Showing posts with label java. Show all posts
Showing posts with label java. Show all posts

Thursday, September 12, 2013

Lazy Lists

Let's talk about laziness. No, let's not just talk about being lazy - let's do something about it. Let's be lazy.

There's a common pattern that I use in Android programming where I will create objects lazily, especially when these objects will not necessarily or frequently be needed at runtime. This is especially true for code that I write for the Android framework, where we like to be as lean as possible and leave more memory available for the system and the application.

Often, these objects are collections. My personal favorite is ArrayList (ever since I moved on years ago from the original, but somewhat crusty, Vector class), although I am also known to use Hashmap for those key/value pair situations (especially after I got used to ActionScript's paradigm of everything-is-a-hashmap).

Of course, you can't simply start poking into a lazily-created collection willy-nilly, unless you're a big fan of NullPointerExceptions. So you need to first check whether the thing is null, and then create it as necessary. Similarly, if you're removing an item from the collection, you might want to check if it's now empty and reset the field to null again.

None of this is difficult... but it is tedious. And it tends to bloat the code that you have to read and write and maintain all over the place. And it's exactly the kind of thing that is easy to get wrong because your mind just blurs over the repeated pattern like it's yet another pile of dirty laundry next to the bed.

So I wondered if there was a way I could encapsulate the behavior I wanted to make my laziness easier, simpler, and more readable. And it turns out there was such a way (or else this would be a very short article that would have ended right around... now).

But first, let's look at the problem a bit more, to understand what we're trying to fix.

I have a class called LazyLists with a couple of List fields and some methods for adding and removing items of various types. First, there are the fields:

List<Integer> intList = null;
List<Float> floatList = null;

Then there are add/remove fields for the int/float types I care about:

public void addItem(int item) {
    if (intList == null) {
        intList = new ArrayList();
    }
    if (!intList.contains(item)) {
        intList.add(item);
    }
}

public void removeItem(int item) {
    if (intList != null) {
        intList.remove((Object) item);
        if (intList.isEmpty()) {
            intList = null;
        }
    }
}

public void addItem(float item) {
    if (floatList == null) {
        floatList = new ArrayList();
    }
    if (!floatList.contains(item)) {
        floatList.add(item);
    }
}

public void removeItem(float item) {
    if (floatList != null) {
        floatList.remove(item);
        if (floatList.isEmpty()) {
            floatList = null;
        }
    }
}

There are a few things to notice about these methods:
  • There's all of that boilerplate code I mentioned before that lazily creates and nulls out the appropriate list based on the state of the list at the time. This is what we'd like to clean up, since this code is repeated as many times as we have to access these list fields.
  • I run a uniqueness check in the addItem() methods because it suits me; I only want to add unique items, not the same items over and over. That's kind of a detail that's specific to my situation, but produces more boilerplate that I'd love to get rid of.
  • There an interesting nuance to the int variation of removeItem(). Do you see it? It's the cast to Object prior to removing the item from intList. This is because of the awkward crossover between primitive types (int, float, etc.) and Object types (Integer, Float, etc.) in Java. There are actually two remove() methods on List, one that takes an int and one that takes an Integer. The one that takes an int removes the item at that index, whereas the Integer variant removes that item itself. That's a pretty huge distinction. And maybe it's well-known to you if you've worked with Lists and ints, but I hit it when working on this example, and thought it was interesting enough to call out.
Anyway, moving on.

We can call the methods above and produce lists that dynamically change with the items that we add/remove. For example, this code creates the class, adds items to the two lists, and removes those items:

LazyLists lists = new LazyLists();
lists.addItem(0);
lists.addItem(1f);
lists.removeItem(0);
lists.removeItem(1f);

Adding a bit of tracing code gives us this output:

starting lists = null, null
populated lists = [0], [1.0]
ending lists = null, null

So there's not too much cruft above, but I figure the second time I'm repeating the same code, I should think about refactoring it in a way that avoids the repetition. And it's easy to imagine that there might be several places in real code that wants to add/remove items, or several different types going into several different types of collections. Then it's easy to see the little bits of repeated code above bloating into more than you might want to manage in the long haul.

There are various ways that you could do this, depending on the collection(s) you want to support, the extra logic you'd like (like my requirement for uniqueness in the lists), and stylistic elements about static methods, etc. But here's what I wound up with:

public class LazyListManager {

    public static  List add(List list, T item) {
        if (list == null) {
            list = new ArrayList();
            list.add(item);
        } else if (!list.contains(item)) {
            list.add(item);
        }
        return list;
    }

    public static  List remove(List list, T item) {
        if (list != null) {
            list.remove(item);
            if (list.isEmpty()) {
                list = null;
            }
        }
        return list;
    }
}

This simple class has two static methods on it to support adding and removing from an arbitrary List object. As needed, it will create a new List (actually, an ArrayList, but that's an implementation detail). It will check for uniqueness in the add() method, check for nullness in the remove() method, and null out an empty list in remove() as appropriate.

There is an important piece here that makes this work - callers must supply their target list as both a parameter to the function and as the recipient of the return value; this is what makes it possible for these utility methods to allocate or null-out the list as appropriate (since they do not have access to the original list, but only have a reference to it).

Given these two static utility methods, we can now write new addItem() and removeItem() methods that are a significantly better (you can't get less than one line of code, unless I missed that part in my CS education):

public void addItemBetter(int item) {
    intList = LazyListManager.add(intList, item);
}

public void removeItemBetter(int item) {
    intList = LazyListManager.remove(intList, item);
}

public void addItemBetter(float item) {
    floatList = LazyListManager.add(floatList, item);
}

public void removeItemBetter(float item) {
    floatList = LazyListManager.remove(floatList, item);
}

Calling these methods looks remarkably similar to what we saw before:

lists.addItemBetter(0);
lists.addItemBetter(1f);
lists.removeItemBetter(0);
lists.removeItemBetter(1f);

and results in exactly the same output (which shouldn't be a surprise. If the results were different, this approach wouldn't be a utility as much as a bug):

starting lists = null, null
populated lists = [0], [1.0]
ending lists = null, null
populated lists = [0], [1.0]
ending lists = null, null

The LazyListManager class has taken out all of the tedious boilerplate related to null checks, uniqueness, allocation, and nullification, and has left us with just one line of code to write whenever we want to add or remove items to/from one of our lists. That's just about the right amount of code for me to write without making a typo or a copy/paste error along the way.

If this were public API, I could envision the manager class offering various different kinds of collections and options, or maybe wrapping more of the capabilities of collections classes (like isEmpty() or contains()). But for now, it's a nice little internal class that can help me simplify my code whenever I need to use this lazy-allocation pattern.

All of the interesting code is inline above, but if you want the two source files, you can download them here.


Thursday, March 28, 2013

For API Nerds: Interfaces and Inner Classes

This article is for API developers only. If you're writing an application, you may not care about APIs, because you probably don't have any. But if you're writing a library that others will use, public API comes into play.

API is like a pile of laundry: you can either spread the huge, reeking mess out all over your floor, or you can stuff it into a nice, clean hamper. While the first approach provides the convenience of being able to select each day's pre-soiled attire quickly, since it is arrayed out in front of you when you get out of bed, the second solution presents a clean interface to people that make the mistake of walking into your pathetic, slovenly life.

Good APIs prefer hampers.

I was implementing a new piece of API and functionality and wondered the following: Is there any reason to not put my implementation (which I did not want in the public API, and which would be the only implementation of said interface) inside the interface itself? It seems odd, perhaps, maybe even a bit tawdry, but is there anything wrong with it?

My motivation was simple: I'm adding this API and implementation into a package that already has many classes to wade through. Should I bother adding more noise to the list for an implementation detail of this single interface? Why not put it into the interface file, and just bundle up that implementation in a logical place, tightly bound to the interface that it implements?

So I did this, coming up with something like the following:

public interface A {
    void a();

    static class AImpl implements A {

        @Override
        public void a() {
            // ...
       }
   }
}

Additionally, an existing class exposed a single method giving a reference to this interface:

public A getA() {
    return new AImpl();
}

This worked well - the users of the public getA() method got what they needed: an object of type A with its spectacular, if slightly under-documented, method a(). And I successfully hid the implementation class inside of this same file, as a package-private static class, saving my package the unbearable burden of yet another file.

Done!

Then I ran JavaDocs on my project and realized my mistake: my supposedly package-private implementation class was now part of the public API, showing up as the public class A.AImpl. What th-... I didn't say it was public! In fact, I explicitly made it package-private, so that the class exposing the new getA() method could instantiate an instance of that class. So what happened?

Interfaces happened. Interfaces do not use the same access rules as classes. Instead, all members of an interface are public by default. So while I used the correct (to my mind) syntax for package-private access, I was actually using the correct (to the mind of my interface) syntax for declaring that inner class public, and the JavaDocs did the rest.

Do I hear you yelling, "You could make it private!?" Or is that just the echo of my internal shouting when I first saw the problem? This is what I tried to do. This fails for (at least) two reasons. One is that I actually needed this class to be package-private (not private), so that I could instantiate it and override it from outside of this interface. An even better reason is that you cannot declare different access permissions than the default that the interface uses. In this case, that means that I cannot have a public interface with a private inner class. I could declare the entire interface to be private... but that defeats the whole thing I was going for by exposing the interface A as a public API.

There are other ways around this. For example, there is a mechanism we use in Android framework code, @hide, to solve the problem of having to expose API internally but not wanting it to be a part of the public API (a workaround for the language not having the ability to differentiate between internal and external access to library APIs). But at the point where I considered using this workaround, the awkwardness of putting the class inside of the interface just got to be too much.

In the end, I just pulled the class out and put it at the same level as A. It added another file to the package, but that really wasn't that big a deal anyway. And it was certainly better than the mess I was creating with my class-inside-interface approach.

The moral of the story is: API design is tricky. Consider not only the internal implementation details ("Can I make my life easier by implementing the solution in this particular way?"), but also (and wayyyyy more importantly), "Can I make the API, and therefore the lives of external developers, better by doing it in this other way?"

Here is a technical diagram, illustrating the problem and the original solution:



Sunday, August 1, 2010

Varargh!

Varargs, a feature of the C language since roughly the late Victorian era, was introduced into the Java language in JDK1.5.

I love varargs. They allow me to declare a function flexibly with the ability to take zero, one, or several parameters. This ability is useful when the user may have an unpredictable number of things to add to some data structure, eliminating the need for them to create some collection to pass the parameters into your function. It makes for a nice API. And I'm all about nice API. (And donuts).

But sometimes varargs don’t work so well. I suppose it’s because I expect too much of them. Like when you meet someone that you really like, so you call them every few minutes or so, then hang around outside their apartment and workplace and friends' houses until they get a restraining order on you. It’s not that they weren’t really cool and worth getting to know, but that your needs weren't necessarily compatible.

I was working recently on some API improvements for a library I'm writing. I had some constructors that users would call with exactly two values of different types, like this:

  public MyObject(int value1, int value2) {}

  public MyObject(float value1, float value2) {}

This worked well, vectoring the code path off in different directions depending on the types of the values that the user passed in.

But sometimes, the users might have just one value to pass in. Or three. Or nineteen. Rather than having variations for all of these situations, multiplied by the number of types that I wanted to support, I wanted something simpler that took variable numbers of parameters.

So I rewrote my API with varargs. It looked something like this:

  public MyObject(int... values) {}

  public MyObject(float... values) {}

This new version was awesome. Now the user could call my functions with the appropriate values and it would do the right thing, no matter what the types were or how many values they passed in. I compiled the code for the library, wrote my test code:

MyObject obj1 = new MyObject(1f);
MyObject obj2 = new MyObject(1);

and

FAIL

I got a compiler error. Specifically, Eclipse gave me the following completely unhelpful message for the test code: “The constructor MyObject(float[]) is ambiguous” for the second line of that test code (MyObject obj2 = new MyObject(1)).

Never mind the fact that the error was telling me that the constructor takes an array instead of varargs; I know that varargs is syntactic sugar for a construct that gets turned into an array. The real problem was that my code wouldn’t compile.

Apparently, varargs cannot handle the method overloading situation and make the correct type inference and just fails to compile. Although the compiler can make a good decision about the correctness of a constructor with float vs. int, it can’t make that decision for float[] vs int[]. So it bails.

Meanwhile, I discovered another nuance of using varags; you can’t pass them around between constructors willy-nilly. One of the things I wanted to do in my rewrite was to record the type that the user wanted to use and send it to a private constructor like this:

private MyObject(Class valueType, Object... values) {}

Meanwhile, I wrote a single type-specific constructor (to avoid the previous compiler error for now) that called this more generic constructor:

public MyObject(float... values) {
 this(float.class, values);
}

Again, my code compiled fine. And when I eliminated one of the float/int varargs combinations, I was able to write a test that successfully called my varargs constructor:

MyObject blah = new MyObject(1f, 2f, 3f);

This code called the private Object-based constructor above. But then I encountered bugs at runtime where the number of values was not what I was expected. In particular, the number of values being stored by the private constructor was 1, not 3.

But, but, but, ...

It turns out that the call to the private constructor turned a perfectly fine threesome of parameters of type float into a single parameter of type float[]. That is, my varargs had just turned into the array that it was pretending it wasn't when I first wrote the code.

At this point in an article, the climax as it were, I would typically say, “And here’s how you work around these issues.” I would love to do that, because I like neat tricks and workarounds. And I’d love to actually have that workaround in my code.

Unfortunately, I have no such workaround. I frankly don’t know how to make this work in any reasonable way. You can be more clear with the compiler, by telling it things like new MyObject(new float[](1f, 2f, 3f}) to get around the compiler error. And maybe you even like the way that that looks for an API. If you are sadistic. Or hallucinating. You could also find a different way to store the individual parameters in the varargs rather than having the VM think that it’s just a single array item. But at this point, my original attempt at a more attractive API and flexible implementation was looking like a bad prototype or an industrial accident.

So I did the only thing that any responsible captain would do after his vessel had hit an iceberg and was going down with all hands: abandon ship on the first lifeboat.

For my situation, I could limit the flexibility to just one or two parameters for the main case, and then have a completely different constructor with a custom data structure for the more flexible, and less common, case of having several parameters. So I walked away from varargs completely for my code with nary a look back. The second problem of calling the private constructor with varags then magically disappeared because I no longer had to pass the varargs around.

For other situations that may benefit from varags, but may run afoul of method overloading, I can only recommend that you know and understand the limitations of the nifty varargs feature. I searched around on the web for answers to my compiler problem and didn’t find anything immediately, so I thought it might be worth noting for posterity. Or at least for the pleasure of venting.

Varargs: great when they work. But when then don’t? Nobody knows. Maybe that's why the syntax uses ellipses. It's like a little language joke that says, "Sure, use varargs, Let's see what happens..."