Tag Archives: java

Java performance tip: Think about your container sizes

Take a look at this javadoc entry and then think about what happens if you create many thousands of HashMaps with only 1 or 2 entries in them.

Our work application has a very large number of objects to which you can attach attributes. We use HashMaps for these. Many of them have no attributes and were just getting their HashMap with a new HashMap() call. In order to investigate just why our application was eating up so much memory (relatively speaking. It still isn’t competing with netbeans, firefox, etc). I opened it up in the profiler, saw that of that memory usage the lion’s share of it was from HashMaps. I’ve now changed the code so they’re being given an initial capacity of 1. Things are much happier now.

This is basically just a heads up – if you’re using a lot of a container type which lets you specify an initial capacity (HashMap, HashSet and ArrayList, for example), give it at least some thought rather than using the default. If you only have a few or they’re purely transient it doesn’t really matter. Even if they’re many and long lived you don’t need to worry about getting it exactly right, but a little tuning can save you from a lot of the memory bloat that is common in Java applications.

This entry was posted in programming and tagged , on by .

Turn your toString methods inside out

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

    while(it hasNext){
      result = result + (it next);
      if (it hasNext) result = result + ", ";
    }
    result + "]"
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

    while(it hasNext){
      result.append(it next);
      if (it hasNext) result.append(", ");
    }
    result.append("]").toString;
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm…

Now, consider the following code snippet:

  println(myReallyLongList);

Let’s unpack what’s going on in it.

  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

    while(it hasNext){
      ap.append(... // err. What do we do here?

We could just do ap.append(it next toString). But that doesn’t solve the first problem – when we nest these things we’re creating a lot of intermediate strings and then immediately throwing them away, not to mention having once again introduced a hidden O(n^2) factor. Sadness. :(

Let’s do the following:

  trait Append{
    def appendTo(ap : Appendable) : Appendable;

    override def toString = appendTo(new java.lang.StringBuilder()) toString;    
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance – no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

This entry was posted in programming and tagged , , , , on by .

I Aten’t Dead

Surgeon General’s Warning: This post contains an excess of hyperlinks. There is anecdotal evidence that excessive linking may be hazardous to your health.

I’m still here. :-) I’ve just been busy with my new job at Trampoline Systems and non-code things.

On the computer front, I’ve been learning (a bit) about spatial data structures and graph algorithms, mostly for work related reasons although also for personal interest. Tinkering with Haskell continues apace – I’ve been finding that it’s a very good language for thinking in, even if I don’t write anything big in it.

The Lazy Strings project may look like it died, but fear not! It continues. Ok, you probably didn’t care either way, but still it continues.

I’ve decided that a) Life is too short to write it in Java and that b) I should just pick an implementation and stick to it. Consequently I’ve moved to Scala, and have the basics of an implementation based on a finger tree, rather than the traditional balanced binary tree used in a rope.

Why a finger tree? Well, umm. Because. :-) Finger trees have nice performance characteristics, are easy to implement, and seem well suited to the task. The version I’m using is very heavily specialised to the task at hand, and measures a number of things up front (currently just length and hash code) to improve performance and allow for various nice optimisations.

The main thing to note is that I’ve been using scalacheck to test properties of the string. It’s been a great help. I’ve not found it that useful for actually tracking down the specifics of the bugs – its error reporting isn’t that great – but it’s been very useful for showing that they exist and providing enough of a general area that I can track them down myself. The fact that Scala has a REPL has been very useful in doing enough experimentation to pin it down.

The utility of these will come to no surprise to those functional programmers reading my blog. :-) Scalacheck isn’t quite as nice as Quickcheck and the Scala REPL isn’t as nice as most of the ML ones (it’s better than ghci though), but they’re both good enough, and it’s nice having these in Scala.

Scala itself I continue to have mixed feelings about. It’s a little too Java like. I very much like what it’s done with the object system (first class modules. Yay!), and about half of what it’s done with the type system, but the whole effect still feels kludgy to me. It’s definitely infinitely better than Java though, and doesn’t fall much short of being as pleasant as an ML (it’s better in some ways, worse in others).

How to create sealed classes in Java

As far as I can tell, no one else seems to have spotted this trick, so I thought I’d share my latest corruption of the Java language:


public class Sealed
{
private Sealed() { }

public final static class Foo extends Sealed {
public Foo() { }
}
public final static class Bar extends Sealed { }
}

What does this code do? It creates a class Sealed which is *not* final, but only has a finite number of subclasses which you determine at compile time (you can make those subclasses themselves extensible if you want, although I declared them final above). No one else can subclass it because they’re not able to call the super constructor as it’s not visible outside of this class.

This entry was posted in programming and tagged on by .

Good APIs

At least in the Java world[1], good APIs are very rare. The vast majority of the APIs I’ve used in Java are at best mediocre, and I’ve run into some real stinkers (No fingerpointing here, but a lot of them are even in the standard library!).

Part of this is the fault of the language. Java code… tends to be ugly. It’s not a language which lends itself to really flexible syntax, and so you have to work really hard to produce powerful abstractions which are actually nice to use.

The only APIs I’ve encountered so far which have really made me sit up and go “Wow, that’s well designed” are Google Guice and Joda time. They have well thought out class hierarchies, good object oriented style[2], sensible use of fluent interfaces / method chaining and just generally well thought out interfaces and names.

They’re not perfect by any means, but they show promise that it really is possible to write good APIs for Java.

Anyone else know of other similarly well designed libraries?

[1] And, unfortunately, I lack much in the way of non-trivially large development in other languages. The Parsec API is nice, ‘though it gives me a headache sometimes. Haskell libraries in general look rather pretty, though I suspect that’s more a function of the language than the API design. I’m not really experienced enough in it to judge good API design.

[2] I’m not an OO fanatic. It has advantages and disadvantages, and sometimes you just want to use a different approach (I rather like FP for example). But one thing I’ve observed is that if you do it right, OO can have the effect of producing some astonishingly readable code. I don’t know either well/at all really, but this style of design seems much more common in Ruby, smalltalk, etc.

This entry was posted in programming and tagged , , , on by .