Category Archives: programming

Java performance tip: Think about your container sizes

Take a look at this javadoc entry and then think about what happens if you create many thousands of HashMaps with only 1 or 2 entries in them.

Our work application has a very large number of objects to which you can attach attributes. We use HashMaps for these. Many of them have no attributes and were just getting their HashMap with a new HashMap() call. In order to investigate just why our application was eating up so much memory (relatively speaking. It still isn’t competing with netbeans, firefox, etc). I opened it up in the profiler, saw that of that memory usage the lion’s share of it was from HashMaps. I’ve now changed the code so they’re being given an initial capacity of 1. Things are much happier now.

This is basically just a heads up – if you’re using a lot of a container type which lets you specify an initial capacity (HashMap, HashSet and ArrayList, for example), give it at least some thought rather than using the default. If you only have a few or they’re purely transient it doesn’t really matter. Even if they’re many and long lived you don’t need to worry about getting it exactly right, but a little tuning can save you from a lot of the memory bloat that is common in Java applications.

This entry was posted in programming and tagged , on by .

Turn your toString methods inside out

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

    while(it hasNext){
      result = result + (it next);
      if (it hasNext) result = result + ", ";
    }
    result + "]"
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

    while(it hasNext){
      result.append(it next);
      if (it hasNext) result.append(", ");
    }
    result.append("]").toString;
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm…

Now, consider the following code snippet:

  println(myReallyLongList);

Let’s unpack what’s going on in it.

  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

    while(it hasNext){
      ap.append(... // err. What do we do here?

We could just do ap.append(it next toString). But that doesn’t solve the first problem – when we nest these things we’re creating a lot of intermediate strings and then immediately throwing them away, not to mention having once again introduced a hidden O(n^2) factor. Sadness. :(

Let’s do the following:

  trait Append{
    def appendTo(ap : Appendable) : Appendable;

    override def toString = appendTo(new java.lang.StringBuilder()) toString;    
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance – no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

This entry was posted in programming and tagged , , , , on by .

Ooooooh. Shiny!

When I built this computer I bought two very nice monitors to go with it.

I then promptly discovered that I had *no idea* how to make two monitors work under linux. X configuration is dark magic to me. Consequently the second monitor got shunted to my other machine and I’ve had a spare old monitor kicking around for a while.

However, my computer recently died and I had to reinstall, and in doing so and poking around with video settings I discovered that the “nvidia-settings” program includes support for making multiple monitors work flawlessly. A few quick clicks and I had the two monitors working side by side very nicely. Yay!

I do however note that Gnome doesn’t handle them that well. This will probably induce me to finally get off my ass and try a better window manager. I tried Enlightenment earlier, but E16 didn’t work, the repository version of E17 is kinda unstable (not to mention Enlightenment is *really confusing*, but I’m sure I’d get used to it) and I haven’t quite worked up the courage/botheredness to install the CVS version yet.

Anyone know what a good window manager is for use with xinerama? Should I be giving XMonad a try?

This entry was posted in programming and tagged , on by .

Best error message ever

“The program ‘apt-get’ is not currently installed. You can install it by typing ‘apt-get install apt'”

This entry was posted in programming and tagged on by .

I Aten’t Dead

Surgeon General’s Warning: This post contains an excess of hyperlinks. There is anecdotal evidence that excessive linking may be hazardous to your health.

I’m still here. :-) I’ve just been busy with my new job at Trampoline Systems and non-code things.

On the computer front, I’ve been learning (a bit) about spatial data structures and graph algorithms, mostly for work related reasons although also for personal interest. Tinkering with Haskell continues apace – I’ve been finding that it’s a very good language for thinking in, even if I don’t write anything big in it.

The Lazy Strings project may look like it died, but fear not! It continues. Ok, you probably didn’t care either way, but still it continues.

I’ve decided that a) Life is too short to write it in Java and that b) I should just pick an implementation and stick to it. Consequently I’ve moved to Scala, and have the basics of an implementation based on a finger tree, rather than the traditional balanced binary tree used in a rope.

Why a finger tree? Well, umm. Because. :-) Finger trees have nice performance characteristics, are easy to implement, and seem well suited to the task. The version I’m using is very heavily specialised to the task at hand, and measures a number of things up front (currently just length and hash code) to improve performance and allow for various nice optimisations.

The main thing to note is that I’ve been using scalacheck to test properties of the string. It’s been a great help. I’ve not found it that useful for actually tracking down the specifics of the bugs – its error reporting isn’t that great – but it’s been very useful for showing that they exist and providing enough of a general area that I can track them down myself. The fact that Scala has a REPL has been very useful in doing enough experimentation to pin it down.

The utility of these will come to no surprise to those functional programmers reading my blog. :-) Scalacheck isn’t quite as nice as Quickcheck and the Scala REPL isn’t as nice as most of the ML ones (it’s better than ghci though), but they’re both good enough, and it’s nice having these in Scala.

Scala itself I continue to have mixed feelings about. It’s a little too Java like. I very much like what it’s done with the object system (first class modules. Yay!), and about half of what it’s done with the type system, but the whole effect still feels kludgy to me. It’s definitely infinitely better than Java though, and doesn’t fall much short of being as pleasant as an ML (it’s better in some ways, worse in others).