Tag Archives: data structures

Turn your toString methods inside out

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

    while(it hasNext){
      result = result + (it next);
      if (it hasNext) result = result + ", ";
    }
    result + "]"
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

    while(it hasNext){
      result.append(it next);
      if (it hasNext) result.append(", ");
    }
    result.append("]").toString;
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm…

Now, consider the following code snippet:

  println(myReallyLongList);

Let’s unpack what’s going on in it.

  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

    while(it hasNext){
      ap.append(... // err. What do we do here?

We could just do ap.append(it next toString). But that doesn’t solve the first problem – when we nest these things we’re creating a lot of intermediate strings and then immediately throwing them away, not to mention having once again introduced a hidden O(n^2) factor. Sadness. :(

Let’s do the following:

  trait Append{
    def appendTo(ap : Appendable) : Appendable;

    override def toString = appendTo(new java.lang.StringBuilder()) toString;    
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance – no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

This entry was posted in programming and tagged , , , , on by .

Data structures in Java

I’ve been playing with implementing some simple data structures in Java, for later use in the lazy strings project. The results are… a bit depressing, frankly. Clearly I need to learn to optimise better, as right now I’m creating specialised data structures for certain types which turn out to perform worse than the general purpose implementations in the standard library. :-)

In particular, I threw together a simple implementation of a Trie, and it performs significantly worse than just using a TreeMap. As the number of collisions increases the relative performance of the Trie improves somewhat, eventually beating the TreeMap, but still in most situations the performance is pretty poor.

It also doesn’t help that the Trie ends up bloating the keys significantly, as you end up having to create far too many objects per character. I can definitely cut back on the object creation, but even one object per character may prove to be too much to get an efficient implementation in Java.

This entry was posted in programming and tagged , , on by .