Tag Archives: scala

Turn your toString methods inside out

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

    while(it hasNext){
      result = result + (it next);
      if (it hasNext) result = result + ", ";
    }
    result + "]"
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

    while(it hasNext){
      result.append(it next);
      if (it hasNext) result.append(", ");
    }
    result.append("]").toString;
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm…

Now, consider the following code snippet:

  println(myReallyLongList);

Let’s unpack what’s going on in it.

  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

    while(it hasNext){
      ap.append(... // err. What do we do here?

We could just do ap.append(it next toString). But that doesn’t solve the first problem – when we nest these things we’re creating a lot of intermediate strings and then immediately throwing them away, not to mention having once again introduced a hidden O(n^2) factor. Sadness. :(

Let’s do the following:

  trait Append{
    def appendTo(ap : Appendable) : Appendable;

    override def toString = appendTo(new java.lang.StringBuilder()) toString;    
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance – no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

This entry was posted in programming and tagged , , , , on by .

I Aten’t Dead

Surgeon General’s Warning: This post contains an excess of hyperlinks. There is anecdotal evidence that excessive linking may be hazardous to your health.

I’m still here. :-) I’ve just been busy with my new job at Trampoline Systems and non-code things.

On the computer front, I’ve been learning (a bit) about spatial data structures and graph algorithms, mostly for work related reasons although also for personal interest. Tinkering with Haskell continues apace – I’ve been finding that it’s a very good language for thinking in, even if I don’t write anything big in it.

The Lazy Strings project may look like it died, but fear not! It continues. Ok, you probably didn’t care either way, but still it continues.

I’ve decided that a) Life is too short to write it in Java and that b) I should just pick an implementation and stick to it. Consequently I’ve moved to Scala, and have the basics of an implementation based on a finger tree, rather than the traditional balanced binary tree used in a rope.

Why a finger tree? Well, umm. Because. :-) Finger trees have nice performance characteristics, are easy to implement, and seem well suited to the task. The version I’m using is very heavily specialised to the task at hand, and measures a number of things up front (currently just length and hash code) to improve performance and allow for various nice optimisations.

The main thing to note is that I’ve been using scalacheck to test properties of the string. It’s been a great help. I’ve not found it that useful for actually tracking down the specifics of the bugs – its error reporting isn’t that great – but it’s been very useful for showing that they exist and providing enough of a general area that I can track them down myself. The fact that Scala has a REPL has been very useful in doing enough experimentation to pin it down.

The utility of these will come to no surprise to those functional programmers reading my blog. :-) Scalacheck isn’t quite as nice as Quickcheck and the Scala REPL isn’t as nice as most of the ML ones (it’s better than ghci though), but they’re both good enough, and it’s nice having these in Scala.

Scala itself I continue to have mixed feelings about. It’s a little too Java like. I very much like what it’s done with the object system (first class modules. Yay!), and about half of what it’s done with the type system, but the whole effect still feels kludgy to me. It’s definitely infinitely better than Java though, and doesn’t fall much short of being as pleasant as an ML (it’s better in some ways, worse in others).

Tail call optimisation in Scala

This is just a quick note. I couldn’t seem to find any good information on what sorts of tail call optimisations were performed by Scala so I ran a few quick tests. Disappointingly, the answer seems to be not much. A simple tail recursion was turned into a loop, but the following code got no love:


object Main extends Application{
def foo (x : Int){
if (x == Integer.MAX_VALUE) Console.println("Hello world!");
else bar(x + 1);
}

def bar (x : Int){
if (x == Integer.MAX_VALUE) Console.println("Hello world!");
else foo(x + 1);
}
foo(0);
}

This isn’t really surprising, although it’s a bit sad. Eliminating general tail calls seems to be quite hard without just converting everything into CPS and making everything work that way (at least so I’m lead to believe. My expertise on the subject is basically nonexistent), which is probably not a great idea on the JVM.

Curiously, if you enabled optimisations with -XO, foo was inlined into bar but the resulting tail recursion was not eliminated. That’s probably a bug.

Update: It’s been pointed out to me that I’ve gotten myself completely confused on the relationship between continuations and tail call elimination. Please ignore any mumbling to that effect. :-) The issue is apparently that TCO is easy when compiling to assembly and hard when compiling to something like JVM byte code which is higher level and doesn’t already support it.

This entry was posted in programming and tagged , on by .