Tag Archives: lazystrings

I Aten’t Dead

Surgeon General’s Warning: This post contains an excess of hyperlinks. There is anecdotal evidence that excessive linking may be hazardous to your health.

I’m still here. :-) I’ve just been busy with my new job at Trampoline Systems and non-code things.

On the computer front, I’ve been learning (a bit) about spatial data structures and graph algorithms, mostly for work related reasons although also for personal interest. Tinkering with Haskell continues apace – I’ve been finding that it’s a very good language for thinking in, even if I don’t write anything big in it.

The Lazy Strings project may look like it died, but fear not! It continues. Ok, you probably didn’t care either way, but still it continues.

I’ve decided that a) Life is too short to write it in Java and that b) I should just pick an implementation and stick to it. Consequently I’ve moved to Scala, and have the basics of an implementation based on a finger tree, rather than the traditional balanced binary tree used in a rope.

Why a finger tree? Well, umm. Because. :-) Finger trees have nice performance characteristics, are easy to implement, and seem well suited to the task. The version I’m using is very heavily specialised to the task at hand, and measures a number of things up front (currently just length and hash code) to improve performance and allow for various nice optimisations.

The main thing to note is that I’ve been using scalacheck to test properties of the string. It’s been a great help. I’ve not found it that useful for actually tracking down the specifics of the bugs – its error reporting isn’t that great – but it’s been very useful for showing that they exist and providing enough of a general area that I can track them down myself. The fact that Scala has a REPL has been very useful in doing enough experimentation to pin it down.

The utility of these will come to no surprise to those functional programmers reading my blog. :-) Scalacheck isn’t quite as nice as Quickcheck and the Scala REPL isn’t as nice as most of the ML ones (it’s better than ghci though), but they’re both good enough, and it’s nice having these in Scala.

Scala itself I continue to have mixed feelings about. It’s a little too Java like. I very much like what it’s done with the object system (first class modules. Yay!), and about half of what it’s done with the type system, but the whole effect still feels kludgy to me. It’s definitely infinitely better than Java though, and doesn’t fall much short of being as pleasant as an ML (it’s better in some ways, worse in others).

Data structures in Java

I’ve been playing with implementing some simple data structures in Java, for later use in the lazy strings project. The results are… a bit depressing, frankly. Clearly I need to learn to optimise better, as right now I’m creating specialised data structures for certain types which turn out to perform worse than the general purpose implementations in the standard library. :-)

In particular, I threw together a simple implementation of a Trie, and it performs significantly worse than just using a TreeMap. As the number of collisions increases the relative performance of the Trie improves somewhat, eventually beating the TreeMap, but still in most situations the performance is pretty poor.

It also doesn’t help that the Trie ends up bloating the keys significantly, as you end up having to create far too many objects per character. I can definitely cut back on the object creation, but even one object per character may prove to be too much to get an efficient implementation in Java.

This entry was posted in programming and tagged , , on by .

I fail miserably at performance testing

I’ve now standardised my test cases to give a consistent test for each of the three options I’m playing with:

Beginning test cases...

Testing name...

   append reader: StringBuilderAdapter took 11 millis.
   toString: StringBuilderAdapter took 2 millis.
   First run for StringBuilderAdapter took 13 millis.
   Second append: StringBuilderAdapter took 5 millis
   toString: StringBuilderAdapter took 9 millis.
   Second run for StringBuilderAdapter took 14 millis.

Testing name...

   append reader: JumpRope took 6 millis.
   toString: JumpRope took 20 millis.
   First run for JumpRope took 26 millis.
   Second append: JumpRope took 0 millis
   toString: JumpRope took 7 millis.
   Second run for JumpRope took 7 millis.

Testing name...

   append reader: TrefoilString took 11 millis.
   toString: TrefoilString took 18 millis.
   First run for TrefoilString took 29 millis.
   Second append: TrefoilString took 0 millis
   toString: TrefoilString took 8 millis.
   Second run for TrefoilString took 8 millis.

These numbers… are a little different than I’d previously believed. They shouldn’t be taken as definitive though – annoying things like “If I change the order of the tests around then the JIT optimises differently and the numbers change” and other random factors make them very variable.

Still, interesting points of note:

I’m a lot closer to StringBuilder level performance than I’d previously believed. The adapter slows it down a little, but these numbers were pretty close to what I’d been seeing with StringBuilder before – it’s the others that are much lower.

For lots of small appends, StringBuilder is about twice as fast as my implementations (the numbers are really really variable). For single large appends the situation is reversed.

There’s not a lot of performance difference between my dirty but supposed to be faster and my nice high level implementations. It turns out allocation is cheap after all.

Lessons to take out of this: I suck at performance testing and I need to set up some really extensive test suites if I want to get good performance numbers. Also, I should stick to a nice high level implementation and only worry about performance as a sanity check every now and then. Ho hum.

This entry was posted in programming and tagged , , on by .

Beating StringBuilder? Hell no.

So, it turns out that StringBuilder is almost insultingly fast. It’s very well written, does almost no allocation, and is almost certainly getting special help from the VM and JIT under the covers. Appending a medium sized string to it appears to be faster (when averaged over about 11000 appends) than allocating new objects!

I’m almost tempted to try writing a mutable JumpRopeBuilder which does the same job to see if I can beat it (and I probably will anyway, but use it to generate balanced JumpRopes rather than optimise it for appends), but I think I’ll restrain myself. I must remember that hte goal is to have a String replacement rather than a StringBuilder replacement, and as that this does pretty well – the appends are very fast compared to String append (the actual conversion to a string is a little slow admittedly – it takes 100-200 ms for a approximately 100k characters, and about half of that is the overhead of iterating over an unbalanced binary tree). And, unlike StringBuilder, this is totally thread safe by virtue of being immutable.

Which reminds me, I should compare it against StringBuffer.

Anyway, I should really start adding some actual functionality now. :-)

This entry was posted in programming and tagged , , , on by .