Lazy strings for Java

I’ve started yet another open source project, motivated by the fact that Java Strings suck. The aim is to have an efficient string implementation for Java which is optimised for a more sensible set of operations. It draws on ideas from Haskell and purely functional data structures, aiming to take advantage of the fact that the JVM’s garbage collection and allocation are now ludicrously well optimised.

At the moment it has a basic set of working operations. Unfortunately the performance for one of the crucial operation blows goats. It’s very very slow for appending to the right of large strings, and causes outrageous memory overflows. smallString.append(largeString) is pretty fast (I measured it at about 50% slower than appending to the end of a StringBuilder when averaged over a medium large file).

I know what the problem is though: A stupid list implementation. I tried to stick too closely to Haskell laziness and failed miserably because the JVM doesn’t handle things like that well. I’m going to rewrite it in more idiomatic Java and with more careful attention to detail, and I hope to get the desired O(1) append in every case.

I’m debating giving Nice another try and rewriting parts of this in it, particularly the list implementation – because of how I intend to handle it, it’s the sort of thing multiple dispatch and functional programming would be *really* handy for.

Update: Ok, maybe I don’t know how to solve it. I’ll put more thought into it and see if I come up with something. As an alternative plan, someone I know pointed me towards Ropes, which is an implementation of a very similar thing for C++ / C. Very likely worth my looking at for inspiration / porting.

This entry was posted in programming and tagged , on by .