I’ve been playing with implementing some simple data structures in Java, for later use in the lazy strings project. The results are… a bit depressing, frankly. Clearly I need to learn to optimise better, as right now I’m creating specialised data structures for certain types which turn out to perform worse than the general purpose implementations in the standard library. :-)
In particular, I threw together a simple implementation of a Trie, and it performs significantly worse than just using a TreeMap. As the number of collisions increases the relative performance of the Trie improves somewhat, eventually beating the TreeMap, but still in most situations the performance is pretty poor.
It also doesn’t help that the Trie ends up bloating the keys significantly, as you end up having to create far too many objects per character. I can definitely cut back on the object creation, but even one object per character may prove to be too much to get an efficient implementation in Java.
Java is usually depressing. :)
I don’t know much about what you’re doing but you might consider using the Flyweight pattern from GoF. They talk about representing characters in a word processor as objects, but using Flyweight to, instead of creating a fresh object for each character, instead creating an object for the first instance of any given character and then just pointing back to the ones you’ve already created. It should work if it’s sufficient for you to point at the character objects from inside your data structure.
If it can be done and you try it, it should improve both your resident size and your performance.
Ha. Yes, that it is. I think the problem is mostly not Java though, it’s more that I’m not yet very good at optimising things and that a naive implementation of a cunning datastructure can be less efficient than a heavily optimised version of a more normal datastructure.
Thanks for the suggestion. I like the idea, but it sounds like it will only work when the objects in question are immutable. The problem here isn’t Character objects (I’m storing char primitives on the nodes) but the actual nodes of the decision tree, which are mutable and thus can’t be reused like that.
I’m going to try to solve the problem by collapsing multiple nodes into one, so that when there are long runs of characters without branches they’ll be stored on a single node. This should hopefully reduce the number of objects by a very large factor (exponentially, in some cases).
Update: I’ve implemented that now, and my Trie is now only twice as slow as TreeMap. :-)