This is kinda a “Well, duh” moment. It’s obvious in retrospect, but I completely failed to spot it up front, so I thought I’d share.
I noticed that SBinary performance really really sucked. We’re using it at work for saving application state, and reading and writing a file of only 900kb took about two seconds! This was bad.
Some quick performance testing suggested that this was almost entirely IO bound. Reading and writing the corresponding amount of data from a byte array took fairly little time – only about 200ms for writing, 300ms for reading. So, what was I doing wrong?
After a few seconds of head scratching I realised the problem. You see, RandomAccessFile implements the DataInput and DataOutput interfaces. This is useful for small things, but for doing non-trivial binary input and output? Not so much.
The problem is that the reads and writes for these implementations are totally unbuffered. This should have been obvious, but for some reason didn’t occur to me. Oops. I’m now buffering reads and writes explicitly (currently in a pretty stupid way, but oh well). It’s a lot faster now.
Take a look at this javadoc entry and then think about what happens if you create many thousands of HashMaps with only 1 or 2 entries in them.
Our work application has a very large number of objects to which you can attach attributes. We use HashMaps for these. Many of them have no attributes and were just getting their HashMap with a new HashMap() call. In order to investigate just why our application was eating up so much memory (relatively speaking. It still isn’t competing with netbeans, firefox, etc). I opened it up in the profiler, saw that of that memory usage the lion’s share of it was from HashMaps. I’ve now changed the code so they’re being given an initial capacity of 1. Things are much happier now.
This is basically just a heads up – if you’re using a lot of a container type which lets you specify an initial capacity (HashMap, HashSet and ArrayList, for example), give it at least some thought rather than using the default. If you only have a few or they’re purely transient it doesn’t really matter. Even if they’re many and long lived you don’t need to worry about getting it exactly right, but a little tuning can save you from a lot of the memory bloat that is common in Java applications.
I’ve now standardised my test cases to give a consistent test for each of the three options I’m playing with:
Beginning test cases...
append reader: StringBuilderAdapter took 11 millis.
toString: StringBuilderAdapter took 2 millis.
First run for StringBuilderAdapter took 13 millis.
Second append: StringBuilderAdapter took 5 millis
toString: StringBuilderAdapter took 9 millis.
Second run for StringBuilderAdapter took 14 millis.
append reader: JumpRope took 6 millis.
toString: JumpRope took 20 millis.
First run for JumpRope took 26 millis.
Second append: JumpRope took 0 millis
toString: JumpRope took 7 millis.
Second run for JumpRope took 7 millis.
append reader: TrefoilString took 11 millis.
toString: TrefoilString took 18 millis.
First run for TrefoilString took 29 millis.
Second append: TrefoilString took 0 millis
toString: TrefoilString took 8 millis.
Second run for TrefoilString took 8 millis.
These numbers… are a little different than I’d previously believed. They shouldn’t be taken as definitive though – annoying things like “If I change the order of the tests around then the JIT optimises differently and the numbers change” and other random factors make them very variable.
Still, interesting points of note:
I’m a lot closer to StringBuilder level performance than I’d previously believed. The adapter slows it down a little, but these numbers were pretty close to what I’d been seeing with StringBuilder before – it’s the others that are much lower.
For lots of small appends, StringBuilder is about twice as fast as my implementations (the numbers are really really variable). For single large appends the situation is reversed.
There’s not a lot of performance difference between my dirty but supposed to be faster and my nice high level implementations. It turns out allocation is cheap after all.
Lessons to take out of this: I suck at performance testing and I need to set up some really extensive test suites if I want to get good performance numbers. Also, I should stick to a nice high level implementation and only worry about performance as a sanity check every now and then. Ho hum.
So, it turns out that StringBuilder is almost insultingly fast. It’s very well written, does almost no allocation, and is almost certainly getting special help from the VM and JIT under the covers. Appending a medium sized string to it appears to be faster (when averaged over about 11000 appends) than allocating new objects!
I’m almost tempted to try writing a mutable JumpRopeBuilder which does the same job to see if I can beat it (and I probably will anyway, but use it to generate balanced JumpRopes rather than optimise it for appends), but I think I’ll restrain myself. I must remember that hte goal is to have a String replacement rather than a StringBuilder replacement, and as that this does pretty well – the appends are very fast compared to String append (the actual conversion to a string is a little slow admittedly – it takes 100-200 ms for a approximately 100k characters, and about half of that is the overhead of iterating over an unbalanced binary tree). And, unlike StringBuilder, this is totally thread safe by virtue of being immutable.
Which reminds me, I should compare it against StringBuffer.
Anyway, I should really start adding some actual functionality now. :-)