(Warning: I’m half asleep, and this post is somewhere between a brain dump and a rant. Coherency is strictly optional).
So, my latest random personal project has turned into a bit of a debacle.
I decided I wanted a Java bytecode manipulation library with a decent Scala API. The options were either “Write my own” or “Write bindings to an existing one”. I chose something of a middle ground: “Port an existing one”. Rather than go for any of the normal big names I went for an obscure little internal library at EPFL called FJBG (Fast Java Bytecode Generator). It’s basically a low level interface onto the classfile format, and I’d used it before for code generation (e.g. for the structural proxies stuff) and found it pretty straightforward. Kindof hard to debug programming errors, but otherwise pretty robust.
One slight flaw: No test suite to speak of. But that’s ok, it’s used as part of the compiler backend for scalac, so I assume it gets relatively well covered by the scalac test suite. And it’s been around for quite a while, so has had itme to stabilise. Should be fine.
Right?
Right?
Anyway, the initial porting process went pretty smoothly. I was astonished at how smoothly in fact – after about 6 hours of work I had the bytecode generation code working in Scala and prettified to have nicer method names, etc. Pretty good going. I was frankly astonished – I basically ran it through jatran, spent about 6 hours fixing compiler errors and then at the end it took about 10 minutes of bug fixing before it just worked. Not bad. The only slight problem was that the class file parsing code wasn’t working.
The problem was that the way the code worked there was a fairly deeply nested inheritance strategy, and maintained two constructor hierarchies – one for creating things in memory, one for creating them from a DataInputStream. because of the way Scala handles constructors this is essentially impossible to do in Scala.
I’ve never thought this was a problem before, but this seemed to me to be quite a reasonable thing to do and I started to have doubts about Scala’s approach to constructors. I still have some, but not to the point that I previously had. The thing is, this approach is really fragile. It means that each constructor needs to balance the class’s invariants in different ways – you’ve given yourself twice as many opportunities to screw up.
Anyway, after some struggling with approaches I eventually (took me several times as long as the previous part of the porting) got this ported in a reasonably straightforward way. It wasn’t the prettiest code ever, but the mapping onto the original wasn’t bad. So I tried it out on a few simple tests – generate a class file, read it back in again, compare them to make sure you got approximately the same thing.
Hm. And it didn’t work. How curious.
I stared at the implementation for a bit, stared at the original Java, couldn’t see a difference. So I ran the same test on the original Java and it broke in the same way. Great.
That turned out to be an easy fix. But it was an easy fix to a problem very definitely caused by the multiple constructor hierarchy. Oh well, that worked now.
Next part of the test. Write the newly read class file to a file, load it and try to run it.
Oops. It NPEs when I try to write the file. Guess I did something wrong – I wonder why that array is null there. Looks like the logic for initialising it is rather complex, lets see how the original Java version handles this. So I wrote a simplified test case using the original which took a class file, read it to the in memory representation and wrote it out again and tested it against a random class file. It broke. In a totally different way to the way my version did – it didn’t even manage to read the file (I think the difference here is that this was a classfile found in the wild rather than one generated by FJBG). Tried it on a different, simpler one – Specifically the class generated by the obvious HelloWorld.java. That broke too.
So at this point I was forced to conclude that the class file reading code in FJBG just didn’t work at all. What the hell? Wasn’t this used in the Scala compiler? Clearly it has to be able to parse class files in order to know what’s available on the classpath to compile against!
So, some digging through the compiler source later: scalac doesn’t use FJBG’s class reading code at all. It has its own entirely separate code for that. So this code which I thought was part of a fairly mature and robust compiler backend was in fact completely and utterly untested and unused. No wonder it was broken.
So, new rule (to many of you, a very old rule): If it’s library code and it’s not tested, it’s broken. An application you can judge by “Does it do the right thing?” to at least get some sense of how not broken it is. Besides, I only have to use it, no code against it. But if my code is going to depend on yours, yours better be tested.
I’m usually pretty bad at tests actually. Applications I’ve written are certainly woefully undertested. SBinary’s tests are… well, adequate. And I don’t really recommend depending on any other libraries I’ve written – they’re all a bit incomplete and half assed. :-) Hopefully this will teach me to be better.
At this point I was already rather upset with FJBG’s object model – too mutable, too many back references. So on top of fixing the reading code I was going to have to fix that. At this point I decided that it was time to cut my losses, so I’m going to go back to option 1: Write my own. I’ll certainly reuse what I can salvage from the FJBG code (assuming some worries I have about licensing are resolved), but honestly the class file format is pretty easy. The overarching format took me two hours to write a parser for (I did it the same night as discovering that . The bytecode format for method bodies is harder, but I expect to be able to reuse FJBG code for this bit (and probably write a fair bit of my own).
Anyway, hopefully this will turn out to be a good thing and I’ll end up with something much more scalic than a straight port of FJBG would have been. We’ll see. Watch this space to see if anything comes of this, and watch this repo to keep an eye on the code.
Maybe you’ll find this writeup interesting, comparing three bytecode libraries for Java bytecode: http://elliotth.blogspot.com/2008/04/generating-jvm-bytecode-3.html, from the author of the language Talc http://code.google.com/p/talc/. Cheers, Patrick
I’ve actually read that. The thing is, along with everyone else he basically recommends ASM.
Thing about ASM, and most of the other bytecode libraries suitable for my purposes: The visitor pattern gives me hives, and the APIs are really verbose.
Basically the objective here is to have a library which can take advantage of Scala’s features as much as possible. As such I’m probably better off building it from the ground up than using one of the existing ones.
Hi David,
What I did for my own Scala bytecode library (for internal use) was to use ASM as a class-file reader/writer, but keep a nice Scala-like representation. I’ve written code to parse class-files a few times, back in my Java days, and didn’t feel like doing it again.
ASM has some interesting design goals that I’m not interested in at all; small library size (hence the six different jars totaling 200k so you don’t include stuff you don’t need), small RAM footprint (visitor pattern), and instrumentation speed (doesn’t need to create lots of data structures on the heap, lets you easily leave parts of a class unchanged, doesn’t assume maxStack and maxLocals need recalculation from scratch). Once you get around that, it’s a pretty nice library to have around; the API is the cleanest I’ve found.
I’m sortof sick and tired of writing wrappers. :-) You end up spending huge amounts of time dealing with the impedance between what you have and what you want, and it costs you in both performance and usability. Also, I’m interested enough in the details of this that I think I’d like to do my own thing.
Fair enough. It’s a very nicely-documented format.
On a side note, the constant pool is a perfect example of something made much easier by Scala case classes.
Indeed. The actual code format is a bit crufty, but the overall structure is very straightforward.
And yes, the spec is full of instances of variant types. The constant pool is indeed a perfect example. :-)