They called me mad

July 21st, 2008

I haven’t written any fiction in ages, mostly because I kept dropping ideas as they were taking too long, so I’ve decided to start experimenting with microfiction as a format. Here’s the first of these.

They called me mad

I wake up on an operating table. I try to sit up, but I’m strapped down. Of course. Why is this never easy?

“Ah, Mr Michaels. You’re awake. Terribly sorry for the inconvenience, but we couldn’t have you interfering with the experiment. I’ll let you go as soon as we’re done here”.

I look at the speaker. An older man in a lab coat (why do they always wear lab coats? It’s not like they really need them. They’re always pristine white), fiddling around with some computer setup. My target.

There’s another table to my left. A dead woman on it - quite far gone. Well preserved, but withered and with a trace of decay. He’s further along than I’d hoped.

I strain against the bonds. No luck. I try to get to the knife in my hidden pocket, but it’s been taken. No way out but talking I guess.

“There’s still time to stop. I can guarantee you won’t be harmed.”

He looks genuinely puzzled.

“Why would I stop? Things are going so well”.

“What you’re doing is against nature! It will turn around and bite you if you don’t stop before it’s too late!”

“Do you live in a tree, Mr Michaels?”

“What?”

“Simple question. Do you live in a tree?”

“No. Why would I live in a tree?”

“Natural state of living for monkeys like ourselves. This modern housing, very unnatural.”

“That’s different”.

“Is it? Oh well, I suppose you’ve convinced me. I have seen the error of my ways and shall come quietly”.

“Really?”

“No, I’m afraid not. Anyway”, he said smiling brightly “all ready. We might as well begin”.

He presses a few buttons and a background humming noise I had hardly noticed raised in volume and pitch. My jaw dropped

“But how can you be ready? The storm isn’t for another two days!”

More bemused looks.

“Why would I need a storm? That sounds like a very unreliable way of working. I have a generator in the basement, and capacitors for storing the electric charge”.

I’m panicking now. I’d never let them get this close before. “Look, just stop! It’s all going to be horribly wrong!”

“Don’t be ridiculous. Do you think I haven’t tested this? I’m a scientist Mr Michaels. There is a process for these things. They called me mad, but the one accusation they never leveled was that my methods were insufficiently rigorous”.

“But you’ve never tested it on a human. Never on a being with a soul!”

He laughs. Not a cackle, just an amused little chuckle. I’ve heard a lot of mad laughs. I’m practically a connoisseur of a good diabolical laugh. Believe me, this chuckle is a lot worse.

“Of course I’ve tested it on a human. Tell me, Mr Michaels, did you think people normally wake up feeling quite so well rested after being shot in the face?”

I remember. I had my gun out, pointing at him, when there was a loud bang and then blackness. Oh god, he actually did it.

The humming raises to a fever pitch. There’s a crack, as if of thunder, and a bright white light fills the room. The dead woman sits up, decay fading and color and flesh returning to her. She smiles.

“Now, Mr Michaels, would you care to join Alice and myself for tea?”


java.lang.String#hashCode

July 16th, 2008

Apologies in advance. This is a really boring post.

For reasons you either know about by now or don’t care about, I was curious as to how well String’s hashCode was distributed (I suspected the answer was “not very”). I ran a few quick experiments to verify this.

For your amusement, here is a list of all hash collisions between alphanumeric strings of size 2: http://www.drmaciver.com/collisions.txt and here is a list of all which don’t collide with any others http://www.drmaciver.com/noncolliding.txt

Some statistics: There are 3844 alphanumeric strings of size 2. Of these 3570 collide with at least one other string. That is, 274 of these strings (or about 7% of them) *don’t* collide with something else.

Oh well. It’s a good thing no one would be stupid enough to rely on hashCode to distinguish the contents of two objects.

Edit: Additional facts I originally posted in the comments.

For what it’s worth, even fewer strings have unique hash codes for 3 characters. 3948 don’t collide, or about 1.6% of them.

This of course doesn’t mean that probability of a hash collision is really high. In reality it’s acceptably low. It’s just a demonstration that it’s not hard to find colliding pairs.

The following consist of all the String hash collisions contained in the sowpods scrabble dictionary:

  • isohel, epistolaries
  • righto, buzzards
  • hierarch, crinolines
  • inwork, hypercatalexes
  • wainages, presentencing
  • variants, gelato
  • misused, horsemints
  • trichothecenes, locular
  • pomatoes, eructation

Lessons learned in class

June 16th, 2008

(Warning: I’m half asleep, and this post is somewhere between a brain dump and a rant. Coherency is strictly optional).

So, my latest random personal project has turned into a bit of a debacle.

I decided I wanted a Java bytecode manipulation library with a decent Scala API. The options were either “Write my own” or “Write bindings to an existing one”. I chose something of a middle ground: “Port an existing one”. Rather than go for any of the normal big names I went for an obscure little internal library at EPFL called FJBG (Fast Java Bytecode Generator). It’s basically a low level interface onto the classfile format, and I’d used it before for code generation (e.g. for the structural proxies stuff) and found it pretty straightforward. Kindof hard to debug programming errors, but otherwise pretty robust.

One slight flaw: No test suite to speak of. But that’s ok, it’s used as part of the compiler backend for scalac, so I assume it gets relatively well covered by the scalac test suite. And it’s been around for quite a while, so has had itme to stabilise. Should be fine.

Right?

Right?

Anyway, the initial porting process went pretty smoothly. I was astonished at how smoothly in fact - after about 6 hours of work I had the bytecode generation code working in Scala and prettified to have nicer method names, etc. Pretty good going. I was frankly astonished - I basically ran it through jatran, spent about 6 hours fixing compiler errors and then at the end it took about 10 minutes of bug fixing before it just worked. Not bad. The only slight problem was that the class file parsing code wasn’t working.

The problem was that the way the code worked there was a fairly deeply nested inheritance strategy, and maintained two constructor hierarchies - one for creating things in memory, one for creating them from a DataInputStream. because of the way Scala handles constructors this is essentially impossible to do in Scala.

I’ve never thought this was a problem before, but this seemed to me to be quite a reasonable thing to do and I started to have doubts about Scala’s approach to constructors. I still have some, but not to the point that I previously had. The thing is, this approach is really fragile. It means that each constructor needs to balance the class’s invariants in different ways - you’ve given yourself twice as many opportunities to screw up.

Anyway, after some struggling with approaches I eventually (took me several times as long as the previous part of the porting) got this ported in a reasonably straightforward way. It wasn’t the prettiest code ever, but the mapping onto the original wasn’t bad. So I tried it out on a few simple tests - generate a class file, read it back in again, compare them to make sure you got approximately the same thing.

Hm. And it didn’t work. How curious.

I stared at the implementation for a bit, stared at the original Java, couldn’t see a difference. So I ran the same test on the original Java and it broke in the same way. Great.

That turned out to be an easy fix. But it was an easy fix to a problem very definitely caused by the multiple constructor hierarchy. Oh well, that worked now.

Next part of the test. Write the newly read class file to a file, load it and try to run it.

Oops. It NPEs when I try to write the file. Guess I did something wrong - I wonder why that array is null there. Looks like the logic for initialising it is rather complex, lets see how the original Java version handles this. So I wrote a simplified test case using the original which took a class file, read it to the in memory representation and wrote it out again and tested it against a random class file. It broke. In a totally different way to the way my version did - it didn’t even manage to read the file (I think the difference here is that this was a classfile found in the wild rather than one generated by FJBG). Tried it on a different, simpler one - Specifically the class generated by the obvious HelloWorld.java. That broke too.

So at this point I was forced to conclude that the class file reading code in FJBG just didn’t work at all. What the hell? Wasn’t this used in the Scala compiler? Clearly it has to be able to parse class files in order to know what’s available on the classpath to compile against!

So, some digging through the compiler source later: scalac doesn’t use FJBG’s class reading code at all. It has its own entirely separate code for that. So this code which I thought was part of a fairly mature and robust compiler backend was in fact completely and utterly untested and unused. No wonder it was broken.

So, new rule (to many of you, a very old rule): If it’s library code and it’s not tested, it’s broken. An application you can judge by “Does it do the right thing?” to at least get some sense of how not broken it is. Besides, I only have to use it, no code against it. But if my code is going to depend on yours, yours better be tested.

I’m usually pretty bad at tests actually. Applications I’ve written are certainly woefully undertested. SBinary’s tests are… well, adequate. And I don’t really recommend depending on any other libraries I’ve written - they’re all a bit incomplete and half assed. :-) Hopefully this will teach me to be better.

At this point I was already rather upset with FJBG’s object model - too mutable, too many back references. So on top of fixing the reading code I was going to have to fix that. At this point I decided that it was time to cut my losses, so I’m going to go back to option 1: Write my own. I’ll certainly reuse what I can salvage from the FJBG code (assuming some worries I have about licensing are resolved), but honestly the class file format is pretty easy. The overarching format took me two hours to write a parser for (I did it the same night as discovering that . The bytecode format for method bodies is harder, but I expect to be able to reuse FJBG code for this bit (and probably write a fair bit of my own).

Anyway, hopefully this will turn out to be a good thing and I’ll end up with something much more scalic than a straight port of FJBG would have been. We’ll see. Watch this space to see if anything comes of this, and watch this repo to keep an eye on the code.


Planet Scala

June 9th, 2008

As you might have noticed, I’m sometimes a very angry person. :-)

One of the things that has gotten me particularly angry recently is the shockingly poor quality of aggregation sites for Scala. There’s Artima Scala Buzz, which I’ve chronicled my irritation with before, and a new Scala driven ad farm which if you read the mailing lists you’ve probably noticed.

The particular reason this makes me so angry is that setting up a good aggregator is really damn easy. Planet planet is very simple to set up (it took me maybe half an hour of tinkering) and produces consistently good results. Planet Haskell is basically a vanilla setup of it and works very well.

So, as usual, I end up porting something from Haskell to Scala. Say hello to Planet Scala


Code generation for structural proxies

June 5th, 2008

In the latest in the long series of slightly half-assed projects of mine that never really approach anything near completion, I’ve been tinkering with runtime code generation for structural types.

Scala’s structural types are implemented using reflection. Reflection on the JVM is unfortunately quite slow, but the dynamic language community have put a fair bit of work into figuring out how to work around these issues. So, I wondered how we could apply those lessons to Scala’s structural typing.

Which, of course, explains why I didn’t bother to so much look as what they did and went off and did my own thing. :-)

The basic idea behind my design is that for a structural type

  { def foo : Bar;
    def baz : Bif;
  }

we generate an interface

  trait StructuralType$$namemanglingblargh{
    def foo : Bar;
    def baz : Bif;
  }

And for each class we want to call using the structural type we generate an adapter class for it. So if I call a method using a structural type I first box it in an adapter class that implements StructuralType$$namemanglingblargh and use that to do the method call.

However, rather than generating these classes at compile time I’m experimenting with generating them at runtime. Why? Because runtime code generation is fun. :-) Also because it makes it easier to use from Java and other languages that don’t have a Scala compiler handy.

So far I’ve only done the code for generating these wrappers. I suspect I’m going to stall horribly when it comes to actually integrating it with the compiler, so I’m going to work on how to make it as usable as possible as a library first.

Initial performance tests suggest that it’s currently only slightly faster than the standard structural types if you create a wrapper at every invocation but that if you can cache the wrapper (which happens for free if you e.g. have a recursive structural method). However these tests are totally unscientific and probably wrong, so take them with a pinch of salt.

You can get the code from here: http://www.drmaciver.com/repos/proxies/

It’s not currently the prettiest of things, but it seems to work. Mostly.

Edit: As usual, I fail at ad hoc performance testing. The current probably wrong figures are that it’s about twice as fast as the structural types if you don’t cache the wrapper class.