Tag Archives: scala

Functional code != Good code

There’s a dangerous trap to fall into: The belief that functional code is automatically good code. Hopefully not too many people would come out and actually claim this, but it seems to be an unstated common belief. It’s also utter bollocks.

Functional programming gives you tools for writing good code. Good functional code can be very good (good imperative code can be very good too! But that’s not the point of this post). Bad functional code can be just as bad as its imperative cousin.

Now, what post about functional programming would be complete without some trivial one liners? Let’s start by summing a collection of integer elements.

Imperatively:

   var x = 0;
   for (j <- myCollection) x += j;

Functionally:

  val x = myCollection.foldLeft(0)(_+_);

The functional code is a little shorter. You’ve avoided making the sum variable mutable, which is nice. It introduces one fewer local variable which helps with reading the code (but it wouldn’t if Scala didn’t have the cut notation for anonymous functions, you’d have had to write (j, i) => j + i, which introduces *more* variables). Is the fact that you’ve used a higher order function an improvement? No, not really. The loop is a higher order function too. We could have written it:

   var x = 0;
   myCollection.foreach(x += _);

and it would have been exactly the same thing (literally. The Scala compiler emits something nearly identical to the second version when given the first version). So the only difference here is the lack of mutability. Both examples are trivial, so there isn’t a whole load of difference. But most code is just a sequence of trivialities, so considering how they’re written is valuable.

One advantage of the fold here is that you can use it as an expression without binding it to a variable. The loop doesn’t let you do that. But is that really a good thing for you to be doing?

Thing is, both of these examples are the wrong way to do it in the middle of other code. The right way to do it?

  val x = sum(myCollection);

(Unfortunately there is no sum method in the Scala standard library. A minor wart. But there’s one in e.g. scalax, or your personal utility classes, or wherever you care to look really).

Inline calculations like this should be factored out into simple methods. This is a no brainer, everyone knows it’s a good idea. And the fact that you’ve written your code in a single expression with a fold doesn’t exempt you from that.

Let’s make our trivial example very slightly less trivial. Instead of a collection of integers, I have a collection of collections. I want to find the total length.

Imperative version:

  var x = 0;
  for (c <- myCollection) x += c.size;

Now we’ll see how the functional code really shines in this example:

Functional version:

  val x = myCollection.foldLeft(0)((y, c) => y + c.size)

See how much better abstracted that is than the imperative version?

Um, no. Me neither.

When you’re doing folding like this if you don’t have an already defined function to be reusing (which doesn’t have to be a named constant, it could be the result of some other computation. It probably shouldn’t be a lambda), this really isn’t any better than imperative version. At least to my eyes I find it a bit worse, but that’s largely a matter of opinion. It’s sure as hell not better.

So how *should* you be writing it? Am I now going to say you should define a method sumLengthsOfCollections?

Fuck no.

You should solve it, as with so many other things, by reducing it to a problem you’ve already solved.

   val x = sum(myCollection.map(_.size));

Here you actually *have* reused logic properly, and the code is at least somewhat better. And if you wanted to later change the implementations of sum (Think this is a ludicrous comment? Really? What if this were acting on BigInts rather than integers. Still think it is?).

In these trivial examples it doesn’t make much of a difference, but as the number of variations and steps in the pipeline increases this style of code really starts to make a difference. Earlier today I had to deal with a method definition that involved a really large number of folds, most of them doing trivial things. I was not amused. The fact that you can fold, doesn’t mean that you should, and the fact that you’ve been given higher order functions isn’t a license to use the wrong ones and assume your code will be good anyway.

Reusing Java’s tools: Cobertura and Scala

I’ve recently been writing some collection implementations for Scala which will hopefully go in the standard library in the upcoming release. Naturally I want to make sure that I tested the hell out of them, so I’ve been writing a reasonably large number of ScalaCheck tests to test the code and make sure it does what I want it to. Earlier I realised I should probably be doing some sort of coverage checking to make sure I had tested things adequately so thought I’d see how easy it would be to plug Cobertura into a Scala project. And I have good news! The answer is: Essentially trivial.

There were a few minor issues that cropped up while doing it:

  • Cobertura strips scala specific metadata out of the class files. This means that you must compile everything first, including your unit tests, before instrumenting code.
  • Scala does a lot of name mangling and generates a lot of classes with weird names. Cobertura naturally doesn’t understand the mapping back to Scala code, so you’ll see the mangled names. They’re not usually that hard to figure out.
  • Cobertura won’t be able to find your source files if you don’t follow Java directory conventions for packages.

Other than that, it all works fine. You get per line coverage information for your source files, decent per package and per class reports, etc. No problem.

Unfortunately my coverage wasn’t nearly as good as I hoped, so I’ll probably have to write a bunch more tests before I submit a patch.

Update: Actually, the line number reporting for Cobertura + Scala appears to be really bad, and often really hard to decipher. But it’s still a nice tool, and actually helped me catch some bugs.

This entry was posted in programming and tagged , , on by .

Unsigned comparison in Java/Scala

Java (and consequently Scala) lack unsigned integer types. Most of the time this isn’t an issue – the majority of the operations you’re likely to care about (addition, subtraction, bitwise operations) work the same regardless of whether you treat the integer as unsigned. There are a few cases where it is a pain though – one that bit me in the context of the work I’ve been doing on and off with Java classfiles is that converting between types of different size. The other one that’s an issue is comparison – integer comparison is signed in Java and there are no operations for doing the unsigned compare.

One way to fix this is to delegate to a solution to the previous problem: Bump the integers up to a type one size up in an unsigned way so that they get converted to positive values, do a comparison on there. This works fine for the smaller values, but if you were to do this with a long you’d have to upgrade to a BigInt, which is a bit sad. I’d previously been doing it this way but earlier when I needed to do unsigned comparisons on longs I decided to look for a better route. (Full disclosure: Actually what happened is I converted some code using ints to code using long and suddenly all my unit tests broke. At that point I discovered that I’d still been trying to do an unsigned comparison using the code for doing it for integers).

Anyway, I popped into ##java to see if anyone there had any ideas for how to do it. Totally counter to form, ##java was actually incredibly useful. The best solution came from someone who went by Tamutnefret and is as follows:

  def unsignedCompare(i : Long, j : Long) = (i < j) ^ (i < 0) ^ (j < 0)

Which is really neat and avoids doing any branching.

Let’s see why this works. We’ll break it down by cases on i (there’s probably a nicer way to see that it works, but I’m blanking on what it is).

Suppose i = 0. This becomes (0 < j) ^ false ^ (j < 0), which is (0 < j) ^ (j < 0), or j != 0. Which is correct, as the only unsigned integer not > 0 is 0.

Suppose now i < 0. This becomes (i < j) ^ true ^ (j < 0).
If (j < 0) this becomes i < j, as x ^ true ^ true = x ^ false = x. If j >= 0 then we certainly have i < j, so we get true ^ true ^ false = false, which is again correct: In unsigned comparison any negative number is greater than any non-negative one. The case i > 0 follows similarily (the standard mathematician’s weasel words).

Anyway, the explanation of why is boring. It takes less time to convince yourself of its truth than it did to read that. :-) But the point is that it’s a cute, useful formula which I’d not previously encountered so I thought I’d record it here.

This entry was posted in programming and tagged , on by .

Lessons learned in class

(Warning: I’m half asleep, and this post is somewhere between a brain dump and a rant. Coherency is strictly optional).

So, my latest random personal project has turned into a bit of a debacle.

I decided I wanted a Java bytecode manipulation library with a decent Scala API. The options were either “Write my own” or “Write bindings to an existing one”. I chose something of a middle ground: “Port an existing one”. Rather than go for any of the normal big names I went for an obscure little internal library at EPFL called FJBG (Fast Java Bytecode Generator). It’s basically a low level interface onto the classfile format, and I’d used it before for code generation (e.g. for the structural proxies stuff) and found it pretty straightforward. Kindof hard to debug programming errors, but otherwise pretty robust.

One slight flaw: No test suite to speak of. But that’s ok, it’s used as part of the compiler backend for scalac, so I assume it gets relatively well covered by the scalac test suite. And it’s been around for quite a while, so has had itme to stabilise. Should be fine.

Right?

Right?

Anyway, the initial porting process went pretty smoothly. I was astonished at how smoothly in fact – after about 6 hours of work I had the bytecode generation code working in Scala and prettified to have nicer method names, etc. Pretty good going. I was frankly astonished – I basically ran it through jatran, spent about 6 hours fixing compiler errors and then at the end it took about 10 minutes of bug fixing before it just worked. Not bad. The only slight problem was that the class file parsing code wasn’t working.

The problem was that the way the code worked there was a fairly deeply nested inheritance strategy, and maintained two constructor hierarchies – one for creating things in memory, one for creating them from a DataInputStream. because of the way Scala handles constructors this is essentially impossible to do in Scala.

I’ve never thought this was a problem before, but this seemed to me to be quite a reasonable thing to do and I started to have doubts about Scala’s approach to constructors. I still have some, but not to the point that I previously had. The thing is, this approach is really fragile. It means that each constructor needs to balance the class’s invariants in different ways – you’ve given yourself twice as many opportunities to screw up.

Anyway, after some struggling with approaches I eventually (took me several times as long as the previous part of the porting) got this ported in a reasonably straightforward way. It wasn’t the prettiest code ever, but the mapping onto the original wasn’t bad. So I tried it out on a few simple tests – generate a class file, read it back in again, compare them to make sure you got approximately the same thing.

Hm. And it didn’t work. How curious.

I stared at the implementation for a bit, stared at the original Java, couldn’t see a difference. So I ran the same test on the original Java and it broke in the same way. Great.

That turned out to be an easy fix. But it was an easy fix to a problem very definitely caused by the multiple constructor hierarchy. Oh well, that worked now.

Next part of the test. Write the newly read class file to a file, load it and try to run it.

Oops. It NPEs when I try to write the file. Guess I did something wrong – I wonder why that array is null there. Looks like the logic for initialising it is rather complex, lets see how the original Java version handles this. So I wrote a simplified test case using the original which took a class file, read it to the in memory representation and wrote it out again and tested it against a random class file. It broke. In a totally different way to the way my version did – it didn’t even manage to read the file (I think the difference here is that this was a classfile found in the wild rather than one generated by FJBG). Tried it on a different, simpler one – Specifically the class generated by the obvious HelloWorld.java. That broke too.

So at this point I was forced to conclude that the class file reading code in FJBG just didn’t work at all. What the hell? Wasn’t this used in the Scala compiler? Clearly it has to be able to parse class files in order to know what’s available on the classpath to compile against!

So, some digging through the compiler source later: scalac doesn’t use FJBG’s class reading code at all. It has its own entirely separate code for that. So this code which I thought was part of a fairly mature and robust compiler backend was in fact completely and utterly untested and unused. No wonder it was broken.

So, new rule (to many of you, a very old rule): If it’s library code and it’s not tested, it’s broken. An application you can judge by “Does it do the right thing?” to at least get some sense of how not broken it is. Besides, I only have to use it, no code against it. But if my code is going to depend on yours, yours better be tested.

I’m usually pretty bad at tests actually. Applications I’ve written are certainly woefully undertested. SBinary’s tests are… well, adequate. And I don’t really recommend depending on any other libraries I’ve written – they’re all a bit incomplete and half assed. :-) Hopefully this will teach me to be better.

At this point I was already rather upset with FJBG’s object model – too mutable, too many back references. So on top of fixing the reading code I was going to have to fix that. At this point I decided that it was time to cut my losses, so I’m going to go back to option 1: Write my own. I’ll certainly reuse what I can salvage from the FJBG code (assuming some worries I have about licensing are resolved), but honestly the class file format is pretty easy. The overarching format took me two hours to write a parser for (I did it the same night as discovering that . The bytecode format for method bodies is harder, but I expect to be able to reuse FJBG code for this bit (and probably write a fair bit of my own).

Anyway, hopefully this will turn out to be a good thing and I’ll end up with something much more scalic than a straight port of FJBG would have been. We’ll see. Watch this space to see if anything comes of this, and watch this repo to keep an eye on the code.

This entry was posted in programming and tagged , , on by .

Planet Scala

As you might have noticed, I’m sometimes a very angry person. :-)

One of the things that has gotten me particularly angry recently is the shockingly poor quality of aggregation sites for Scala. There’s Artima Scala Buzz, which I’ve chronicled my irritation with before, and a new Scala driven ad farm which if you read the mailing lists you’ve probably noticed.

The particular reason this makes me so angry is that setting up a good aggregator is really damn easy. Planet planet is very simple to set up (it took me maybe half an hour of tinkering) and produces consistently good results. Planet Haskell is basically a vanilla setup of it and works very well.

So, as usual, I end up porting something from Haskell to Scala. Say hello to Planet Scala

This entry was posted in Admin, programming and tagged , on by .