Archive for the 'programming' Category

Code generation for structural proxies

Thursday, June 5th, 2008

In the latest in the long series of slightly half-assed projects of mine that never really approach anything near completion, I’ve been tinkering with runtime code generation for structural types.

Scala’s structural types are implemented using reflection. Reflection on the JVM is unfortunately quite slow, but the dynamic language community have put a fair bit of work into figuring out how to work around these issues. So, I wondered how we could apply those lessons to Scala’s structural typing.

Which, of course, explains why I didn’t bother to so much look as what they did and went off and did my own thing. :-)

The basic idea behind my design is that for a structural type

  { def foo : Bar;
    def baz : Bif;
  }

we generate an interface

  trait StructuralType$$namemanglingblargh{
    def foo : Bar;
    def baz : Bif;
  }

And for each class we want to call using the structural type we generate an adapter class for it. So if I call a method using a structural type I first box it in an adapter class that implements StructuralType$$namemanglingblargh and use that to do the method call.

However, rather than generating these classes at compile time I’m experimenting with generating them at runtime. Why? Because runtime code generation is fun. :-) Also because it makes it easier to use from Java and other languages that don’t have a Scala compiler handy.

So far I’ve only done the code for generating these wrappers. I suspect I’m going to stall horribly when it comes to actually integrating it with the compiler, so I’m going to work on how to make it as usable as possible as a library first.

Initial performance tests suggest that it’s currently only slightly faster than the standard structural types if you create a wrapper at every invocation but that if you can cache the wrapper (which happens for free if you e.g. have a recursive structural method). However these tests are totally unscientific and probably wrong, so take them with a pinch of salt.

You can get the code from here: http://www.drmaciver.com/repos/proxies/

It’s not currently the prettiest of things, but it seems to work. Mostly.

Edit: As usual, I fail at ad hoc performance testing. The current probably wrong figures are that it’s about twice as fast as the structural types if you don’t cache the wrapper class.

Scala arrays

Tuesday, June 3rd, 2008

There’s been a thread on Scala debate in the last day or two about breaking the current array implementation. It’s in most regards a very tedious thread, but one thing that’s emerged in the course of the thread is that people don’t really understand how Scala arrays work. This isn’t surprising - they’re a bit confusing. It looks like they must be horribly inefficient, but actually they’re only mildly inefficient and much of the time the compiler can remove even that and treat them as normal Java arrays.

This post is basically a “What’s going on here?” post. I’ll explain some of the basics of how Scala arrays work and what you can expect in terms of performance. Warning: This post will contain JVM bytecode. If that scares you… well, close your eyes while reading or something. :-)

There are essentially two distinct cases here. The case where the arrays are polymorphic (depend on a type parameter we don’t know the class of. e.g. an Array[T]) and the case where they’re monomorphic (depend on a fully realised type parameter. e.g. an Array[Int] or an Array[String]. I’m also going to count things like Array[List[T]], where the class is known even though the actual type depends on a parameter. This is technically wrong, but works for the purpose of this discussion).

The issue here is that Scala supports polymorphic arrays where the type parameter ranges across both object and value types. This is interesting, because the JVM doesn’t. Consequently it comes with some overhead in the implementation. Occasionally this is a significant performance hit.

Ok. So, Scala is boxing arrays to make them appear polymorphic. It must be generating horribly inefficient code to do so, right? Let’s take a look to see what happens:

Consider the following method:

def foo = {
val bar = new Array[Int](3);
bar(0) = 1;
bar;

}

So, we’re expecting something involving boxing ints to and from objects. Let’s use javap to look at the generated bytecode.

public int[] foo();
Code:
0:    iconst_3
1:    newarray int
3:    astore_1
4:    aload_1
5:    iconst_0
6:    iconst_1
7:    iastore
8:    aload_1
9:    areturn


Oh, um. So, not much boxing huh?If you can’t read bytecode, what this is doing is basically exactly what a Java compiler using the corresponding code for int[] would generate. It’s using the correct instructions for creating and storing things into a primitive int array.

Object arrays behave similarly:

def foo = {
val bar = new Array[String](3);
bar(0) = "fish";
}

public java.lang.String[] foo();
Code:
0:    iconst_3
1:    anewarray    #14; //class java/lang/String
4:    astore_1
5:    aload_1
6:    iconst_0
7:    ldc    #16; //String fish
9:    aastore
10:    aload_1
11:    areturn

This correctly generates a String[] as you’d expect.

When you have a primitive array, Scala generates pretty efficient bytecode for it - it just uses a native Java primitive array. Similarly when you have an object array of a known class type.

Note that it’s specifically the classtype that has to be known. The actual type is of no importance.Here’s another example of it doing the right thing:

def foo[T] = {
val bar = new Array[List[T]](3);
bar(0) = Nil;
bar
}

public scala.List[] foo();
Code:
0:    iconst_3
1:    anewarray    #14; //class scala/List
4:    astore_1
5:    aload_1
6:    iconst_0
7:    getstatic    #20; //Field scala/Nil$.MODULE$:Lscala/Nil$;
10:    aastore
11:    aload_1
12:    areturn

Multidimensional arrays behave similarly:

  def foo = {
    val bar = new Array[Array[Int]](10, 10);
    bar(0)(3) = 1;
    bar;
  }
public int[][] foo();
  Code:
   0:   ldc     #13; //int 10
   2:   ldc     #13; //int 10
   4:   multianewarray  #15,  2; //class "[[I"
   8:   astore_1
   9:   aload_1
   10:  iconst_0
   11:  aaload
   12:  iconst_3
   13:  iconst_1
   14:  iastore
   15:  aload_1
   16:  areturn

So far, nothing very interesting to see here. It works exactly like you'd expect it to. Where the magic starts to happen is when you have an Array[T].

So, what happens if we use one of these things polymorphically?

def foo = {
val baz = new Array[Int](3);
bar(baz, 3);
}

def bar[T](ts : Array[T], t : T) = ts(0) = t;

Clearly we need to do something. In Java this would just work by using an Object[] under the scenes, but for Scala we can’t do that as we need to be able to pass it an array of primitives.

So what we do is we box the array. We create a standard wrapper which can take both a primitive array and an object array and present an array like API for both. This is in fact the thing you’re seeing when you see the API for scala.Array - all the methods like map, flatMap, projection, etc. belong to the BoxedArray wrapper class (or one of its superclasses).

Let’s see how this looks at the bytecode level:

public void bar(scala.runtime.BoxedArray, java.lang.Object);
Code:
0:    aload_1
1:    iconst_0
2:    aload_2
3:    invokevirtual    #18; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V
6:    return

public void foo();
Code:
0:    iconst_3
1:    newarray int
3:    astore_1
4:    aload_0
5:    new    #28; //class scala/runtime/BoxedIntArray
8:    dup
9:    aload_1
10:    invokespecial    #31; //Method scala/runtime/BoxedIntArray."":([I)V
13:    iconst_3
14:    invokestatic    #37; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
17:    invokevirtual    #41; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V
20:    return

So, the method bar just generates the expected code using the BoxedArray class. On calling this code, we first create an int[] as normal, we box it with a BoxedIntArray (an implementation of BoxedArray for wrapping an int[]) and call the method as expected. There’s a bit of boxing overhead, but it’s not too bad. The main point is that at the backend we are still using a primitive array for storage.

Ok. This is all very well, but what happens if we need to create a polymorphic array? How can I do new Array[T](3) if I don’t know what T is so don’t know whether it needs to be a primitive or an object array how can I create this?

The key observation here is that as long as we are polymorphic in the array type we must box anyway. So you never have to expose a native Java array until a point where you know what the actual type is. Until that point we use a custom boxed array implementation called BoxedAnyArray. What happens here is that it starts out using an Object[] until it knows what type it is. At the point where we request it as a Java native array we now have a concrete array type we want it as. Consequently we create an array of that type at that point, populate it from the Object[] and change the BoxedAnyArray in place to now be backed by that array instead of the Object[] it originally had. The Object[] is now discarded.

Because the BoxedAnyArray’s backing store is now the newly created (possibly primitive) array, any updates to and from the boxed version will now point to this new array. Hence the native array and the boxed array will remain in synch, even though this isn’t the array it started out with. Further, and this is important, after this point we have a native array in our hands and can treat it accordingly. Anything that was using the BoxedAnyArray continues to do so, but there is no “taint” associated with our newly created array and in particular no additional overhead.

Unfortunately this trick doesn’t come for free. BoxedAnyArray is kindof a slow beast (I think some of this can be fixed with an additional level of indirection actually, but it currently isn’t). It does a lot of checking in the apply and update methods and, additionally, they’re synchronized. While you continue to use BoxedAnyArray your performance will suffer much more than the other BoxedArray implementations.

Let’s read some more bytecode:

def newarray[T] = new Array[T](3);

def foo = {
val x = newarray[Int]
x(0) = 3;
x
}

public scala.runtime.BoxedArray newarray();
Code:
0:    new    #14; //class scala/runtime/BoxedAnyArray
3:    dup
4:    iconst_3
5:    invokespecial    #17; //Method scala/runtime/BoxedAnyArray."":(I)V
8:    areturn

public int[] foo();
Code:
0:    getstatic    #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
3:    aload_0
4:    invokevirtual    #29; //Method newarray:()Lscala/runtime/BoxedArray;
7:    getstatic    #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
10:    invokevirtual    #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
13:    astore_2
14:    aload_2
15:    instanceof    #41; //class scala/runtime/BoxedArray
18:    ifeq    37
21:    getstatic    #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
24:    aload_2
25:    checkcast    #41; //class scala/runtime/BoxedArray
28:    getstatic    #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
31:    invokevirtual    #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
34:    goto    38
37:    aload_2
38:    checkcast    #43; //class "[I"
41:    astore_1
42:    aload_1
43:    iconst_0
44:    iconst_3
45:    iastore
46:    aload_1
47:    areturn

It's generating slightly redundant bytecode here, in that it's testing to see if the local variable is a BoxedArray or an int[] before unboxing it despite the fact that it’s always a BoxedArray, but that’s not a big deal. The key point here is that we had a BoxedAnyArray and we end up with an int[].

Here’s an example combining the two instances of polymorphism we’ve seen so far:

def foo = {
val x = newarray[Int]
bar(x, 3);
}

public void foo();
Code:
0:    getstatic    #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
3:    aload_0
4:    invokevirtual    #43; //Method newarray:()Lscala/runtime/BoxedArray;
7:    getstatic    #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
10:    invokevirtual    #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
13:    astore_2
14:    aload_2
15:    instanceof    #21; //class scala/runtime/BoxedArray
18:    ifeq    37
21:    getstatic    #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
24:    aload_2
25:    checkcast    #21; //class scala/runtime/BoxedArray
28:    getstatic    #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
31:    invokevirtual    #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
34:    goto    38
37:    aload_2
38:    checkcast    #55; //class "[I"
41:    astore_1
42:    aload_0
43:    new    #57; //class scala/runtime/BoxedIntArray
46:    dup
47:    aload_1
48:    invokespecial    #60; //Method scala/runtime/BoxedIntArray."":([I)V
51:    iconst_3
52:    invokestatic    #66; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
55:    invokevirtual    #68; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V   58:    return

Note how the BoxedAnyArray gets unboxed and reboxed into a BoxedIntArray before being passed anywhere. This is a good thing - BoxedIntArray is a much lighter weight thing.

Summary:

  • When used monomorphically, Scala arrays are Java arrays.
  • When used polymorphically, Scala arrays are scala.runtime.BoxedArray instances. There is a separate subclass of BoxedArray for each primitive type and one for object. When a monomorphic method receives a BoxedArray it will usually unbox it rather than call its apply and update methods.
  • There is also BoxedAnyArray. This has a fairly high overhead and is used *only* for creating polymorphic arrays. The compiler will get rid of it at the earliest possible opportunity and rebox as other forms.

So, what does this mean in terms of performance?

Firstly, if you use arrays monomorphically, they will be Java arrays and have no additional overhead associated with them (you'll need to box them for invoking methods on Array like map, projection, etc, but this isn't too bad). So if you want really high performance code, only use monomorphic arrays.

At the next level, you can still get reasonably high performance code if you use polymorphic arrays but only create them monomorphically. You'll get a boxing overhead for primitives, but that's true of any polymorphic code in Scala.

Slowest of all is when you create an array polymorphically and continue to use it polymorphically. Once you're in monomorphic code it will regain normal performance characteristics and future uses of it will behave as if it had been instantiated monomorphically (but previously stored references to it will continue to pass through a BoxedAnyArray and have the associated overhead).

One final point of warning. Consider the following code:

class Foo[T]{
private[this] val arr = new Array[T](2);
def apply(i : Int) = arr(i);
def update(i : Int, t : T) = arr(i) = t;
}

Note that the Array never escapes this. So there’s no way to unbox it!

Looking at the bytecode confirms:

public class arrays.Foo extends java.lang.Object implements scala.ScalaObject{
public arrays.Foo();
Code:
0:    aload_0
1:    invokespecial    #12; //Method java/lang/Object."":()V
4:    aload_0
5:    new    #14; //class scala/runtime/BoxedAnyArray
8:    dup
9:    iconst_2
10:    invokespecial    #17; //Method scala/runtime/BoxedAnyArray."":(I)V
13:    putfield    #21; //Field arr:Lscala/runtime/BoxedArray;
16:    return

public void update(int, java.lang.Object);
Code:
0:    aload_0
1:    getfield    #21; //Field arr:Lscala/runtime/BoxedArray;
4:    iload_1
5:    aload_2
6:    invokevirtual    #27; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V
9:    return

public java.lang.Object apply(int);
Code:
0:    aload_0
1:    getfield    #21; //Field arr:Lscala/runtime/BoxedArray;
4:    iload_1
5:    invokevirtual    #38; //Method scala/runtime/BoxedArray.apply:(I)Ljava/lang/Object;
8:    areturn
}

The array never gets unboxed and consequently continues to use the higher overhead BoxedAnyArray forever. If you instead use an Array[AnyRef] here and cast your code will run noticably faster (which was the observation which sparked this whole discussion)

Random linear algebra hackery in Scala

Monday, June 2nd, 2008

One of my random pet projects at the moment is putting together a prettified linear algebra API for Scala. I’m not implementing the linear algebra routines myself, but am instead implementing it as a wrapper over Colt. This isn’t really any sort of stable release or project. It’s more of a “If you want this sort of thing you might find it useful” comment.

The basic design is:

  • Everything is mutable, but the API encourages using it in an immutable way.
  • Makes no attempt to hide its colt origins. If you want to use it with Colt proper, it will cheerfully let you do so, and you will often have to explicitly use colt classes.
  • API loosely inspired by matlab/Octave
  • Hideous abuses of overloading, implicits, functional programming and symbolic method names abound in order to get an API that allows for very compact code.  If you don’t like this sort of thing you may want to steer clear. :-)

It’s currently available from my misc hg repository, but it’s getting large enough that I’ll probably factor it out into its own soon.

SBinary 0.2

Sunday, June 1st, 2008

I quietly uploaded SBinary 0.2 last night. I was going to write up some documentation for it today and do an official announcement, but it’s now coming up to 19:00 and I still haven’t written any documentation so I have a sneaking suspicion it ain’t happening today. So, if you want to have a play with it there’s a new jar and scaladoc up there.

Changes from 0.1:

  • Improved API which is less closely tied to java.io
  • Supports a wider range of collections and standard types (not the entire standard library as I’d originally said I was going to do. There are some issues surrounding collections and open unions which I want to resolve before I do that).
  • A limited amount of support for lazy IO using Stream. (Specifically, Stream is read lazily from an input. However there are a lot of limitations on this at the moment). The design is such that it should be impossible to incorrectly interleave IO actions (i.e. reading something else from the input will force pending lazy reads to complete).
  • Improved some performance stupidities (nonbuffered reading/writing to files, unspecialised reading/writing of byte arrays)
  • Improved generic combinators for building binary instances (although the “case classes for free” one I want to add is waiting on a compiler bug before I can add it)

I hope to write up some proper documentation soon, and when I do I’ll send an announcement to the mailing list. In the meantime, feel free to have a play.

Book Review: The Art of Assembly Language

Thursday, May 29th, 2008

I’ve been meaning to get a handle on a lot more low level programming details for a while. I can do C, more or less, but I only had the vaguest notion of the lower level CPU architecture and assembly language. While I was over in the states, Victoria made the mistake of taking me into a book store. I emerged several books later. Among them, The Art of Assembly Language. This post is a review of the book. Well, ok. It’s more of a rant about the book.

First off, let me say this: I liked this book. I read it cover to cover (although a few of the chapters I only skimmed), which is extremely rare for a programming book. I can’t offhand think of another example of one I own (of which there are many) which I’ve read that thoroughly. Most of them I dip into and read chapters out of order as I feel like it. I came out feeling like I had a much better understanding of the way the x86 architecture works and various details that had previously escaped me. It’s interesting and largely well written.

So, it’s a good book, and I’m glad I read it. And that’s the last positive thing I’m likely to say about it in this post. The rest of the post is me being mean about its flaws.

First off: After reading this book, one comes out with a good sense the way various aspects of assembly are constructed and used. It clears up a lot of misconceptions you might have had if you don’t already know assembly and provides a high level sense of what makes up an assembly program (Examples: I had previously assumed that the stack was purely a higher level construct and didn’t have any specific CPU support. I also hadn’t realised quite how significant registers were for… well, everything, or the way the FPU was set up in relation to the rest of the CPU instructions). What it doesn’t teach you to do is write assembly.

The text uses High Level Assembler. This is an interesting pedagogical tool, as it certainly helps to remove a lot of the scariness of writing assembler (compare the hello worlds for example). The problem is that it’s not really an assembler. Instead it’s the bastard offspring of a highish level language (well, low to mediumish. Somewhere a little above C but below C++) and x86 assembly. Worse, the bits that look like x86 assembly often aren’t. Some of this takes the form of relatively harmless little extensions. e.g. in x86 assembly the mov instruction may only move data between registers or a register and a memory location. You can’t mov between two addresses in main memory. In HLA you can. mov(x, y) translates to push(x) pop(y).

If you know your assembly, you’ll just have spotted the other far more insidious way in which it’s different (I do not know my assembly, hence only discovered this on a) Reading some NASM source and getting very confused and b) Reading FAQ entry 8). The operands are backwards from intel’s mnemonics. Mostly. Except when they’re not. So in other words if you learn to write HLA you will most likely completely destroy your ability to write other assemblers because you will constantly get fuddled by operand order (ok, that’s probably not true. I suspect a few weeks with an assembler and someone standing over you with a big stick would cure that).

But that’s ok. I mean, surely HLA is a mature and well tested product? It’s been around for over a decade. If it’s that much better than existing approaches to assembler, what’s the harm in doing things differently?

Um.

Setting aside any philosophical issues with the way HLA does things, the author repeatedly and insistently describes the current implementation of HLA as horrible, not designed for performance and a prototype. This does not exactly fill me with confidence.

So, what does HLA add on top of x86 assembly?

  • A nice standard library.
  • In particular, good String handling (better than C’s in my opinion).
  • Support for composite datatypes (records, unions, pointers, arrays)
  • A calling convention for passing arguments to procedures. I’m not thrilled by this - something like it is definitely useful to have, but the calling convention chosen suffers from the C problem of making it all too easy to copy large amounts of data around on the stack (more so: It passes arrays by value).
  • A macro language (the author seems to be convinced that this is the most advanced macro language ever. It’s not. It’s a half assed scripting language built into a preprocessor. It’s not really better than using something like fmpp or similar. It’s not even close to lispy macros).
  • Ten thousand different types of loop.
  • Exceptions ?!
  • Objects ?!?!

The exception subsystem makes me particularly mad. The book goes into a great deal of detail (too much detail) about how all the different control flow extensions are implemented and is completely silent on the subject of exceptions. I assumed that this was an advanced subject left to the HLA manual. Not so much - no description of detail there.

So, we have a high level feature whose implementation is undocumented and which is used in many places in the standard library. It’s good that we’re writing assembly here - it really lets us see what the close to the metal programming is like.

The objects section is just sad. I don’t really have anything more to say about it than that.

As a tool for teaching, HLA serves its purpose, but it would work a lot better if it were heavily trimmed down and made less of an attempt at being higher level than normal assembly and more of one at being a lot of useful libraries, some extensions for procedure calls and a better macro language. As it is, I certainly wouldn’t want to program in it.