Tag Archives: scala

Code generation for structural proxies

In the latest in the long series of slightly half-assed projects of mine that never really approach anything near completion, I’ve been tinkering with runtime code generation for structural types.

Scala’s structural types are implemented using reflection. Reflection on the JVM is unfortunately quite slow, but the dynamic language community have put a fair bit of work into figuring out how to work around these issues. So, I wondered how we could apply those lessons to Scala’s structural typing.

Which, of course, explains why I didn’t bother to so much look as what they did and went off and did my own thing. :-)

The basic idea behind my design is that for a structural type

  { def foo : Bar; 
    def baz : Bif; 
  }

we generate an interface

  trait StructuralType$$namemanglingblargh{ 
    def foo : Bar; 
    def baz : Bif; 
  }

And for each class we want to call using the structural type we generate an adapter class for it. So if I call a method using a structural type I first box it in an adapter class that implements StructuralType$$namemanglingblargh and use that to do the method call.

However, rather than generating these classes at compile time I’m experimenting with generating them at runtime. Why? Because runtime code generation is fun. :-) Also because it makes it easier to use from Java and other languages that don’t have a Scala compiler handy.

So far I’ve only done the code for generating these wrappers. I suspect I’m going to stall horribly when it comes to actually integrating it with the compiler, so I’m going to work on how to make it as usable as possible as a library first.

Initial performance tests suggest that it’s currently only slightly faster than the standard structural types if you create a wrapper at every invocation but that if you can cache the wrapper (which happens for free if you e.g. have a recursive structural method). However these tests are totally unscientific and probably wrong, so take them with a pinch of salt.

You can get the code from here: http://www.drmaciver.com/repos/proxies/

It’s not currently the prettiest of things, but it seems to work. Mostly.

Edit: As usual, I fail at ad hoc performance testing. The current probably wrong figures are that it’s about twice as fast as the structural types if you don’t cache the wrapper class.

This entry was posted in programming and tagged , on by .

Scala arrays

Update: Note that this is about the old implementation and no longer applies.

There’s been a thread on Scala debate in the last day or two about breaking the current array implementation. It’s in most regards a very tedious thread, but one thing that’s emerged in the course of the thread is that people don’t really understand how Scala arrays work. This isn’t surprising – they’re a bit confusing. It looks like they must be horribly inefficient, but actually they’re only mildly inefficient and much of the time the compiler can remove even that and treat them as normal Java arrays.

This post is basically a “What’s going on here?” post. I’ll explain some of the basics of how Scala arrays work and what you can expect in terms of performance. Warning: This post will contain JVM bytecode. If that scares you… well, close your eyes while reading or something. :-)

There are essentially two distinct cases here. The case where the arrays are polymorphic (depend on a type parameter we don’t know the class of. e.g. an Array[T]) and the case where they’re monomorphic (depend on a fully realised type parameter. e.g. an Array[Int] or an Array[String]. I’m also going to count things like Array[List[T]], where the class is known even though the actual type depends on a parameter. This is technically wrong, but works for the purpose of this discussion).

The issue here is that Scala supports polymorphic arrays where the type parameter ranges across both object and value types. This is interesting, because the JVM doesn’t. Consequently it comes with some overhead in the implementation. Occasionally this is a significant performance hit.

Ok. So, Scala is boxing arrays to make them appear polymorphic. It must be generating horribly inefficient code to do so, right? Let’s take a look to see what happens:

Consider the following method:

def foo = {
val bar = new Array[Int](3);
bar(0) = 1;
bar;

}

So, we’re expecting something involving boxing ints to and from objects. Let’s use javap to look at the generated bytecode.

public int[] foo();
Code:
0:    iconst_3
1:    newarray int
3:    astore_1
4:    aload_1
5:    iconst_0
6:    iconst_1
7:    iastore
8:    aload_1
9:    areturn


Oh, um. So, not much boxing huh?If you can’t read bytecode, what this is doing is basically exactly what a Java compiler using the corresponding code for int[] would generate. It’s using the correct instructions for creating and storing things into a primitive int array.

Object arrays behave similarly:

def foo = {
val bar = new Array[String](3);
bar(0) = "fish";
}

public java.lang.String[] foo();
Code:
0:    iconst_3
1:    anewarray    #14; //class java/lang/String
4:    astore_1
5:    aload_1
6:    iconst_0
7:    ldc    #16; //String fish
9:    aastore
10:    aload_1
11:    areturn

This correctly generates a String[] as you’d expect.

When you have a primitive array, Scala generates pretty efficient bytecode for it – it just uses a native Java primitive array. Similarly when you have an object array of a known class type.

Note that it’s specifically the classtype that has to be known. The actual type is of no importance.Here’s another example of it doing the right thing:

def foo[T] = {
val bar = new Array[List[T]](3);
bar(0) = Nil;
bar
}

public scala.List[] foo();
Code:
0:    iconst_3
1:    anewarray    #14; //class scala/List
4:    astore_1
5:    aload_1
6:    iconst_0
7:    getstatic    #20; //Field scala/Nil$.MODULE$:Lscala/Nil$;
10:    aastore
11:    aload_1
12:    areturn

Multidimensional arrays behave similarly:

  def foo = {
    val bar = new Array[Array[Int]](10, 10);
    bar(0)(3) = 1;
    bar;
  }
public int[][] foo();
  Code:
   0:   ldc     #13; //int 10
   2:   ldc     #13; //int 10
   4:   multianewarray  #15,  2; //class "[[I"
   8:   astore_1
   9:   aload_1
   10:  iconst_0
   11:  aaload
   12:  iconst_3
   13:  iconst_1
   14:  iastore
   15:  aload_1
   16:  areturn

So far, nothing very interesting to see here. It works exactly like you’d expect it to. Where the magic starts to happen is when you have an Array[T].

So, what happens if we use one of these things polymorphically?

def foo = {
val baz = new Array[Int](3);
bar(baz, 3);
}

def bar[T](ts : Array[T], t : T) = ts(0) = t;

Clearly we need to do something. In Java this would just work by using an Object[] under the scenes, but for Scala we can’t do that as we need to be able to pass it an array of primitives.

So what we do is we box the array. We create a standard wrapper which can take both a primitive array and an object array and present an array like API for both. This is in fact the thing you’re seeing when you see the API for scala.Array – all the methods like map, flatMap, projection, etc. belong to the BoxedArray wrapper class (or one of its superclasses).

Let’s see how this looks at the bytecode level:

public void bar(scala.runtime.BoxedArray, java.lang.Object);
Code:
0:    aload_1
1:    iconst_0
2:    aload_2
3:    invokevirtual    #18; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V
6:    return

public void foo();
Code:
0:    iconst_3
1:    newarray int
3:    astore_1
4:    aload_0
5:    new    #28; //class scala/runtime/BoxedIntArray
8:    dup
9:    aload_1
10:    invokespecial    #31; //Method scala/runtime/BoxedIntArray."":([I)V
13:    iconst_3
14:    invokestatic    #37; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
17:    invokevirtual    #41; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V
20:    return

So, the method bar just generates the expected code using the BoxedArray class. On calling this code, we first create an int[] as normal, we box it with a BoxedIntArray (an implementation of BoxedArray for wrapping an int[]) and call the method as expected. There’s a bit of boxing overhead, but it’s not too bad. The main point is that at the backend we are still using a primitive array for storage.

Ok. This is all very well, but what happens if we need to create a polymorphic array? How can I do new Array[T](3) if I don’t know what T is so don’t know whether it needs to be a primitive or an object array how can I create this?

The key observation here is that as long as we are polymorphic in the array type we must box anyway. So you never have to expose a native Java array until a point where you know what the actual type is. Until that point we use a custom boxed array implementation called BoxedAnyArray. What happens here is that it starts out using an Object[] until it knows what type it is. At the point where we request it as a Java native array we now have a concrete array type we want it as. Consequently we create an array of that type at that point, populate it from the Object[] and change the BoxedAnyArray in place to now be backed by that array instead of the Object[] it originally had. The Object[] is now discarded.

Because the BoxedAnyArray’s backing store is now the newly created (possibly primitive) array, any updates to and from the boxed version will now point to this new array. Hence the native array and the boxed array will remain in synch, even though this isn’t the array it started out with. Further, and this is important, after this point we have a native array in our hands and can treat it accordingly. Anything that was using the BoxedAnyArray continues to do so, but there is no “taint” associated with our newly created array and in particular no additional overhead.

Unfortunately this trick doesn’t come for free. BoxedAnyArray is kindof a slow beast (I think some of this can be fixed with an additional level of indirection actually, but it currently isn’t). It does a lot of checking in the apply and update methods and, additionally, they’re synchronized. While you continue to use BoxedAnyArray your performance will suffer much more than the other BoxedArray implementations.

Let’s read some more bytecode:

def newarray[T] = new Array[T](3);

def foo = {
val x = newarray[Int]
x(0) = 3;
x
}

public scala.runtime.BoxedArray newarray();
Code:
0:    new    #14; //class scala/runtime/BoxedAnyArray
3:    dup
4:    iconst_3
5:    invokespecial    #17; //Method scala/runtime/BoxedAnyArray."":(I)V
8:    areturn

public int[] foo();
Code:
0:    getstatic    #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
3:    aload_0
4:    invokevirtual    #29; //Method newarray:()Lscala/runtime/BoxedArray;
7:    getstatic    #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
10:    invokevirtual    #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
13:    astore_2
14:    aload_2
15:    instanceof    #41; //class scala/runtime/BoxedArray
18:    ifeq    37
21:    getstatic    #25; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
24:    aload_2
25:    checkcast    #41; //class scala/runtime/BoxedArray
28:    getstatic    #35; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
31:    invokevirtual    #39; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
34:    goto    38
37:    aload_2
38:    checkcast    #43; //class "[I"
41:    astore_1
42:    aload_1
43:    iconst_0
44:    iconst_3
45:    iastore
46:    aload_1
47:    areturn

It’s generating slightly redundant bytecode here, in that it’s testing to see if the local variable is a BoxedArray or an int[] before unboxing it despite the fact that it’s always a BoxedArray, but that’s not a big deal. The key point here is that we had a BoxedAnyArray and we end up with an int[].

Here’s an example combining the two instances of polymorphism we’ve seen so far:

def foo = {
val x = newarray[Int]
bar(x, 3);
}

public void foo();
Code:
0:    getstatic    #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
3:    aload_0
4:    invokevirtual    #43; //Method newarray:()Lscala/runtime/BoxedArray;
7:    getstatic    #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
10:    invokevirtual    #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
13:    astore_2
14:    aload_2
15:    instanceof    #21; //class scala/runtime/BoxedArray
18:    ifeq    37
21:    getstatic    #39; //Field scala/runtime/ScalaRunTime$.MODULE$:Lscala/runtime/ScalaRunTime$;
24:    aload_2
25:    checkcast    #21; //class scala/runtime/BoxedArray
28:    getstatic    #49; //Field java/lang/Integer.TYPE:Ljava/lang/Class;
31:    invokevirtual    #53; //Method scala/runtime/ScalaRunTime$.arrayValue:(Lscala/runtime/BoxedArray;Ljava/lang/Class;)Ljava/lang/Object;
34:    goto    38
37:    aload_2
38:    checkcast    #55; //class "[I"
41:    astore_1
42:    aload_0
43:    new    #57; //class scala/runtime/BoxedIntArray
46:    dup
47:    aload_1
48:    invokespecial    #60; //Method scala/runtime/BoxedIntArray."":([I)V
51:    iconst_3
52:    invokestatic    #66; //Method scala/runtime/BoxesRunTime.boxToInteger:(I)Ljava/lang/Integer;
55:    invokevirtual    #68; //Method bar:(Lscala/runtime/BoxedArray;Ljava/lang/Object;)V   58:    return

Note how the BoxedAnyArray gets unboxed and reboxed into a BoxedIntArray before being passed anywhere. This is a good thing – BoxedIntArray is a much lighter weight thing.

Summary:

  • When used monomorphically, Scala arrays are Java arrays.
  • When used polymorphically, Scala arrays are scala.runtime.BoxedArray instances. There is a separate subclass of BoxedArray for each primitive type and one for object. When a monomorphic method receives a BoxedArray it will usually unbox it rather than call its apply and update methods.
  • There is also BoxedAnyArray. This has a fairly high overhead and is used *only* for creating polymorphic arrays. The compiler will get rid of it at the earliest possible opportunity and rebox as other forms.

So, what does this mean in terms of performance?

Firstly, if you use arrays monomorphically, they will be Java arrays and have no additional overhead associated with them (you’ll need to box them for invoking methods on Array like map, projection, etc, but this isn’t too bad). So if you want really high performance code, only use monomorphic arrays.

At the next level, you can still get reasonably high performance code if you use polymorphic arrays but only create them monomorphically. You’ll get a boxing overhead for primitives, but that’s true of any polymorphic code in Scala.

Slowest of all is when you create an array polymorphically and continue to use it polymorphically. Once you’re in monomorphic code it will regain normal performance characteristics and future uses of it will behave as if it had been instantiated monomorphically (but previously stored references to it will continue to pass through a BoxedAnyArray and have the associated overhead).

One final point of warning. Consider the following code:

class Foo[T]{
private[this] val arr = new Array[T](2);
def apply(i : Int) = arr(i);
def update(i : Int, t : T) = arr(i) = t;
}

Note that the Array never escapes this. So there’s no way to unbox it!

Looking at the bytecode confirms:

public class arrays.Foo extends java.lang.Object implements scala.ScalaObject{
public arrays.Foo();
Code:
0:    aload_0
1:    invokespecial    #12; //Method java/lang/Object."":()V
4:    aload_0
5:    new    #14; //class scala/runtime/BoxedAnyArray
8:    dup
9:    iconst_2
10:    invokespecial    #17; //Method scala/runtime/BoxedAnyArray."":(I)V
13:    putfield    #21; //Field arr:Lscala/runtime/BoxedArray;
16:    return

public void update(int, java.lang.Object);
Code:
0:    aload_0
1:    getfield    #21; //Field arr:Lscala/runtime/BoxedArray;
4:    iload_1
5:    aload_2
6:    invokevirtual    #27; //Method scala/runtime/BoxedArray.update:(ILjava/lang/Object;)V
9:    return

public java.lang.Object apply(int);
Code:
0:    aload_0
1:    getfield    #21; //Field arr:Lscala/runtime/BoxedArray;
4:    iload_1
5:    invokevirtual    #38; //Method scala/runtime/BoxedArray.apply:(I)Ljava/lang/Object;
8:    areturn
}

The array never gets unboxed and consequently continues to use the higher overhead BoxedAnyArray forever. If you instead use an Array[AnyRef] here and cast your code will run noticably faster (which was the observation which sparked this whole discussion)

This entry was posted in programming and tagged , , on by .

Random linear algebra hackery in Scala

One of my random pet projects at the moment is putting together a prettified linear algebra API for Scala. I’m not implementing the linear algebra routines myself, but am instead implementing it as a wrapper over Colt. This isn’t really any sort of stable release or project. It’s more of a “If you want this sort of thing you might find it useful” comment.

The basic design is:

  • Everything is mutable, but the API encourages using it in an immutable way.
  • Makes no attempt to hide its colt origins. If you want to use it with Colt proper, it will cheerfully let you do so, and you will often have to explicitly use colt classes.
  • API loosely inspired by matlab/Octave
  • Hideous abuses of overloading, implicits, functional programming and symbolic method names abound in order to get an API that allows for very compact code.  If you don’t like this sort of thing you may want to steer clear. :-)

It’s currently available from my misc hg repository, but it’s getting large enough that I’ll probably factor it out into its own soon.

This entry was posted in programming and tagged , , on by .

SBinary 0.2

I quietly uploaded SBinary 0.2 last night. I was going to write up some documentation for it today and do an official announcement, but it’s now coming up to 19:00 and I still haven’t written any documentation so I have a sneaking suspicion it ain’t happening today. So, if you want to have a play with it there’s a new jar and scaladoc up there.

Changes from 0.1:

  • Improved API which is less closely tied to java.io
  • Supports a wider range of collections and standard types (not the entire standard library as I’d originally said I was going to do. There are some issues surrounding collections and open unions which I want to resolve before I do that).
  • A limited amount of support for lazy IO using Stream. (Specifically, Stream is read lazily from an input. However there are a lot of limitations on this at the moment). The design is such that it should be impossible to incorrectly interleave IO actions (i.e. reading something else from the input will force pending lazy reads to complete).
  • Improved some performance stupidities (nonbuffered reading/writing to files, unspecialised reading/writing of byte arrays)
  • Improved generic combinators for building binary instances (although the “case classes for free” one I want to add is waiting on a compiler bug before I can add it)

I hope to write up some proper documentation soon, and when I do I’ll send an announcement to the mailing list. In the meantime, feel free to have a play.

This entry was posted in programming and tagged , , , on by .

A cute hack: Changing the scope of System.out

This isn’t any sort of deep trick. I just found the results surprisingly pleasant, so thought I’d share.

In the GUI interpreter I need to be able to capture standard out. Doing println in the interpreter should print to the screen, not the console. And that’s fine. Java provides System.setOut. I had to fight the stupidities of java.io a bit to make it work (annoying details with flushing among other things), but it did in the end.

There is of course a problem with this. System.out is basically a great honking big global variable. We’ve now made it impossible to run multiple interpreters in the same VM. Le sigh.

Ok. So, what we do is a bit of juggling and set System.out appropriately before interpreting each thing. Right?

Nope. Doesn’t work. These things can and do live in separate threads. And for that matter, what about background threads spawned by one of the interpreters?

The solution is to make System.out thread local. And not just any type of thread local – an inheritable thread local. Spawned threads need to inherit the System.out of their parent thread so that something like spawn { println(“Hello world”) } prints to the right screen.

Once you have this, the idea is obvious. You define an object which extends OutputStream but delegates all functionality to some thread local variable (snarfing the System.out that was present when the object loads as the default value), set System.out to be a print stream wrapping that and you’re done. You can now set its value in a thread local way and multiple interpreters can coexist peacefully.

Like I said, nothing special, but it seems like a surprisingly handy trick.

This entry was posted in programming and tagged , , , on by .