Category Archives: programming

“Object Main extends Application” considered harmful

You’re probably all familiar with the standard Java invocation for creating an applicaiton’s start point.

class Main{
public static void main(String[] args){
// code
}
}

Scala has a nice feature which lets you use a trait to define your application where you don’t care about the initial arguments.


object Main extends Application{

//code

}

What happens is that Application has a method main(args : Array[String]) which gets inherited, then a corresponding static method is created on the class Main. The application logic goes in Main’s initialisation code. Neat, huh?

All we’re missing is what the definition of main is.


trait Application{

def main(args : Array[String]) = createHorrificallyConfusingBugsIfYouSoMuchAsLookAtAThread();

}

Oops.

Ok, that’s not actually true. Really it’s


trait Application{

def main(args : Array[String]) = ()

}

It doesn’t do anything. Everything happens in the object initialisation code.

Why is this an issue?


object Main extends Application{

val foo = stuff();

spawn {  doSomethingWith(foo) }

doSomeLongRunningThingThatBlocks();

}

Why is this an issue?

Well, foo is a field. And Main’s constructor hasn’t completed. This means that the visibility of foo to other threads is completely undefined. Bad things will happen.

This isn’t a hypothetical scenario. We’ve just spent a non-trivial amount of time debugging a problem caused by this. We were spawning a background process from Main, and then entering QApplication.start() (the “let QT start doing stuff” method). The background process was blocking on trying to read a field from Main, and only unblocking when QApplication finally exited.

You should not expose partially constructed objects like this. It’s a bad idea, the VM will hate you for it and it will punish you with incomprehensible bugs as a result.

Edit: In fact, it turns out that this isn’t about partially constructed objects. It’s about static initialization. Because the initialization of the Main$ class depends on the construction of the Main object, Main$ will not be fully initialized until the constructor has exited. The JVM semantics guarantee that class loading is single threaded and that other threads cannot access the class until it has finished loading. Hence the observed problem.

This entry was posted in programming and tagged , on by .

Scala syntax change proposal

I came up with a neat idea for changing the syntax for call by name parameters recently (it turned out that it’s actually a reversion to an older syntax for it! The new one was there to resolve some problems, but I like the old syntax better so would rather resolve those problems directly). In the discussion of this some problems were pointed out and the feature list sortof spiralled out of control and collided head on with a previous proposal by Andrew Foggin. Here’s a summary of the current state of the proposal.

  • Any of the modifiers currently allowed for local variables is allowed as either a function argument or a constructor parameter. i.e val, lazy val, var or def.
  • A parameter marked as def has the same semantics as call by name parameters currently do. It replaces the old syntax (or, rather, the new syntax).
  • A function taking N arguments is equivalent to a function taking a ProductN (modulo compiler optimisations). So given def stuff(val foo, var bar, def ba z) the invocation stuff(x, y, z) is equivalent to the invocation stuff(new Product3{ val _1 = x; var _2 = y; def _3 = z; })
  • In order to take into account the need to make call by name parameters constructor local, and generally improve the behaviour of constructors, we introduce an additional privacy modifier, “local”. Conceptually, things marked local are only visible within the constructor. It’s basically a stronger form of private, and is the scoping modifier that constructor arguments with no qualifiers currently have. local variables and defs are not visible outside the body of the class. Unlike private members, you may not access the local variables of another member of the same class. Edit: Seth Tisue has pointed out in the comments that you can already do this. The notation for it is private[this]
This entry was posted in programming and tagged , , on by .

Sbinary performance and Buffered IO

This is kinda a “Well, duh” moment. It’s obvious in retrospect, but I completely failed to spot it up front, so I thought I’d share.

I noticed that SBinary performance really really sucked. We’re using it at work for saving application state, and reading and writing a file of only 900kb took about two seconds! This was bad.

Some quick performance testing suggested that this was almost entirely IO bound. Reading and writing the corresponding amount of data from a byte array took fairly little time – only about 200ms for writing, 300ms for reading. So, what was I doing wrong?

After a few seconds of head scratching I realised the problem. You see, RandomAccessFile implements the DataInput and DataOutput interfaces. This is useful for small things, but for doing non-trivial binary input and output? Not so much.

The problem is that the reads and writes for these implementations are totally unbuffered. This should have been obvious, but for some reason didn’t occur to me. Oops. I’m now buffering reads and writes explicitly (currently in a pretty stupid way, but oh well). It’s a lot faster now.

This entry was posted in programming and tagged , , on by .

SBinary backends

I’m thinking about changing the scope of some of the code for SBinary.

Specifically, you remember that part where I said “SBinary is only for serializing objects and manipulating binary data, and it’s going to remain super minimal and specialised and this will never ever change!!”? I’m thinking of changing that. :-)

The reason for this change of heart is that I’m realising how incredibly generic the constructions you put together for SBinary are. You’re basically creating a walker for deconstructing and reconstructing your entire object graph. That’s pretty damn powerful. In particular I was thinking about how to modify formats to permit sharing (another post on that will be forthcoming) and suddenly thought “haaang on a minute. I’ve written this code before”. It looks suspiciously identical to some Java code I wrote a while back for generic cloning of object graphs*. A simple rebinding of the backend to use a queue of objects rather than input and output streams would give a pretty efficient deep clone mechanism. I’ve also been thinking of creating a JCR backend which mostly works the same as the binary data (indeed, most data would probably be stored as binary blobs in the JCR), but allows for references to other nodes (and would use this for data sharing).

At the very least, this will result in ditching the explicit dependency on java.io. It will still be used extensively in the back end, but this is only visible in the API for the parts that actually need to interact with it. (most likely approach – have an Input and Output opaque type to replace DataInput and DataOutput. These will just be wrappers around the java.io types, but this won’t be visible at first)

If I do do something like this, it would still be with making binary data the priority, and there would definitely be a specialised binary frontend which should be just as convenient as the current API. If it ever looks like feature creep is threatening to destroy that I’ll separate out projects and/or cut out the idea entirely.

* In the unlikely event that anyone who worked on that project actually reads this blog, they will probably shudder in horror at the mention of that code. It was very fragile with regards to changes in the rest of the code. But that wasn’t actually an issue with the cloning – it was an issue with the post-clone processing. The graph was of database mapped objects and it needed to be partially linearised in order to insert it back into the database due to constraint issues, and this never really worked right.

This entry was posted in programming and tagged , , , on by .

An introduction to implicit arguments

SBinary and Scalacheck are part of a small set of libraries that make extensive use of implicit arguments in a style very reminiscient of Haskell type classes. I’m hoping this style of programming will get more common in Scala – it’s a really useful technique and, in my completely unbiased opinion, both SBinary and Scalacheck are fantastic and you absolutely should use them. :-) But in order to do so you need to really understand how implicit arguments in Scala work.

This post is actually for work, as we’re using Scala there and this is a subject which has been confusing one of my colleagues.

As a starting point, in Scala you can declare a method to have multiple argument lists. This isn’t a fantastically useful feature, but here’s how it works:

scala> def foo(x : Int)(y : Int)
     | = x + y
foo: (Int)(Int)Int

scala> foo(1)(2);
res1: Int = 3

scala> foo(1, 2);
:6: error: wrong number of arguments for method foo: (Int)(Int)Int
       foo(1, 2);
       ^

scala> foo(1)
:6: error: missing arguments for method foo in object $iw;
follow this method with `_' if you want to treat it as a partially applied funct
ion
       foo(1)
       ^

i.e. “exactly the same as a single method parameter list but you have to use a different syntax for calling it”. Hurray.

This has one advantage though. You can declare the last parameter list of a function to be implicit. The syntax for this works as follows:

scala> def speakImplicitly (implicit greeting : String) = println(greeting)
speakImplicitly: (implicit String)Unit

scala> speakImplicitly("Goodbye world")
Goodbye world

scala> speakImplicitly
:6: error: no implicit argument matching parameter type String was foud.

scala> implicit val hello = "Hello world"
hello: java.lang.String = Hello world

scala> speakImplicitly
Hello world

So, we can call this as normal but, additionally, we can leave out the implicit argument list and the compiler will look for a value in the enclosing scope which has been marked as implicit. If we try to do that and there is no such value in scope then the compiler will complain.

Matching implicit arguments

Implicits are totally typesafe, and are selected based on the static type of the arguments. Here are some examples to show how things work.

Implicits of the wrong type
scala> def speakImplicitly (implicit greeting : String) = println(greeting)
speakImplicitly: (implicit String)Unit

scala> implicit val aUnit = ();
aUnit: Unit = ()

scala> speakImplicitly
:7: error: no implicit argument matching parameter type String was found.

Only an implicit of type String will be selected for an implicit argument of type String.

Implicits of the wrong static type
scala> def speakImplicitly (implicit greeting : String) = println(greeting)
speakImplicitly: (implicit String)Unit

scala> implicit val hello : Any = "Hello world"
hello: Any = Hello world

scala> speakImplicitly
:7: error: no implicit argument matching parameter type String was found.

Implicit selection happens on the *static* type of variables. It’s no use having something of the right dynamic type if the variable isn’t typed accordingly.

scala> def speakImplicitly (implicit greeting : String) = println(greeting)
speakImplicitly: (implicit String)Unit

scala> implicit val foo = "foo";
foo: java.lang.String = foo

scala> implicit val bar = "bar";
bar: java.lang.String = bar

scala> speakImplicitly
:9: error: ambiguous implicit values:
 both value bar in object $iw of type => java.lang.String
 and value foo in object $iw of type => java.lang.String
 match expected type String

If there are multiple implicit arguments of the same type, it will fail as it has no way of choosing between them. But…

Implicit arguments of subtypes
scala> def sayThings (implicit args : List[Any]) = args.foreach(println(_))
sayThings: (implicit List[Any])Unit

scala> implicit val nothingNiceToSay : List[Any] = Nil
nothingNiceToSay: List[Any] = List()

scala> sayThings

scala> implicit val hellos : List[String] = List("Hello world");
hellos: List[String] = List(Hello world)

scala> sayThings
Hello world

If you have an implicit argument of a subtype, it will also match as an implicit argument of this type. Moreover, if you have two implicit arguments which match and one is a subtype of the other, the more specific type will match.

Parameterized implicits
scala> def implicitly[T](implicit t : T) = t
implicitly: [T](implicit T)T

scala> implicit val foo = "foo"
foo: java.lang.String = foo

scala> implicit val aUnit = ()
aUnit: Unit = ()

scala> implicitly[String]
res2: String = foo

scala> implicitly[Unit]

Type parameters can quite happily take part in the implicits mechanism.

Defining implicit arguments

So, we know how to use defined implicit arguments now. But how can we define them? We’ve seen one way:

implicit val foo = "foo";

scala> implicitly[String]
res2: String = foo

If this was all we could do then it wouldn’t be that powerful a feature. A nice to have, but ultimately not *that* exciting. Fortunately there are a few more things we can do. For starters, Scala has the uniform access principle, so any (wait, no. That would be too general. We can’t have features without special cases. Sigh. Ok, let’s say most) things you can do with a val you can do with a def

implicit def foo = "foo"

scala> implicitly[String]
res2: String = foo

This def will be invoked each time we want the implicit. Here’s an example to demonstrate this

scala> implicit def aUnit : Unit = println("Hello world")
aUnit: Unit

scala> implicitly[Unit]
Hello world

scala> implicitly[Unit]
Hello world

scala> implicitly[Unit]
Hello world

In general, implicit defs shouldn’t have side effects. It can lead to some really counterintuitive behaviour. This is just for demonstration purposes.

Now, the ability to use defs opens up a bunch of possibilities. For example, they can have type parameters:

scala> implicit def emptyList[T] : List[T] = Nil;
emptyList: [T]List[T]

scala> implicitly[List[String]]
res9: List[String] = List(Hello world)
// Oops, we still had an implicit List[String] left over from an earlier example. Note how that was used in preference to the parameterized version. Let's try again.

scala> implicitly[List[Int]]
res10: List[Int] = List()

Moreover, implicit defs used in this way can themselves have implicit parameters. For example:

scala> case class Foo[T](t : T);
defined class Foo

scala> implicit val hello = "Hello"
hello: java.lang.String = Hello

scala> implicit def foo[T](implicit t : T) = Foo[T](t)
foo: [T](T)Foo[T]

scala> implicitly[Foo[String]]
res3: Foo[String] = Foo(Hello)

(Note: I originally tried to write this example with Option. It turns out there’s a bug with how covariant types are handled which made it not work)

The basic idea is that anything marked as implicit which you could write as a single identifier (possibly with a type signature to handhold the type inference system) is valid to be passed as an implicit argument.

More reading

This should provide enough to get you started. Your next step should probably be to check out the documentation for Scalacheck and SBinary (the latter of which is… less than stellar at the moment. I’ll fix that, I promise. :-)). If you’re looking for some slightly more hardcore reading, Generics of a Higher Kind has some interesting examples. Other than that, the best thing to do is play with some code.

This entry was posted in programming and tagged , , on by .