Tag Archives: API

Easy binary serialization of Scala types

I’m going to be prototyping some stuff in Scala at work in the coming week, and wanted a nice way of marshalling things to/from files and across the network. The BytePickle stuff in scala.io does nothing for me, and Java serialization gives me the screaming heebie jeebies, so this prompted me to get off my ass and do something I’ve been meaning to do for a while – port something akin to Haskell’s Data.Binary to Scala using the encoding of type classes I’ve previously discussed. Well, it’s done – it didn’t take very long at all. The port is *extremely* loose – in particular I’ve just written it for imperative use rather than define custom monads for reading and writing in a pure manner (sorry). The project is hosted on google code at http://code.google.com/p/sbinary/

At its heart it’s extremely simple:

trait Binary[T]{
  /**
   * Read a T from the DataInputStream, reading no more data than is neccessary.
   */
  def reads(stream : DataInputStream) :T;

  /**
   * Write a T to the DataOutputStream.
   */
  def writes(t : T)(stream : DataOutputStream) : Unit; 
}

object Operations{
  /**
   * Use an implicit Binary[T] to read type T from the DataInputStream.
   */ 
  def read[T](stream : DataInputStream)(implicit bin : Binary[T]) : T = bin.reads(stream);

  /**
   * Use an implicit Binary[T] to write type T to the DataOutputStream.
   */
  def write[T](t : T)(stream : DataOutputStream)(implicit bin : Binary[T]) : Unit =  
    bin.writes(t)(stream);
}

Err. That’s it. Did you want more? :-)

There’s more to it than that of course, but most of the rest of the code I’ve written for this is just helper methods, instances and scalacheck tests.

Out of the box this will serialise tuples of any size (that Scala supports. i.e. of 22 elements or fewer), lists, arrays, immutable maps, options, Strings, all the AnyVal types and any combination thereof. Looking at the code should give you an idea of how to define your own Binary instances.

Using it is very simple. It works by knowing the type of thing you want to read or write from the stream and selecting the appropriate logic based on that type (but, unlike Java serialization, if you give it the wrong type it will attempt to read it as that type anyway and probably do crazy things – this is very explicitly using the type to define a compact encoding and doesn’t select it based on dynamic information from the stream). e.g.

  import binary.Operations._;
  import binary.Instances._;
  val foo = read[(Int, Option[String], List[Int])](inStream);
  write(foo._2)(outStream);

The read and write methods on Operations take care of selecting an appropriate implicit instance of Binary and combining them to do the right thing.

Note that binary serialization logic is kept entirely external to the class, so it’s almost as easy to define for classes from external libraries as it is for your own.

I’m not doing an official release yet – I want to have a play around with this and see how usable it is. Once I have, I might change the API around to improve it. On the other hand, the code works now and does enough (within its very simple objectives) that it’s probably useful. I’ve written a bunch of scalacheck tests for it and am reasonably confident it gets all the current binary instances right. If you want to use it for something, go right ahead! Report back to me and let me know how it goes.

Edit: By the way, this only works properly on 2.6.1 or higher. There were some problems with the implicit arguments implementation prior to then that prevent the instances from working correctly.

This entry was posted in programming and tagged , , on by .

Why not Scala?

I thought I’d follow up on my previous post on why one would want to use Scala with one on why you wouldn’t. I’m definitely planning to continue using it, but it would be dishonest of me to pretend it was a perfect language.

I’m not going to cover the usual ones – weak tool support, difficulty of hiring Scala programmers, etc. These are pretty standard and will be true in most ‘esoteric’ languages you care to name. They’re certainly important, but not the point of this post. I’m just going to focus on language (and implementation) issues.

You’re looking for a functional language

Scala is not a functional programming language. It has pretensions of being so, and it has adequate support for functional programming, but it only goes so far. It’s got better support for functional programming than C#, Ruby, etc. but if you compare its functional aspects to ML, Haskell, OCaml, etc. you’ll find it sadly lacking. Problems include:

  • Its pattern matching is really rather cumbersome.
  • An annoying distinction between methods and functions. Scala’s first class functions are really no more than a small amount of syntactic sugar around its objects. Because Scala’s scoping is sane this isn’t particularly an issue, but it occasionally shows up.
  • The handling of multiple arguments is annoying. It doesn’t have the pleasant feature of Haskell or ML that every function has a single argument (multiple arguments are encoded as either tuples or via currying). Admittedly this isn’t a prerequisite of a functional language – e.g. Scheme doesn’t do it – but it’s a very big deal in terms of typing and adds a nice consistency to the language. I’m not aware of any statically typed functional languages which *don’t* do this (although the emphasis between tupling and currying varies from language to language).
  • Almost no tail call elimination worth mentioning. A very small subset of tail calls (basically self tail calls – the ones you can obviously turn into loops) are eliminated. This is more the JVM’s fault than Scala’s, but Martin Odersky himself has shown that you can do better (although admittedly it comes with a performance hit).
  • The type inference is embarrassingly weak. e.g. recursive methods won’t have their return type inferred. Even what type inference is there is less than reliable.

Compiler stability

The compiler is buggy. It’s not as buggy as I sometimes get the impression it is – I’ve definitely claimed a few things to be bugs which turned out to be me misunderstanding features – but it’s buggy enough that you’ll definitely run into issues. They’re rarely blockers (although sometimes they are. Jan Kristen has run into a few with his recent experiments with wicket + scala), but more importantly the bugginess means you really can’t trust the compiler as much as you’d like to. When something goes wrong it’s not always certain whether it’s your fault or the compiler’s. This is a big deal when one of the selling points is supposed to be a type system which helps you catch a wide class of errors.

Language consistency

The language has a lot of edge cases. These can be really difficult to wrap your head around, and can be really annoying to remember.

Let’s take an example. Variables. Simple, eh? Well, no.

A variable (local or field) can be a function (or constructor) parameter, a val, or a var. A val is a definition – it can’t be assigned to after the definition is made. A var is a normal mutable variable like in Java. A function parameter is almost like a val, except for the parts where it isn’t. Additionally, a function parameter can also be a var or a val. But it doesn’t have to be. Variables can be call by value (normal), call by name (the expression is evaluated each time you reference its value) or lazy (the expression is evaluated the first time you need its value and never again). But only vals can be lazy. And function parameters can’t be lazy, even if they’re also vals (I don’t understand this one. It seems obviously stupid to me). Meanwhile, only function parameters can be call by name – you can’t assign them to vars or vals (a no argument def is the equivalent of a call by name val).

Clear as mud, eh? Now, granted I wrote the above to make it sound deliberately confusing (it’s probably owed a blog post later to make it seem deceptively simple), but it’s a fairly accurate representation of the state of affairs.

Here’s another one (it’s related to the arguments issue). Consider the following snippet of code:

def foo = "Hello world";
println(foo());

def bar() = "Goodbye world";
println(bar);

Pop quiz: Does this code compile? If not, which bit breaks? No cheating and running it through the compiler!

Answer: No, it doesn’t. Because foo was defined without an argument list, it can’t be invoked as foo(). However, despite bar being defined with an (empty) argument list we can invoke it without one.

I could keep going, but I won’t. The short of it is that there are a lot of these little annoying edge cases. It seems to give beginners to the language a lot of grief.

Too much sugar

Scala has a lot of syntactic sugar. Too much in my opinion. There’s the apply/update sugar, unary operators by prefixing with unary_, general overloaded assignment (which, as I discovered when testing, only works in the presence of an associated def to go with it. Another edge case). Operators ending in : are left associative. Constructors are infixed in pattern matching case classes but not in application. etc. It’s hard to keep track of it all, and most of it is annoyingly superfluous.

Lack of libraries

Yes, yes, I know. It has all of the Java libraries to play with. And this is great. Except… well, they’re Java libraries. They’re designed with a Java mindset, and they can’t take advantage of Scala’s advanced features. Implicit conversions, and a number other tricks, are quite useful for making an API more palatable, but there’s a strong danger that what you end up with isn’t much more than Java with funny syntax. Much more than that requires a reasonable amount of porting work to get a good API for your use.

All in all, I find these add up to just a bunch of annoyances. It’s still my preferred language for the JVM, but depending on how you wait your priorities they might be more significant for you. Even for me I occasionally find myself getting *very* irritated with some of these.

This entry was posted in programming and tagged , , , , on by .

Dependency injection in Scala

I (and some others in #scala) have been wondering recently about the state of play for dependency injection in Scala. This is mostly just a brain dump of a few thoughts and a request for feedback. If anyone has any good ideas, please share!

As I see it, most of the Java dependency injection frameworks should work fine for Scala. Guice won’t because of generics issues, and similarly the generics support from other frameworks (e.g. Spring’s type collections) won’t though, so you lose a great deal of type safety. You’re back to an almost Java-like level of type safety in fact. :) Also these don’t take advantage of many of Scala’s great features (higher order functions and a more advanced object system in particular), so the whole thing seems rather unsatisfactory.

I wondered briefly about a system based on abstract method injection using traits, but I couldn’t make it work in a satisfactory manner. The fact that you’d expose dependencies as defs was also unsatisfactory because it means that the compiler doesn’t know that they’re stable so you can’t e.g. import them.

There was some discussion in #scala last night about how “dependency injection is useless if you have higher order functions”. This seems like nonsense to me. A well designed scala program may have less need for DI because of the presence of higher order functions but the basic need for composing of modules (that’s what dependency injection frameworks really are after all – a module composition DSL) is still there, for more or less the same reason why Scala has objects as well as functions.

It’s not entirely clear to me how DI should work in Scala, both from an API and an implementation point of view. Something Guice-like might be a good starting point (but only a starting point! Porting Guice verbatim to Scala would almost certainly be a bad idea), but it’s not clear to me how one would even implement it in Scala. Part of the problem is that Scala lacks a satisfactory metaprogramming facility. It can use Java’s reflection, but the scala.reflect packages seem sadly meager. (There do seem to be a bunch of interesting sounding classes in there, but there appears to be no documentation or evidence of prior usage, so I can’t figure out what on earth they’re for).

This entry was posted in programming and tagged , , , on by .

Turn your toString methods inside out

All examples in this post will be written in a pseudo-dialect of Scala. Hopefully they should be easy to translate into your favourite programming language (or Java). I also haven’t bothered to compile any of them as they’re mostly not entirely valid. Feel free to point out errors.

Consider the following code:

class List[T]{
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = "[";

    while(it hasNext){
      result = result + (it next);
      if (it hasNext) result = result + ", ";
    }
    result + "]"
  }
}

What’s wrong with it?

Well, as you presumably know, concatenating two strings of length m and n is an O(m + n) operation (In Haskell or ML it would be an O(m) operation, so this can be made more efficient, but the basic point will still remain). This means we’ve accidentally made an O(n^2) toString algorithm. Oops.

So, the traditional response is:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  override def toString : String = {
    val it = this.elements;
    var result = new StringBuilder();

    while(it hasNext){
      result.append(it next);
      if (it hasNext) result.append(", ");
    }
    result.append("]").toString;
  }
}

Great! We’ve removed all those expensive string concatenations.

Now, what happens if we call toString on a List[List[String]]? Umm…

Now, consider the following code snippet:

  println(myReallyLongList);

Let’s unpack what’s going on in it.

  val it = myReallyLongList.elements;
  var result = new StringBuilder();

  while(it hasNext){
    result.append(it next);
    if (it hasNext) result.append(", ");
  }
  println(result.append("]").toString);

So, we’ve created a big intermediate string via a StringBuilder, then printed it, discarding the string after that. Right?

Wouldn’t it be great if we’d written the following code instead?

  val it = myReallyLongList.elements;

  while(it hasNext){
    print(it next);
    if (it hasNext) print(", ");
  }
  println("]");

No intermediate structures created at all. And note that the code used to print is almost exactly the same as the code used to append to the StringBuilder.

Conveniently there’s a useful little interface in java.lang which people tend to ignore. If not, we’d have had to write wrappers. In particular this is a superclass of Writer, PrintStream, StringBuilder and StringBuffer. So, let’s rewrite the above code:

class List[T]{
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable){
    val it = this.elements;

    while(it hasNext){
      ap.append(... // err. What do we do here?

We could just do ap.append(it next toString). But that doesn’t solve the first problem – when we nest these things we’re creating a lot of intermediate strings and then immediately throwing them away, not to mention having once again introduced a hidden O(n^2) factor. Sadness. :(

Let’s do the following:

  trait Append{
    def appendTo(ap : Appendable) : Appendable;

    override def toString = appendTo(new java.lang.StringBuilder()) toString;    
  }

  object Appending{
    def append(any : AnyRef, ap : Appendable){
      if (any.isInstanceOf[Append]) any.asInstanceOf[Append].appendTo(ap);
      else ap.append(any toString)
    }
  }

Now we can write it as:

class List[T] extends Append{
  import Appending._;
  import java.lang.StringBuilder;
  // list implementation

  def appendTo(ap : Appendable) = {
    val it = this.elements;
    while(it hasNext){
      append(it next, ap);
      if (it hasNext) ap append(", ");
    }
    ap.append("]");
  }
}

Now, no matter how deeply we nest things, we’ll get things printed in a manner with completely consistent performance – no hidden gotchas.

There are also other benefits to structuring things this way. If you make everything work based on an API that looks like this you’ll tend to write things which work by injecting filters in reading and writing code. And, hey, suddenly all your code works completely transparently when you discover that you need to work with things that are e.g. read off the network, backed by something on the file system, etc. and really need a streaming version of the library.

Also note that I’m not saying “Strings are bad”. There are a lot of cases where what you need really is a persistently available string. Then, by all means, use toString! But even then this is helpful, as your toString code will work a lot better and more consistently than it might otherwise have done.

This entry was posted in programming and tagged , , , , on by .

Good APIs

At least in the Java world[1], good APIs are very rare. The vast majority of the APIs I’ve used in Java are at best mediocre, and I’ve run into some real stinkers (No fingerpointing here, but a lot of them are even in the standard library!).

Part of this is the fault of the language. Java code… tends to be ugly. It’s not a language which lends itself to really flexible syntax, and so you have to work really hard to produce powerful abstractions which are actually nice to use.

The only APIs I’ve encountered so far which have really made me sit up and go “Wow, that’s well designed” are Google Guice and Joda time. They have well thought out class hierarchies, good object oriented style[2], sensible use of fluent interfaces / method chaining and just generally well thought out interfaces and names.

They’re not perfect by any means, but they show promise that it really is possible to write good APIs for Java.

Anyone else know of other similarly well designed libraries?

[1] And, unfortunately, I lack much in the way of non-trivially large development in other languages. The Parsec API is nice, ‘though it gives me a headache sometimes. Haskell libraries in general look rather pretty, though I suspect that’s more a function of the language than the API design. I’m not really experienced enough in it to judge good API design.

[2] I’m not an OO fanatic. It has advantages and disadvantages, and sometimes you just want to use a different approach (I rather like FP for example). But one thing I’ve observed is that if you do it right, OO can have the effect of producing some astonishingly readable code. I don’t know either well/at all really, but this style of design seems much more common in Ruby, smalltalk, etc.

This entry was posted in programming and tagged , , , on by .