Easy binary serialization of Scala types

I’m going to be prototyping some stuff in Scala at work in the coming week, and wanted a nice way of marshalling things to/from files and across the network. The BytePickle stuff in scala.io does nothing for me, and Java serialization gives me the screaming heebie jeebies, so this prompted me to get off my ass and do something I’ve been meaning to do for a while – port something akin to Haskell’s Data.Binary to Scala using the encoding of type classes I’ve previously discussed. Well, it’s done – it didn’t take very long at all. The port is *extremely* loose – in particular I’ve just written it for imperative use rather than define custom monads for reading and writing in a pure manner (sorry). The project is hosted on google code at http://code.google.com/p/sbinary/

At its heart it’s extremely simple:

trait Binary[T]{
  /**
   * Read a T from the DataInputStream, reading no more data than is neccessary.
   */
  def reads(stream : DataInputStream) :T;

  /**
   * Write a T to the DataOutputStream.
   */
  def writes(t : T)(stream : DataOutputStream) : Unit; 
}

object Operations{
  /**
   * Use an implicit Binary[T] to read type T from the DataInputStream.
   */ 
  def read[T](stream : DataInputStream)(implicit bin : Binary[T]) : T = bin.reads(stream);

  /**
   * Use an implicit Binary[T] to write type T to the DataOutputStream.
   */
  def write[T](t : T)(stream : DataOutputStream)(implicit bin : Binary[T]) : Unit =  
    bin.writes(t)(stream);
}

Err. That’s it. Did you want more? :-)

There’s more to it than that of course, but most of the rest of the code I’ve written for this is just helper methods, instances and scalacheck tests.

Out of the box this will serialise tuples of any size (that Scala supports. i.e. of 22 elements or fewer), lists, arrays, immutable maps, options, Strings, all the AnyVal types and any combination thereof. Looking at the code should give you an idea of how to define your own Binary instances.

Using it is very simple. It works by knowing the type of thing you want to read or write from the stream and selecting the appropriate logic based on that type (but, unlike Java serialization, if you give it the wrong type it will attempt to read it as that type anyway and probably do crazy things – this is very explicitly using the type to define a compact encoding and doesn’t select it based on dynamic information from the stream). e.g.

  import binary.Operations._;
  import binary.Instances._;
  val foo = read[(Int, Option[String], List[Int])](inStream);
  write(foo._2)(outStream);

The read and write methods on Operations take care of selecting an appropriate implicit instance of Binary and combining them to do the right thing.

Note that binary serialization logic is kept entirely external to the class, so it’s almost as easy to define for classes from external libraries as it is for your own.

I’m not doing an official release yet – I want to have a play around with this and see how usable it is. Once I have, I might change the API around to improve it. On the other hand, the code works now and does enough (within its very simple objectives) that it’s probably useful. I’ve written a bunch of scalacheck tests for it and am reasonably confident it gets all the current binary instances right. If you want to use it for something, go right ahead! Report back to me and let me know how it goes.

Edit: By the way, this only works properly on 2.6.1 or higher. There were some problems with the implicit arguments implementation prior to then that prevent the instances from working correctly.

This entry was posted in programming and tagged , , on by .

4 thoughts on “Easy binary serialization of Scala types

  1. Jason

    Excellent. I was just about to embark on a similar project (ditto the screaming heebie-jeebies with java serialization). Thanks!

  2. David R. MacIver

    Cool. Hope you find it useful! Feel free to ask questions about it, and please send me feedback if you do use it. :-)

  3. David R. MacIver

    Yeah, the separately defined read and write functions are moderately annoying. It isn’t ideal. I’m probably going to leave those as is, but

    a) provide helpful combinators for building up binary instances.
    b) See if I can figure out a nice way of doing code generation for defining binary instances for the cases where you don’t care about the precise formats

    C++ templates have an advantage here (albeit one which requires black magic to use), as they can integrate the code generation into the actual API.

Comments are closed.