Tag Archives: sbinary

SBinary 0.2

I quietly uploaded SBinary 0.2 last night. I was going to write up some documentation for it today and do an official announcement, but it’s now coming up to 19:00 and I still haven’t written any documentation so I have a sneaking suspicion it ain’t happening today. So, if you want to have a play with it there’s a new jar and scaladoc up there.

Changes from 0.1:

  • Improved API which is less closely tied to java.io
  • Supports a wider range of collections and standard types (not the entire standard library as I’d originally said I was going to do. There are some issues surrounding collections and open unions which I want to resolve before I do that).
  • A limited amount of support for lazy IO using Stream. (Specifically, Stream is read lazily from an input. However there are a lot of limitations on this at the moment). The design is such that it should be impossible to incorrectly interleave IO actions (i.e. reading something else from the input will force pending lazy reads to complete).
  • Improved some performance stupidities (nonbuffered reading/writing to files, unspecialised reading/writing of byte arrays)
  • Improved generic combinators for building binary instances (although the “case classes for free” one I want to add is waiting on a compiler bug before I can add it)

I hope to write up some proper documentation soon, and when I do I’ll send an announcement to the mailing list. In the meantime, feel free to have a play.

This entry was posted in programming and tagged , , , on by .

SBinary progress

If things have seemed a little quiet on the SBinary front, do not despair! It’s not because I’ve abandoned it. Partly I’ve been very busy recently, but I’ve also been held up with various issues on the implementation. One was waiting on Scala 2.7.1 as it fixes an issue I had with implicits, and another was a feature that I’ve decided to defer to 0.3 (to do with modifiers for binary instances. In particular I wanted to get sharing based on identity working properly, but I kept running into issues)

Anyway, I’ve spent most of today working on it and things are going pretty well. You can expect a 0.2 release at some point after 2.7.1 goes final. It will feature:

  • A revised API that I think is nicer to work with. It replaces the use of DataInput and DataOutput with custom Input and Output types. These define read and write methods for reading and writing things with Binary instances, plus a few other useful methods.
  • Improved generic methods for defining binary instances. In particular the length encoding has got a revamp and asUnionN has become significantly less irritating to work with.
  • A certain amount of experimental support for lazy IO via Streams. I’m not totally convinced this is a good idea, but it’s sufficiently useful that I’m going to provide it anyway with a big red warning sticker.
  • A much larger set of data types handled out of the box. It should cover most of the types available form the Scala standard library that it reasonably can (it can’t handle things like functions, etc)

As promised, this release should still be binary compatible with the last one.

This entry was posted in programming and tagged , , , on by .

Sbinary performance and Buffered IO

This is kinda a “Well, duh” moment. It’s obvious in retrospect, but I completely failed to spot it up front, so I thought I’d share.

I noticed that SBinary performance really really sucked. We’re using it at work for saving application state, and reading and writing a file of only 900kb took about two seconds! This was bad.

Some quick performance testing suggested that this was almost entirely IO bound. Reading and writing the corresponding amount of data from a byte array took fairly little time – only about 200ms for writing, 300ms for reading. So, what was I doing wrong?

After a few seconds of head scratching I realised the problem. You see, RandomAccessFile implements the DataInput and DataOutput interfaces. This is useful for small things, but for doing non-trivial binary input and output? Not so much.

The problem is that the reads and writes for these implementations are totally unbuffered. This should have been obvious, but for some reason didn’t occur to me. Oops. I’m now buffering reads and writes explicitly (currently in a pretty stupid way, but oh well). It’s a lot faster now.

This entry was posted in programming and tagged , , on by .

SBinary backends

I’m thinking about changing the scope of some of the code for SBinary.

Specifically, you remember that part where I said “SBinary is only for serializing objects and manipulating binary data, and it’s going to remain super minimal and specialised and this will never ever change!!”? I’m thinking of changing that. :-)

The reason for this change of heart is that I’m realising how incredibly generic the constructions you put together for SBinary are. You’re basically creating a walker for deconstructing and reconstructing your entire object graph. That’s pretty damn powerful. In particular I was thinking about how to modify formats to permit sharing (another post on that will be forthcoming) and suddenly thought “haaang on a minute. I’ve written this code before”. It looks suspiciously identical to some Java code I wrote a while back for generic cloning of object graphs*. A simple rebinding of the backend to use a queue of objects rather than input and output streams would give a pretty efficient deep clone mechanism. I’ve also been thinking of creating a JCR backend which mostly works the same as the binary data (indeed, most data would probably be stored as binary blobs in the JCR), but allows for references to other nodes (and would use this for data sharing).

At the very least, this will result in ditching the explicit dependency on java.io. It will still be used extensively in the back end, but this is only visible in the API for the parts that actually need to interact with it. (most likely approach – have an Input and Output opaque type to replace DataInput and DataOutput. These will just be wrappers around the java.io types, but this won’t be visible at first)

If I do do something like this, it would still be with making binary data the priority, and there would definitely be a specialised binary frontend which should be just as convenient as the current API. If it ever looks like feature creep is threatening to destroy that I’ll separate out projects and/or cut out the idea entirely.

* In the unlikely event that anyone who worked on that project actually reads this blog, they will probably shudder in horror at the mention of that code. It was very fragile with regards to changes in the rest of the code. But that wasn’t actually an issue with the cloning – it was an issue with the post-clone processing. The graph was of database mapped objects and it needed to be partially linearised in order to insert it back into the database due to constraint issues, and this never really worked right.

This entry was posted in programming and tagged , , , on by .