Against Human Readability, Part 2 of Many: The toolchain argument

One of the biggest promoted arguments for using text based formats (which is not the same as human readable of course, but is a prerequisite) is that you can use your existing unix toolchain on them.

I am skeptical.

You can of course use your text editor on them. Rather by definition. But being able to use the other tools limits you to very specific sorts of text formats – in particular ones which are quite heavily line oriented. The thing is, the unix tool chain is actually a very specific tool. It works very well for things which are Unix shaped and quite terribly for anything else. You thus end up needing tools for converting to and from unix shaped formats if you want to be able to use your tool chain on your data. Witness jsonpipe as an example of this sort of shenanigans. And if you’re going to be converting it into and out of a different textual format, why do you need your source format to be text based at all?

This still leaves you with your text editor.

I think if you’re going to be designing a binary format, I think you really need some sort of pretty printer for it. In our discussion in #againstreadability, Andy and I had the following conversation:

21:19 < andyjpb> if you have a pretty printer you have to do most of the hardwork anyway ;-)
21:19 <@DRMacIver> Not so! 
21:19 <@DRMacIver> Generators are much easier than parsers
21:19 <@DRMacIver> Because you don't have to worry about the ambiguity introduced 
21:19 < andyjpb> well, I suppose it depends on whether you have a reader or not.. but once you've got a printer, a reader is not so far away
21:19 <@DRMacIver> Yeah, but don't do that :)
21:20 < andyjpb> whynot? modifying bits in a blob is a solid usecase

I think I’ve changed my mind. If you really find you need to be editing the format directly in a text editor, it’s not completely unreasonable to have a reader as well as a pretty printer. This does require you to have a semi-defined textual format in parallel to your binary one, but because it’s not a primary method of interchange the requirements and costs of it are lower (in particular it doesn’t have nearly as serious performance and security concerns)

But I think this is the wrong way to do it. I realised on further thought about this that the tool chain argument misses out something really quite crucial. It’s not surprising – I think it goes back to before this was actually a valid point – but once I realised it I was kicking myself for how blindingly obvious it is.

UNIX is not your only toolchain

There are programming languages other than shell, many of them with very good interactive shells. As soon as you’ve got bindings to python, or ruby, or erlang, or even freaking javascript, you have an interactive environment and a rich set of libraries with which to operate on your data format.

By manipulating your data format in these languages instead of in shell, you gain a much finer degree of control and greater degree of expression than you can manage in the primitive language of shell. Further, you do it without any requirements on representation – the bits that are human readable you can manipulate easily as strings. The bits you aren’t you can either ignore or process using whatever libraries for operating on that binary data your language offers you.

This entry was posted in programming on by .