David R. MacIver's Blog: Situated Software

Situated Software

20 November 2018

Attention Conservation Notice: Sometimes I write about tech.

There’s a concept I’ve been using recently that I find quite helpful as a descriptive tool, which is the notion of software being situated.

Software which is situated is software which is highly specific to some particular context - e.g. a single person, a single task or category of task, a single computer. It is non-reusable outside of that context, and as a result it is able to make very in depth (and usually implicit) assumptions about reality.

This isn’t a binary. All software has a context, so no software is truly unsituated, but that context can be more or less specific. The question is not whether software is situated, but where and how strongly it is situated.

There are a number of interesting ways that software can be situated.

Hypothesis is not especially situated software (libraries tend not to be), but Hypothesis’s build system is very situated despite being designed to run on a variety of different computers and CI systems - it is adapted to a very specific task (running Hypothesis build tasks).

Another situated codebase I work on is the code that powers my notebook. This code is what Simon Tatham calls SymbiosisWare. Code that is part of the extended self rather than a distinct project in its own right (although jml has recently adapted the code for his own use. It’s important to remember that even situated software can be reused, it just requires adaptation to the new environment). In this case the context to which it is situated is that of a single person.

Another piece of situated software is the code that exists to manage experiments I run on my workstation, Szalinski. Szalinski is a bit of a beast of a machine and I use it to run experiments on test-case reduction, so there is a lot of code that is very specific to it and its storage environment. This code is mostly written so that it could be run on another machine with broadly similar configuration, but we’d almost certainly find a huge wealth of baked in assumptions by making it do so.

A fourth category of situated software is most software companies’ internal code (except the code for software that they’re actually shipping to end users, if there is any): Generally a lot of the individual applications are not that situated, because you’ve generalised them enough to make them run in dev, staging, and production, but there’s still a large ecosystem of glue and deployment code that is highly situated to a specific environment that is only found within the company.

Note that the same code base can contain a variety of code that is situated to a greater or lesser degree. Your junk drawer module is probably full of highly situated code even if the project it supports is not especially situated, and many highly situated codebases will contain nice well factored code that you could easily extract out into its own reusable library if you wanted to.

“Situated” is not a negative term. I like all three of the situated codebases I work on. The great thing about situated code is that it’s easy to create, and because you don’t have a huge range of use cases to consider it is also often very easy to modify. This can make it worthwhile to create highly situated code in cases where less situated code would be too high cost.

It’s not a positive term either. Situating software absolutely has its downsides.

The first is the nature of situation - if you have a highly specific context outside of which the code is not useful, then the code is not useful outside of that highly specific context! The Hypothesis build system is great. We keep thinking about turning it into a generic tool, but honestly that sounds exhausting so we never do. As a result Hypothesis’s build has some great functionality that nobody else gets to benefit from.

The second is that situated software can be difficult to get started with as a newcomer. It gains much of its power from being embedded in a particular context, but that context is often implicit to some large degree, which means that as a newcomer you essentially have to reverse engineer it. In the same way that you have to be embedded in the context to use the situated code, you have to be embedded in the context to even understand the situated code, and that comes with a certain amount of up front work. I’ve been seeing this in the context of getting some students up to scratch on Hypothesis internals - there’s a lot of highly situated code in the insides of the fuzzer, with all sorts of non-obvious baked in assumptions.

Despite not giving a way to describe the quality of the code, I still find it a very useful descriptive tool, partly because it lets you name and observe certain trade offs that might not otherwise be obvious.

In particular, knowing that situated code exists and is valid is important because it gives you permission to write situated code! There are many circumstances in which this is a useful thing to do and I think we resist doing it because it feels like we’d be writing bad code.

But code is only good or bad in context, and I think many of the things we would dismiss as bad practice are only bad in the context of reusability. Much of the code we feel guilty about, we shouldn’t, because it’s actually perfectly acceptable code, it’s just highly situated.

Comments

pozorvlak on 2018-11-20 12:54:56:

I’m not entirely sure what features the Hypothesis build system has, but apparently NPM has libraries for the release-on-merge feature and some other helpful things: https://hackernoon.com/these-6-essential-tools-will-maintain-your-npm-modules-for-you-4cbbee88e0cb

Also, I see from the Hypothesis docs that you have problems with long Travis build queues. Have you considered using https://bors.tech/? It batches up all currently-approved pull requests into one big merge, finds failing PRs using binary search, and updates the master branch with the passing subset. This means you run fewer builds, you eliminate the “Your PR is out of date; update with latest changes?” race-condition, and your `master` branch always contains exactly what was tested.

pozorvlak on 2018-11-27 12:52:40:

Tying that back to the actual thesis of the post...

The NPM build-automation libraries are probably not directly useful to you, because they’re situated to the context of JavaScript/NPM (I haven’t checked, but this seems likely). It might be possible to generalise them to support Python/PyPI, but this may well be more effort than it’s worth; my main reason for mentioning them is that they might serve as useful design inspiration, for how to carve up the Hypothesis build system into libraries, how to design their APIs, etc. I’m thinking that UI similarity between JS and Python build-automation tools might reduce cognitive load on developers who work in both contexts.

Bors is an interesting study in situation. It’s a reimplementation of a tool developed by the Rust project, and still has some Rust-isms. It runs as a Web service that listens to GitHub webhooks, which makes it independent of your choice of CI tool(s), but not of your choice of version-control host. They provide it as a hosted service, and they’ve also gone to some lengths to make the service easy to run yourself using a variety of hosting providers, both by making it a twelve-factor app and by providing multiple sets of setup instructions and example config (even a “deploy to Heroku” button). Which is just as well, because it’s written in an unusual language (Elixir), that most potential users don’t know how to deploy. This makes me think that one way of understanding the twelve-factor-app design pattern (and, come to that, the “small program that reads and writes text streams” Unix design pattern) is as a strategy for making your code less situated. One can also think of containerisation this way: who cares about the precise environment needed for a program to run, if that environment can be cheaply and reliably duplicated wherever it’s needed? But of course, that just exposes us to the situational requirements outside the boundary of your environment-automation system: Heroku/CloudFormation/Kubernetes/Docker-compose can create web and database servers for you, but they can’t create the GitHub account to connect to.