Category Archives: programming

In defence of developer exceptionalism

Epistemic status: I am thinking through a possible position out loud. While I believe the argument I am making in this post, I am not totally convinced by it. I am aware of a number of counter-arguments that I do not touch on here.

There’s a rather annoying attitude that often comes up: Software Developers are special and you should treat them differently from everyone else when designing how your organisation works.

I don’t think this is true. Software development is knowledge work with a fairly logical (though less logical than we like to pretend) basis, and most of the specific features of it as a profession just emerge out of that.

But I am starting to think it might be useful.

There is a category of argument I have often encountered in negotiation (not at all coincidentally, usually right before I turned down a job offer or shortly before I left a company). I call it the “We can’t treat you fairly because then we’d have to treat other people fairly too” argument. e.g. I’ve seen this in requests for flexible working (I like four day weeks a lot, and will in general happily take 80% of the salary for a 4 day week) where the response to the request is “But loads of people want that and if you had it they’d ask why they can’t have it too.”

My attitude to this in the past has always been “Well, why can’t they have it too?”

This is, of course, still my attitude, but I’ve come to accept that it will win the argument precisely 0% of the time.

The problem with accepting arguments of the form “Why should you get this treatment if everybody else does not?” is that they remove all possibility of incremental change. Even when everybody should get something, there’s probably no route to that that doesn’t pass through some people getting it and not others.

One example of where this comes up in a more explicitly developers vs non developers scenario is in responses to things like “Why you should not interrupt a programmer“. Every time this or similar is posted, a lot of my friends become angry and say “Programmers aren’t special! You shouldn’t interrupt anyone!”.

This is more or less true (there are large categories of jobs where it’s not only fine but desirable to interrupt them because that’s what their job is. But for basically all knowledge work and a whole bunch of adjacent jobs it’s true). You shouldn’t interrupt anyone.

But it’s a lot easier to win arguments about not interrupting programmers.

And, perhaps, it’s a lot easier to win arguments about not interrupting other people once you’ve established a norm that it’s not OK to interrupt programmers?

And this is why I think developer exceptionalism might be useful. Not because developers are exceptional, but because it is a place where we can put the thin end of the wedge.

A lot of how business is run is very dysfunctional, and exceptionalism gives us a convenient way of side stepping that.  If your business is very interrupt driven but you can establish a small island of uninterrupted calm around your programmers, maybe later the designers can come join you on the island? And maybe after that marketing can note that actually a lot of their work involves pretty intense thinking like that and they’d like to be uninterrupted at least some of the time and…

I can’t promise that it will work, but I’m pretty sure it won’t work without. Systems are quite stable without something to shake them up, and without some specific argument that you should be treated specially, you’ll be treated the same way as everyone else: Badly.

Exceptionalism and solidarity feel fundamentally at odds, and perhaps they are, but perhaps solidarity without the permission to occasionally engage in exceptionalism is just a crab bucket, and the best way to get everyone to where you want them to be is to let some people get ahead by whatever means necessary and then pull the others up behind them?

Of course, if this is going to work you have to actually pull people up behind you.

The correct progression is:

  1. “You can’t have X”
  2. “But developers are special!”
  3. “OK fine developers can have X”
  4. “You can’t have X”
  5. “But the developers have X and it worked really well at improving Y which is also our problem”
  6. (Developers chime in) “We can totally back that up. This is what we’ve seen out of X and it really does help with Y”
  7. “Oh OK, you can have X too”

An incorrect progression would be:

  1. “You can’t have X”
  2. “But developers are special!”
  3. “OK fine developers can have X”
  4. “You can’t have X”
  5. “But the developers have X and it worked really well at improving Y which is also our problem”
  6. (Developers chime in) “No! X is only for developers! You are not a developer and thus are not special and cannot have X”
  7. “You can’t have X”

If developer exceptionalism is to be a useful lie then we need to do two things:

  1. Never forget that it is a useful lie and not actually the truth.
  2. Have other peoples’ back when they try to follow our lead.

Otherwise it’s just developers being privileged jerks, which we have quite enough of already.

 

This entry was posted in life, programming on by .

Language reconstruction based fuzzing without reconstructing languages

Note: This idea has not been implemented yet so may turn out not to work. It all clicked together while I was walking around Paris this afternoon and I’m writing it down while it’s fresh in my mind. I think it will work though.

Secondary note: This post may be too inside baseball for you.

An idea that I’ve been banging my head against recently is that of reconstructing languages corresponding to tests based on some bounded data. I’ve posted a bit about its use for reducing, but I’ve also been interested in its use for fuzzing.

Here’s a problem Hypothesis experiences: The | operator is currently not associative. x | (y | z) will produce a different data distribution than (x | y) | z. This is suboptimal. I could special case it (and for a long time I thought I had, but apparently I dreamed that code), but it’s easy to come up with trivial ways to defeat that. e.g. (x | y).map(lambda x: x) | z should really have the same distribution as x | y | z but doesn’t if you special case |.

But under the hood Hypothesis just works with streams of bytes, so in theory you can represent its data distribution as a finite state machine. You could then randomly walk the state machine reweighting your choices by the number of reachable states and thus get much get better coverage of the set of available paths by reaching rare paths much more often.

Or, somewhat suggestively, here’s another algorithm that could work well and is a lot simpler than reweighting:

  1. Pick a state uniformly at random
  2. Take the shortest path to that state (or, indeed, any random path to that state).
  3. Continue the random walk from there

Unfortunately we run into a major problem with either of these approaches: Reconstructing regular languages for unknown predicates is really hard.

The insight I had this afternoon is this: We don’t need to! We can use the same trick as I used to produce self-optimizing boolean expressions to just store exemplars of distinct states we’ve found. We then just pick a random state we’ve already discovered and run the generation from there (Hypothesis can extend any string to a complete but possibly invalid example).

How does this work?

We maintain three sets of data:

  1. A corpus of examples such that we know that every element of this list definitely corresponds to a different state in the state machine for our predicate.
  2. A set of previously seen strings (this doesn’t have to be precise and can e.g. be a bloom filter)
  3. A binary search tree with branches labelled by strings and leaves labelled by indices into our list of examples.

We populate this as follows:

  1. The corpus starts as the empty string
  2. The set of previously seen strings contains just the empty string
  3. The binary tree is a single leaf pointing to 0 (the index of the empty string).

We then fuzz as follows:

  1. Pick a random element of our corpus.
  2. Generate an example such that that random element is a strict prefix of the example (it’s easy to do this in Hypothesis).
  3. For each proper prefix of that example, perform the incorporate operation.

The incorporate operation when run on a string s is:

  1. If the string is in our set of seen strings, stop. Else, add it to the set.
  2. Walk the binary tree until we hit a leaf. At each branch look at the label e. If the concatenation of s followed by e is valid, walk right. Else, walk left.
  3. Once we are at a node, find the string in our corpus index by it. Call that t. By repeatedly fuzzing extending s and t, attempt to find e such that s + e and t + e produce different results. If we do find such an e, add s to the corpus as a new element and split the current tree node, creating a branch labelled by e and with the left and right children being leaves containing s and t (which one is which depending on which way around the difference occurred). Otherwise, if s is a better example than t (shorter, or same length and lexicographically smaller), replace t in the corpus with s.

The experiments we split on witness that two nodes must be in different states (cf. the Myhill-Nerode theorem), so we guarantee that our corpus consists of unique states. The fuzzing step is probabilistic, so we can sometimes discard a state inappropriately, but this shouldn’t be a major problem in practice – states that we can’t distinguish with high probability might as well be the same from a fuzzing point of view.

Anyway, like I said, it’s currently unclear how useful this is because I haven’t actually implemented it! In particular I suspect it’s more useful for long running fuzzing processes than for the sort of fast fuzzing Hypothesis tries to do. Fortunately I’m planning how to incorporate such workflows into Hypothesis. However I definitely still think it’s a good idea and I’m looking forward to exploring it further.

This entry was posted in Hypothesis, programming on by .

Self-optimizing boolean expressions

This is a trick I figured out this morning. It seems to work (in that I have a prototype that works just well enough to pass a single test), and I think it might have some interesting applications. I haven’t yet decided whether it’s more than a fun toy though.

It’s entirely possible (likely, even) that this is a well known idea that has some fancy name that I don’t know. I was doing some digging on related areas and couldn’t see anything like it, but that just means I wasn’t looking in the right places.  It may also be that it’s never been written up because it’s a bad idea. Usual disclaimers about having clever ideas apply.

The idea is this: We’re working with boolean expressions over a set of variables. We want a canonical representation of them. We could use reduced ordered binary decision diagrams, but we dislike exponential blow-up. So what do we do?

There are a whole pile of extensions to the idea of BDDs which relax constraints, add features, etc. We’re not going to do that. Instead we’re going to canonicalize as we go using the awesome power of mutability.

For this trick we will need:

  1. A SAT solver (this problem is intrinsically equivalent to solving SAT, so there’s no getting away from that. We might as well let someone else do the hard work). I used minisat for my prototype. This would presumably work better with one of the various patched minisat extensions.
  2. A single mutable binary tree with leaves as boolean expressions and splits labelled by sets of variables.
  3. Union find.
  4. Hash consing (with weak references if you prefer but it will hurt speed)

Our working representation for boolean expressions will be that an expression can be one of:

  1. A literal true
  2. A literal false
  3. A single variable
  4. The negation of another boolean expression
  5. The conjunction (and) of two other boolean expressions

We hash cons the representation of our expressions so that any two structurally equal expressions are reference equal. We then make them our items in union find, so additionally every expression has a reference to another expression, or to itself if it is our current canonical representation for equivalent expressions.

Now, every time we introduce a new expression that is not previously in our hash cons we want to find out if it is equivalent to any expression we have previously seen. If it is, we perform a merge operation in our union find, rooting the more complicated of the two in the simpler of the two. If not, we add it as a new root.

Obviously we don’t do that by searching through every expression we’ve seen so far. This is where the binary tree comes in.

The binary tree starts with a single branch keyed with the empty set with the literal false as the left branch and the literal true as the right branch. When we insert a new node, we walk the tree as follows:

At a split node, look at the set of variables and determine what value this expression has when those variables are set to true and every other variable is set to false. If that result is true, walk to the right leaf. Else, walk to the left branch.

Once you get to a leaf, look at the expression that is currently there.  Using a SAT solver, attempt to find a variable assignment that produces different results for the two expressions.t eIf the solver doesn’t find one, the expressions are equivalent. Merge them. If the solver does find one, split the leaf out into a branch with the label as the set of variables that are true in that assignment and false otherwise.

Anyway, once you have this data structure, you can always cheaply convert an expression that you’ve already constructed to the simplest equivalent expression you’ve seen so far. Testing equivalence of expressions then becomes equally cheap – just simplify both and see if you get the same answer.

This is of course a value of cheap that involves having run a large number of calls to a SAT solver to get to this point, but for the case where you’ve got a large number of small expressions which you’re going to want to then pass to a SAT solver all at once later I think that’s probably not too bad – passing the individual expressions to a SAT solver is extremely cheap, and then you just convert the whole batch to CNF together rather than trying to insert them into the data structure.


 

Notes:

  1. The tree will almost certainly get unbalanced. Occasional rebalancing probably becomes necessary. I’m not sure what the best way to do this is going to be, as it’s not actually an ordered tree in the traditional sense. I think something Splay Tree like could be made to work.
  2. You have to decide what constitutes a “better” expression in an implementation of this, but some sort of heuristic about number of reachable distinct subexpressions is a good start.
  3. It is probably worth putting some effort in to minimizing and reusing the expressions you get back from the SAT solver. It depends a bit on what the hit rate is likely to be.
  4. There are some optimizations you can do to make the evaluation a bit more efficient that I haven’t gone into here.
  5. There are some simple algebraic rewrite rules it’s worth performing before farming anything off to the sat solver.
  6. You can also optimize a little based on the fact that literals, variables and their negations are always distinct from each other.

 

This entry was posted in programming on by .

Shrinking failing input using a SAT solver

Advance warning: This is mostly me writing up a negative to neutral result. The technique described in this post works OK, but it doesn’t seem to work better than a well optimized delta debugging and as such does not currently justify its complexity. It also doesn’t work very well when you start from large examples, but that’s fortunately not a huge problem using the ideas I outlined in my last post because you can work with smaller sequences until the example is small enough to be feasible when approached bytewise.

My hope was that it would give a system that was much better at finding longer intervals to delete (delta debugging only guarantees that you can not delete any byte. Guaranteeing that you cannot delete any interval provably requires a quadratic number of tests to check, but I was hoping that this would be able to exploit structure found in the string to find more intervals that pure delta debugging would have missed.

I’ve got some ideas I want to pursue to extend it that might make it worth it, but this post just describes the current state of work in progress. Also I might end up incorporating it into my shrinker anyway and hope that I can improve its working with more cleverness later (I already have a prototype of doing this).

Anyway, disclaimers over, on to the idea.

We are trying to take a string \(S\) and a predicate \(f\). We are trying to construct a string \(T\) of minimal length such that \(f(T)\) is true.

This builds on my previous idea of using regular language induction to aid shrinking. The version I described in that post appears to be too slow to make work.

The core idea is still that we are looking for a DFA with fewer than \(|S|\) nodes such that every accepted string satisfies \(f\). If we can find such a DFA then we can shrink \(S\) because the shortest path to an accepted node must have no more than the number of nodes, which is fewer than \(|S|\) by assumption.

The idea is to try to manufacture a DFA from \(S\) using a SAT solver (Disclaimer: I actually used the naive algorithm that they aimed to replace because I didn’t quite understand their work and wanted to prototype something quickly) to colour the nodes in a manner consistent with the transitions observed in \(S\). (i.e. if we colour the indices \(i\) and \(j\) the same and \(S[i] = S[j]\) then we colour \(i + 1\) and \(j + 1\) the same).

We maintain a set of pairs of indices which we know do not correspond to the same state in the DFA. We initialize it to \(\{(0, |S|\}\) at the start and will add to it as we go.

We now produce a colouring consistent with that set and the transition requirement.

If that colouring gives a unique colour to every index, stop. There is no DFA with fewer than \(|S|\) nodes that matches \(S\) and.

Otherwise, produce the quotient DFA and look at the minimal path to an accepting node. If it satisfies \(f\), we have shrunk the string. Start again from the beginning with the new string (this usually doesn’t work, but it’s worth checking).

If that didn’t work, we now produce either a strictly smaller string satisfying \(f\) or we increase the size of our set of known inconsistencies as follows:

For each colour, we look at the smallest and the largest index in it, \(i, j\). If deleting the interval \([i, j)\) from the  string produces a string satisfying \(f\) then we have shrunk the string. Start again from the beginning with the new string. Otherwise, we have proved that the subsequences of \(S\) up to \(i\) and \(j\) respectively must live in distinct states (because following the suffix of \(S\) starting at \(j\) produces different results). Add \((i, j)\) to the set of inconsistencies. Do this for each colour (starting from the ones which produce the largest gap so we maximize how much we shrink by), and then if none of them produced a shrink then try colouring again.

Eventually we’ll either hit the point where we need all \(|S| + 1\) colours to colour the indices, or we’ll have shrunk the string.

Note: I think this doesn’t actually guarantee even 1-minimality like delta debugging does. The problem is that you might have an index that is not equivalent to its successor index even though you can delete the character there. I’m not currently sure if this algorithm can witness that possibility or not.

Anyway, I’ve been running the example I used to prototype structureshrink (the C++ hello world) with this as the list minimization algorithm. It seems to do alright, but it’s nothing to write home about, but it’s been an interesting experiment and I’m glad to have got the excuse to play with minisat.

This entry was posted in programming, Python on by .

Delta debugging with inferred separators

Delta debugging, or something akin to it, tends to be phase one of most file based shrinkers: Shrinking is much easier the faster your test function is, test functions tend to be faster on smaller data, and shrinking is also easier on smaller data because you’ve got fewer things to try, so anything you can do to prune down the file before you get onto the hard work is good, and delta debugging like algorithms do a pretty good job of that.

But delta debugging in its original form is very line oriented: You’re trying to make the file one-minimal in the sense that you can’t delete any single line. What do you do if you want to delete within lines? Or if your file is binary and not line oriented at all?

Delta debugging really works on arbitrary sequences, so you can always just do bytewise delta debugging and it will work perfectly well, but there are usually a lot more bytes than there are lines, and deleting an arbitrary block of bytes is much less likely to work than deleting a sequence of lines is. So it works, but it takes what is often a rather slow process and makes it even slower.

I spotted a trick this morning that works rather well. Define the set of separators for a string as containing all one or two byte substrings.

Now for each separator, we perform delta debugging on the file split by that separator (i.e. we treat it as if it were the newline in our classic delta debugging: We split by separator, try to minimize that sequence subject to the constraint that joining by that separator is interesting).

On it’s own, this works but not all that well because most separators don’t work or work no better than shrinking bytewise, but there’s a refinement we can perform, which is that we can give each separator a score: The score of a separator is the smallest length in bytes of any component in the split. We then process separators in decreasing order of score.

The reason this works is that this guarantees that on a successful shrink we always delete at least the score of the separator many bytes (because we always delete at least one component). This lets us focus on separators which produce large shrinks first, so by the time we get to less useful separators we should have much less to do and already be running pretty fast.

So far I’ve only run this on one example (John Regehr’s from Reducers are Fuzzers), but it works pretty well there – much better than a straightforward bytewise delta debugging certainly, and the file has a lot of reduction potential other than newlines).

In this example at least it’s particularly interesting noticing which separators seem to work well – the obvious ones like ‘ ‘ and ‘\n’ do of course, but ‘{‘ worked particularly well for this example, as does ‘.c’ and ‘e;’. I think this is because they are by luck or reality capturing some nicely exploitable facet of the structure of the example, which suggests this idea has broader application.

I’m not sure these techniques will make it into Hypothesis, as Hypothesis has some special hinting to figure out what’s best to delete that is probably already good enough, and this seems like it would be a technique that worked better for the problem of reducing slow operations on large files, but I’m still pretty pleased with it and will probably find an opportunity to use it somewhere.

This entry was posted in Hypothesis, programming, Python on by .