Category Archives: Uncategorized

There is no single acceptable defect rate

This is a response to a good piece called “Infrastructure as code would be easy if we cared”. It’s not required reading for this post, but I recommend it anyway, and I mostly agree with its message.

But there’s one point in it that I feel the need to quibble with, partly because I spent so long making it myself before I realised it was bogus:

But any real business:

  • Knows what their acceptable defect rate is
  • Is already operating at it

This isn’t true.

I’m not going to quibble about “any real business” or “know”. I could, but it would be needless pedantry.

What I want to quibble with is the idea that businesses operate at their acceptable defect rate.

The defect rate a business operates is not a single number that is magically acceptable when all others aren’t. It’s a rate they can’t afford to improve: When making improvements to your defect rate costs more than the reduced rate of defects would save you, you stop trying to improve the rate.

You might call that the acceptable defect rate if you like, but if you do the point is that the acceptable defect rate is inherently unstable and the idea that it would remain the same under changes to your work flow is just untrue.

This is not a trivial point: It means that you can reduce the defect rate and get more money if you change the economics. A reduction in defects that is currently non-viable because it would cost you 100 person hours could suddenly become viable if it cost you 50 person hours instead (it could also become viable if e.g. new regulations come in that make some of those defects really expensive).

And some things do change the costs of reducing defects. Obviously I think Hypothesis is one of those things, because it increases your ability to find defects much more quickly, thus decreasing the cost of fixing them. This is a comparatively rare example of something that changes the cost of finding and fixing defects while keeping most other things fixed, and I genuinely believe that this reduces the defect rate in software.

It’s always tempting to think of the world as immutable and hard to change, but the reality is that it’s mostly just the result of large systems responding to costs and incentives, and small changes in those costs incentives can produce remarkably large effects if you give them time to work.

This entry was posted in Uncategorized on by .

Designing a feature auction

I’m thinking of doing a crowd funding campaign for Conjecture.

One of the things that makes this a nice proposition is that there’s an almost unbounded amount of work I can do on it, but there’s also quite a nice finite core that it would still be very useful to make really good and would take much less time.

However how would I decide which of the unbounded work to do? The classic model seems to be stretch goals. I hate stretch goals, particularly ones where the stretch goal required to make it useful for you is 2 or 3 items down the list (I’m looking at you, people who put android support as a stretch goal).

The obvious answer is to let people pay for work. I was originally thinking in terms of reward tiers there, but I’m not a huge fan of that. If I declare that, say, ruby support costs £3000 and 300 ruby developers are willing to chip in £10, I should do ruby support. That seems like exactly the point of crowd funding.

So I started sketching out how I’d like this to work and came up with a system I’m pretty pleased with. I’m not sure if it’s a good idea, but it’s a nice design, so I thought I’d share.

It’s based heavily off the single transferable vote method of proportional representation but with some tweaks to fit the problem.

Usage: When creating a campaign, you set an “initial cost” figure (this should be the same as your campaign goal). You also specify a list of additional features, in your personal priority order, with costs attached to them.

Every pound (dollar, euro, whatever) people contribute to your crowd funding campaign then gives them voting power to choose which of these features is implemented. They vote by simply listing their feature preferences in order of most preferred to least. They can list as many or as few of the features as they like.

First, everyone pays for the initial cost you set. Everyone pays the same fraction of their contributuon, chosen to exactly match the cost

Voting now proceeds in rounds. In each round there are a number of active features – inactive features have either been chosen to be implemented or disqualified. Additionally each voter has a set of remaining funds, which starts at the amount they contributed and is reduced as they pay for features.

Each contributor votes for their current most preferred active item. If they’ve run out of active items they care about they vote for your most preferred active item (this part ensures people that you don’t get “free money” – as much money as can be spent on features will be spent on features, it’s just that if people don’t express a preference you get to choose).

Now, some features may have been funded: Any feature for whom the total remaining funds of the people voting for it exceeds its cost is funded. The feature is chosen to be implemented and marked inactive. Anyone who voted for it now has their remaining funds reduced by the same fraction, so that just enough is spent to cover the cost. E.g. if Ruby cost £3000 and there had been £6000 available funds, each person voting for it would spend half their remaining funds.

If no features were funded, one of the features drops out. Take the feature that is furthest from being funded (i.e. cost – funds allocated is highest), breaking ties by picking the one that is lowest in your preference order. This is disqualified and marked inactive.

This process is repeated until the total funds remaining is smaller than the cost of any active feature.

If it’s also less than the cost of any feature that was not chosen, stop. Otherwise, start again from the beginning with the current remaining funds and only the set of features that were not chosen previously.

Repeat this until you make it through an entire vote without choosing anything. At that point just fill in the remaining features in order of your preferences.

Design notes

  1. It is a little complicated, as there are a bunch of edge cases I noticed when writing this up, but I think it’s simpler to use than to describe, and most of those edge cases contribute to making it better for both you and the contributors.
  2. I’m not sure how essential the use of the preference list is. Certainly the “uncast votes go to your preference list” is quite useful, because it lets you shape the results in your direction while still complying with peoples’ wishes – e.g. if Ruby and Lua both cost £3000 and currently have £2500 voting for each, but I prefer to do Lua and now have £1000 in my spare change pool, I get to choose Lua.
  3. The multiple repeats thing is annoying, but it feels unfair to just go straight to the preference list.
This entry was posted in Uncategorized on by .

Structure aware simplification for Conjecture

I’m continuing to work on some of the issues I described in my last post about simplification for conjecture. I’ve got something that needs further work but is pretty good.

The idea is to let the data generation provide structure hints. As such it adds two additional functions to the conjecture API: One starts an example, another ends it. Examples can be and indeed routinely are nested.

We use this API to produce a series of intervals as [start, end) pairs. These then get to be the main focus of where we try to shrink.

So here’s how the shrinking works given that:

Every time we run a test, if the test fails we replace the current best buffer with the buffer that triggered the failure, trimming it to only the initial segment that was read (we only ever consider buffers that are simpler than the current best, so we don’t need to check that it is).

When we do this we also record all the intervals in the buffer that correspond to examples. We sort these from most complicated to simplest in the same order that we use for buffers – that is, an interval is simpler than another if it is shorter than it or is of the same length but lexicographically before.

The first step is to try to prune the buffer down as much as possible by throwing away parts of it that don’t matter. We greedily delete intervals, starting from most to least complicated. If a deletion succeeds, we repeat this until no intervals can be deleted.

Some details:

  1. A trick: If we start from the most complicated each time then we’re going to end up doing a lot of work that never succeeds. What we actually do is that we maintain an index variable and always take the interval at that index in the list. When we hit the end, we either stop the process or if we’ve made any changes we start again from the beginning. This works despite the fact that the list of intervals may be changing under us.
  2. In my test cases there end up being a lot of small examples and it’s not worth trying to delete them all in this crude trimming pass. The current magic number here is that we ignore any intervals with fewer than 8 bytes.

In my test cases (which are not very representative it must be said) this typically cuts the buffer size by a factor of ten. It’s also a lot faster than structure ignorant shrinking of this sort was – if the structure gave nothing but this then it would be a win.

Starting from that buffer, for each interval (starting from the least complicated to most) we then perform a simplification I’m calling improve and clone.

We first try to improve the interval as much as possible: This involves repeatedly applying bytewise simplification operations (basically:

  1. For each index, try replacing the byte with a smaller one and see if that helps. A linear probe is actually OK here though I use something slightly smarter to cut down on constant factors)
  2. Try a replacement chosen uniformly at random that sorts lexicographically before the current value.
  3. Interpreting the interval as an unsigned n-byte 64-bit integer, try subtracting one.

You should normally avoid operations that look like the third one (actually the second one isn’t great either), but note that we only try it once and then we cycle back to to bytewise lowering, which will tend to then feed on its results to produce a lot more of a shrink.

Once that’s done we try cloning the interval: We take the segment currently in that interval and try replacing other intervals with it. We only choose intervals which are at least the same length (so we don’t make the buffer longer) and are “not too much larger” (and, if the intervals are the same size, sort lexicographically larger than the current one). The current upper bound is no more than twice as large, or no more than 16 if that would be less than 16. We do this regardless of whether any changes were made in the early phase.

The way the cloning phase is mixed in is important: As well as being a generally good strategy, by doing the example cloning on the element we just shrank we can avoid duplicating a lot of our hard work when it comes to shrinking other things.

We use the same index trick to do this for every interval until there are no more improvements or we run out of time / hit other configuration limits.

This still needs  more work, but for the examples I’ve tried it on (from Hypothesis’s example test suite) it appears to be nearly as good as Hypothesis’s simplification now (i.e. probably better than most other Quickchecks). This is only for some of the structurally simpler examples (they’re large but not very complicated) and I suspect as I flesh it out further I’ll find limitations to this approach, but they’re still examples that the initial attempts struggled massively on, so it’s pretty encouraging.

This entry was posted in Uncategorized on by .

The two voices of progress

There are two voices in my head.

Both are just my inner monologue of course. But sometimes it speaks with one voice, sometimes another.

One voice says: Here’s how you can make this happen.

The other says: Here’s what will happen if you do that.

Historically I often described the former as the software developer’s role and the latter as the ops role. I think the reality is that everyone needs both, and not just because we’re all devops now.

There are many ways to fail to make progress, but ultimately all of them come down to hitting some sort of obstacle and failing to be able to get past it.

Sometimes the obstacle is a creative block: You could get past it if you just had the right solution. Other times it’s a foresight block: You needed to plan for this six months ago, and if you didn’t there’s actually not much you can do here. Different voices tell you how to overcome different obstacles.

I think most people are predisposed to listen to and value one voice over the other. Certainly historically I’ve been much more of a “here’s how you can make this happen” type of person, and have gradually learned that the other voice has useful contributions that would have prevented this debacle we’ve found ourselves in if I’d only listened to the other one.

But you don’t have to be good at listening to both voices when they’re inside your head, because the voices come from outside your head too: Your team mates.

The problem comes not when you’re not good at thinking in both ways, the problem comes when you discount what people who are good at thinking in the way that you’re not are saying as unimportant.

This is probably familiar to anyone who has tried to be the voice of reason in the room: A bunch of cowboys dismiss your concerns as probably not very important in practice and how likely is that to happen? Lets be pragmatic here.

My impression is that right now the tech industry is heavily biased towards the “lets make it work” voice, as exemplified by the “move fast and break things” attitude, but it goes the other way too. When you see everything constantly being broken around you it’s tempting to think that if only everyone would just listen to you everything would be great.

I think this is a mistake too. The danger of being too far on the predictive side is that it prevents you from experimenting. It also tends to produce a sort of analysis paralysis: Many ideas that have proven to be very successful would never have been started if their creators had realised how much work they were going to be.

I think in many ways the optimal way to do progress is to have the two voices handing off to each other. Experiment, then stabilize, then repeat. Skip the experiment step and you don’t get anywhere, skip the stabilize step and you get somewhere you didn’t want to be.

It seems very hard to get these two approaches in balance. I’d like to fix that. I’m not sure how to make it happen, but I’m pretty sure that what will happen if we do is pretty great.

This entry was posted in Uncategorized on by .

Apparently I’m still doing this fanfic thing

After a discussion on Twitter about the mechanics of wishing (a niche interest of mine, and apparently of others) I somehow ended up being talked into / talking myself into writing some fanfic of Disney’s Aladdin.

Yeah, I know.

I keep thinking “It’s completely impossible to write this because  once (event happens) then (character) has too much power and you just can’t construct a plot around that” and then I come up with a great solution to that problem and I come up with a scene around that and then I write that scene.

The result is that it’s not really a whole fiction so much as a set of isolated scenes that probably make sense if you’ve seen the movie and definitely don’t if you haven’t. There will almost certainly be a certain amount of backfilling and it may eventually turn into a complete story in its own right.

Current features:

  1. The Genie features a very large number of rules, which are dynamically altered as people try to work around them. WIshing is powerful but you simply can’t bootstrap into godhood with it. Mostly because every time I think of a way of bootstrapping into godhood I write a scene in where someone tries to do that and then the rules are added to in a way that prevents it.
  2. Everyone who gets their hand on the lamp (and it’s not just Aladdin and Jafar) is severely clued in and makes intelligent use of their wishes.
  3. Jasmine is a character with a great deal of agency and does not do the whole “I’m not just a prize to be won!” thing followed by having all her agency taken away and acting as a prize to be won in a fight between two men.
  4. Jasmine is scary. Don’t mess with her. You’ll regret it.

If that sounds appealing, here’s the work in progress.

This entry was posted in Uncategorized on by .