Category Archives: Uncategorized

Things that are not documentation

I’m about  as bad at documenting my code as the average software developer, but I’m trying to be better. With Hypothesis I think I’m making good progress on that front, though I feel like it’s still very visibly someone who isn’t actually good at documentation but is trying to do better.

One of the first steps of doing better is not to fool yourself with substitutes. The following is a short, incomplete, list of things that developers falsely believe to be documentation:

  1. The code itself
  2. The tests for the code
  3. The comments in the code
  4. The type signatures
  5. The person who wrote the code being available for questions
  6. Discussions in your issue tracker

Here is a short, complete, list of things that are actually documentation:

  1. A document, written in a natural language, which describes how to use your software.

If it is not on this second list, it’s not documentation.

All of the first list are useful tools which aid program comprehension. The presence of good documentation does not remove the need for them. However the presence of them also does not remove the need for good documentation.

The confusion comes, I think, from people confusing “substitutable to a point” with “substitutable”. Type signatures, or tests, can fill some of the use cases of documentation, and can reduce its need for a time, so it’s tempting to think of them as a sort of documentation, but they cannot actually fill the niche of comprehension that documentation enables.

Let me try an analogy: Consider coffee and sleep, subjects dear to my heart. Can you substitute coffee for sleep? Certainly, up to a point – if you’ve had a bad night, coffee will help. Can you substitute sleep for coffee. Certainly. I’ve heard rumours from people who are familiar with the concept that if you have a good night’s sleep then you need less coffee the next day. Can coffee improve alertness even in the presence of enough sleep? Yep.

Is coffee a type of sleep? Uh, no.

The fact that two tools solve overlapping problems is no excuse for confusing them.

Why am I taking such a hard line about this?

It’s because developers hate writing documentation but know that it’s a thing they’re supposed to do.

So if you let people believe that something that is not documentation is documentation, they’ll just do that instead and tick the box that says “Yep! I documented it”, and feel good about themselves for having writing code that does not, in fact, have documentation.

 

This entry was posted in Uncategorized on by .

Revising some thoughts on test driven development

Epistemic status: Still thinking this through. This is a collection of thoughts, not an advocacy piece.

I’ve previously been pretty against TDD. It is possible that this has always been based on a straw understanding of what TDD is supposed to be for, but if so that is a straw understanding shared by a number of people who have tried to sell me on it.

I am currently moving towards a more nuanced position of “I still don’t think TDD is especially useful in most cases but there are some cases where it’s really amazingly helpful”.

Part of the source of my dislike of TDD has I think come from underlying philosophical differences. A test suite has two major purposes:

  1. It helps you prevent bugs in your code
  2. It acts as executable documentation for your code

As far as I am concerned, the important one is the first. People who think their test suite is a good substitute for documentation are wrong. Your code is not self-documenting. If you haven’t written actual for reals documentation, your code is not documented no matter how good your test suite is.

And my belief has always been and remains that TDD is actively harmful for using a test suite as the first purpose. Good testing is adversarial, and the number one obstacle to good testing (other than “not testing in the first place”) is encoding the same assumptions in your tests as in your code. TDD couples writing the tests and the code so closely that you can’t help but encode the same assumptions in them, even if it forces you to think about those assumptions more clearly.

I am aware of the counter-argument that TDD is good because it ensures your code is better tested than it otherwise would be. I consider this to be true but irrelevant, because mandating 100% coverage has the same property but forces you to maintain a significantly higher standard of testing.

So if TDD is harmful for the purpose of testing that matters, it must be at best useless and at worst harmful, right?

As far as I’m concerned, right. If your goal is a well tested code base, TDD is not a tool that I believe will help you get there. Use coverage instead.

But it turns out that there are benefits to TDD that have absolutely nothing to do with testing. If you think of TDD as a tool of thought for design which has absolutely nothing to do with testing whatsoever then it can be quite useful. You then still have to ensure your code is well tested, but as long as you don’t pretend that TDD gets you there, there’s nothing stopping you from using it along the way.

Using tests to drive the design of your API lets you treat the computer as an external brain, and provides you with a tool of thought that forces you to think about how people will use your code and design it accordingly.

The way I arrived at finally realising this is via two related design tools I have recently been finding very useful:

  1. Writing documentation and fixing the bits that embarrassed you when you had to explain them
  2. Making liberal use of aspirational examples. Starting a design from “Wouldn’t it be great if this worked?” and see if you can make it work.

TDD turns out to be a great way of combining both of these things in an executable (and thus checkable) format:

  1. The role of tests as executable documentation may not actually be a valid substitute for documentation, but it happily fills the same niche in terms of making you embarrassed when your API is terrible by forcing you to look at how people will use it.
  2. A test is literally an executable aspirational example. You start from “Wouldn’t it be great if this test passed?” and then write code to make the test pass.

When designing new segments of API where I’ve got the details roughly together in my head but am not quite clear on the specifics of how they should all fit together or how this should work, I’ve found using tests for this can be very clarifying, and this results in a workflow that looks close to, but not exactly like, classic TDD.

The workflow in question is as follows:

As per classic TDD, work is centered around features. For example, if I was designing a database API, the following might be features:

  1. I can connect to the database
  2. I can close a connection to the database
  3. I can create tables in the database
  4. I can insert data into the database
  5. I can select data from the database

Most of these are likely to be a single function. Some of them are probably two or three. The point is that as with classic TDD you’re focusing on features not functions. I think this is a bit more coarse grained than advocated by TDD, but I’ve no idea how TDD as she is spoken differs from TDD as described.

Working on an individual feature involves the following:

  1. Start from code. All types and functions you think you’ll need for this stage are defined. Functions should all raise some error. InvalidArgument or similar is a good one, but any fatal condition you can reasonably expect to happen when calling that function is fine. If there is really no possible way a function could raise an exception, return some default value like 0, “” or None.
  2. Write lots of tests, not just one, for all the cases where those functions should be failing. Most of these tests should pass because e.g. they’re asserting that your function raises an invalid argument when you don’t specify a database to connect to. Your function considers all arguments to be invalid, so this test is fine!
  3. Any tests for error conditions that do not currently pass, modify the code to make them pass. This may require you to flesh out some of your types so as to have actual data.
  4. Now write some tests that shouldn’t error. Again, cover a reasonable range of cases. The goal is to sketch out a whole bunch of example uses of your API.
  5. Now develop until those tests pass. Any edge cases you spot along the way should immediately get their own test.
  6. Now take a long hard look at the tests for which bits of the API usage are annoying and clunky. Improve the API until it does not embarrass you. This may and probably will require you to revise earlier stages as well and that’s fine.
  7. If you’re feeling virtuous (I’m often not and leave this to the end) run coverage now and add more tests until you reach 100%. You may find this requires you to change the API and return to step 5.

Apply this to each stage in turn, then apply a final pass of steps 6 and 7 to the thing as a whole.

This isn’t very different from a classic TDD work flow. I think the units are more coarsely grained, and the emphasis on testing error conditions first means that you tend to start with tests which are passing and act as constraints that they should stay passing rather than tests that are failing and which act as drivers to make them pass, but it’s say no more than a standard deviation out from what I would expect normal TDD practice to look like.

The emphasis on error conditions is a personal idiosyncrasy. Where by personal idiosyncrasy I mean that I am entirely correct, everyone else is entirely wrong, and for the love of god people, think about your error conditions, please. Starting from a point of “Where should this break?” forces you to think about the edge cases in your design up front, and acts as something of a counterbalance to imagining only the virtuous path and missing bugs that happen when people stray slightly off it as a result.

So far this approach has proven quite helpful for the cases I’ve used it. I’m definitely not using this for all development, and I wouldn’t want to, but it’s been quite helpful where I need to design a new API from scratch and the requirements are vague enough that it helps to have a tool to help me think tem through.

This entry was posted in Hypothesis, Uncategorized on by .

The war cannot be won, yet still we must fight our little battles

I’ve only half jokingly referred to 2015 as the year I declare war on the software industry.

You could make a David and Goliath analogy here, but the thing is that instead of my felling the giant with my plucky shepherd’s weapon, what’s actually going to happen is that Goliath is going to put his hand on my forehead, hold me out at arm’s length, and laugh as I struggle ineffectually against his vastly superior strength and size.

But maybe, just maybe, if I struggle hard enough and push with all my might, I can get Goliath to take a single step back.

Hypothesis was my opening volley, as an attempt to raise the benchmark for quality – if I make it easy enough to find your bugs, maybe you’ll fix them?

As an opening volley, it’s a pretty weak one. Most of the reason why software is bad isn’t because it’s too hard to write tests, it’s because of social reasons – people are so conditioned to bad software at this point that it’s just not that much of a problem to release broken software into the world because people will use it anyway.

But maybe if we make it easier for the people who care to write quality software on time and on budget, we can start to change the norms. If you can choose between two equally shiny and feature-full pieces of software except that one actually works properly, perhaps you’ll start to care more about software quality, and if your customers start to desert you for the software that actually works, maybe you’ll really start to care about software quality.

Hypothesis alone will never achieve this, but each tool gives Goliath a nudge in the right direction.

Then, tired of the burden of free labour we put on people as an industry, I wrote it’s OK for your open source software to be a bit shitty.

Will it change minds? Maybe a few. The responses on the corresponding reddit thread were really a lot higher quality than I would generally expect of reddit. It certainly seemed to help a whole bunch of people who were concerned about the quality of their own open source work, and hopefully it has given people a weapon to defend themselves when dealing with people who feel entitled to their free labour.

Then I wrote the Two Day Manifesto, an attempt to attack the problem from the other end. If the problem is that companies are built on free labour, maybe they could contribute some paid labour back?

Probably not, but again maybe we can nudge Goliath in the right direction.

Because ultimately all of these efforts are mostly there in the hope that I’ll find just the right way to push the giant, or that I will manage to push at the same time as enough other people, and that maybe he will take just a single step back.

And then someone else, or more likely some other group, will make him take another step back.

And over time he will retreat to the horizon, and we will follow, still pushing.

The horizon retreats ever into the distance, but we can look over our shoulders and see how much territory we’ve reclaimed from the giant, and we will redouble our efforts and push further.

And maybe, maybe, one day we will circle the world, and come back to where we started, and he will meet our forces on the other side and discover he no longer has anywhere to go.

But that day is far in the future. We will never see it, nor will the next generation. We will not win this war, and perhaps we never will.

But we’re still going to push.

This entry was posted in Hypothesis, Uncategorized on by .

Honey I shrunk the clones: List simplification in Hypothesis

Simplification in Hypothesis is something of a vanity feature for me. I spend significantly more development effort on it than it strictly warrants, because I like nice examples. In reality it doesn’t really matter if the simplest value of a list of integers with at least three duplicates is [0, 0, 0] or [-37, -37, -37] but it matters to me because the latter makes me look bad. (To me. Nobody else cares I suspect)

So I was really pleased that I got simultaneous simplification in as a feature for Hypothesis 1.0. If a list contains a bunch of duplicate values (and because templating this is easy to check for – all templates are hashable and comparable for equality) before trying to simplify them individually, Hypothesis tries to simplify them in a batch all at once.

As well as solving the above ugly example, this turns out to be really good for performance when it fires. Even if your test case has nothing to do with duplication, a lot of the time there will be elements in the list whose value fundamentally doesn’t matter. e.g. imagine that all that is needed to trigger a failure is a list of more than 100 elements. The individual elements don’t matter at all. If we happen to have produced an example where we have a list of 100 elements of the same value, we can simplify this 100x as fast as if we had to simplify every individual element.

I’m doing a bunch of work on simplification at the moment, and as a result I have lots of tests for example quality of the form “In this circumstance, Hypothesis should produce an example that looks like this”. Some of them had a tendency to get into really pathological performance problems because they’re in exactly this scenario: They have to do a lot of individual simplifications of values in order to find an optimal solution, and this takes a long time. For example, I have a test that says that the list must contain at least 70 elements of size at least ten. This example was deliberately constructed to make some of the existing most powerful simplification passes in Hypothesis cry, but it ended up wreaking havoc on basically all the passes. The goal is that you should get a list of 70 copies of 10 out at the end, and the simplification run should not take more than 5 seconds.

This obviously is going to work well with simultaneous simplification: If you have a set of duplicate indices greater than 10, simultaneous simplification will move them all to 10 at once.

Unfortunately the chances of getting an interesting amount of duplication for integers larger than 10 is pretty low, so this rarely fires and we usually have to fall back to individual simplification which ends up taking ages (I don’t actually know how long – I’ve got a hard 5 second timeout on the test that it usually hits, but eyeballing it it looked like it got about halfway in that time).

So the question is this: Can we design another simplification pass that is designed to deliberately put the list into a state where simultaneous simplification can fire?

On the face of it the answer is obviously yes: You can just change a bunch of elements in the list into duplicates. Then if any of those are also falsifying examples we end up in a state where we can apply simultaneous simplification and race to the finish line.

Naturally there’s a wrinkle. The problem is that simplification in Hypothesis should be designed to make progress towards a goal. Loops aren’t a problem, but things which cause unbounded paths in the simplify graph will cause it to spend a vast amount of time not doing anything very useful until it gets bored of this simplification, declares enough is enough, and gives you whatever it’s got at the time (a lesson I should probably learn from my own creation).

Or, to put it more directly: How can we tell if a list of cloned elements is actually simpler. If we have [1, 2, 3, 4, 5, 6, 7, 8, 9999999999999999] we’d really much rather clone 1 all over the place than 9999999999999999.

The solution is to extend SearchStrategy with yet another bloody method (it has a default, fortunately), which allows it to test whether one of two templates should be consider strictly simpler than the other. This is a strict partial order, so x is not strictly simpler than x and it needn’t be the case that for any x and y one of the two is simpler. In general this is intended as a heuristic, so fast is better than high accuracy.

For the particular case of integers the rule is that every positive number is simpler than every negative number, and otherwise a number is simpler if it’s closer to zero.

We can now use this to implement a cloning strategy which always makes progress towards a simpler list (or produces nothing):

  1. Pick a random element of the list. Call this the original.
  2. Find the indices of every element in the list for which the original is strictly simpler.
  3. If that set is empty, nothing to do here.
  4. Otherwise pick a random subset of those indices (the current choice is that we pick an integer uniformly at random between 1 and the size of the list and then pick that many elements. This is not a terribly scientifically chosen approach but seems to work well).
  5. Replace the element in each index we selected with a clone of the original

Empirically this seems to do a very good job of solving the particular example it was designed to solve and does not harm the remaining cases too much (I also have an example which deliberately throws away duplicates. Believe it or not this sped that example up).

It’s unclear how useful this is in real practical tests, but it at the very least shouldn’t hurt and appears it would often help. The list generation code is fairly fundamental in that most other collection generation code (and later on, stateful testing) is built on it, so it’s worth putting in this sort of effort.

This entry was posted in Hypothesis, Python, Uncategorized on by .

How to make other people seem like humans

This is a technique that is by now so ingrained in how I think about things that it’s sometimes hard for me to remember that not only do normal people not constantly do this, it took me most of 30 years to figure out how to do it.

Do you frequently catch yourself treating your political enemies as if they are basically bogeymen who want to eat kittens? Do you frequently find yourself saying “I literally can’t imagine how anyone could think this way?”

You may be suffering from demonization: The tendency to believe that people who disagree with you are inhuman and fundamentally incomprehensible monsters.

You can keep doing this if you want, but personally I don’t recommend it. It’s both not a useful model of the world and given what a large proportion of the world probably disagrees with you, I imagine it’s quite stressful going around thinking they’re all fundamentally stupid and/or evil and are barely being restrained by society from chowing down to a nice bowl of kitten pops.

So if you find yourself unable to comprehend how someone could possibly hold a position, consider the following technique:

Take a set of things you care about. These can either be things you value, things you fear, or any mixture of the two. Now exaggerate some of them and downplay others.

I find moral foundations theory to be often useful here (I don’t know enough experimental moral philosophy to comment on its truth, but that’s not actually a required feature for this). For example, in order to understand conservative thinking I dial down care/harm a bit and dial up the other five axes.

Moral qualities are not the only dials you have to twiddle with. Trust of a particular group is often a good one too. e.g. anti-vaxxers become much more comprehensible when you consider that there are more than a few incidences in the 20th century of “We’re totes vaccinating you honest” medical experiments, and vaccination programs have not proven without ulterior motives in the 21st century either. It’s not hard to imagine distrusting the people who tell you that vaccines are OK, and dialling up the fear of harming your kids (Sanctity/degradation helps here too).

Generally speaking I rarely find a position so alien that I could not imagine myself holding it if my priorities were very different. Sometimes I have to really completely distort my priorities (I can just about stretch to understanding people who are against late term abortion, but people who are against early term abortion I basically have to start saying “Well if I believed this entirely wrong thing…”), but even then most positions are usually reachable.

It’s worth noting that the purpose of these mental gymnastics is neither to provide an accurate model of peoples’ beliefs, nor to come up with a reason that they’re OK. The fact that I have a somewhat better understanding of conservative morality than I used to does not make me significantly more inclined to be conservative, and the fact that I can somewhat understand the position of anti-vaxxers does not make any less inclined to think that they’re child murdering scum who should sent to jail (it turns out that even once you’ve thought of someone as entirely human with a coherent set of motivations you can still passionately hate them).

The purpose is to give you a mental picture you can work with, and start treating people as individuals to be engaged with, and if necessary dealt with, without treating them as caricatures. It’s a useful working principle for getting things done, not for acquiring a perfect understanding of someone’s motivations. Once you have engaged with them, you will probably find you acquire a more nuanced view of their actual motivations.

It’s also helpful for making me feel better about the world. I don’t know about you, but I find it nice to know that it’s not actually full of moustache twirling villains who are basically out to cause harm, but instead people with coherent sets of motivations that are different from your own.

Obviously you should feel under no compulsion to actually do this. This is intended as a useful technique, not a moral obligation. If you don’t feel comfortable or able to do this, I’m pretty sure I can understand your position.

This entry was posted in Uncategorized on by .