Author Archives: david

The DRMacIver survival kit

Up front warning: These are all affiliate links. I doubt I’ll get much out of that, but I have it set up and I was writing the post anyway so I figured I might as well.

I carry a lot of stuff around with me by default. Not in a “Well of course I carry a leatherman with me. Doesn’t everyone?” sense, I’ve just accumulated a bunch of things that solve specific problems. Some of these are personal peculiarities, but some of them have been life changingly useful, so it seemed like it might be useful to enumerate

Very few of these are essential. The essential leaving the house check is wallet-keys-phone (my phone is a Nexus 4. I had a 5 for a while, but it broke and the 4 is just better). The rest of these are more… the chance of my life being dramatically improved by having these available without advance planning is high.

So, yes, sorry. This is my survival kit, not a guide to surviving me. You’re on your own on that one.

Unambiguously useful

  1. Bag. I have a lot of stuff, and while the fashion gods have decided that my gender gets pockets, I don’t get that many pockets. I like satchels. A backpack would be more useful, but vanity. The one I’m currently using as my main bag is a slightly battered and cheaper variant of this one.
  2. Musician’s ear plugs. These are so good. If you have any sort of trouble in loud bars you need to get yourself a pair of these. They cut out general ambient noise while mostly not filtering conversational noise. You should get these ones. I’ve tried 3 or 4 brands at this point and these ones are amazing and the others I’ve tried vary from OK to useless. People spotting that I was wearing these and going “You have literally changed my life” when I explained them is the major thing that prompted me to write this post.
  3. Kindle. I read probably somewhere in the region of ten million words of fiction per year. This used to be a major problem – a lot of my travel weight allowance always went on books, because otherwise I’d run out mid travel! Then I found out I could have a book store in my pocket and, well. I do not leave home without my Kindle if I can possibly help it. Note. You will probably want a case for this. I have the standard hard leather case. I don’t know about the paperwhite, but my previous kindles were very non-durable without a case.
  4. External battery for phone and kindle. I use a RAVPower. The Kindle has a long enough battery life that this is mostly for the phone or for when I forget the Kindle. Phones running out of battery is awful. This battery is huge but lasts me probably 6 or 7 phone charges, so I only need to remember to charge it once a week.
  5. USB cables to go with the above. I’m using these ones, which were recommended to me as giving great charging time, though I find they don’t always stay in my phone very well (other devices such as kindle and battery are fine).
  6. A watch. I have a cheap Casio. It’s pretty great. It’s not the most attractive of objects (though I have the metal one so it’s inoffensive), so it fails the “watch as socially acceptable men’s jewelry” role, but it’s pretty great at telling the time, lasts forever on one battery charge, almost impossible to damage (I’ve put it through the wash) and means that I don’t get distracted by my phone when I check the time.
  7. A water bottle. I have a metal Siggs which is great. I like hydration, but hate disposable plastic. The 0.4 litre size seems pretty good to me – I rarely run out of water for casual use with this size, but it also doesn’t add too much to the weight.
  8. Nail Clippers (generic link. I don’t actually have specifically these ones). My nails splinter a lot, and it’s so annoying to not have these to hand and they take up basically no room.
  9. Ear Phones. Sometimes you just don’t want to listen to your surroundings. I like the Amazon Basics ones.
  10. A key ring to tie things together. I’m a big fan of these ones. They’re super cheap, fit a lot, and are generally very useful.

Ambiguously Useful

These pull their weight enough that I haven’t removed them, but I wouldn’t necessarily rush out and get one yourself.

  1. USB flash drive. I have this one, which is great and can just go on a key chain. It’s ambiguously useful because I don’t think I’ve used it since I put it there. I thought I would use this all the time, but it turns out I just don’t need a USB key.
  2. LED flashlight. I need to see in the dark sometimes, and the phone flashlight basically nukes the battery. Hence a flashlight. The one I have is fairly bulky. I have something that’s a slightly cheaper version of this. I’m considering downgrading to a tiny key ring one.
  3. Hip Flask. I carry this around because I have fairly specific tastes in drinks and am often in places that can’t satisfy them. (Basically I don’t really drink beer or wine much and am a spirits snob). So I thought carrying a hip flask of decent whiskey around would be a good idea. Problem is it turns out I hardly drink these days, and when I do it tends to be in places that will serve me good cocktails. So that’s not very useful after all.
  4. Utili-key. Turns out I do carry a leatherman-ish around with me all the time. I’ve never used it, and that’s not just because I’m bad at opening it.

Things I’m considering trialling

  1. Coin purse. I hate carrying change. I’m wondering if having one of these in my bag would help.
  2. Pill bottle. As previously mentioned, Ibuprofen is an essential part of this complete David. My main concern here is that I don’t feel great about carrying around an unlabelled bottle of pills, and having them in the packet isn’t that inconvenient.
  3. Travel bottles. Mainly for things like moisturiser which are really annoying to need and not have.


Do I recommend you follow suit and go out and buy all these things? Well, maybe. I think carrying a bag is a good idea in general (especially if you’re one of those freeloaders who relies on other people carrying a bag), and once you’ve got that having some of these is really helpful.

I don’t normally have comments open, but I’ve opened them for this post, so feel free to share any essentials you have. I promise not to edit my affiliate code into the link. :-)

This entry was posted in life on by .

Anyone want a speaker?

I’m in the process of doing a lot of speaking and putting together a lot of talks. This means I’m always up for new places to speak at. So if you’re looking for tech speakers at your meetup group, conference or company, read on.

The following are talk subjects I currently have ready to go (or could have ready to go on short notice):

  1. Various things on property-based testing in general and Hypothesis in particular. I’ve got two talks prepared for this: “Finding more bugs with less work” and “The plural of anecdote is not test suite”
  2. Gory details of how Conjecture works and why this is cool
  3. “Writing libraries is terrible”. A short rant about all the social and technical problems one runs into when writing open source libraries plus some things I think might help.
  4. “Your CI is the wrong shape”. A piece about designing your CI to fit with your developer workflow instead of spending all your time waiting on CI. Somewhat based on my empirically derived testing principles post.

I’ve done a large number of variations on the first one at this point. They’ve all gone very well, but I’m keen to try some of the others.

Also, I have plenty of other things I can speak on (if you’re at this blog you’ve probably noticed I have a few opinions to share) and haven’t turned into a talk yet, so if none of those quite fit feel free to get in touch anyway and I might have something for you.

I do have some (fairly reasonable) requirements:

  1. If you’re a meetup group or conference, you must have a code of conduct (which I will look at before agreeing).
  2. If you’re a paid conference, I require a free ticket if I’m speaking (I’m self-employed and on a budget until I manage to get my income variance way down from where it currently is, so this is particularly important, but I also think it’s just appropriate to not make speakers pay for tickets).
  3. If you’re somewhere that is not easily accessible from Cambridge UK (London is fine) I’ll probably need travel and accommodation expenses (see above. A London train fare is fine, but anything more than that starts to hurt).
  4. Half hour or longer speaking slots. I can do and have done shorter talks, but it’s just not worth it unless it’s at an event I’m going to be at anyway.
  5. If you’re a company then I’m still happy to do a free talk, but I’m going to want to sell you training and/or consulting services, so I’ll happily trade a talk for a meeting with e.g. someone who has access to training budget.

All that sound good? Great! Do get in touch.

This entry was posted in programming, Python on by .

Conjecture, parametrization and data distribution

Up front warning: This is a very inside baseball post, and I’m the only person who plays this particular variant of the game. This blog post is mostly a mix of notes to self to and sharing my working.

I’m in the process of trying to rewrite the Hypothesis backend to use the Conjecture approach.

At this point the thing I was originally worried was intractable – shrinking of data – is basically solved. Conjecture shrinks as well as or better than Hypothesis. There are a few quirks to still pay attention to – the shrinking can always be improved, and I’m still on the fence as to whether some of the work I have with explicit costing and output based shrink control is useful (I think it’s probably not), but basically I could ship what I have today for shrinking and it would be fine.

However I’m discovering another problem: The other major innovative area of Hypothesis is its parametrized approach to data generation. More generally, I’m finding that getting great quality initial data out of Conjecture is hard.

This manifests in two major ways:

  1. It can be difficult to get good data when you also have good shrinking because you want to try nasty distributions. e.g. just generating 8 bytes and converting it to an IEEE 754 binary float representation produces great shrinking, but a fairly sub-par distribution – e.g. the probability of generating NaN is 1 in 2048 (actually very slightly lower).
  2. The big important feature of Hypothesis’s parametrization is correlated output. e.g. you can’t feasibly generate a list of 100 positive integers by chance if you’re generating each element independently. Correlated output is good for finding bugs.

1 is relatively easily solved by letting data generators participate in the initial distribution: Instead of having the signature draw_bytes(self, n) you have the signature draw_bytes(self, n, distribution=uniform). So you can let the floating point generator specify an alternative distribution that is good at hitting special case floating point numbers without worrying about how it affects distributions. Then, you run the tests in two modes: The first where you’re building the data as you go and use the provided distributions, the second where you’re drawing from a pre-allocated block of data and ignore the distribution entirely.

This is a bit low-level unfortunately, but I think it’s mostly a very low level problem. I’m still hoping for a better solution. Watch this space.

For the second part… I think I can just steal Hypothesis’s solution to some degree. Instead of the current case where strategies expose a single function draw_value(self, data) they can now expose functions draw_parameter(self, data) and draw_value(self, data, parameter). A normal draw call then just does strategy.draw_value(data, strategy.draw_parameter(data)), but you can use alternate calls to induce correlation.

There are a couple problems with this:

  1. It significantly complicates the usage pattern: I think the parametrization is one of the bits of Hypothesis people who look at the internals least understand, and one of the selling points of Conjecture was “You just write functions”. On the other hand I’m increasingly not sold on “You just write functions” as a good thing: A lot of the value of Hypothesis is the strategies library, and having a slightly more structured data type there is quite useful. It’s still easy to go from a function from testdata to a value to a strategy, so this isn’t a major loss.
  2. It’s much less language agnostic. In statically typed languages you need some way to encode different strategies having different parameter types, ideally without this being exposed in the strategy (because then strategies don’t form a monad, or even an applicative). You can solve this problem a bit by making parameters an opaque identifier and keeping track of them in some sort of state dictionary on the strategy, but that’s a bit gross.
  3. Much more care with parameter design is needed than in Hypothesis because the parameter affects the shrinking. As long as shrinking of the parameter works sensibly this should be OK, but this can become much more complicated. An example of where this gets complicated later.
  4. I currently have no good ideas how parameters should work for flatmap, and only some bad ones. This isn’t a major problem because you can fall back to a slightly worse distribution but it’s annoying because Conjecture previously had the property that the monadic and applicative interfaces were equivalently good.

Here’s an example of where parametrization can be a bit tricky:

Suppose you have the strategy one_of(s1, …, sn) – that is, you have n strategies and you want to pick a random one and then draw from that.

One natural way to parametrize this is as follows: Pick a random non-empty subset of {1, .., n}. Those are the enabled alternatives. Now pick a parameter for each of these options. Drawing a value is then picking a random one of the enabled alternatives and feeding it its parameter.

There are a couple major problems with this, but the main one is that it shrinks terribly.

First off: The general approach to shrinking directions Hypothesis takes for alternation is that earlier branches are preserved. e.g. if I do integers() | text() we’ll prefer integers. If I do text() | integers() we’ll prefer text. This generally works quite well. Conjecture’s preference for things that consume less data slightly ruins this (e.g. The integer 1 will always be preferred to the string “antidisestablishmentarianism” regardless of the order), but not to an intolerable degree, and it would be nice to preserve this property.

More generally, we don’t want a bad initial parameter draw to screw things up for us. So for example if we have just(None) | something_really_complicated() and we happen to draw a parameter which only allows the second, but it turns out this value doesn’t matter at all, we really want to be able to simplify to None.

So what we need is a parameter that shrinks in a way that makes it more permissive. The way to do this is to:

  1. Draw n bits.
  2. Invert those n bits.
  3. If the result is zero, try again.
  4. Else, return a parameter that allows all set bits.

The reason for this is that the initially drawn n bits will shrink towards zero, so as you shrink, the parameter will have more set bits.

This then presents two further problems that need solving.

The next problem is that if we pick options through choice(enabled_parameters) then this will change as we enable more things. This may sometimes work, but in general will require difficult to manage simultaneous shrinks to work well. We want to be able to shrink the parameter and the elements independently if at all possible.

So what we do is rejection sampling: We generate a random number from one to n, then if that bit is set we accept it, if not we start again. If the number of set bits is very low this can be horrendously inefficient, but we can short-circuit that problem by using the control over the distribution of bytes suggested above!

The nice thing about doing it this way is that we can mark the intermediate draws as deletable, so they get discarded and if you pay no attention to the instrumentation behind the curtain it looks like our rejection sampling magically always draws the right thing on its first draw. We can then try bytewise shrinking of the parameter, which leads to a more permissive set of options (that could then later allow us to shrink this), and the previously chosen option remains stable.

This then leads to the final problem: If we draw all the parameters up front, adding in more bits will cause us to read more data because we’ll have. This is to draw parameters for them. This is forbidden: Conjecture requires shrinks to read no more data than the example you started from (for good reason – this both helps guarantee the termination of the shrink process and keeps you in areas where shrinking is fast).

The solution here is to generate parameters lazily. When you pick alternative i, you first check if you’ve already generated a parameter for it. If you have you use that, if not you generate a new one there and then. This keeps the number and location of generated parameters relatively stable.

In writing this, a natural generalization occurred to me. It’s a little weird, but it nicely solves this problem in a way that also generates to monadic bind:

  1. parameters are generated from data.new_parameter(). All this is in an integer counter.
  2. There is a function data.parameter_value(parameter, strategy) which does the same lazy calculation keyed off the parameter ID: If we already have a parameter value for this ID and strategy, use that. If we don’t, draw a new one and store that.
  3. Before drawing from it, all strategies are interned. That is, replaced with an equivalent strategy we’ve previously seen in this test run. This means that if you have something like booleans().flatmap(lambda b: lists(just(b))), both lists(just(False)) and lists(just(True)) will be replaced with stable strategies from a pool when drawing. This means that parameters get reused.

I think this might be a good idea. It’s actually a better API, because it becomes much harder to use the wrong parameter value, and there’s no worry about leaking values or state on strategy objects, because the life cycle is fairly sharply confined to that of the test. It doesn’t solve the problem with typing this well, but it solves the problem of using it incorrectly well enough that an unsafe cast is probably fine if you’re unable to do so.

Anyway, brain dump over. I’m not sure this made sense to anyone but me, but it helped me think through the problems quite a lot.

This entry was posted in Hypothesis on by .

Services that won’t buzz off

I’ve long maintained that one of the best things about Beeminder is that it doesn’t go away just because you can’t be bothered. You can’t ignore it, and you can’t just vaguely not feel like it. You can always actively decide not to use it, but it doesn’t get you off the hook for another week, so you can’t give in to temporary weakness.

I’ve recently figured out a usage pattern for this that is working out quite well for me that comes as a direct application of this idea: By roping them to Beeminder, you can give other services the same property.

Lets be concrete:

I love Todoist. I think it’s genuinely great software. It’s well designed, has a great Android app (it has an iOS app, I assume it’s also great), has a great API, and just generally seems like they’ve put a lot of effort into it. It’s in the category of software for which I don’t really need the premium features but I pay for them anyway to support the free version (Beeminder is also somewhat in this category, though I do make use of precisely one premium feature).

But… there’s a problem. I find TODO lists mildly aversive. I don’t have crippling TODO list dread or anything (I’m not being facetious here. That’s a genuine thing people experience), but they make me a bit stressed out so I will tend to default to ignoring them if I can. It’s totally fine when I actually get around to doing them, but I will put it off nearly indefinitely if I can.

Which is where Beeminder comes in! Through a careful application of If This Then That (the only service mentioned in this post that I don’t pay for, and that’s only because they won’t let me pay them. IFTTTT, if you’re reading this, please give me a premium option?), you can rope Todoist to Beeminder’s refusal to be ignored.

So this is what I’m currently doing:

  1. I have a Beeminder goal called todone. It is a do more goal which tracks the number of TODO items I complete. I am required to complete 8 per week. The goal is set to trim the safety buffer so I can’t build up more than 8 days (the fact that these numbers are both 8 is a coincidence. They’re “slightly more than one a day” and “slightly more than a week”) of backlog, although I started this at a short safety buffer so I haven’t quite reached that point yet.
  2. I have an IFTTT rule using the Todoist and Beeminder channels that enters an item into that goal every time I complete any task.

It’s important to note what this is not: This is not a goal about being a productivity machine. 8 TODO items a week is not a large amount, particularly because many of the TODO items are recurring tasks that I would do regularly anyway. I have recurring scheduled tasks for things like “change my pillow cases” (which I always put off for a couple days more than I should) or “shave my head” (which I always intend to do more often than I do. Though given how cold it is right now maybe not). This blog post alone is netting me two TODO items because I have a recurring blogging task (every 6 days) and am now entering draft blog ideas into their own Todoist project (I heartily recommend doing that by the way, in the general interests of writing more). It does also contain more significant tasks – contact a particular client, submit a talk to a particular conference, etc. But I get to choose the mix of task difficulty, so 8 tasks a week is not hard.

The purpose of this goal is twofold:

  1. Keep me using Todoist
  2. Do not increase the stress level of using Todoist. I have previously had a more elaborate Todoist system that was more “productivity machine” focused and that was stressful as all get out and made me hate both Todoist and Beeminder. Do not recommend.

And it seems to be working rather well for this. It turns Todoist into a regular feature of my life, and it makes an excellent piece to add to my Exobrain.

This was originally designed to help me get out of various aversive behaviours, and I think the jury is still out on whether it’s succeeded at this, but it seems to be helping a bit. Even if all it does is keep me using Todoist though I think it’s an unambiguous win and I heartily recommend the combination.

This entry was posted in Better living through subservience to the machine on by .

Free work is hard work

As has previously been established, I’m a big fan of paying people for labour and think you deserve to get what you pay for when it comes to software (yes, I do run a free operating system. And yes, I get what more or less precisely what I pay for).

But… honesty compels me to admit that paying for work is no panacea. If you compare open source software to similar commercial products, the commercial ones are usually really bad too.

They’re often bad in different ways, but they’re still bad. Clearly being paid for work is not sufficient to make good software.

I think I’ve figured out a piece of this puzzle recently: When working for free, the decision making process looks very different from paid work. The result is that you can be good at things that people aren’t good at when writing commercial software, but it’s also that you just do a lot more work that you would otherwise never have bothered with.

Consider the question: “Should I do this piece of work?”

Free answer: “Well… people seem to think it’s important, and it looks a bit shit that I haven’t done it. I guess I can find some free time to work on it.”

Paid answer: “Will I earn more money from the results of doing this work than the work costs me?”

The result is that in free work, a lot of things happen that if you put your hard nosed businessperson hat on and looked at objectively make absolutely no sense.

A concrete example: Hypothesis supports Windows. This isn’t very hard, but it does create an ongoing maintenance burden.

Personally, I care not one whit for Python on Windows. I am certain that near 100% of the usage of Hypothesis on Windows is by people who also don’t care about Windows but feel bad about not supporting it. I only implemented it because “real” libraries are cross platform and I felt bad about not running on Windows.

In a business setting, the sensible solution would be  to not do the work until someone asked for it. and then either quote them for the work or charge a high enough license fee that you can soak the cost of platform support.

So what to do about this?

Well, on the open source side I think it makes sense to start being a bit more “on demand” with this stuff. I probably shouldn’t have supported Windows until I knew someone cared, and then I should have invited the person who cared to contribute time or money to it. Note: I am probably not going to follow my own advice here because I am an annoying perfectionist who is hurting the grade curve.

On the business side, I would advise that you get better software if you do sometimes do a bunch of stuff that doesn’t make immediate business sense as it rounds everything off and provides an over-all higher user experience even if any individual decision doesn’t make much sense. But honestly almost everyone writing software who is likely to read this is off in startup and adtech la la land where none of your decisions make any financial sense anyway, so that’s not very useful advice either.

So perhaps this isn’t the most actionable observation in the world, but now that I’ve noticed it’s going on I’ll be keeping a watchful eye out to observe its mechanisms and consequences.

This entry was posted in life, programming, Python on by .