Author Archives: david

Seeking recommendations for better sleep data

OK. Lets start with the problem.

The ultimate problem is that I often/usually wake up feeling like shit. I am in the process of attempting to determine whether this is caffeine related by removing caffeine from my life, but I would describe myself as something like 80% confident that it is not. I don’t believe I have sleep apnea, but checking that is also something I will be trying to do at some point.

I have been using a Jawbone UP for sleep tracking. I am considering replacing it with a Ouija board in order to get more useful and relevant information. It appears to be categorically unable to detect whether I am actually asleep, let alone what sort of sleep I am currently experiencing (though I have been getting some benefit out of the sleep tracking functionality).

However, it has highlighted one interesting thing: I noticed this morning that my heart rate spiked to > 100bpm twice last night. My “normal” heart rate is somwhere in the region of 55bpm, so that’s quite a spike.

And I think I’ve seen that before, just based on the shape of the graph, though it’s not consciously registered.

So why don’t I look at the data?

Well, I’d love to! Except Jawbone throw away all the detailed heart rate data that’s more than 24 hours old, and don’t let you access the detailed data even from the last 24 hours. I’d be fine with scripting a data export and just dumping it somewhere, but it’s completely impossible in a classic internet of things “You thought that just because you own the device you can actually make use of its functionality? Sucker” backstab.

So that’s the proximate problem: I would like to answer the question “Does my heart rate routinely spike during the night and is that indicative of something?”

The obvious way of answering this is using some sort of continuous recording of heart rate data which has the controversial and exciting feature of my actually being able to access the data.

I only really need this while I am asleep, but it would be nice to have it while I’m out and about too.

So, here is approximately what I am looking for:

  1. A wearable heart rate monitor with accurate data. I am perfectly happy for this to be a chest strap based one rather than a watch.
  2. Which I can get a complete data export from. Ideally I would be able to do this for historic data without needing a nearby bluetooth capable device to send the data to live.
  3. That isn’t ruinously expensive (definitely not > £100. Ideally more in the £50 region).

I currently have three contenders:

  1. A fitbit. They provide full historical heart rate data and an API you can get it from. I am mostly not just going straight for this one because I’ve heard fairly bad things about the accuracy of the fitbit’s heart rate monitoring.
  2. A Zephyr. People seem to mostly be recomming the ruinously expensive Bioharness, but they also have the merely slightly pricey HxM. I do not believe I can get historic data out of them only live. These are mostly recommended because they are accurate and have a good bluetooth API.
  3. Get a cheap chest strap monitor that speaks the standard heart rate service bluetooth specification (there are some really cheap ones), and try out the various android apps that speak it, then later if that works well, write a small Python script to just dump it to a database and run it on a raspberry pi or something next to my bed.

I’m currently most tempted to try the third option first despite it being the “worst” in many regards (requiring the most manual effort on my part). Heart rate service is pretty standard, as far as I can tell, so I can experiment with a heart rate monitor that costs ~£15 and upgrade if the data looks promising (e.g. the Polar heart rate monitors are supposedly quite good and speak heart rate service).

I am however not very satisfied by any of these options, and am open to general advice, recommendations, etc on any of the above, on any point from the proximate to the ultimate problem. So please share. Comments are open on this post, and you’re also welcome to tweet or email me.

This entry was posted in Uncategorized on by .

First Past The Post is not the problem, districts are

It will come as no surprise to anyone that I am thoroughly against the systems used for electing representatives in both the UK and the US.

What may come as a surprise is that I’m pretty much indifferent to the fact that they both use first past the post, or simple plurality voting.

Don’t get me wrong. First past the post (FPTP from now on) is a rubbish voting system. It has essentially no redeeming values that are not also shared by approval voting, which is in all ways a superior system to it (I’m not massively in favour of approval voting, but it’s unambiguously better than FPTP). The point is not that FPTP is good, it’s that in this context it is irrelevant.

The reason it is irrelevant is that almost any choice of voting system in its place would produce just as bad or worse results, because the most major failure is in how we are applying voting, not the voting we’re doing.

To see this, consider the following scenario:

  1. We’ve got some geographical districts and each one gets exactly one representative.
  2. The representatives are divided into two parties. Call them the Purple and Green parties.
  3. The purple party has 50.1% of the populace who think that Purple is amazing and Green is literally the worst. The other 49.9% think the opposite.
  4. This division is uniformly spread over the entire country with little to no local variation

What happens? Well, you get an entirely purple house of representatives! Because in each district you have a strict majority of people who prefer Purple. Green gets exactly zero representatives despite the fact that 49.9% of the populace are strongly in favour of Green.

This is manifestly ridiculous, and the resulting government cannot be said to have any significant democratic mandate over an equally ridiculous 100% Green party. I’ll leave it up to you to decide how many more seats they should have than Green, but I’ll take it as read that getting all the seats is not an acceptable answer.

And getting this scenario required almost nothing about first past the post. If you replaced it with approval voting, alternative vote, range voting, whatever, the result would still be the same, because it boiled down to a simple majority vote and the majority in each district genuinely preferred Purple.

The choice of voting system influences what sort of distortions you can get. As your voting systems get better the above tends to be the only sort of distortion you get, but it will always remain possible, but what you will always get with districts is that you can take whatever single result you’d get if you ran the vote over the whole populace and just give them 100% of the seats (note: There are some technical conditions required for this to be strictly true, but it is in practice true with basically everything).

In practice the distortions are usually not this extreme because most of the time the distribution is not uniform. But that doesn’t make the result more representative of the populace, it just makes it a consequence of the vagaries of where people happen to live. You then get Gerrymandering happening, which is essentially the art of deliberately creating these distortions in a way that furthers your political aims.

The only way to avoid this problem altogether and also have districts is to have a system where some districts can genuinely have a majority preference for a candidate but that candidate still doesn’t get elected. This isn’t as unreasonable as it sounds. Two examples of reasonable ways to do this are random ballot and biproportional apportionment. However both of these are niche and you’re unlikely to get them.

Amongst systems with mainstream support, you really need to do away with districting in part or in whole. You need to move to a full proportional representation system across the whole country, or across larger multi-member districts.

But one thing you can’t do is just tinker with the system that you use in each district if you want to make a difference. Changing the voting system you use within districts is just rearranging deck chairs on the titanic, and continuing to focus on first past the post as a problem is just going to get you another vote on the optimal deck chair rearrangement at best rather than getting people off the sinking ship.

This entry was posted in voting on by .

Contributors do not save time

(This is based off a small tweet storm from yesterday).

There’s this idea that what open source projects need to become sustainable is contributors – either from people working at companies who use the project, or from individuals.

It is entirely wrong.

First off: Contributors are great. I love contributors, and I am extremely grateful to anyone who has ever contributed to Hypothesis or any of my other open source projects. This post is not intended to discourage anyone from contributing.

But contributors are great because they increase capabilities, not because they decrease the effort required. Each contributor brings fresh eyes and experience to the project – they’ve seen something you haven’t, or know something you don’t.

Generally speaking a contribution is work you weren’t going to do. It might be work you were going to do later. If you’re really unlucky it’s work you’re currently in the process of doing. Often it’s work that you never wanted to do.

So regardless of what the nature of the contribution, it creates a sense of obligation to do more work: You have to deal with the contributor in order to support some work you weren’t going to do.

Often these dealings are pleasant. Many contributions are good, and most contributors are good. However it’s very rare that contributions are perfect unless they are also trivial. The vast majority of contributions that I can just say “Thanks!” and click merge on are  things that fix typos. Most of the rest are ones that just fix a single bug. The rest need more than the couple of minutes work (not zero work, mind you) that it took to determine that it was such a contribution.

That work can take a variety of forms: You can click merge anyway and fix it yourself, you can click merge anyway and just deal with the consequences forever (I don’t recommend this one), you can talk the contributor through the process of fixing it themselves, or you can reject the contribution as not really something you want to do.

All of these are work. They’re sometimes a lot of work, and sometimes quite emotionally draining work. Telling someone no is hard. Teaching someone enough of the idiosyncracies of your project to help them contribute is also hard. Code review is hard.

And remember, all of this is at the bare minimum work on something that you weren’t previously going to do just yet, and may be work on something that you were never going to do.

Again, this is not a complaint. I am happy to put in that work, and I am happy to welcome new contributors.

But it is a description of the reality of the situation: Trying to fix the problems of unpaid labour in open source by adding contributors will never work, because it only creates more unpaid labour.

This entry was posted in programming on by .

Against Virtue Environmentalism

I came up with the term “Virtue Environmentalism” recently and I think it’s a good one and will probably be using it more often.

This came up when talking to a friend about a frustrating experience he’d had. Afterwards he vented at me about it for a bit and we had a good conversation on the subject.

The friend in question cares a lot about the environment. He’s mostly vegan and donates a lot of money  to a variety of environmental charities and generally spends a fair bit of time stressing out about global warming.

But he also drives. Like, a lot. And I don’t mean a Tesla (I don’t know enough about cars to tell you about fuel efficiency, but it’s a conventional engine). Both short distance and long road trips. There’s no physical reason he has to drive – he’s in tolerably good shape and could definitely cycle a lot of the places he drives to if he wanted, but he really likes driving.

These aren’t inconsistent positions. I don’t think I was the one who convinced him of this, but he’s basically on board with the idea of donating instead of making personal interventions. He’s decided quite reasonably that his life is significantly better for all this driving that he’s willing to make the trade off, and he donates more than enough to be massively carbon negative despite it even without the veganism.

But someone he met at a party recently really took issue with that, basically calling him a hypocrite. I’m not sure how the subject came up, but it got quite heated.

Over the course of the conversation it emerged that the person in question was not vegetarian and did not donate anything to charity, but was very conscientious about taking public transport everywhere they couldn’t cycle, turning off all the lights, recycling everything, doing home composting, etc.

One of these people is making a big environmental difference. The other one is giving the person who is giving a big environmental difference a hard time for not making a big enough difference.

(Note: This account has been somewhat fictionalized to protect the guilty)

I’m going to start describing this behaviour as virtue environmentalism.

The term comes from ethical theory. Approximately, we have consequentialist ethics and virtue ethics (it’s more complicated than that, but that’s the relevant subset here).

Consequentialist ethics says that ethical behaviour comes from acts which produce good outcomes, virtue ethics says that ethical behaviour comes from acts which exhibit virtues.

Similarly, consequentialist environmentalism says that environmental behaviour comes from acts which produce environmentally good consequences, while virtue environmentalism comes from acts which demonstrate environmentally friendly behaviour.

So, donating money to charity is consequentially good but mostly not a virtue – sure, you might as well do it, but it’s not real environmentalism.

My biases are clearly showing here. I largely subscribe to consequentialist ethics, but think virtue ethics has its place. There are good arguments that virtue ethics produces better consequential outcomes in many cases, and also that it produces better adjusted people. I’m not sure I buy these arguments, but it’s a valid life choice.

But virtue environmentalism is mostly bullshit.

Atmospheric carbon and other greenhouse gasses are amongst the most fungible types of harm out there. If I pump 100 tonnes of carbon into the atmosphere (a very high footprint) and extract 110 from it into some sort of long term storage (e.g. donating to prevent deforestation or plant new trees), then I’ve removed ten tonnes of carbon  from the atmosphere and as a result I’ve done more good than someone who has only pumped 5 tonnes of carbon into the atmosphere (a very low footprint) but hasn’t removed any.

Virtue environmentalism largely results in three things:

  1. Spending lots of time and effort on actions that make no practical difference at all but are highly visible.
  2. Feeling good enough about yourself that you don’t perform the actions that would actually help.
  3. Pissing off other people and making them care less about environmentalism overall.

The third is particularly important. If we want our descendants to not gently broil in the inevitable consequences of our own environmental waste, we need to get everyone to start to taking this seriously, and if you keep telling people that the only valid way to do environmental change is this sort of hair-shirt-wearing nonsense then the result will be that people do neither that nor the actually useful actions they would probably be quite happy to do.

If you want to do “environmentally friendly” things that don’t help much but make you feel better then sure, go for it. But stop expecting other people to do the same if you actually want to help the planet instead of just feeling good about yourself.

This entry was posted in Charitable giving, life on by .

Fuzzing through multi-objective shrinking

This is an experiment I’ve been running for the last couple of days (on and off and with a bunch of tinkering). It was intended as a prototype for using glassbox in a next-gen version of Hypothesis, but it’s proven interesting in its own right.

The idea is a specific automated way of using a test case reducer as a fuzzer using branch instrumentation (I’m using afl‘s instrumentation via the afl-showmap command): For every branch we ever observe the program taking, we try to construct a minimal example that hits that branch.

This will tend to produce interesting examples because you throw away a lot of extraneous detail that isn’t required to hit that branch. This is is particularly true of “tolerant” parsers which try to recover from a lot of errors.

How it works

The core idea is that we take a normal test case reducer and repeatedly apply it in a way that automatically turns it into a multi-objective reducer.

Say we have a function, label, which takes a binary string and returns a set of labels. Labels can be anything, but in the case of using AFL’s instrumentation they’re essentially branches the program can take along with a rough count of how many times that branch was taken (essentially because the branches are hashes so some different branches may end up equated with each other).

We replace the labelling function with a side-effectful version of it which returns the original results but also updates a table which maps each label to its “best” example we’ve seen so far. We consider a string better than another if it is either shorter or the same length but sorts lexicographically before the other (when viewed as a sequence of unsigned 8 bit integers).

We then repeatedly iterate the following process: Pick a label, take the best example for that label, and reduce that test case with respect to the condition that it has that label (updating the other test cases with every call).

There are various heuristics you could use to pick a label. The ones I’ve tried are:

  • Pick one of the labels which currently has the best example
  • Pick one of the labels which currently has the worst example
  • Pick any label, uniformly at random

Uniformly at random seems to work the best: The others have a tendency to get stuck. In the case of ‘best’ there are a lot of small labels and it ends up spending a lot of time trying to shrink them all, not doing very interesting work in the process. In the case of ‘worst’ it tends to spend all its time trying to shrink very hard to shrink labels and not getting very far. Uniformly at random seems to consistently make progress and find interesting results.


There are a couple of extra useful things you can do to speed up the process.

The first is that every time label is called you can mark the string as known. Then when shrinking instead of shrinking by whether the string has the label, you shrink by whether the string is either the current best for the label or is unknown.

This works because if the string were simpler than the current best and already known, then the current best would already have been updated to that string.

This is the equivalent of caching the predicate for delta-debugging, but you don’t want to cache the label function because its outputs are complex values (they’re sets of labels, so there are \(2^n\) distinct values even after interning) so end up consuming a lot of memory if you cache them.

The second is that you can often tell when a label is going to be useless to shrink and skip it. There are two things you can do here:

  • If when you tried to shrink a label it made no changes, you can mark that label as ‘finished’. If another shrink later improves the label, you remove the finished mark. A finished label cannot be shrunk further and thus can be skipped.
  • By maintaining a counter that is updated every time a label is improved or added to the table, you can tell if an attempt to shrink did anything at all by checking the counter before and after. If it did nothing, you can mark the string as finished. Any labels whose current best string is finished can also be skipped.

This also gives a way of terminating the fuzz when there’s nothing left that’s discoverable: If every label is skippable, you’re done.


This seems to work quite well in practice. Starting from a relatively large initial example, it quickly increases the number of labels by about an order of magnitude (some of these are just difference in branch counts, as AFL counts not just whether the branch was hit but also a bucketed version of how many times).

It also works pretty well at finding bugs. I’ve been running it for about 48 hours total (a bit longer by clock time but I turned it off in the middle while I made some changes) and it’s found two bugs in a widely deployed file format parser that’s been stable for a couple of years (I’ve sent both to the author of the parser, and don’t want to say which one it is until I’ve got permission to do so. I don’t think either of them are security issues but hard to say for sure). One of them is confirmed novel, and I haven’t heard back about the other one yet. It found the first one after about 10 hours, but that appears to have been mostly luck – rerunning with a bunch of changes that otherwise improved the process hasn’t refound that bug yet.

Anecdotally, almost all of the examples produced are not valid instances of the format (i.e. the tool under test exits with a non-zero status code). This isn’t very surprising: The expectation is that it will give you just enough of the file to get you to the point you’re looking for and then throw away the rest, which is unlikely to get you a valid file unless the branch you’re looking for is taken after the file validity has already been determined.

Comparison with AFL

In some ways this is obviously quite similar to AFL, given that it uses the same instrumentation, but in other ways it’s quite different. My suspicion is that overall this approach will work better as an approach to providing a corpus to AFL than it will just on its own, but it’s surprisingly competitive even without that.

In particular it seems like it hits an order of magnitude increase in the number of seen labels much faster than I would expect AFL to. I think it helps that it’s using AFL’s instrumentation much more extensively than AFL itself actually does – AFL just uses the instrumentation for novelty detection, whileas this approach actually treats each label as a target in its own right and thus can take much more advantage of it.

The AFL algorithm is roughly just to repeatedly iterates the following:

  1. Pick an example from the corpus and mutate it
  2. If the mutated example exhibits any labels that we’ve not previously seen, add it to the corpus

It’s not really interested in the labels beyond novelty detection, and it doesn’t ever prune the corpus down to smaller examples like this does.

This approach also has a “self-optimizing” character that AFL lacks: Because AFL never replaces examples in its corpus, if you start with large examples you’re stuck with large examples forever. Because of this, AFL encourages you to start with very small, fast examples. This approach on the other hand will take whatever large examples you throw at it and will generally turn them into small examples.

To be clear: This isn’t a better approach than AFL. Maybe if it were highly optimized, tuned and refined it would become at least as good, but even then they would both have strengths and weaknesses compared to each other. But it’s not obviously a worse approach either, and even now it has some interesting advantages over the approach that AFL takes.

This entry was posted in programming on by .