There’s no single error rate

This post is going to explain a couple of things that I often see people getting confused about. If you’ve got any background in statistics, decision theory, or similar it may be quite obvious to you – this post is for everyone else.

When people make decisions they often make the wrong ones. Hopefully you are not shocked by this information.

In an ideal world, this would not be the case, and we would always make perfect decisions. Unfortunately that’s fantasy land, and the reason is that in reality we need to make decisions based on imperfect information and a finite budget. It’s no good trying to decide on the perfect dinner if it takes you a year to do so. That way lies Chidi.

So what people often do is try to minimize the rate of errors in the time and resource budget available to them. This seems like a reasonable thing to do. Unfortunately it’s wrong.

The problem is that there is not a single rate of errors, there are in fact multiple different types of error that matter: One for each possible outcome of the decision you made.

I’m going to focus on a particular type of decision: You are given a thing of some sort (a car, a person, an idea), and you have to either accept it (decide to use it in some sense – sell it, hire it, implement it) or reject it (throw it away and no longer care about it). There are many other types of decisions, and much of what I’m going to say will apply to other types, but this is a particularly useful simple model of decision making to consider.

Let’s start with a concrete example: Suppose you are part of the quality assurance team at a car manufacturer. Your job is to certify the finished cars as safe or not (please note that I know nothing about car manufacture and this example is entirely an illustrative just so story).

Each time a car is presented to you, you can make two types of error:

  • You can accept an unsafe car as safe
  • You can reject a safe car as unsafe

Both of these errors are bad, but they are not the same sort of thing. Rejecting a safe car is expensive, but passing an unsafe car is a potentially fatal error (and also very expensive if you care more about that sort of thing).

Although you cannot ensure you never make any errors, there are two strategies you can adapt that will let you reduce one of these error rates to zero:

  • To ensure that you never pass an unsafe care, never accept any cars.
  • To ensure that you never reject a safe care, accept every car.

That is, in order to reduce one error rate to 0% you have to make the other error rate 100%: When you reject every unsafe care, you also reject every safe care. When you accept every safe car, you also accept every unsafe car.

For quality checking cars, obviously both of these are bad strategies that you should not try to adopt. However there are absolutely cases of decision making where these are reasonable strategies for reducing the error rate, they just tend not to quite match the accept/reject model. In some cases if the cost of quality checking is high and the cost or rate of bad candidates is low, it might be worth accepting everything. If you’re rejecting everything, you might as well not run the process in the first place.

One way is to assign a cost to each type of error and optimise for the cost given your resource constraints. Another way is to decide an acceptable error rate (e.g. accept no more than one in ten thousand unsafe cars) and then optimise for the other error rate under that constraint (try to accept as many safe cars as you can while maintaining that other error rate). There are a lot of mathematical considerations here, but I recommend not worrying about them. Most decisions you make you’re not going to have fine enough control over the rates the worry too much about the details of the numbers.

However, there’s a decent rule of thumb: Unless you are radically changing your process, any change that makes one of these error rates go down will make the other one go up. You can increase the stringency of your quality assurance and sell fewer unsafe cars, but this will probably cause you to reject more cars that would have been fine. You can be less nitpicky about some aspects of your quality assurance, which will mean you reject fewer safe cars, but this will cause you to sell more unsafe ones.

One specific problem that tends to come up in this model is that often you don’t know what one of the error rates is: If you pass an unsafe car, you’ll find out about it when it breaks down. If you reject a safe car it might just be scrapped. You’ll never know if it would have been fine on the road because nobody is every going to be driving it.

This is where it is particularly important to bear in mind that there is no single error rate, because although there are still two error rates, now you only see one, and it’s tempting to assume that the the other one is the same when in fact the opposite is likely to be true due to our rule of thumb: If the error rate you see is low, the error rate you don’t see is high.

This matters for two reasons. One I’ve written about before is that if the thing you’re accepting or rejecting is a person, the rate at which you reject good candidates is a measure of how unfair you are being to those candidates, and additionally can be systemically biased against some people. I won’t go into that in detail, I recommend the previous article if you want to read more about that.

The other reason is that bad rejections, e.g. rejecting perfectly cars, adds to your cost. If you are rejecting 50% of safe cars then each safe car is costing you twice as much as it needs to produce. This is expensive! Even rejecting 10% of safe cars is an 11% increase in your manufacturing cost. It may still be worth that cost, but if you don’t know what that cost is how can you tell?

So now that I’ve explained the problem, I’d like to finish with two small pieces of advice.

  1. Whenever you have a decision making process, consider what the different types of errors you can make are.
  2. If some of those error rates are currently hidden from you, try to fix that and measure them. You don’t necessarily need to change them, but until you know what they are, you can’t know if you do.

PS. The typical language used for talking about this is false positives and false negatives, but I don’t like that terminology because I find people tend to get them mixed up. Ideally use terms specific to your problems, as I have in this post.

This entry was posted in Uncategorized on by .