David R. MacIver's Blog: The Ethics of False Negatives in Interviewing

The Ethics of False Negatives in Interviewing

5 June 2019

When interviewing people, there is one significant ethical component to the decision making process that is rarely made explicit, and that is often done badly as a result. I’ve written about it previously (here, here, and here), but you don’t need to read those - this is a standalone piece that sums up my current thinking.

The problem is this: The natural shape of the problem of interviewing encourages you to hire people who are similar to yourself. As a result, the aggregate effect of seemingly innocuous and effective decision making processes is often to amplify structural inequalities in your industry.

It will take a while to unpack how this plays out, so bear with me.

I’ll start with a simplification and only consider the case where you’re hiring one person for one role. Many real interview processes don’t play out this way - you might hire multiple people, there might be some flexibility about roles, etc. The dynamic is clearer in the one person to one role case, so I’ll stick to that, but most of the same considerations apply in general even if the details change.

When you are running such an interview process, your goal is to sift through a lot of candidates and hire one who would be good for the role (The question of what “good in that role” means is very complicated and could easily take up an entire post on its own. I intend to ignore it entirely, as others are much better placed to talk about it than I am). Effectively, you see a series of people, and make a yes/no decision about hiring each of them, stopping when you’ve hired enough people.

When making this yes or no decision to hire someone, there are two ways it can go wrong: You can say yes when you should have said no (a bad hire, or false positive) or you can say no when you should have said yes (a bad rejection, or false negative).

The most important thing to understand about interviewing is that you are extremely concerned with the false positive rate, and only somewhat interested in the false negative rate. That is, you want to ensure that the people you hire would be good in that role, and you are less concerned with the question of whether the people you reject would have been bad in that role.

Why? Well, because the cost of a bad hire is almost always much higher than the cost of keeping on trying unless your standards are incredibly stringent. You pay an opportunity cost, but as long as there are a lot of candidates, the opportunity cost of rejecting any one good candidate is low. All you are really paying is the cost of interviewing, so as long as the base rate of good candidates is tolerably high and your false negative rate isn’t too high, it’s mostly OK.

Why does it work out this way? It’s because the cost of a false positive is high and visible, while the cost of a false negative is low and invisible.

Because you have no further contact with most rejected candidates, a false negative and a true negative are functionally indistinguishable - if you could tell they were a false negative they wouldn’t be a false negative! As a result, if you have a high false negative rate it won’t look like a high false negative rate, it will look like you’re getting a lot of bad candidates and the cost of interviewing is correspondingly high.

In contrast, if you hire a candidate, you now have a lot more information about them. You will find this out over the coming months and years, and will eventually become reasonably certain as to whether your hire was good or not. This means that false positives will almost always eventually become visible to you, but they will do so at great cost! They’ll have spent a significant time as dead weight (or active toxicity) in your organisation, and this opportunity cost is large: You were hiring because you needed someone to help you out, and you’ve now spent months or years not getting that benefit.

As a result, every false positive is both extremely costly and eventually discovered. This means that you have both a strong incentive to keep it low, and good feedback that allows you to do so.

As a result you can roughly think of “rational” interview design as about minimizing the false positive rate while keeping the cost of hiring to some reasonable level.

Before I go on I want to emphasise that this is very reasonable behaviour. Asking a company to ignore or substantially raise its false positive rate is not going to go down well, and is more likely to result in a kind of theater of signalling where they find more complicated or worse ways to get the same benefit.

However, it’s worth thinking a bit more about the structure of this process, and to try to shift it a bit, by adding new constraints and incentives.

Why? Lets turn to a fact that we have been ignoring so far: Candidates are people, with their own needs and motivations.

A candidate is similarly paying a cost for interviewing, but their interests in the error rates are reversed. The cost of false negative rate is almost entirely paid by the candidates, because it means that they are denied something they wanted (and deserved), and there is a high emotional impact to rejection. With interviewing, they also have a significantly higher opportunity cost, because there are fewer companies they can reasonably apply to than you have candidates applying (this is less true in other examples of this dynamic).

As a result, false negatives are not nearly as free as they looked. Instead they are what economists call an externality - something that looks free because someone else, in this case the candidate, is paying for it.

How much should this matter to you as an interviewer? Well, some, certainly. If you want to behave ethically as an interviewer and as a company you do need to at least consider the harm to the candidate, but anecdotally it seems to be a thing that people are vaguely aware of and consider to be an acceptable cost - after all, most interviewers have been interviewees, so they have some idea of what it’s like to be on the receiving end. So for most companies there is already at least a certain amount of respect for the candidate’s time (large, self-important companies, with long multiday interview processes notwithstanding). It’s a thing that could be considered more, certainly, but it’s not a huge moral crisis.

Unfortunately “respect for the candidate’s time” does not fully capture the cost of false negatives, because not all false negatives are created equal.

We now need to unpack another thing that we’ve been ignoring so far: Interviewing is not a magic black box that spits out an answer of good or bad, it’s a reasoning process based on interacting with the candidate.

In an ideal world you would have a perfect simulation of what it would be like to work with the candidate, and hire them if that simulation was positive, but in the actual interviewing process where you have a small amount of time you basically just ask them some questions, get them to perform a task, and make your best judgement based on the evidence.

Again, this is fine, there’s literally nothing else you can do, so you should do that, but you shouldn’t do it uncritically, and it is worth thinking in more detail about the specifics of the reasoning process you have.

The core problem is that other interviewers are likely to reason in similar ways to you. This means that individual candidates may experience a much higher false negative rate than average.

Take, for example, someone who finds interviews very stressful and thus underperforms in them compared to their actual job ability. They will experience a significantly higher false negative rate than average, and experience a correspondingly higher cost to interviewing¹.

When the variation is low, it’s tempting to not worry about this that much - so what if a candidate has to interview at twice as many places to get a job? That’s not your problem, and it doesn’t seem to be that big a deal.

Unfortunately there’s no reason to expect the variation to be low, and if some people find they are rejected vastly more often than average, those people are at a significant disadvantage in life.

When you participate in a system which significantly disadvantages people like this, you need to think long and hard about whether you are doing them an injustice. I think in this case we clearly are.

In the worst case you could imagine people who will never get hired even when they would be very good in the job. These people will experience a 100% false negative rate no matter how good your false negative rate is on average.

How might such groups arise? Well, racism for starters. In a society with pervasive and overt racism (e.g. apartheid South Africa, the USA during segregation) you might be entirely excluded from jobs simply because of your race. In a society where people pay lip service to not being racist, the numbers will look less extreme, but as long as there is significant prejudice against a group that group will find it harder to get hired.

There the corresponding ethical advice seems easy, right? Don’t be racist. Job done. Unfortunately, it’s not nearly so simple as that.

The problem is that you can get broadly similar effects without any active prejudice against your part. Suppose you had a test that gave you 100% accuracy on 80% of the population - that is, the test says you should hire the person if and only if you should actually hire the person - the remaining 20% of the population always failed the test. If you ignore the population effect, hiring based on this test looks very good. It has no false positives, a fairly low false negative rate, and permanently marginalizes a fifth of the population.

How plausible is it that such a test exists?

Well, that’s an extreme case, but when hiring for software developers I think a reasonable case can be made that looking for open source contributions is a decent approximation to it. It’s certainly not the case that all open source developers are good hires, but looking at someone’s open source code is a pretty good signal of whether they’re good at software development. However, this means that people who don’t contribute to open source get left out. If you use open source as a necessary signal, you’ll exclude a whole bunch of people, but even if you use it as a sufficient signal, it gives people who don’t contribute to open source a significant handicap, and open source contributions very disproportionately come from well off white men, so there’s a strong structural bias there.

This should be the point where I tell you the right thing to do here, but I honestly don’t know. I’d certainly say you shouldn’t require open source contributions, but I don’t actually think that it’s reasonable to say you should ignore them. Primarily because even if I told you that you’d ignore the advice, but also because that in itself would exclude a bunch of people who e.g. have no formal education and want to be able to use their open source contributions as signs of competence.

Even if you manage to rid your interview process of tests like this that are structurally prejudiced against a group, it will be hard to entirely remove biases. The problem is that fundamentally selecting for minimizing false positives gives an intrinsic advantage to people who you understand how to interview. These will disproportionately be people who are similar to you.

Suppose you are interviewing someone, and you want to get a sense of whether you’d like working with them. In order to do this you need to have a conversation. If they are someone who you share a lot of culture with, this is easy - you have things to talk about, you share a language (both in the sense that your first languages may be the same, but also just in the sense of shared cultural references, jokes, etc). It’s easy to talk to them.

In contrast, someone from a very different culture will take more work - you need to establish common ground and figure out a mode of conversation that works well for you. If you were working with this person you would get a long period of time to do that, but in an interview you have an extremely short time, and as a result the conversation will often flow less naturally than it might. As a result you are less able to tell whether you will actually get along well with this person when working with them, and this again gives the people who are similar to you an advantage.

This pattern repeats itself over and over again: If you are in familiar territory, you know how to predict accurately in that territory, so when you come to try to reduce false positives you will automatically select for the familiar - the unfamiliar is uncertain, and so your false positive rate goes up.

How do you get familiar with what working with someone from a particular group is like? Well, you work with them. Which you’re less likely to do if there’s some significant structural prejudice against them in the interview process. In this way, structural prejudices tend to reinforce themselves over time - we hire people who are like us, and this increasingly refines our models of which people are good to work with into ones that are based on that familiar set of people.

None of this is to deny that there aren’t myriad other structural prejudices in interviewing, this one is just important to highlight because of how insidious it is: It doesn’t operate through any negative belief about the group being disadvantaged, it acts solely through uncertainty and a reasonable set of of behaviour following incentives, and so even people who are genuinely committed to doing better can fall victim to it without noticing.

Naturally at this point I should tell you about the solution. Unfortunately, that’s mostly a hard pass from me, sorry. The general shape of the solution is “get better at interviewing”, and I’m not actually good enough at conducting an interview process to really offer advice on how to do so.

So, to the degree I have advice, it’s this: Most interview processes I’ve been a part of (on either side of the table) have been quite slapdash, and that’s only going to exacerbate this problem. Given the impact of the natural pressures of interviewing, this has to change. Interviewing is a hard problem, with huge social impacts, and it deserves to be treated as such.

As a result, I would like people to think much harder about designing their interview processes, and do some reading and learn how to do it properly.

Most of this has to be fixed at the company level, but if you have any good resources on e.g. reading, courses, etc. that people can take, please feel free to leave comments on the blog or let me know through some other medium.

Comments

foo on 2019-06-05 22:43:26:

“Interviewing” is a key word here: it is by design a source of many biases, and at the same time it is definitely NOT the end goal, merely a mean to an end: establishing if a candidate is likely to be a fit.

Now I think the good news is that there are other ways, in particular standardized tests. These tests in turn can be biased, but if they are to be used many times, it is worth examining them more thoroughly; and that’s easier to do than examining our own biases and thought processes.

Tests don’t have to cover only current technology: logic, language, perception of space or sounds, manual skills, certain personality traits, etc. can be evaluated as well. Some recruiting specialists have databases, which may be imperfect, but I suspect they also can also be improved (but I don’t know how!). A challenge could be to effectively mutualize such tests while minimizing the bias (and probably tests should be designed such that learning specific answers doesn’t help).

Interviews should still happen, because tests are probably not complete, but having more than a single signal should help.

david on 2019-06-05 23:09:37:

“Interviewing” is a pretty catch all term here - I definitely include e.g. giving people take home tests. I think however you’re underestimating how much standardised testing has the same problem.

Evan Jones on 2019-06-11 13:02:13:

I think there is a somewhat unfortunate typo: “Why? Well, because the cost of a bad hire is almost always much *lower* than the cost of keeping on trying unless your standards are incredibly stringent” I’m pretty sure you meant to say “higher”? Either that or I’m misunderstanding the costs here :)

david on 2019-06-11 19:04:12:

I’m confused about what version you’re reading, because as far as I can tell it does say “Well, because the cost of a bad hire is almost always much higher”. I did make a couple of mistakes of that form when it was initially published, but I think they were fixed days ago!

And worse, because of that interviews will be even more stressful! This can create significant feedback loops where the candidate gets stressed, so interviews are bad, so they get more stressed about interviews, etc.
↩︎