David R. MacIver's Blog: The false proxies of mirror images

The false proxies of mirror images

25 September 2013

A while back Tim Chevalier wrote a post called “Hiring based on hobbies: effective or exclusive?” that I seem to feel the need to keep signal boosting. It’s about the problems with using hobbies as a factor when hiring people. I recommend reading the whole thing, but allow me to quote a relevant section:

[...] if you decide someone isn’t worth hiring because they don’t have “interesting” hobbies, what you’re really saying is that people who didn’t come from an affluent background aren’t interesting. That people with child care or home responsibilities aren’t interesting. That disabled people aren’t interesting. That people who have depression or anxiety aren’t interesting. That people who might spend their free time on political activism that they don’t feel comfortable mentioning to a stranger with power over them aren’t interesting. That people who, for another reason, just don’t have access to hacker spaces and don’t fit into hobbyist subcultures aren’t interesting.

Essentially the point is that hiring based on hobbies selects for people who are from a similar background to you.

You see, the problem is that the answer to the question of whether this is effective or exclusive is that it’s both. Hobbies are in many ways a really good signal of an active mind which will engage well with the job.

The problem is that they are also a signal that the person has the time and energy to pursue those hobbies.

Amongst people who have the lack of commitments and routinely have the mental energy for it, it may be that there is a very strong correlation between hobbies and competence (I’d actually place bets that it’s significantly less strong than we like to believe it is, but I have no data so I’m not going to speculate too wildly on that front. Lets assume for the sake of the argument that popular wisdom is correct here).

The problem is that hobbies are a form of false proxy. We’re unable to perform the test that would really allow us to determine someone’s competence (that is to say: Hire them and work closely with them for a couple years in a variety of different scenarios), so instead we pick something which we can easily measure and appears to be a good signal for it.

And how did we learn that it was a good signal?

Well, by looking around us. Look at the people we know that are good. If we’re feeling very virtuous we can look at the people we know are bad and actually compare differences rather than just falling into the “good people do X. Therefore X is good” fallacy.

The problem is that you’re looking at a very small population, and when you do this you’re necessarily massively over-fitting for the data. When you create tests based on your observation of your current in-group you end up creating tests that work very well for people who look like the data you’re modelling and at best perform more noisily for people outside it, but more likely actively penalise them.

Why? Because this is where privilege comes in. (Note: Privilege in this context is a technical term. If you feel the need to comment something along the lines of “How dare you call me privileged? I worked hard to get where I am!” etc just... don’t. Please).

The problem is that the advantages you have are mostly ones you don’t see. Because most of society’s advantages don’t come in terms of people giving you a leg up (certainly some do, but we tend to be more aware of those), they come in the form of things you don’t have to worry about. You may not have to worry about the constraints of chronic illness, of dependants, of currently being in a position where money is tight. It’s hard to notice the absence of something you’ve never experienced, and consequently you often don’t notice these group advantages. This is especially true because even if some people experience it at the individual level, as a group we’re a fairly privileged lot and so most of our group behaviours will lack these constraints.

There’s another way these false proxies can creep in. There have been a bunch of discussions about the myth of the 10x engineer recently. I also had some interesting conversations about various biotruthy beliefs about programming on Twitter (mostly with tef and janepipistrelle I think). Underlying both of these is a common question: What do we mean by programming ability?

Well, it’s obvious of course. Programming ability is being good at the things that I’m good at. Those things I’m not good at? Yeah I suppose some of them are kinda important, but not nearly as much.

This is a less egotistical viewpoint than you might think it is. Certainly some people believe this because they’re full of themselves, but it’s entirely possible to accidentally finding yourself believing this with the best intentions in the world.

How?

Well, what are the things you work at getting better at?

Right. The things you think are important. So it’s not just that people think things they are good at are important. It’s also that people work to get good at the things they think are important.

So what do you do when you want to decide how well someone will contribute to a team? You look at how good they are of course.

That is... how much they’re good at the things you’re also good at.

Or how much they look like you.

Oh, you’ll still choose people with complementary skills. I don’t think many of us fall into this trap so overtly as to only hire people with the exact same skill set as us. But step up a level, there are the meta qualities. Things like mathematical ability, passion about technology, reliability, being able to think through problems quickly, ability to communicate well and confidently, etc. Certainly these are some of the things I value. Coincidentally they’re things I also think I do quite well in. Funny that, huh?

These don’t necessarily contribute to a group diversity problem. Some of them do - it’s easier to talk confidently when you’re not socially punished for doing so, it’s easier to be passionate when you’ve got the energy to spare - but even when there’s little between-group variation in them (e.g. mathematical ability) they contribute to a personality monoculture. We don’t all look the same, we also all think rather more similarly than sheer chance would suggest.

Ultimately what we’re selecting for when hiring isn’t actually about the individual we’re hiring. What we really want to know is if the addition of this person to the team will make us better in ways we need to be better. This doesn’t necessarily mean they have “cultural fit” or that their abilities align well with the team - it might mean that they have a slow, methodical approach that will provide the necessary damper on the team’s mad rush to ship regardless of how broken. It might mean that they provide a calm, mediating voice. It might simply mean that they’re good at programming in ways that we wouldn’t expect because they’re different from our own. The point is that we don’t actually just need to hire people who are like us, we should hire people who augment us. People from across the spectrum who will make us better in a myriad different ways.

But we can’t test for that, because we don’t know how. So instead we invent these tests which we think provide good proxies for it.

But many of these tests are false proxies which are really testing how similar they are to us.

And then we act surprised when our whole team looks like us, and we claim that well it’s just what the tests show all the best candidates looked like and we have to accept the reality.

What a lovely meritocracy we work in.

Comments

Michael Chermside on 2013-10-03 15:44:28:

The problem that I have with this is that the same argument can be used to invalidate just about any signal other than having people do the exact job that you intend to hire them for.

Should I hire my Java programmers on the basis of how well people answer questions about Java programming? Probably not, because the skill of answering questions in an interview is pretty different than the skill of actually writing and debugging code.

Should I hire on the basis of university degrees an professional certifications? Probably not, because we have evidence[1] that GPA and certifications don’t correlate well with actual job performance.

Should I hire on the basis of strong recommendations from existing employees? Probably not, that discriminates in favor of those who happen to be friends with existing employees, and certainly exacerbates existing industry bias against women and minorities.

Should I take on potential hires as unpaid “interns” for a time to evaluate their performance before paying them? No, because that discriminates against those too poor to support themselves on savings while working “for free”. (It’s also illegal in many jurisdictions.)

Taken to the logical extreme, we should post job descriptions then hire willing people at random, firing them later if they aren’t doing as well as we hoped. But I hope everyone understands how badly that would work out.

I think that the best course between the Scylla of discriminatory hiring and the Charybdis of poor hiring is to pay attention to signals like hobbies that are likely to be useful but may be discriminatory via secondary effects, but to pay attention to them with an awareness that it may be biasing us, and an effort to consider that.

[1] - there are probably better sources, but the first of my hiring-related bookmarks I came across was http://blog.alinelerner.com/lessons-from-a-years-worth-of-hiring-data/

david on 2013-10-05 08:40:43:

The problem that I have with this is that the same argument can be used to invalidate just about any signal other than having people do the exact job that you intend to hire them for.

Correct! Interview tests should be as close to the actual job you want them to do as you can feasibly make them, and you should be making the decision based almost entirely on those skills. (I will admit that I regard “Can we work with this person or will we all want to kill eachother?” as a relevant job skill I test for though).

I don’t regard any of these specific questions as especially puzzling.

Should I hire my Java programmers on the basis of how well people answer questions about Java programming? Probably not, because the skill of answering questions in an interview is pretty different than the skill of actually writing and debugging code.

My solution to this is take-home coding tests which let them solve an actual problem in a couple hours of their own time. I do indeed not quiz people about (language) programming in the interview because it’s tantamount to useless brainteaser questions which also don’t work.

Should I hire on the basis of university degrees an professional certifications? Probably not, because we have evidence[1] that GPA and certifications don’t correlate well with actual job performance.

This is only even relevant for hiring people who have no experience. For that case? I dunno. I think I’d rather screen people based on some email questions to judge rough ability than use background for it. I’m not very good at hiring people new to the industry though, so I don’t really know.

Should I hire on the basis of strong recommendations from existing employees? Probably not, that discriminates in favor of those who happen to be friends with existing employees, and certainly exacerbates existing industry bias against women and minorities.

Probably not? Absolutely not! Friends of employees go through exactly the same interview process as everyone else.

Should I take on potential hires as unpaid “interns” for a time to evaluate their performance before paying them? No, because that discriminates against those too poor to support themselves on savings while working “for free”. (It’s also illegal in many jurisdictions.)

Um. No, of course you shouldn’t do this, because you’re presumably not an awful person.

Taken to the logical extreme, we should post job descriptions then hire willing people at random, firing them later if they aren’t doing as well as we hoped. But I hope everyone understands how badly that would work out.

Naturally I do have some thoughts about how to use randomisation to improve this problem, but I agree that hiring purely at random is a strawman that no-one is proposing...

I think that the best course between the Scylla of discriminatory hiring and the Charybdis of poor hiring is to pay attention to signals like hobbies that are likely to be useful but may be discriminatory via secondary effects, but to pay attention to them with an awareness that it may be biasing us, and an effort to consider that.

Whileas I think you got it right the first time: We should test people whether they’re actually good at the job and hire them on the basis of that.

The downside of making accurate inferences about people | David R. MacIver on 2015-03-09 16:41:43:

[…] care, etc, and open source is not exactly an inclusive environment). This is somewhat related to what I’ve written about false proxies previously, but is more insidious: It’s almost impossible to come up with metrics that are completely […]