Category Archives: Hiring

Interviewing: Test, don’t sample

Do you ask for a code sample when interviewing someone?

Don’t. It’s a terrible idea. It creates stress and doesn’t give you any useful answers.

Seeing code they’ve written is obviously good and useful, but the way to do this is not to ask for a sample, it’s to set them a small task (something that shouldn’t take more than an hour or two) and ask them to code a solution to it.

Sure, for some people it will take more time, but for most this will be less stressful and for you it will be infinitely more useful.

Why it is less stressful:

It puts everyone on an equal footing. Some people can’t give you a recent coding sample because everything they’ve written recently is under NDA.
They are not trying to guess what you’re looking for because you’ve said what you are looking for. They don’t need to guess whether you’d prefer something cute and clever or boring but well tested. They don’t need to spend ages sorting through a bunch of code they’ve written trying to figure out what will best fit your subjective requirements.

Why it is better for you:

Less stressed candidates produce more representative answers.
You have more control over what you are testing for, and can refine this over time.
Any question where you can’t compare the answer between candidates is a waste of your time and theirs because it’s so subjective and poorly calibrated that you might as well just toss a coin. You can compare coding tests, you can’t compare coding samples.

Code samples: Bad for the candidate, and worse for you. Just say no.

The false proxies of mirror images

A while back Tim Chevalier wrote a post called “Hiring based on hobbies: effective or exclusive?” that I seem to feel the need to keep signal boosting. It’s about the problems with using hobbies as a factor when hiring people. I recommend reading the whole thing, but allow me to quote a relevant section:

[…] if you decide someone isn’t worth hiring because they don’t have “interesting” hobbies, what you’re really saying is that people who didn’t come from an affluent background aren’t interesting. That people with child care or home responsibilities aren’t interesting. That disabled people aren’t interesting. That people who have depression or anxiety aren’t interesting. That people who might spend their free time on political activism that they don’t feel comfortable mentioning to a stranger with power over them aren’t interesting. That people who, for another reason, just don’t have access to hacker spaces and don’t fit into hobbyist subcultures aren’t interesting.

Essentially the point is that hiring based on hobbies selects for people who are from a similar background to you.

You see, the problem is that the answer to the question of whether this is effective or exclusive is that it’s both. Hobbies are in many ways a really good signal of an active mind which will engage well with the job.

The problem is that they are also a signal that the person has the time and energy to pursue those hobbies.

Amongst people who have the lack of commitments and routinely have the mental energy for it, it may be that there is a very strong correlation between hobbies and competence (I’d actually place bets that it’s significantly less strong than we like to believe it is, but I have no data so I’m not going to speculate too wildly on that front. Lets assume for the sake of the argument that popular wisdom is correct here).

The problem is that hobbies are a form of false proxy. We’re unable to perform the test that would really allow us to determine someone’s competence (that is to say: Hire them and work closely with them for a couple years in a variety of different scenarios), so instead we pick something which we can easily measure and appears to be a good signal for it.

And how did we learn that it was a good signal?

Well, by looking around us. Look at the people we know that are good. If we’re feeling very virtuous we can look at the people we know are bad and actually compare differences rather than just falling into the “good people do X. Therefore X is good” fallacy.

The problem is that you’re looking at a very small population, and when you do this you’re necessarily massively over-fitting for the data. When you create tests based on your observation of your current in-group you end up creating tests that work very well for people who look like the data you’re modelling and at best perform more noisily for people outside it, but more likely actively penalise them.

Why? Because this is where privilege comes in. (Note: Privilege in this context is a technical term. If you feel the need to comment something along the lines of “How dare you call me privileged? I worked hard to get where I am!” etc just… don’t. Please).

The problem is that the advantages you have are mostly ones you don’t see. Because most of society’s advantages don’t come in terms of people giving you a leg up (certainly some do, but we tend to be more aware of those), they come in the form of things you don’t have to worry about. You may not have to worry about the constraints of chronic illness, of dependants, of currently being in a position where money is tight. It’s hard to notice the absence of something you’ve never experienced, and consequently you often don’t notice these group advantages. This is especially true because even if some people experience it at the individual level, as a group we’re a fairly privileged lot and so most of our group behaviours will lack these constraints.

There’s another way these false proxies can creep in. There have been a bunch of discussions about the myth of the 10x engineer recently. I also had some interesting conversations about various biotruthy beliefs about programming on Twitter (mostly with tef and janepipistrelle I think). Underlying both of these is a common question: What do we mean by programming ability?

Well, it’s obvious of course. Programming ability is being good at the things that I’m good at. Those things I’m not good at? Yeah I suppose some of them are kinda important, but not nearly as much.

This is a less egotistical viewpoint than you might think it is. Certainly some people believe this because they’re full of themselves, but it’s entirely possible to accidentally finding yourself believing this with the best intentions in the world.

How?

Well, what are the things you work at getting better at?

Right. The things you think are important. So it’s not just that people think things they are good at are important. It’s also that people work to get good at the things they think are important.

So what do you do when you want to decide how well someone will contribute to a team? You look at how good they are of course.

That is… how much they’re good at the things you’re also good at.

Or how much they look like you.

Oh, you’ll still choose people with complementary skills. I don’t think many of us fall into this trap so overtly as to only hire people with the exact same skill set as us. But step up a level, there are the meta qualities. Things like mathematical ability, passion about technology, reliability, being able to think through problems quickly, ability to communicate well and confidently, etc. Certainly these are some of the things I value. Coincidentally they’re things I also think I do quite well in. Funny that, huh?

These don’t necessarily contribute to a group diversity problem. Some of them do – it’s easier to talk confidently when you’re not socially punished for doing so, it’s easier to be passionate when you’ve got the energy to spare – but even when there’s little between-group variation in them (e.g. mathematical ability) they contribute to a personality monoculture. We don’t all look the same, we also all think rather more similarly than sheer chance would suggest.

Ultimately what we’re selecting for when hiring isn’t actually about the individual we’re hiring. What we really want to know is if the addition of this person to the team will make us better in ways we need to be better. This doesn’t necessarily mean they have “cultural fit” or that their abilities align well with the team – it might mean that they have a slow, methodical approach that will provide the necessary damper on the team’s mad rush to ship regardless of how broken. It might mean that they provide a calm, mediating voice. It might simply mean that they’re good at programming in ways that we wouldn’t expect because they’re different from our own. The point is that we don’t actually just need to hire people who are like us, we should hire people who augment us. People from across the spectrum who will make us better in a myriad different ways.

But we can’t test for that, because we don’t know how. So instead we invent these tests which we think provide good proxies for it.

But many of these tests are false proxies which are really testing how similar they are to us.

And then we act surprised when our whole team looks like us, and we claim that well it’s just what the tests show all the best candidates looked like and we have to accept the reality.

What a lovely meritocracy we work in.

An interview question

Edit for warning: This problem has proven substantially harder than I intended it to be. The spec keeps turning out to be ambiguous and needing patching and there are enough nasty edge cases that basically no one gets everything right (indeed, my reference implementation has just been pointed out to contain a bug). I’m going to leave it as is for the challenge but be warned.

This isn’t a question I’ve ever used to interview people, but it’s not dissimilar from the coding test we’ve got very good results from at Aframe (the problem is not at all similar, but the setup is fairly). I’ve been pondering this as an alternative, and I thought it might interesting to share it with people. I’ll explain what it’s designed to test in a later post.

If you want to answer it, I’m happy grade your answer, but there’s no exciting reward for doing so other than public or private recognition of a job well done. Details on this after the question.

The problem

You are required to write a command line program which reads lines which are terminated by ‘\n’ or EOF from its STDIN until it gets an end of file and then writes a stably sorted version of them to STDOUT. Where a line is terminated by EOF it should have an implicit ‘\n’ inserted after it.

It is perfectly OK to use library sort functions for this. Additionally, ability to handle large numbers of lines is not required – the only performance requirement is to sort 500 lines of less than 1000 bytes each in less than 10 seconds (this should not be in any way onerous). You can write an external sort if you want, and you’ll get extra kudos for it, but it is in no way required for a correct answer. The task is to implement the comparator used for the sorting, not to implement a sorting algorithm.

The comparator to be implemented is as follows:

Each line is to be considered to be an arbitrary sequence of non-‘\n’ bytes, which will be interpreted as a sequence of non-overlapping tokens. A token is EITHER:

a sequence of ascii letter characters a-zA-Z
a numeric string. This is a decimal representation of a number, containing 1 or more digits 0-9, possibly with a leading – sign, possibly with a single decimal point (.) followed by additional digits. The decimal point must come after at least one digit. No other characters are permitted. e.g. -5 and 5.0 are valid numeric tokens but +5, .5 and 5. are not (though they all contain a valid numeric token: 5)

Non-ascii or ascii but non-alphanumeric bytes may be present in the line, but must not be considered to be part of a token.

Each token should be the longest possible token it can be, with ambiguity resolved in favour of making the leftmost token longer. So for example the line “foobar” is the token “foobar”, not the tokens “foo” and “bar”, and “3.14.5” is “3.14”, “5”

Note also that lines may contain characters other than those permitted in tokens, and that tokens are not necessarily separated by whitespace:

“-10.11foo10 kittens%$£$%”

should be tokenized as

-10.11, foo, 10, kittens

Lines should then be compared lexicographically as their list of tokens (as usual, if one is a prefix of the other then the shorter one comes first), with individual tokens being compared as follows:

Two numeric tokens should be compared as their numeric value when interpreted as a decimal representation
Any numeric token is less than any non numeric token
Non-numeric tokens should be compared lexicographically by single character, with characters compared case insensitively. Case insensitivity should be performed as in English with ‘a’ corresponding to ‘A’, ‘b’ to ‘B’, etc.

Submission

If you want me to grade your answer, email it to me at [email protected]. If it’s not obvious how to run it, please include instructions. I will need to be able to run it on a linux VM.

Your email should include:

The source for your solution, either attached or as a link
Whether you want me to publish your name and grade (you don’t get to change your mind after you’ve been graded)
If so, how you want you want to be cited (pseudonym, full name, etc. I’m also happy to include a link
Whether you’re OK with me using your source code as an example in follow up posts (I will assume you are unless you explicitly say you are not)
Roughly how long you spent on the problem (I won’t publish this except in aggregate, it’s mostly just for my information)

I will then grade your submission as soon as I feasibly can (I don’t expect to be overwhelmed by submissions, and most of the grading will be automated, so hopefully this shouldn’t take too long). I will reply with your grade. You can at this point (assuming you didn’t get everything right. If you did, well done!) ask for examples of where you went wrong or submit an amended solution. We can keep iterating this process until one of us gets bored (I’m unlikely to accept more than 3 solutions from the same person). If you submit multiple solutions I’ll include your grades for all of them if you’ve asked for your name to be published.

Grading is as follows:

F: You have suffered from a serious failure to read the spec
C: You got some of it right, but there are significant omissions
B: You have mostly got it right, but you missed some edge cases
A: You have passed every test case I can throw at it
A+: And you implemented an external sort or otherwise did something clever. Go you!

(Note that there are no rewards for cleverness if you haven’t got the basic problem right. Such is life)

Hall of fame

(In order of solutions coming in)

Dave Stark with a grade of B, A. Bonus kudos also for the fact that his solution uncovered a bug in my reference solution
Alexander Azarov with a grade of B. Also his first solution uncovered some ambiguities in my spec
Kat (a different one than I’ve referenced here before). B on her first try followed by an A (she got all the hard edge cases right the first time but was tripped over by an annoying one). Also kudos for pointing out a lot of spec ambiguities.
Eiríkr Åsheim too gets kudos for the first solution which got everything right first time. Also for rolling his own IO code and finite state machine.

Comments policy

Feel free to point out problems in the spec. Note that a lack of detail is not a problem, but ambiguity is.

Do not post solutions in the comments. I will delete your comment. I will also delete or edit any comment I think gives too much away.

Also, comments of the form “What a stupidly easy test” will be deleted unless you have submitted a solution that was graded A.

Update on voice recording in interviews

So I’ve had my first interview. I think I’ve greatly overestimated the practicality of the voice recording plan – attempting to work the questions into the interview is a much less structured process than I’d hoped it would be, which makes it very hard to introduce the recorder.

I’m not completely giving up on it as an idea, but for now I’m just going with hand written notes.

A discussion on free time and hiring and employment practices

I’ve storified a long twitter discussion I had about this yesterday. Left here without further comment.

(I’ve not embedded it because it’s long and storify’s infinite scrolling is annoyingly blog breaking)