An amusing problem to solve

I was thinking about coding questions, and how you would write a coding question that tested peoples’ ability to explore algorithms in areas they weren’t likely to be that familiar with without necessarily being able to come up with optimal solutions.

I mulled over this a bit and came up with one which is entirely too evil to use. However I rather feel like I’m going to have to try to implement it anyway because it’s going to bug me. I figured I would inflict this on the rest of you, because misery loves company.

The problem is this:

You are given a non-negative symmetric \(n \times n\) matrix \(A\) with \(A_{ii} = 0\), with real numbers represented as double precision. This is your potential matrix.

Find an assignment of coordinates \(x_1, \ldots, x_n \in [0, 1]\) that has a low value for the energy function \(E = \sum\limits_{1 \leq i < j \leq n} \frac{A_{ij}}{|x_i – x_j|}\). You are not required to minimize it, but if you can that would be lovely.

Some initial thoughts on how one might solve it:

  1. Any optimal solution for a non-zero \(A\) assigns at least one point to 0 and at least one point to 1.
  2. For any given starting position there is a relatively straightforward gradient descent algorithm to find a local minimum
  3. If \(A_{ij} > 0\) for all \(i \neq j\) then there are at least \(n!\) local minima.
  4. If you’ve placed \(k\) items and you want to place a \(k + 1th\), as long as it has a non-zero coordinate in \(A\) with at least one of the items you’ve placed so far there is a unique place to put the k+1th item (though finding this is O(k)). This probably makes a greedy algorithm quite productive for smallish \(n\).
  5. If the graph with edges \(\{ \{i, j\} : A_{ij} \neq 0\}\)  is not connected you can run the problem separately on each component and merge the results, because they don’t affect each-other.
  6. Some form of e.g. simulated annealing with the mutation operator being to swap pairs of elements then run a local optimisation would probably be quite productive.

As an additional comment: Trying to do combinatorial optimisation on permutations of indices is an exercise in frustration. Believe me, I know. This problem may prove more annoying than it’s worth.

This entry was posted in How do I computer?, Numbers are hard on by .

The false proxies of mirror images

A while back Tim Chevalier wrote a post called “Hiring based on hobbies: effective or exclusive?” that I seem to feel the need to keep signal boosting. It’s about the problems with using hobbies as a factor when hiring people. I recommend reading the whole thing, but allow me to quote a relevant section:

[…] if you decide someone isn’t worth hiring because they don’t have “interesting” hobbies, what you’re really saying is that people who didn’t come from an affluent background aren’t interesting. That people with child care or home responsibilities aren’t interesting. That disabled people aren’t interesting. That people who have depression or anxiety aren’t interesting. That people who might spend their free time on political activism that they don’t feel comfortable mentioning to a stranger with power over them aren’t interesting. That people who, for another reason, just don’t have access to hacker spaces and don’t fit into hobbyist subcultures aren’t interesting.

Essentially the point is that hiring based on hobbies selects for people who are from a similar background to you.

You see, the problem is that the answer to the question of whether this is effective or exclusive is that it’s both. Hobbies are in many ways a really good signal of an active mind which will engage well with the job.

The problem is that they are also a signal that the person has the time and energy to pursue those hobbies.

Amongst people who have the lack of commitments and routinely have the mental energy for it, it may be that there is a very strong correlation between hobbies and competence (I’d actually place bets that it’s significantly less strong than we like to believe it is, but I have no data so I’m not going to speculate too wildly on that front. Lets assume for the sake of the argument that popular wisdom is correct here).

The problem is that hobbies are a form of false proxy. We’re unable to perform the test that would really allow us to determine someone’s competence (that is to say: Hire them and work closely with them for a couple years in a variety of different scenarios), so instead we pick something which we can easily measure and appears to be a good signal for it.

And how did we learn that it was a good signal?

Well, by looking around us. Look at the people we know that are good. If we’re feeling very virtuous we can look at the people we know are bad and actually compare differences rather than just falling into the “good people do X. Therefore X is good” fallacy.

The problem is that you’re looking at a very small population, and when you do this you’re necessarily massively over-fitting for the data. When you create tests based on your observation of your current in-group you end up creating tests that work very well for people who look like the data you’re modelling and at best perform more noisily for people outside it, but more likely actively penalise them.

Why? Because this is where privilege comes in. (Note: Privilege in this context is a technical term. If you feel the need to comment something along the lines of “How dare you call me privileged? I worked hard to get where I am!” etc just… don’t. Please).

The problem is that the advantages you have are mostly ones you don’t see. Because most of society’s advantages don’t come in terms of people giving you a leg up (certainly some do, but we tend to be more aware of those), they come in the form of things you don’t have to worry about. You may not have to worry about the constraints of chronic illness, of dependants, of currently being in a position where money is tight. It’s hard to notice the absence of something you’ve never experienced, and consequently you often don’t notice these group advantages. This is especially true because even if some people experience it at the individual level, as a group we’re a fairly privileged lot and so most of our group behaviours will lack these constraints.

There’s another way these false proxies can creep in. There have been a bunch of discussions about the myth of the 10x engineer recently. I also had some interesting conversations about various biotruthy beliefs about programming on Twitter (mostly with tef and janepipistrelle I think). Underlying both of these is a common question: What do we mean by programming ability?

Well, it’s obvious of course. Programming ability is being good at the things that I’m good at. Those things I’m not good at? Yeah I suppose some of them are kinda important, but not nearly as much.

This is a less egotistical viewpoint than you might think it is. Certainly some people believe this because they’re full of themselves, but it’s entirely possible to accidentally finding yourself believing this with the best intentions in the world.

How?

Well, what are the things you work at getting better at?

Right. The things you think are important. So it’s not just that people think things they are good at are important. It’s also that people work to get good at the things they think are important.

So what do you do when you want to decide how well someone will contribute to a team? You look at how good they are of course.

That is… how much they’re good at the things you’re also good at.

Or how much they look like you.

Oh, you’ll still choose people with complementary skills. I don’t think many of us fall into this trap so overtly as to only hire people with the exact same skill set as us. But step up a level, there are the meta qualities. Things like mathematical ability, passion about technology, reliability, being able to think through problems quickly, ability to communicate well and confidently, etc. Certainly these are some of the things I value. Coincidentally they’re things I also think I do quite well in. Funny that, huh?

These don’t necessarily contribute to a group diversity problem. Some of them do – it’s easier to talk confidently when you’re not socially punished for doing so, it’s easier to be passionate when you’ve got the energy to spare – but even when there’s little between-group variation in them (e.g. mathematical ability) they contribute to a personality monoculture. We don’t all look the same, we also all think rather more similarly than sheer chance would suggest.

Ultimately what we’re selecting for when hiring isn’t actually about the individual we’re hiring. What we really want to know is if the addition of this person to the team will make us better in ways we need to be better. This doesn’t necessarily mean they have “cultural fit” or that their abilities align well with the team – it might mean that they have a slow, methodical approach that will provide the necessary damper on the team’s mad rush to ship regardless of how broken. It might mean that they provide a calm, mediating voice. It might simply mean that they’re good at programming in ways that we wouldn’t expect because they’re different from our own. The point is that we don’t actually just need to hire people who are like us, we should hire people who augment us. People from across the spectrum who will make us better in a myriad different ways.

But we can’t test for that, because we don’t know how. So instead we invent these tests which we think provide good proxies for it.

But many of these tests are false proxies which are really testing how similar they are to us.

And then we act surprised when our whole team looks like us, and we claim that well it’s just what the tests show all the best candidates looked like and we have to accept the reality.

What a lovely meritocracy we work in.

This entry was posted in Feminism, Hiring, How do I computer? on by .

Designing a tech meet-up for inclusiveness

A while ago, shortly after attending Nine Worlds, I found myself very impressed with their code of conduct and wondering why more software development meetups don’t have a similar thing – an increasing number (although I think still not a large percentage) of our conferences do so it’s not that we’re completely ignorant of the need, but none of the London tech meet-ups have one. Some recent conversations reminded me of this question, and I think it’s an idea that might have legs.

One tech meet-up in the USA (I’m afraid I don’t remember which one and can’t find the relevant tweet right now) told me they had adopted the Hacker School Code of Conduct, and that it had worked well for them, so we’ve at least got an existence proof that such a group can work.

So it seems like the answer to the question “Why isn’t there a tech meet-up in London with a code of conduct?” is simply “We haven’t created one  yet”. It seems like it might be worth fixing that.

So I set about sketching a meet-up group that would satisfy the criteria “Have a code of conduct”. Of course you can just do this by taking any meet-up group and slapping a code of conduct on it, so this is mostly some thoughts around a) designing an interesting meet-up group and b) supporting that code of conduct slightly better.

Initial design goals:

  1. A code of conduct. I don’t know exactly which one. I like the specific hacker school guidelines, so those should be stolen. The Geek Feminism sample anti-harassment policy is also good. Some hybrid of these two should be made and continually refined as we figure out what works. (As a minor bugbear, and because much of the meet-up will probably happen in a bar because that’s how these things usually end up happening in London, one thing that should be explicitly on the CoC is that pressuring people to drink is not OK)
  2. This explicitly should not a “$SUBGROUP in tech” meet-up. Those are totally worthwhile and I’m glad they exist but, well, you may notice that I look rather a lot like the dominant group in tech. As a result it would be a bit problematic for me to organise such a group. I would hope to promote a diverse group, but I would hope to do that by talking about interesting things whilst not being arseholes. If there are other things we can do to promote it within our group we should do them, but fundamentally this should be a meet-up group who get together to talk about interesting tech things where everyone can be themselves rather than be singled out as a being a specific type of person in tech.
  3. It should be interesting for more than just being a room full of interesting people drinking
  4. People should come away from it improved in some capacity, even if it’s just “Hey I learned a thing”
  5. I like democracy. It’s a good thing. The group should evolve from its initial conception, and it should do so under the guidance of its membership rather than some benevolent dictator (though said dictator may be useful for getting the initial concept up and running).

So here is the design I settled on. Some or all of it may be terrible and liable to change if it hits reality, so consider these more a sketch of an idea than a concrete proposal:

The theme of the group is “Making software development better”. It’s a deliberately loose theme which can cover a multitude of topics. In particular it includes both “making better software” and “making it better to work in software”. Over time the group can decide what precisely we want this to mean – how much we want to focus on technologies, on process, etc. My assumption is that initially most of the talks will focus on “Here’s this thing we’re doing at work. It seems to work pretty well” or “Here’s a problem we’re having at work. Can you lot help me solve it?” – the point being that the focus is on software development rather than programming. This is true both in the sense that we’ll be talking more about “Here’s a neat debugging technique” than “Here let me teach you Haskell”, but also in the sense that non-coders who are part of the process are decidedly welcome. Indeed, while we definitely don’t want to be afraid of advanced technical content, we should try to make sure that a good chunk of what’s talked about is also about the larger process of development.

The rough structure would be:

  1. People arrive and mingle for about half an hour
  2. We have a half hour talk or discussion group, everyone sits
  3. Anyone who wants to get up and talk for < 5 minutes may. These can be lightning talks or they can just be “hey, this is my company, we’re hiring”.
  4. We have a 15 minute break
  5. We have another talk/discussion group
  6. Formalised fun ends. People can now hang around for as long as they like

One talk should be technical (here’s a tool we’re using and how it helps. Here’s a debugging technique. Let me teach you a bit more about security than you probably know), one should be non-technical (here’s a change we made to our sprint process that’s really helped. Here’s one weird trick to make meetings less painful. Here’s how we run our hiring process). Which one goes first will alternate.

The discussion groups are worth highlighting. In these the speaker would basically be acting as a moderator rather than talking for half an hour. Their goal would be to frame the discussion by asking questions and try and direct what’s on topic. The group would probably use something like Occupy movement hand signals (I used these recently, so they’re on my mind) to discuss these points without it just becoming a free-for-all. Basically the goal is to have a productive group discussion around a subject. This is also helpful for people who would like to speak but are nervous about it because they have to do an awful lot less speaking.

Speakers come from within the group rather than external. Everyone is encouraged but not required to speak. Initially I’d imagine that the process for talk selection would be “Whomever we can get to speak”. Eventually when there are enough people prepared to speak something else can probably be decided on – e.g. priority to people who haven’t previously talked, then randomly select a speaker amongst those left. A soft goal here is to provide an environment in which people who might otherwise be reluctant to speak feel comfortable doing so because they know a lot of the people around them and know that they’re in a supportive environment where people will behave decently.

Membership of the group would be closed, but with an aim of bringing ever more people into it: Ideally in any given session about a fifth (I made this number up and it might not be the right balance, but something in that region) of the group should be new. I’m not quite sure how that would work. My default here would be to have a list of people who are interested in joining and select from it by lot to fill a newness quota each session (though I expect that initially there will be trouble finding enough people rather than too many people joining at once). I’d also probably suggest that after someone has been to one session as a newbie there’s an option for a “this person was kinda a jerk and we’d rather they didn’t come back” vote amongst the existing group.

It’s also possible to be expelled from the group. The main way to do this is code of conduct violations – except for extreme violations (sexual harassment, full-blown racist ranting, etc) this isn’t a zero tolerance policy, but people who violate it will be asked to moderate their behaviour and if they don’t then they will be expelled.

There probably needs to be another mechanism for removing people from the group for people who are technically not violating the code of conduct but are making the group a worse place. Best mechanism I can come up with right now is “quietly have a word with the organisers. They’ll quietly poll to get consensus as to whether this is a real problem. If they think it is, they’ll have a word with the person in question about their behaviour. If this continues to be a problem they’ll ask them to leave”.

Anyway, that’s most of my thoughts for now. I’m aware I’ve spent far more on the details of organising the group than on the actual content, but that’s mostly because those are the bits that are novel and because I have a reasonable amount of faith that if you put a bunch of interesting, intelligent and pleasant people in a room they’ll find good things to talk about. One thing developers aren’t short of is opinions, and giving people a non-prejudicial place to speak about those opinions seems like a great way to get a different set than normally dominate heard.

Right now this is just an idea. I do not have a good history of follow-through on ideas. If you want this to happen I recommend shouting excitedly at me until I do it, or stealing the idea for yourself. I’m mostly putting it out there to see if people are interested. Are you?

Update: OK. I’ve got some initial interest. I think I’d like to define acceptance criteria which will cause me to try to organise such a thing. I may adjust these later, these are just an initial attempt at it. I will run an initial attempt at this if I get interest from:

  1. At least 15 people able and interested to attend such an event in London. This number is arbitrarily chosen and I might relax it, but it doesn’t feel like an unreasonable amount.
  2. At least 5 of these people must not be personally known to me. For a concrete criterion lets say I haven’t met them in person. This is mainly to stop this turning into a David’s tech friends all get together event.
  3. At least 5 of these people must not be white cis men. I dislike quotas in general, but this shouldn’t be a hard condition to satisfy and if 2/3rds of the initial audience for this group are just like all the other tech meet-ups in London it feels a little pointless. TBH I’d hope to do a lot better than this, I just want to set a lower bound.
  4. Someone puts forth a technical talk they’d like to do (I’ll hijack the non-technical talk slot for the first one to do a “What do we want out of this?” discussion group). If the first three conditions are satisfied I’m sure I’ll be able to pester someone into doing this.

(I’m indifferent as to whether the 5 in those two conditions overlap or not)

If you’re interested, drop me a tweet or an email (@DRMacIver or [email protected] respectively) to let me know.

This entry was posted in Feminism, How do I computer? on by .

How learning Scala made me a better programmer

People will often tell you how learning functional programming / C / their current favourite language paradigm will make you a better programmer even if you don’t actually use it.

I mean, yes, I suppose this is true. It’s very hard for learning new things about programming to not teach you something general purpose, and if a language is unfamiliar enough that it’s not just a syntax change you’re almost certainly going to learn new ways of thinking along with it. I don’t know that it’s the most efficient way to become a better programmer, but it’s definitely not a bad one.

Scala, though, while there are a bunch of neat features which I still miss and occasionally emulate badly, I can point to a much more concrete and useful way in which it made me a better programmer. Unfortunately a) I don’t think it’s likely to work any more and b) You’re not going to like it.

When I was learning Scala back in… 2006 I think it might have been? Certainly early 2007 at the latest. Anyway, sure I was learning interesting things about static typing, object orientation and functional programming (maybe not much of the last one. I already knew Haskell and ML tolerably well), what I actually learned the most about was from an entirely different feature of the language.

Specifically that the compiler was ridiculously buggy.

I assume things are much better than they used to be these days, and I’m sure my memory is exaggerating how bad they were at the time, but it feels like back then the development cycle was compile, run tests, report compiler bug. I think I probably hit more compiler bugs than most people – I’m not sure why; maybe I just had a tendency to push at the edge cases. It could also be that I just was nicer about reporting them than most people, I don’t know. Either way, there was a reasonable period of time when I had the top reported number of bugs on the tracker (I think Paul Phillips was the one who ousted me. Certainly once he came along he blew my track record out of the water).

Back in the dim and distant past when I wrote “No seriously, why Scala?” someone asked the question “Why isn’t a buggy compiler a show stopper?”. There were some pretty good answers, but really the accurate one is just “Well… because it’s not?”. You pretty quickly learn to get used to a buggy compiler and integrate its bugginess into its work flow – not that you come to depend on it, but it just becomes one of those things you know you have to deal with and compensate for. It potentially slows you down, but as long as you have good habits to prevent it tripping you up and the rest of the language speeds you up enough to compensate this isn’t a major problem.

I’m not saying I’d actively seek out a buggy compiler today. Obviously if you can choose to not be slowed down this is better than being slowed down, and no matter how good your procedures for compensating for it eventually you’re going to be hit by a compiler bug in production. If I even need to say it, there are clearly downsides to writing production code with a buggy compiler.

But from the point of view of my development as a programmer it was amazing.

Why?

Well, partly just because being good at writing a decent bug report is a nice skill to have to endear you to a maintainer and this is where I learned a lot of those skills (though it took me being on the wrong end of bad bug reports to properly internalise them).

But mostly I think because just it was really useful having that much practice finding and submitting bugs. In much the same way that we spend more time reading than writing code, we also spend more time finding bugs than writing bugs. Having lots of practice at this turns out to be super helpful.

Compiler bugs have an interesting character to them. Most good advice about debugging advises you to not assume that the bug you’re experiencing is in the compiler, or even your system libraries, and that’s in large part because it indeed rarely is, so you tend not to experience this character too often.

What is that character?

It’s simple: You cannot trust the code you have written. You cannot deduce the bug by reading through the code until you get a wrong answer. The code is lying to you, because it doesn’t do what you think it does.

To an extent this is always true. Your brain does not include a perfect implementation of the language spec and the implementation of the all the libraries, so the code is always capable of doing something different than you think it does, but with compiler bugs you don’t even have the approximation to the truth you normally rely on.

“Code is data” is a bit of a truism these days, but with a compiler bug it’s actually true. Your code is not code, it’s simply the input to another program that is producing the wrong result. You are no longer debugging your code, you are manipulating your code to debug something else.

This is interesting because it forces you to think about your code in a different way. It forces you to treat it as an object to be manipulated instead of a tool you are using to manipulate. It gives you a fresh perspective on it, and one that can be helpful.

How do you debug a compiler bug? Well. Sometimes it’s obvious and you just go “Oh, hey, this bit of code is being compiled wrong”. You copy and paste it out, create a fresh test case and tinker with it a little bit and you’re done.

This will probably not happen for your first ten compiler bugs.

Fortunately there’s a mechanical procedure for doing this. For some languages there’s even literally a program to implement this mechanical procedure. I find it quite instructive to do it manually, but that’s what people always say about tedious tasks that are on the cusp of being automated. Still, I’m glad to have done it.

What is this mechanical procedure?

Simple. You create a test case for the thing that’s going wrong (this might just be “try to compile your code” or it might be “run this program that produced wrong output”. For the sake of minimalism I prefer not to use a unit testing framework here). You check out a fresh branch (you don’t have to do this in source control but there’s going to be a lot of undo/redo so you probably want to). You now start deleting code like it’s going out of fashion.

The goal is to simply delete as much code as possible while still preserving the compiler bug. You can start with crude deletion of files, then functions, etc. You’ll have to patch up the stuff that depends on it usually, but often you can just delete that too.

The cycle goes:

  1. Delete
  2. Run test
  3. If the test still demonstrates the problem, hurrah. Go back to step 1 and delete some more.
  4. If the test no longer demonstrates the problem, that’s interesting. Note this code as probably relevant, undelete it, and go delete something else.

Basically keep iterating this until you can no longer make progress.

When you stop making progress you can either go “Welp, that’s small enough” (generally I regard small enough to be a single file under a few hundred lines, ideally less) or you can try some other things. e.g.

  1. Inlining imported files
  2. Inlining functions
  3. Manually constant folding arguments to functions (i.e. if we only ever call f(x, y, 1) in a program, remove the last argument to f and replace it with 1 in the function body)

The point being that you’re thinking in terms of operations on your code which are likely to preserve the bug. Finding those operations will help you understand the bug and guide your minimization process. Then at the end of it you will have a small program demonstrating it which is hopefully small enough that the bug is obvious.

Is this how I normally debug programs? No way. It’s a seriously heavyweight tool for most bugs. Most bugs are shallow and can be solved with about 10 minutes of reading the code until it’s obvious why it’s wrong.

Even more complicated bugs I tend not to break this out for. What I do take from this though is the lesson that you can transform your program to make the bug more obvious.

Sometimes though, when all else fails, this is the technique I pull out. I can think of maybe half a dozen times I’ve done it for something that wasn’t a compiler bug, and it’s been incredibly useful each time.

All those half dozen times were for a very specific class of bug. That specific class of bug being “ruby”. There’s a hilarious thing that can happen in ruby where namespacing isn’t really the cool thing to do and everything messes around with everyone else’s internal implementation. This potentially leads to some really bizarre spooky interaction at a distance (often involving JSON libraries. *shakes fist*). This technique proved immensely invaluable getting to the bottom of this. For example the time I discovered that if your require order was not exactly right, having a JSON column in datamapper would cause everything using JSON to break. That was fun.

But even when I’m not explicitly using these techniques, it feels like a lot of my debugging style was learned in the trenches of the Scala compiler. There’s a certain ramp up when debugging where you start with intuition and pile on increasingly methodical techniques as the bugs get more mysterious, and I think exposure to a lot of really mysterious bugs helped me learn that much earlier in my career than I otherwise would have.

It’s possible that the fact that it was a compiler was irrelevant. It feels relevant, but maybe I’d have learned variations on the technique at any point when I was daily interacting with a large, buggy piece of software. But to date, the 2007-2008 era Scala compiler is the best example I’ve had of working with such, so it’s entirely possible I’d have never learned that skill otherwise, and that would have been a shame because it’s one that’s served me well on many other projects.

This entry was posted in programming on by .

Some things that are the same size as each other

In my last post I introduced the notion of the size of an infinite set and proved two of the fundamental theorems about them.

In particular I pointed out that infinite sets could be the same size as proper subsets of themselves, and that they were strictly smaller than power sets of themselves.

One of the reasons why the fact that there are different sizes of infinity is a little surprising is that it turns out that a lot of things you’d think of as being different sizes are actually the same size.

This post is mostly worked examples of that. Along the way I’ll also use these examples to introduce some of the basic ideas of cardinal arithmetic, but this is mostly about showing how some of this works in practice.

A piece of notation: The set of real numbers is denoted \(\mathbb{R}\), the set of rational numbers \(\mathbb{Q}\) (a rational number is anything that can be represented as \(\frac{m}{n}\) for integers n\) and the set of integers \(\mathbb{Z}\). I’m not going to attempt to give a formal definition of the real numbers here, but I also shouldn’t rely on anything too surprising about them.

How large are these sets? They’re the same size as the two main sets we’ve already studied.

  • \(\mathbb{Z} = \aleph_0\)
  • \(\mathbb{Q} = \aleph_0\)
  • \(\mathbb{R} = 2^{\aleph_0}\)

In order to prove the first two, we’re first going to look at a slightly different set. Consider \(\mathbb{N}^2\), the set of ordered pairs of natural numbers (e.g. \((1, 2)\), \((7, 3)\), etc). How large is this set?

In fact, \(|\mathbb{N}^2| = \aleph_0\).

Why? Well, certainly \(|\mathbb{N}^2| \geq \aleph_0\), because the function \(f(n) = (n, 0)\) is an injection \(f : \mathbb{N} \to \mathbb{N}^2\). But we can also construct an injection \(g : \mathbb{N}^2 \to \mathbb{N}\) as \(g((i, j)) = 2^i 3^j\). Uniqueness of prime decomposition then gives us that this is an injection, and thus \(|\mathbb{N}|^2 = \aleph_0\).

You can also construct a bijection explicitly. It’s kinda informative to do so, but the details are fiddly enough that I confess I can’t really be bothered with them.

But this fact now gives us the cardinalities of \(\mathbb{Z}\) nearly for free. Let \(f : \mathbb{Z} \to \mathbb{N}^2\) be such that \(f(n) = (s, \mathrm{abs}(n)\), where \(s = 0\) if \(n < 0\) and \(s = 1\) otherwise. Then this is an injection \(f : \mathbb{Z} \to \mathbb{N}^2\). We already know \(\mathbb{Z} \geq \aleph_0\) (because it contains \(\mathbb{N}\) as a subset), so \(|\mathbb{Z}| = \aleph_0\).

We can then do the same with \(\mathbb{Q}\). Express a rational \(x = \frac{m}{n}\) as a reduced fraction (i.e. \(n > 0\) and m and n have no common divisor). Define \(g(x) = (f(x), n)\) where \(f\) is our injection from above. This is an injection into \(\mathbb{N}^2\) and thus \(|\mathbb{Q}| = \aleph_0\).

The use of pairs for studying cardinality turns out to be quite useful. In fact it’s the case that for any infinite set that \(|A^2| = |A|\). It will be some time before we have the tools to prove this and I don’t know if I’m going to, but it’s worth knowing. We will prove it later for another concrete example though.

In general, pairs like this give us a notion of multiplication for cardinal numbers. We can construct the pair set \(A \times B\) as the set of ordered pairs where the first is an element of \(A\) and the second is an element of \(B\). Then we write \(|A \times B| = |A| |B|\).

If \(|A| \leq |C|\) and \(|B| \leq |D|\) then \(|A| |B| \leq |C| |D|\). We can see this by just pairing together the two injections we have from this inequality: If \(f : A \to C\), \(g : B \to D\) are injections then the function \(h((a, b)) = (f(a), f(b)\) is an injection \(A \times B \to C \times D\). This also shows that our notion of multiplication is well defined in the sense that if different sets have the same cardinality then their pair sets also have the same cardinality.

In particular this means that e.g. \(|\mathbb{Q}^2| = |\mathbb{Z} \times \mathbb{N}| = \aleph_0\). Any pairing of sets of size \(\aleph_0\) will give a set of size \(\aleph_0\).

Now lets look at the real numbers. Why is \(|\mathbb{R}| = 2^{\aleph_0}\)?

Well. First consider the set of all functions \(f : \mathbb{N} \to \{0,1\}\). We’ll call this set \(2^\mathbb{N}\). This has a natural bijection with \(\mathcal{P}(\mathbb{N})\) by identifying a set with its indicator function. i.e. \(f(A)(n) = 1\) if \(n \in A\) else \(f(A)(n) = 0\). We’ll show this set is in bijection with the real numbers.

First, here’s an injection \(f : 2^\mathbb{N} \to \mathbb{R}\): Let \(f(t)\) be the number whose decimal representation is \(1\) if \(t(n) = 1\) or \(0\) otherwise. This is an injection because decimal representations are unique if they contain no infinite sequence of trailing 9s.

In the other direction, choose a binary representation for \(x\) (it doesn’t matter which, but lets arbitrarily pick the version without trailing 1s where there’s ambiguity). Then the function \(g(x)(n) = \) the nth binary digit of \(x\) is an injection \(g : \mathbb{R} \to 2^\mathbb{N}\). Hence \(|\mathbb{R}| = |2^\mathbb{N}| = 2^{\aleph_0}\).

Note the bit at the end there where we wrote \(|2^\mathbb{N}| = 2^{\aleph_0}\)? This is of course where the notation \(2^{\aleph_0}\) actually comes from.

In general we can define a notion of cardinal exponentiation: \(|A|^|B| = |A^B|\), where \(A^B\) is the set of functions \(f : B \to A\). This is consistent with the above notation because \(|\{0,1\}| = 2\).

Note that if \(|A| \leq |C|, |B| \leq |D|\) then \(|A|^{|C|} \leq |C|^{|D|}\). If \(f, g\) are our injections then pick some point \(c \in C\). Define \(h : A^B \to C^D\) as \(h(t)(d) = g(t(f^{-1}(d)))\) if \(f^{-1}\) is defined there, else \(h(d) = c\). This is an injection because if \(t \neq t’\) then we can find \(a\) with \(t(a) \neq t'(a)\). Therefore \(h(t)(f(a) \neq h(t’)(f(a)\) and so \(h(t) \neq h(t’)\).

While we’re here, we should also define cardinal addition. The idea of cardinal addition is that we want a disjoint union option. \(|A| + |B|\) is \(|A \cup B|\) ignoring the fact that \(A\) and \(B\) might overlap. We can do this by defining it as \(|A| + |B| = |\{0\} \times A \cup \{1\} \times B|\), with the pairing just there to force that disjointness. We’ll write that disjoint union operation \(A \sqcup B\).

Turns out a lot of the nice properties you’d hope held do hold. In particular:

  • \(|A| |B| = |B| |A|\)
  • \(|A| + |B| = |B| + |A|\)
  • \(|A| (|B| + |C|) = |A| |B| + |A| |C|\)

I’m going to label these “exercise for the reader”. There are fairly natural bijections in all cases.

One thing which I will do because it’s important for the next bit is show that the following exponentiation law holds: \(|A|^{|B|} \times |A|^{|C|} = |A|^{|B| + |C|}\).

Proof: This is because we can define a function on \(A \sqcup B\) by defining it piecewise on \(A\) and \(B\). So given a pair \(t = (f, g) \in |A|^{|B|} \times |A|^{|C|}\) we can define \(h(t)((0, a)) = f(a)\), \(h(t)((1, b)) = g(b)\). \(h\) is a bijection. QED

This will let us prove that \(|\mathbb{R}^2| = |\mathbb{R}|\).

First note that \(\aleph_0 + \aleph_0 = \aleph_0\) because the definition of \(\mathbb{N} \sqcup \mathbb{N}\) actually defines it as a subset of \(\mathbb{N}^2\).

This then gives us that \(|\mathbb{R}^2| = |\mathbb{R}|^2 = (2^{\aleph_0})^2 = 2^{\aleph_0 + \aleph_0} = 2^{\aleph_0} = \mathbb{R}\).

So there as many pairs of real numbers as real numbers.

It turns out to be more extreme than that. \(|\mathbb{R}^\mathbb{N}| = |\mathbb{R}|\). i.e. there are as many infinite sequences of real numbers as there are real numbers.

Why?

Well, first: we have the exponentiation rule that \((|A|^{|B|})^|C| = |A|^{|B| |C|}\). This is because given a function \(f : B \times C \to A\) you can map it to the function \(f'(b)(c) = f((b, c))\) and this mapping is a bijection.

This means that \(|\mathbb{R}^\mathbb{N}| = 2^{\aleph_0 \times \aleph_0} = 2^{\aleph_0} = |\mathbb{R}|\).

One thing that is strictly larger than \(\mathbb{R}\) is of course \(\mathbb{R}^\mathbb{R}\). This is because \(|\mathbb{R}^\mathbb{R}| \geq |2^\mathbb{R}| > \mathbb{R}\). So there are strictly more functions from \(\mathbb{R} \to \mathbb{R}\) than their are real numbers.

This isn’t true for continuous functions though. Let \(\mathcal{C}\) be the set of continuous functions \(\mathbb{R} \to \mathbb{R}\). Then \(|\mathcal{C}| = |\mathbb{R}|\).

Why? Well because a continuous function is uniquely defined by its value on the rationals (because every real number has rationals arbitrarily close to it), so the function \(h : \mathcal{C} \to \mathbb{R}^\mathbb{Q}\) defined by \(h(f) = f|_\mathbb{Q}\) is an injection. But \(|\mathbb{R}^\mathbb{Q}| = |\mathbb{R}^\mathbb{N}| = |\mathbb{R}|\).

So, there you have it. Some examples of infinity, some things that are the same size as each other, and some that aren’t. Hopefully this makes some of the ideas slightly more familiar.

 

This entry was posted in Numbers are hard on by .