Author Archives: david

Advance warning of advanced mathematics

A long, long time ago (about 6 years now) my friend John Bytheway and I were working on a joint paper. It was an odd little thing – a combination of Banach Space geometry and point-set topology with just enough analysis in between to glue the two together. There was probably something publishable in it, but we never really got it to a point where we were happy enough with the results to try. Still, I really quite liked the material in it, and I seem to come back to it every few years, polish bits of it, have a tinker and wonder if it’s worth trying to get it published. Part of this was we discovered later on that less of it was new than we’d previously believed.

I’ve come to the conclusion that I really don’t care enough about the stamp of approval for being published in a journal but that I’d quite like to get the results out in public, for some closure if nothing else but also in case anyone finds them interesting/useful. So I’m going to dig up some of the individual material and try to package them up so they stand alone as blog posts and just put them up here.

This entry was posted in Numbers are hard on by .

A nicer way to complete metric spaces

It’s a standard result that you learn almost as soon as you learn about metric spaces that every metric space has a completion.

The normal way to prove this is to construct a pseudometric on the space of Cauchy sequences on the metric space and then quotient out pairs with zero distance to get a metric space. I think this is a bit of a waste of time – the details are fiddly and annoying and it’s not terribly enlightening. Here’s a nicer way.

Let \(X\) be a set. It’s a standard result that \(l^\infty(X)\), the set of bounded functions \(X \to \mathbb{R}\) together with the uniform metric, is a complete metric space.

Now let \(X\) be a metric space and fix arbitrary \(c \in X\).

Claim: The function \(\iota : x \to f_x\) where \(f_x(y) = d(x, y) – d(c, y)\) is an isometric embedding of \(X\) into \(l^\infty(X)\).

If this claim is true we’re done. We’ve got an isometric embedding of \(X\) into a complete metric space, so the closure of the image is its completion.

In order to prove this we’ll need the following elementary result:

Lemma: Let \(x, y, z \in X\). Then \(|d(x, z) – d(y, z)| \leq d(x, y)\).
Proof:

Simple application of the triangle inequality. \(d(x, z) \leq d(x, y) + d(y, z)\) so \(d(x, z) – d(y, z) \leq d(x, y) \). Similarly \(d(x, z) – d(y, z) \geq -d(x, y) \). QED

Now to prove that \(\iota\) is an isometric embedding.

First we must show that \(f_x \in l^\infty(X)\). This is a simple consequence of the above lemma: $$|f_x(y)| = |d(x, c) – d(y, c)| \leq d(x, c)$$

Hence \(f_x\) is bounded

Now for isometry:

Consider \(x, w \in X\). Then $$|f_x(y) – f_w(y)| = |d(x, y) – d(y, c) + d(y, c) – d(w, w)| = |d(x, y) – d(w, y)| \leq d(x, y)|$$

So, taking the supremum over \(y\), we have \(||f_x – f_w|| \leq d(x, w)\). We now need only show equality.

For this, consider \(|f_x(w) – f_w(w)| = |d(x, w) – d(x, x)| = d(x, w)\). Hence we must have \(||f_x – f_w|| \geq d(x, w)\) and the result is proved.

Notes

  • I’m afraid I don’t remember where I first saw this so I can’t provide a reference, but it’s definitely not original to me. I probably encountered it in some textbook long ago
  • The thing I like about this is that it’s quite explicit, and at each step along the way you do the only thing that could possibly work once you’ve had the initial idea of trying to embed it into \(l^\infty(X)\): The only function you know about on it is the metric, so you have to use that. It’s not bounded, so what’s a non-artificial way to make it so, etc. The details are also much less fiddly than the normal proof because we did all the hard work of showing completeness when we proved that \(l^\infty(X)\) was complete.
This entry was posted in Numbers are hard on by .

The power of fixed point theorems in Set Theory

For reasons that aren’t relevant here, I was reminded of the power of fixed point theorems on partially ordered sets to prove some fundamental results in set theory earlier. This is an expository post about that. If you don’t care about set theory you probably won’t care about this.

I’m going to mention two fixed point theorems and use each of them to give a simple proof of a well known result in set theory.

Preliminaries

A poset is a set \(X\) with a relation \(\leq\) on it that is

  1. Reflexive: \( \forall x. x \leq x \)
  2. Antisymmetric: \(\forall x, y. x \leq y\) and \(y \leq x \implies x = y\)
  3. Transitive: \(\forall x, y, z. x \leq y\) and \(y \leq z \implies x \leq z\)

Additional reminders:

  • A poset is said to be totally ordered if additionally it is Total: \( \forall x, y. x \leq y\) or \(y \leq x\).
  • A chain is a subset of a poset which is totally ordered under the restriction of the relationship to it.
  • \(x\) is an upper bound for a set \(A\) if \(\forall a \in A. a \leq x \)
  • \(x\) is a least upper bound for a set \(A\) if it is an upper bound and \(x \leq y\) for all upper bounds of A \(y\). When such an upper bound exists it is unique (by anti-symmetry) and we write it as \(\sup A\)
  • A poset is said to be complete (resp. chain-complete) if every subset (resp chain) has a least upper bound
  • Every chain complete poset has a least element (the least upper bound of the empty set).

Examples of posets are any of the normal sets of numbers – e.g. \(\mathbb{N}, \mathbb{Q}, \mathbb{R}\). All of these are total orders, none of them are complete. \([0, 1] \subseteq \mathbb{R}\) is a complete total order.

For any \(X\), the power set \(\mathcal{P}(X)\) is a complete poset with the order being set inclusion.
Our fixed point theorems will concern functions

Our fixed point theorems are both about particular types of mappings of posets to themselves.

Knaster–Tarski fixed point theorem

(This is actually a slightly weakened form of the theorem because it’s slightly easier to prove and we don’t need the full thing)

Theorem: Let \(X\) be a complete poset and \(f : X \to X\) be order presering. That is \(x \leq y \implies f(x) \leq f(y)\). Then \(f\) has a fixed point.

Proof:

Let \(A = \{ x : x \leq f(x) \} \).

If \(x \in A\) then \(f(x) \in A\) as if \(x \leq f(x)\) then \(f(x) \leq f(f(x))\) due to \(f\) being order preserving.

Suppose now that \(y = \sup A\). Then if \(x \in A\) we have \(x \leq y\) and so \(f(x) \leq f(y)\). But \(x \leq f(x)\) by hypothesis of \(x \in A\), so we have \(x \leq f(x) \leq f(y)\). Therefore \(f(y)\) is also an upper bound of \(A\) and thus, because \(y\) is the least upper bound, \(y \leq f(y)\) and thus \(y \in A\).

But then as per above we must also have \(f(y) \in A\), and thus \(f(y) \leq y\). Therefore \(f(y) = y\).

QED.

This then lets us provide a very short proof of one of the basic results of set theory.

Cantor–Bernstein–Schroeder theorem

Let \(A, B\) be sets, and let \(f : A \to B\) and \(g : B \to A\) be injections. There exists a bijection \(h : A \to B\)

Proof:

The idea is that we want to find some subset \(H \subseteq A\) for which we can define our function h as \(h|_H = f\) and \(h|_{H^c} = g^{-1}\).

Claim: If \(g(f(H)^c) = H^c\) then the function h is well defined and is a bijection.

Well defined is easy: By the assumption, every element of \(H^c\) is in the image of \(g\), so \(g^{-1}\) is well defined on it.

Injective: It’s injective when restricted to \(H\) and to \(H^c\), the only question is whether we can have \(h(x) = h(y)\) with \(x \in H\) and \(y \in H^c\). But by assumption and injectivity of \(g\) we have \(g^{-1}(H^c) \subseteq f(H)^c\), so this is impossible.

Surjective: By assumption, \(g^{-1}(H^c) = f(H)^c\), so the function is surjective.

So now all that remains is to find such an H. Which we will now do by using Knaster-Tarski to pull it out of a hat.

Rewrite the requirement as \(H = g(f(H)^c)^c\)

i.e. we have the function \(S(A) = g(f(A)^c)^c\) and are looking for \(S(H) = H\).

Suppose \(A \subseteq B\). Then \(f(A) \subseteq f(B)\), so \(f(B)^c \subseteq f(A)^c\), so \(g(f(B)^c) \subseteq g(f(A)^c)\), so \( g(f(A)^c)^c \subseteq g(f(B)^c)^c\). i.e. \(S(A) \subseteq S(B)\).

So \(S\) is an order preserving map on the complete poset \((\mathcal{P}, \subseteq)\). Thus by the Knaster–Tarski fixed point theorem it gives us \(H\) we want.
QED

I find Knaster-Tarski a particularly “magic” fixed point theorem usage here because there’s no obvious connection between \(A\) and \(S(A)\) – we’ve just defined \(S(A)\) so that if it had a fixed point it would be the thing we wanted, and thus a solution appears.

Our next fixed point theorem is one I find much more intuitive, which makes it perhaps slightly odd that the proof we’ll use it in is intimately connected with the Axiom of Choice (which is intuitive itself but produces unintuitive results) and Zorn’s lemma (which is too technical to really have much intuition about).

Bourbaki–Witt fixed point theorem

Let \(X\) be a chain-complete poset and \(f : X \to X\) such that \(f(x) \geq x\). For every \(x \ni X\), \(f\) has a fixed point \(y \geq x\)

Proof:

The proof is by transfinite induction. That is, we’ll take some ordinal \(\kappa\) with no injection into \(X\) (such an ordinal exists by Hartog’s Lemma) and define \(g : \kappa \to X\) as follows:

  1. \(g(0) = x\)
  2. \(g(\alpha + 1) = f(g(\alpha))\)
  3. \(g(\beta) = \sup \{ g(\alpha) : \alpha < \beta\}\) when \(\beta\) is a limit ordinal (this is well defined because g produces a chain by construction and \(X\) is chain complete)

\(g\) is, by construction and that \(x \leq f(x)\), a monotone increasing function. i.e. \(\alpha \leq \beta \implies g(\alpha) \leq g(\beta)\). It’s not injective, therefore we can find some \(\alpha < \beta\) with \(g(\alpha) = g(\beta)\). Then \(\alpha + 1 \leq \beta\), so \(g(\alpha) \leq g(\alpha + 1) \leq g(\beta) = g(\alpha)\). Therefore \(g(\alpha) = g(\alpha + 1) = f(g(\alpha))\) and \(g(\alpha)\) is our desired fixed point (and is \(\geq x\) because \(g(0) = x\). QED Now, how do we use this? Well, it provides a very nice proof of Zorn's lemma by way of a closely related result.

Hausdorff maximality theorem

Let \(X\) be a poset. Every chain \(C \subseteq X\) is contained in a maximal chain (that is one not properly contained in any other chain).

This is a theorem very closely related to Zorn’s lemma – either can very easily be proved from the other. But Zorn’s lemma is a simple corollary of Hausdorff’s maximality theorem (whileas the other direction requires at least a bit of machinery), and this way round we get to apply a nice fixed point theorem to give a very simple proof.

Proof:

Let \(\mathcal{C}\) be the set of chains of X ordered by inclusion.

Claim: This is a chain complete poset.
Proof: Let \(\mathcal{A} \subseteq \mathcal{C}\) be a chain. Let \(A = \bigcup \mathcal{A}\). If \(x, y \in A\) then \(x, y \in B\) for some \(B \in \mathcal{A}\) (because it’s a chain, so find one containing each and take the larger of the two). Therefore \(x \leq y\) or \(y \leq x\). Hence \(A \in \mathcal{C}\). Clealy it’s a least upper bound for \(\mathcal{A}\) as it is one in the poset \(\mathcal{P}(X)\).

So, we’ll now apply the Bourbaki Witt fixed point theorem. Define \(f : \mathcal{C} \to \mathcal{C}\) as follows:

  1. If A is maximal, \(f(A) = A \)
  2. If A is not maximal, use the axiom of choice to arbitrarily pick some chain strictly containing \(A\) as \(f(A)\)

By construction \(f(A) \supseteq A\), by the above \(\mathcal{C}\) is chain complete, therefore by Bourbaki Witt \(f\) has a fixed point above every element of \(\mathcal{C}\). The elements of \(\mathcal{C}\) are the chains of \(X\) and the fixed points of \(f\) must be maximal. Hence the result is proved.

QED

As promised, Zorn’s lemma is just a simple corollary:

Zorn’s Lemma

Let \(X\) be a poset such that every chain has an upper bound (not necessarily a least one). \(X\) has a maximal element.

Proof:

Pick a maximal chain, \(C\). Let \(x\) be an upper bound for \(C\). If there were some element \(y \) with \(x < y\) then \(C \cup \{y\}\) would be a chain strictly containing \(C\). Therefore no such \(y\) exists and \(x\) is maximal.

Conclusion

A lot of the proofs presented for both of these results are pretty inscrutable. They’re not hard to understand, but there are quite a lot of details to get right.

I’ve got to admit, this mostly exhausts my list of cute things in set theory you can prove by way of fixed point theorems. I’m sure there are more, and I’d be interested to hear about them, but it’s been a while since I’ve looked at this and these are the only two I remember.

Update: In talking to Imre Leader, who taught me the course in which I learned Bourbaki Witt, he points out that as far as set theory is concerned Bourbaki Witt is almost entirely useless. It can essentially be regarded as “the choice free part of Zorn’s lemma”, and as a result no one actually uses it because the hypotheses of Zorn’s lemma are so much easier to verify.

This entry was posted in Numbers are hard on by .

“Agda is now mainstream”

Various people on the internets have been pointing out that the results for this is a mainstream language on hammer principle are a bit weird. In particular Agda was showing up as number 10 for it which if you know of Agda you’ll realise how ridiculous it is and if you don’t, well exactly.

It’s not there any more, as I tweaked the algorithm slightly to prune statement/item pairs with very little data, but the basic problem of the confusingness of this particular statement remains. The problem is alluded to on the text of the page, but I expect no one reads that text anyway. Oh well. So, quoted here:

For items where we’ve got a lot of responses the numbers should be pretty good. For the rest you can probably just expect them to hang around the middle of the rankings wondering what they’re doing here.

This is because the way the ranking algorithm works is that it sorts by an initial score and then looks for evidence to move items from that point. If there’s little data that score will be about in the middle and there’ll be little evidence to move it.

This is true on all the pages, but the problem with this particular one is that in this case lack of data is correlated with the correct position of the language in the ranking. This means that particularly non mainstream languages will lack data and thus will be sitting there in the middle, while as only mildly non mainstream languages will be parked down at the bottom because there was a lot of evidence that they weren’t mainstream.

The main problem here is not really the ranking algorithm (although it could be improved. I have an improvement in the works, but it’ll take a while to plumb in), it’s that the site doesn’t make it clear when an item’s position in a list is really more of a rough guess than a reality. I don’t currently have any good ideas as to how to indicate that, so the status quo will probably remain for now.

It’s also worth noting that the aggregate rankings are really more an amusement than they are a reliable source of decision making (especially now that the comparison pages provide you with more hard data). They should inspire, and they should lead your line of inquiry, but certainly don’t take them as rock solid fact. Even if they captured and presented the data perfectly you’re still just looking at a bunch of answers to subjective question from a bunch of randoms on the internet. Fun and interesting, but not necessarily representative of The Truth.

This entry was posted in programming on by .

New, more principled, hammers

So, do you remember Hammer Principle? That thing Mike and I did about two years ago?

Well, it’s still around. It even gets a fair bit of traffic. It averages about 100 unique visitors a day, which isn’t nothing, and every now and then reddit or hacker news remember it exists and the world briefly descends on it.

Which makes it pretty embarrassing how much we’ve been neglecting it. It’s largely just been trucking along, doing its thing, and we’ve been cheerfully ignoring it. It had a few bugs, but it basically worked and people liked it so we weren’t that worried.

We briefly tried to do a relaunch last year when the one year anniversary came up, but we got trapped into a “REDESIGN ALL THE THINGS” plan and it ended up floundering. Too much work to get back to where we started.

But not this time! Mike has been getting very excited about The Lean Startup recently, so when we got interested in doing some more work on Hammer Principle we decided to do it that way. We’ve not been terribly strict about it, but we’ve started measuring some things and making little tweaks in response to that.

To be honest the main benefit for me has been the instant feedback. Not even from the measuring so much (though it’s surprisingly addictive), just the fact that we’re iterating in small changes and features. It’s nice to see rapid progress on it again, and it motivates us to do more and to make yet more rapid progress.

Here are some of the things we’ve done:

  • Tweaked the URL schema. Not exciting for you, but google analytics and subdomain based grouping were not friends which made it hard to tell what’s going on (all old URLs should now redirect to new ones. Tell me if they don’t or you find broken links we’ve missed
  • New navigation bar between nails. It was a little hard to tell due to the aforementioned issues with old URL schema, but as far as we could tell basically no one navigated between nails. If you came for the programming languages you were unlikely to stay for the databases. This is an attempt to fix that, and it seems to be helping
  • Redesigned the colour scheme. The old one was a bit garish – it kinda made sense at the time, but as we added different colours for different nails it made much less
  • Added random assertions to the home page, giving people examples of the sort of fact the system throws out
  • Complete redesign of the comparison pages. More on this in a moment

The comparison page is the one that excites me the most. Partly because it’s the biggest change, partly because I thought the old ones were a bit shit, but mostly because I think the new ones are really quite cool.

The old comparison pages were basically a list: The top 10 statements people thought each item was better at than another. This lead to a lot of conclusion because they looked like they were suggesting these statements were empirically good for that language when that often wasn’t the case (when you’re comparing assembler and C for “this language would be good for a web project” you know you’re in trouble). It also wasn’t very informative.

The new ones however give you much more information about the breakdown between the two sides, and with less of an implication that either are good for it. It also gives you more data about what’s going on in a reasonably easy to digest way. I think the overall result is a much clearer picture of the differences between the things you’re comparing. Try comparing Ruby and Python, Git and Mercurial or Judo and Aikido.

Hopefully this is just the start. We’re building up a list of things to try and experiments to run and the result should be a variety of changes coming in the near future.

This entry was posted in Uncategorized on by .