Author Archives: david

A heuristic for problem solving

This is a heuristic for how to choose what steps to take during a debugging or problem solving process. I think it is too labourious to calculate in practice, but it might be interesting for either machine learning or for some sort of interactive problem solving system which talks you through the steps (indeed my friend Dave and I have talked about building such a thing. We’ve tentatively settled on the name Roboduck).

I also suspect I’m reinventing somebody’s wheel here, but I’m not sure whose. If nothing else it pretty strongly resembles some of the maths you would use for building decision trees.

It started after I read John Regehr’s how to debug. In it he illustrates the debugging process with what are basically probability distributions over likely causes of bugs.

One particularly interesting paragraph is about how you choose what experiments to perform in order to eliminate hypotheses:

Hopefully the pattern is now clear: based on our intuition about where the bug lies, we design an experiment that will reject roughly 50% of the possible bugs. Of course I’m oversimplifying a bit. First, experiments don’t always have yes/no results. Second, it may be preferable to run an easy experiment that rejects 15% of the possible bugs rather than a difficult one that rejects exactly 50% of them. There’s no wrong way to do this as long as we rapidly home in on the bug.

It occurred to me that, although there was no wrong way to do it, clearly some ways of doing it were better than others, and we can actually make this empirically precise.

Basically, when you have your probability distribution about what you think the state of the world is, it is possible to assign a number that almost literally means “How confused am I?” (actually it means “How much additional information do I expect to need to figure out what’s going on if I proceed optimally?”). The Shannon entropy.

Your goal is to get the entropy of your current set of hypotheses down to 0 (I know what is going on).

This means that a fairly sensible way to proceed is to maximizes the rate of change of entropy (it isn’t necessarily correct because this is a purely local search which you might be able to leap out of by finding a better starting point, but it should be pretty good).

So, how do you choose which experiment to perform? You look at the expected decrease in entropy and divide it by the time it will take you to perform the experiment.

That’s pretty much it, but lets do some worked examples.

Suppose we have \(N\) equally likely possibilities, and an experiment will let us determine if it’s one of \(k\) of them. That is, if the experiment gives one answer we will have \(k\) equally likely possibilities, and if it gives the other we will have \(N – k\) equally likely possibilities. Let’s write \(p = \frac{k}{N}\) for convenience.

Our current entropy is \(\log(N)\) (all logs will be base 2).

We expect to get the first answer with probability \(p\), and the second answer with probability \(1 – p\).

For the first answer we have entropy \(\log(k)\). For the second we have entropy \(\log(N – k\).

So the expected entropy of the result is

\[
\begin{align*}
E(h) &= \log(N) – p \log(k) – (1 – p) \log(N – k) \\
&= p(\log(N) – \log(k)) + (1 – p)(\log(N) – \log(N – k)) \\
&= -p \log(p) – (1 – p) \log(1 – p)\\
\end{align*}
\]

Which it turns out is the negative of the entropy of the experimental result (that is, our experimental result is a random variable which takes one value with value \(p\) and the other with value \(1 – p\) and this is the entropy of such a variable).

This is a nice result which is unfortunately completely false in general. In order to see this, consider running an experiment where we flip a coin and observe the outcome: This experiment has entropy \(log(2)\), which is at least as high as the entropy of the preceding experiment, but it gives us precisely 0 information.

Lets continue with our example and use it to analyze the example in the quote:

Suppose we have an experiment that would let us eliminate 50% of cases. This gives us an expected information gain of \(1\). Now suppose we have an experiment that would let us eliminate 15% or 85% of cases. This has an information gain of \(-\log(0.15) * 0.15 + -\log(0.85) * 0.85 \approx 0.43\). So the former experiment is about \(2.37\) times as good as the latter experiment, and should be preferred unless you can perform the latter experiment at least \(2.37\) times as quickly.

This entry was posted in Numbers are hard, rambling nonsense on by .

A file that should exist in all your ruby projects

Alright, the title is maybe a little more general than is strictly valid. I’m making the assumption that if you’re writing ruby then you are, like me, the sort of trend bucking nonconformist that does exactly the same thing as all the other trend bucking non-conformists.

Specifically I’m assuming that if you are using ruby you’re also using bundler (you should be. It is the way and the truth and the light. There is no salvation save through bundler), and you are using git. If you’re not using the former you should fix that (did I mention you should fix that?). If you’re not using the latter then this might be worth reading anyway, but the specific file you’re going to need is different.

Anyway, at the root of your git repo you should have a file called “.gitattributes”. That file may contain various things, but the line it needs to contain is

Gemfile.lock -merge

What’s going on here?

Well, the Gemfile.lock is basically a compiled and pinned down version of your Gemfile. You’re supposed to commit it to your repo so as to get a consistent gem environment across all the different platforms you run on.

The problem comes when you’re working with other people, or even just on different branches, and you make changes to the Gemfile (or even just the Gemfile.lock) on each of those different branches. You might get away with it, but there’s a good chance that when you merge the branches it will silently just merge your Gemfile.lock files. This is because the lock file is a text format so git assumes it’s safe to merge.

Sometimes this will cause you no problems, or will cause you problems that you notice very quickly. The problem is that often it will produce a Gemfile.lock that confuses bundler into working in some cases and much later down the line you will get really confusing bundler errors when you try to use it in a slightly different context (we’ve e.g. found that if you have two incompatible versions of a gem in your Gemfile.lock it can cause confusingly different results depending on what’s installed on your system).

So what does this .gitattributes do? Simple: It tells git never to merge your Gemfile.locks. For merge purposes it treats them as binary files and will generate a conflict at this point, thus localising the error to where the problem occurred instead of some distant point down the line.

This entry was posted in programming on by .

What are developers bad at?

I’m currently trying to put together a post about different modes of thinking that I think developers are not very good at and traditionally leave to other roles to prop them up on, but I’m not having much luck. It occurred to me that a large part of the problem here was that I am, in fact, a developer so most of my perspective of this is from the inside looking out. I’d like to fix that.

As I’ve previously expressed I think having a good team who you can rely on to support you in areas you’re bad at is a wonderful thing, but I also think that relying on them too much and not meeting them half way will create unnecessary blockages and inefficiencies in your work flow – you’ll have to wait for someone else to do something you could have done in 5 minutes really, and so it will instead take several hours and may be indefinitely delayed.

When I’d previously written about this I was thinking in terms of different types of development, but it now occurs to me that exactly the same problem occurs when developers work with non-developers.

So, people who work closely with developers: What are we bad at that you’d like us to meet you half way on? What would it make your life easier if we all got a little bit better at? Can you offer concrete advice on how to do so?

This entry was posted in programming, Work on by .

So I accidentally designed a voting system

…and I have no idea what its properties are and whether it’s a good idea.

I was wondering how you might extend random ballot to multi-winner elections. Specifically the case where you have C candidates and you want to elect N of them. The following is the system I hit upon:

Voters provide a list of candidates they would like elected, in order of preference for how strongly they would like them to be elected. They do not have to rank all the candidates, though strategically I think it’s advantageous to rank the full candidate list.

We now run the following algorithm:

While we have elected fewer than N candidates, iterate the following procedure:

  1. Remove all votes including only candidates that have already been elected.
  2. If we have no votes left, something has gone terribly wrong and we can’t actually elect enough candidates.
  3. Now pick a random voter.
  4. Add the voter’s highest ranked preference who has not yet been elected to the list of elected candidates.

Note that the sampling is with replacement: We do not remove a voter when we pick their vote.

I currently have no idea what the properties of this procedure are, but I think it might be interesting. My intuition is that it’s probably strategy proof in the sense that you have no incentive to vote other than your true preference (if so, the sampling with replacement is crucial for this – otherwise you might rank a less preferred but also less widely popular candidate first), but I haven’t worked through the details so there’s a high likelihood of this being wrong.

This entry was posted in voting on by .

Different ways of defining topologies

This is almost certainly not new, but it’s something I’ve been thinking about recently and haven’t seen before (it’s pretty close to a treatment in terms of closure operators, which I have seen before but not as a pedagogical approach).

Normally a topology is defined as a family of open sets. This is unfortunate, as open sets are in many ways the least intuitive topological object when you first encounter them. I was thinking in terms of what a more intuitive way of defining topology would be and I hit upon the following approach.

Given a set X, a topology is a binary relationship between points in X and subsets of X. We will call this relation “touches”.

The intuitive idea is that x touches A if A contains points “arbitrarily close to” x, a term which we will leave undefined for the moment and which is only intended to be an intuitive justification. For example, if you consider the open interval \((0, 1) \subseteq \mathbb{R}\), this touches the points 0 and 1 despite not containing them, but it does not touch the point 2.

We require this relationship to obey the following axioms:

  1. Nothing touches the empty set
  2. If \(x \in A\) then \(x\) touches \(A\)
  3. If \(x\) touches \(A\) and \(A \subseteq B\) then \(x\) touches \(B\)
  4. If x touches \(A \cup B\) then \(x\) touches \(A\) or \(B\)
  5. If every point \(x \in B\) touches \(A\) and \(y\) touches \(B\) then \(y\) touches \(A\)

The last two are hopefully the only ones that aren’t immediately obvious. I don’t really have a good justification for 4 at the moment, but hopefully it’s not too unreasonable. The reasoning behind 5 is to exclude things where you can “make the set much larger” by adding in nearby points.

For example, you could consider \(X = \mathbb{N}\), and that \(x\) touches \(A\) if it’s within distance 1 of some element of \(A\). Then by repeatedly adding in points you would eventually fill up the whole set. This is something we want to avoid.

A topological space is then a set X together with this touches relation. We will tend to abuse notation by referring to the set as identical with the set and letting the topology be assumed.

I’m now going to show how this definition relates to the normal ones.

Let \(X\) be a topological space defined as above. Say that a set \(A \subseteq X\) is closed if \(x \in A\) whenever \(x\) touches \(A\)

We’ll prove the following properties of closed sets:

Proposition: \(\emptyset\) and \(X\) are both closed.

This is true almost by definition. \(X\) certainly contains every point in the space, so contains every point it touches. No points touch the empty set and thus every point it touches is contained in it.

Proposition: If \(\{ A_i : i \in \mathcal{I} \}\) are all closed, then so is \(\bigcap_{i \in \mathcal{I}} A_i \)

Proof: Suppose \(x\) touches \(\bigcap_{i \in \mathcal{I}} A_i \). For any \(i\) this set is a subset of \(A_i\), and thus by axiom 3, \(x\) also touches \(A_i\). Thus \(\forall i. x \in A_i \). Therefore \(x \in \bigcap_{i \in \mathcal{I}} A_i \).

Proposition: If \(A, B\) are closed then so is \(A \cup B\).

Proof: If \(x\) touches \(A \cup B\) then by axiom 4 it touches at least one of \(A\) or \(B\) and thus because they are closed is contained in at least one of \(A\) or \(B\). Therefore \(x \in A \cup B\).

So our closed sets satisfy all the usual conditions for closed sets of a normally defined topological space, thus their complements satisfy all the axioms for open sets in a topological space.

Now how do we recover the “touches” relationship from a normal topological space?

Well if we define the closure of a set as normal as the smallest closed set containing it, i.e.

\[ cl(A) = \bigcap \{ B \supseteq A : B \text{ is closed } \} \]

Then we can define a touches relationship as \(x\) touches \(A\) if \(x \in cl(A)\).

Does this satisfy all our axioms?

First we must prove the following properties of closure operators:

  1. \(A = cl(A)\) iff \(A\) is closed
  2. \(A \subseteq cl(A) \)
  3. \(A \subseteq B \) implies \(cl(A) \subseteq cl(B)\)
  4. \(cl(A \cup B = cl(A) \cup cl(B)\)
  5. \(cl(cl(A)) = cl(A)\)

The first three properties are pretty trivial consequences of the definitions and the properties of closed sets.

To prove 4: First note that \(cl(A) \cup cl(B)\) is a closed set containing \(A \cup B\), so certainly \(cl(A \cup B) \subseteq cl(A) \cup cl(B)\). Further because \(A, B \subseteq A \cup B\) we have \(cl(A), cl(B), \subseteq cl(A \cup B)\), so \(cl(A) \cup cl(B) \subseteq cl(A \cup B)\). So we’ve proved the inclusion both ways and the two are equal.

To prove 5: This is a simple consequence of the fact that the closure of a set is closed.

We can now show that the touches relationship we’ve defined obeys all the properties we asked for.

  1. \(\emptyset\) is closed, so \(cl(\emptyset) = \emptyset\) and no points touch it.
  2. \(A \subseteq cl(A)\), so every point in \(A\) touches \(A\)
  3. \(cl(A) \subseteq cl(B)\) so if \(x \in cl(A)\) then \(x \in cl(B)\)
  4. If \(x \in cl(A \cup B\)\) then \(x \in cl(A) \cup cl(B)\) so \(x \in cl(A)\) or \(x \in cl(B)\) so \(x\) touches \(A\) or \(B\)
  5. If every \(x \in B \) touches \(A\) then \(B \subseteq cl
    (A)\) so \(cl(B) \subseteq cl(cl(A)) = cl(A)\). So if \(y\) touches \(B\) then \(y \in cl(B)\) and so \(y\) touches \(A\)

So we get a proper touches relation back as we wanted.

Now the next question is: Do these define the same thing? If we start from a touches relation and go to a set of closed sets then back again, do we get the same relation back? And vice versa?

The answer is still yes. Time for more definition chasing!

We’ve already seen that a set is closed iff it is equal to its closure, and that we can construct a touches relation from the closure. So if we show that the touches relation also uniquely defines the closure operator then we’re done.

So for this we need a theorem:

Given the closed sets constructed from our touches operator, \(cl(A) = \{ x : x \text{ touches } A \} \)

Proof:

This is where we will finally have to use our fifth axiom for touching (we haven’t so far). In order to see that it’s necessary, consider our example with \(\mathbb{N}\) and touching meaning “within distance 1”. Then in this case the closed sets we will generate will be only the sets \(\emptyset\) and \(\mathbb{N}\), so the touches relationship we will get back is that every point touches every set.

Now, suppose \(x\) touches \(A\). Then it touches every closed set containing \(A\), and thus it touches the closure which is the intersection of those sets. So \(cl(A) \supseteq \{ x : x \text{ touches } A \} \). We will now show the right hand side is closed, and thus contains the closure and we will be done.

But this is almost immediate from the 5th axiom: Suppose \(y\) touches \( \{ x : x \text{ touches } A \}\). Then certainly every element of this set touches \(A\) (by definition). Therefore \(y\) touches \(A\). Therefore \(y \in \{ x : x \text{ touches } A \}\). So the set contains every element that touches it, and the result is proved.

So this means that \(x\) touches \(A\) if and only if \(x \in cl(A)\) with the closure operator we’ve defined, and for every closure operator we can define a touches relationship which satisfies this. So we’re done.

We can also nicely define continuity in terms of the touches relationship.

Given two topological spaces \(X\) and \(Y\), we can then define a function \(f : X \to Y \) to be continuous if whenever \(x \in X, A \subseteq X\) with \(x\) touching \(A\), we have \(f(x)\) touching \(f(A)\). i.e. a function is continuous if it doesn’t pull points away from sets.

How does this connect to the normal definition of continuity?

First, if \(f\) is continuous under this definition, let \(A \subseteq Y\) be closed. Then let \(x\) be close to \(f^{-1}(A)\). Then we must have \(f(x)\) close to \(f(f^{-1}(A)\) \subseteq A\). So \(f(x)\) is close to \(A\) and thus in \(A\) (because it is closed), so \(x \in f^{-1}(A)\).

Therefore if \(f\) is continuous under this definition then whenever \(A\) is closed then \(f^{-1}(A)\) is also closed.

Now to show the converse. Suppose that whenever \(A\) is closed then \(f^{-1}(A)\) is also closed, we will show \(f\) is continuous.

Suppose that \(x\) is close to \(A\) but \(f(x)\) is not close to \(f(A)\). Let \(B = cl(f(A))\). Then \(f(x) \not\in B\). But \(f^{-1}(B)\) is a closed set, and it contains \(A\). Therefore it must contain \(cl(A)\). But \(cl(A) \ni x\), so we must have \(x \in f^{-1}(B)\), contradicting our hypothesis. QED

This entry was posted in Numbers are hard on by .