Category Archives: Numbers are hard

A heuristic for problem solving

This is a heuristic for how to choose what steps to take during a debugging or problem solving process. I think it is too labourious to calculate in practice, but it might be interesting for either machine learning or for some sort of interactive problem solving system which talks you through the steps (indeed my friend Dave and I have talked about building such a thing. We’ve tentatively settled on the name Roboduck).

I also suspect I’m reinventing somebody’s wheel here, but I’m not sure whose. If nothing else it pretty strongly resembles some of the maths you would use for building decision trees.

It started after I read John Regehr’s how to debug. In it he illustrates the debugging process with what are basically probability distributions over likely causes of bugs.

One particularly interesting paragraph is about how you choose what experiments to perform in order to eliminate hypotheses:

Hopefully the pattern is now clear: based on our intuition about where the bug lies, we design an experiment that will reject roughly 50% of the possible bugs. Of course I’m oversimplifying a bit. First, experiments don’t always have yes/no results. Second, it may be preferable to run an easy experiment that rejects 15% of the possible bugs rather than a difficult one that rejects exactly 50% of them. There’s no wrong way to do this as long as we rapidly home in on the bug.

It occurred to me that, although there was no wrong way to do it, clearly some ways of doing it were better than others, and we can actually make this empirically precise.

Basically, when you have your probability distribution about what you think the state of the world is, it is possible to assign a number that almost literally means “How confused am I?” (actually it means “How much additional information do I expect to need to figure out what’s going on if I proceed optimally?”). The Shannon entropy.

Your goal is to get the entropy of your current set of hypotheses down to 0 (I know what is going on).

This means that a fairly sensible way to proceed is to maximizes the rate of change of entropy (it isn’t necessarily correct because this is a purely local search which you might be able to leap out of by finding a better starting point, but it should be pretty good).

So, how do you choose which experiment to perform? You look at the expected decrease in entropy and divide it by the time it will take you to perform the experiment.

That’s pretty much it, but lets do some worked examples.

Suppose we have \(N\) equally likely possibilities, and an experiment will let us determine if it’s one of \(k\) of them. That is, if the experiment gives one answer we will have \(k\) equally likely possibilities, and if it gives the other we will have \(N – k\) equally likely possibilities. Let’s write \(p = \frac{k}{N}\) for convenience.

Our current entropy is \(\log(N)\) (all logs will be base 2).

We expect to get the first answer with probability \(p\), and the second answer with probability \(1 – p\).

For the first answer we have entropy \(\log(k)\). For the second we have entropy \(\log(N – k\).

So the expected entropy of the result is

\[
\begin{align*}
E(h) &= \log(N) – p \log(k) – (1 – p) \log(N – k) \\
&= p(\log(N) – \log(k)) + (1 – p)(\log(N) – \log(N – k)) \\
&= -p \log(p) – (1 – p) \log(1 – p)\\
\end{align*}
\]

Which it turns out is the negative of the entropy of the experimental result (that is, our experimental result is a random variable which takes one value with value \(p\) and the other with value \(1 – p\) and this is the entropy of such a variable).

This is a nice result which is unfortunately completely false in general. In order to see this, consider running an experiment where we flip a coin and observe the outcome: This experiment has entropy \(log(2)\), which is at least as high as the entropy of the preceding experiment, but it gives us precisely 0 information.

Lets continue with our example and use it to analyze the example in the quote:

Suppose we have an experiment that would let us eliminate 50% of cases. This gives us an expected information gain of \(1\). Now suppose we have an experiment that would let us eliminate 15% or 85% of cases. This has an information gain of \(-\log(0.15) * 0.15 + -\log(0.85) * 0.85 \approx 0.43\). So the former experiment is about \(2.37\) times as good as the latter experiment, and should be preferred unless you can perform the latter experiment at least \(2.37\) times as quickly.

This entry was posted in Numbers are hard, rambling nonsense on by .

Different ways of defining topologies

This is almost certainly not new, but it’s something I’ve been thinking about recently and haven’t seen before (it’s pretty close to a treatment in terms of closure operators, which I have seen before but not as a pedagogical approach).

Normally a topology is defined as a family of open sets. This is unfortunate, as open sets are in many ways the least intuitive topological object when you first encounter them. I was thinking in terms of what a more intuitive way of defining topology would be and I hit upon the following approach.

Given a set X, a topology is a binary relationship between points in X and subsets of X. We will call this relation “touches”.

The intuitive idea is that x touches A if A contains points “arbitrarily close to” x, a term which we will leave undefined for the moment and which is only intended to be an intuitive justification. For example, if you consider the open interval \((0, 1) \subseteq \mathbb{R}\), this touches the points 0 and 1 despite not containing them, but it does not touch the point 2.

We require this relationship to obey the following axioms:

  1. Nothing touches the empty set
  2. If \(x \in A\) then \(x\) touches \(A\)
  3. If \(x\) touches \(A\) and \(A \subseteq B\) then \(x\) touches \(B\)
  4. If x touches \(A \cup B\) then \(x\) touches \(A\) or \(B\)
  5. If every point \(x \in B\) touches \(A\) and \(y\) touches \(B\) then \(y\) touches \(A\)

The last two are hopefully the only ones that aren’t immediately obvious. I don’t really have a good justification for 4 at the moment, but hopefully it’s not too unreasonable. The reasoning behind 5 is to exclude things where you can “make the set much larger” by adding in nearby points.

For example, you could consider \(X = \mathbb{N}\), and that \(x\) touches \(A\) if it’s within distance 1 of some element of \(A\). Then by repeatedly adding in points you would eventually fill up the whole set. This is something we want to avoid.

A topological space is then a set X together with this touches relation. We will tend to abuse notation by referring to the set as identical with the set and letting the topology be assumed.

I’m now going to show how this definition relates to the normal ones.

Let \(X\) be a topological space defined as above. Say that a set \(A \subseteq X\) is closed if \(x \in A\) whenever \(x\) touches \(A\)

We’ll prove the following properties of closed sets:

Proposition: \(\emptyset\) and \(X\) are both closed.

This is true almost by definition. \(X\) certainly contains every point in the space, so contains every point it touches. No points touch the empty set and thus every point it touches is contained in it.

Proposition: If \(\{ A_i : i \in \mathcal{I} \}\) are all closed, then so is \(\bigcap_{i \in \mathcal{I}} A_i \)

Proof: Suppose \(x\) touches \(\bigcap_{i \in \mathcal{I}} A_i \). For any \(i\) this set is a subset of \(A_i\), and thus by axiom 3, \(x\) also touches \(A_i\). Thus \(\forall i. x \in A_i \). Therefore \(x \in \bigcap_{i \in \mathcal{I}} A_i \).

Proposition: If \(A, B\) are closed then so is \(A \cup B\).

Proof: If \(x\) touches \(A \cup B\) then by axiom 4 it touches at least one of \(A\) or \(B\) and thus because they are closed is contained in at least one of \(A\) or \(B\). Therefore \(x \in A \cup B\).

So our closed sets satisfy all the usual conditions for closed sets of a normally defined topological space, thus their complements satisfy all the axioms for open sets in a topological space.

Now how do we recover the “touches” relationship from a normal topological space?

Well if we define the closure of a set as normal as the smallest closed set containing it, i.e.

\[ cl(A) = \bigcap \{ B \supseteq A : B \text{ is closed } \} \]

Then we can define a touches relationship as \(x\) touches \(A\) if \(x \in cl(A)\).

Does this satisfy all our axioms?

First we must prove the following properties of closure operators:

  1. \(A = cl(A)\) iff \(A\) is closed
  2. \(A \subseteq cl(A) \)
  3. \(A \subseteq B \) implies \(cl(A) \subseteq cl(B)\)
  4. \(cl(A \cup B = cl(A) \cup cl(B)\)
  5. \(cl(cl(A)) = cl(A)\)

The first three properties are pretty trivial consequences of the definitions and the properties of closed sets.

To prove 4: First note that \(cl(A) \cup cl(B)\) is a closed set containing \(A \cup B\), so certainly \(cl(A \cup B) \subseteq cl(A) \cup cl(B)\). Further because \(A, B \subseteq A \cup B\) we have \(cl(A), cl(B), \subseteq cl(A \cup B)\), so \(cl(A) \cup cl(B) \subseteq cl(A \cup B)\). So we’ve proved the inclusion both ways and the two are equal.

To prove 5: This is a simple consequence of the fact that the closure of a set is closed.

We can now show that the touches relationship we’ve defined obeys all the properties we asked for.

  1. \(\emptyset\) is closed, so \(cl(\emptyset) = \emptyset\) and no points touch it.
  2. \(A \subseteq cl(A)\), so every point in \(A\) touches \(A\)
  3. \(cl(A) \subseteq cl(B)\) so if \(x \in cl(A)\) then \(x \in cl(B)\)
  4. If \(x \in cl(A \cup B\)\) then \(x \in cl(A) \cup cl(B)\) so \(x \in cl(A)\) or \(x \in cl(B)\) so \(x\) touches \(A\) or \(B\)
  5. If every \(x \in B \) touches \(A\) then \(B \subseteq cl
    (A)\) so \(cl(B) \subseteq cl(cl(A)) = cl(A)\). So if \(y\) touches \(B\) then \(y \in cl(B)\) and so \(y\) touches \(A\)

So we get a proper touches relation back as we wanted.

Now the next question is: Do these define the same thing? If we start from a touches relation and go to a set of closed sets then back again, do we get the same relation back? And vice versa?

The answer is still yes. Time for more definition chasing!

We’ve already seen that a set is closed iff it is equal to its closure, and that we can construct a touches relation from the closure. So if we show that the touches relation also uniquely defines the closure operator then we’re done.

So for this we need a theorem:

Given the closed sets constructed from our touches operator, \(cl(A) = \{ x : x \text{ touches } A \} \)

Proof:

This is where we will finally have to use our fifth axiom for touching (we haven’t so far). In order to see that it’s necessary, consider our example with \(\mathbb{N}\) and touching meaning “within distance 1”. Then in this case the closed sets we will generate will be only the sets \(\emptyset\) and \(\mathbb{N}\), so the touches relationship we will get back is that every point touches every set.

Now, suppose \(x\) touches \(A\). Then it touches every closed set containing \(A\), and thus it touches the closure which is the intersection of those sets. So \(cl(A) \supseteq \{ x : x \text{ touches } A \} \). We will now show the right hand side is closed, and thus contains the closure and we will be done.

But this is almost immediate from the 5th axiom: Suppose \(y\) touches \( \{ x : x \text{ touches } A \}\). Then certainly every element of this set touches \(A\) (by definition). Therefore \(y\) touches \(A\). Therefore \(y \in \{ x : x \text{ touches } A \}\). So the set contains every element that touches it, and the result is proved.

So this means that \(x\) touches \(A\) if and only if \(x \in cl(A)\) with the closure operator we’ve defined, and for every closure operator we can define a touches relationship which satisfies this. So we’re done.

We can also nicely define continuity in terms of the touches relationship.

Given two topological spaces \(X\) and \(Y\), we can then define a function \(f : X \to Y \) to be continuous if whenever \(x \in X, A \subseteq X\) with \(x\) touching \(A\), we have \(f(x)\) touching \(f(A)\). i.e. a function is continuous if it doesn’t pull points away from sets.

How does this connect to the normal definition of continuity?

First, if \(f\) is continuous under this definition, let \(A \subseteq Y\) be closed. Then let \(x\) be close to \(f^{-1}(A)\). Then we must have \(f(x)\) close to \(f(f^{-1}(A)\) \subseteq A\). So \(f(x)\) is close to \(A\) and thus in \(A\) (because it is closed), so \(x \in f^{-1}(A)\).

Therefore if \(f\) is continuous under this definition then whenever \(A\) is closed then \(f^{-1}(A)\) is also closed.

Now to show the converse. Suppose that whenever \(A\) is closed then \(f^{-1}(A)\) is also closed, we will show \(f\) is continuous.

Suppose that \(x\) is close to \(A\) but \(f(x)\) is not close to \(f(A)\). Let \(B = cl(f(A))\). Then \(f(x) \not\in B\). But \(f^{-1}(B)\) is a closed set, and it contains \(A\). Therefore it must contain \(cl(A)\). But \(cl(A) \ni x\), so we must have \(x \in f^{-1}(B)\), contradicting our hypothesis. QED

This entry was posted in Numbers are hard on by .

Constructing the negative numbers

Advance warning: This is a post about a standard foundational construction in mathematics – constructing the integers from the natural numbers. It’s partly a prelude to another post, and I plan to do the details in slightly more generality than is normally done, but if you’re interested in this subject it’s probably nothing new and if it’s something new you’re probably not interested in this subject. Also a lot of this is going to be spelled out in way more detail than you’re likely to care about.

Second warning: I’m pretty sure my lecturer when he did this proof for me back more than a decade ago in my first year of my maths degree added the caveat “And now you should forget about this proof and never do it again”. So I’m going against the health advisory of a trained mathematician. Some maths contained herein may be hazardous to your health.

This post is about the following theorem:

Let G be a set with operations + and * such that they obey all the axioms of a ring except for the existence of additive inverses, but have the additional property that + cancels in the sense that if x + z = y + z then x = y.

There is a ring R and an isomorphic embedding \(f : G \to R\). Further, there is a unique (up to isomorphism) minimal such G.

First, let’s prove uniqueness, as it will motivate the construction.

Lemma: Suppose R is minimal. Then every element of R can be written as \(f(x) – f(y)\) for \(x, y \in G\).

Proof: We need only show that the set of such elements form a ring, then the result will follow by minimality.

It suffices to show that it’s closed under the ring operations, but this is just simple algebra:

\[-(f(x) – f(y)) = f(y) – f(x)\] \[(f(x) – f(y)) + (f(u) – f(v)) = f(x + u) – f(y + v)\] \[(f(x) – f(y)) * (f(u) – f(v)) = f(x * u + y * v) – f(x * v + y * u)\]

So this set is a subring of R containing f(G), so it must be R due to minimality. QED

As an aside, note that it is not necessarily the case that every element of R is f(x) or -f(x) for some \(x \in G\). Consider for example \(G = \{ x \in \mathbb{N} : x > 1 \}\).

Now suppose we have another minimal isomorphic embedding \(f’ : G \to R’\). We want to construct an isomorphism \(h : R \to R’\) such that \(h \cdot f = f’\). This will prove uniqueness.

We construct h as follows:

Let \(r = f(x) – f(y)\). Define \(h(r) = f'(x) – f'(y)\).

First we must show this is well defined.

Specifically we will show that if \(f(x) – f(y) = f(u) – f(v)\) then \(f'(x) – f'(y) = f'(u) – f'(v)\). This turns out to just be basic manipulation of the fact that \(f\) and \(f’\) are both isomorphic embeddings:

\[
\begin{aligned}
f(x) – f(y) &= f(u) – f(v) \\
f(x) + f(v) &= f(u) + f(y) \\
f(x + v) &= f(u + y) \\
x + v &= u + y \\
f'(x + v) &= f'(u + y) \\
f'(x) + f'(v) &= f'(u) + f'(y) \\
f'(x) – f'(y) &= f'(u) – f'(v) \\
\end{aligned}
\]

So as a result, \(h\) is well defined. Additionally, you can construct \(h’ : R’ \to R\) in exactly the same manner, and it’s clear by definition that \(h\) and \(h’\) are inverses, so \(h\) is a bijection.

If we pick \(b \in G\) then

\[
\begin{aligned}
h(f(x)) &= h(f(x + b) – f(b)) \\
&= f'(x + b) – f'(b) \\
&= f'(x) \\
\end{aligned}
\]

So we’ve proven that \(h(f(x)) = f'(x)\) and thus \(h \cdot f = f’\).

We still need to show that \(h\) is an isomorphism, but this follows simply from our formulae for \(+\) and \(*\) on things of the form \(f(x) – f(y)\): Each operation you can convert into operations in the original set, pass to \(f’\) and then convert back.

QED

This uniqueness proof nicely informs the construction too. Every element in our target ring is going to be the difference of two elements in our set. Therefore we construct the target ring as the set of differences. But of course two differences might result in the same element, so we define an equivalence relation for when this should happen and quotient out by that.

So we will now construct our ring:

Let \(R = G^2 / \sim\), where \((x,y) \sim (u, v)\) if \(x + v = y + u\) (i.e. if \(x – y = u – v\)).

First we must show this is an equivalence relation. Reflexivity is obvious, symmetry follows from commutativity of addition. We only need to show transitivity.

We’ll need some notation: If \(a = (x, y)\), write \(x = a^+\) and \(y = a^-\).

(This will give us slightly fewer variables to keep track of).

Suppose \(a \sim b\) and \(b \sim c\).

Then
\[\begin{aligned}
a^+ + b^- &= a^- + b^+\\
b^+ + c^- &= b^- + c^+\\
a^+ + b^- + b^+ + c^- &= a^- + b^+ + b^- + c^+\\
a^+ + c^- + (b^- + b^)&= a^- + c^+ + (b^- + b^)\\
a^+ + c^- &= a^- + c^+ \\
a &\sim c \\
\end{aligned}\]

(note we had to use the cancellation property for + here)

Fix \(b \in G\) and define \(f : G \to R\) as \(f(x) = [(x + b, b)] \). (We’re not guaranteed a 0 element in G, which is why we can’t just map it to \([(x, 0)]\)).

We will now prove that \(R\) is a ring and \(f\) an isomorphic embedding into it.

First note that f does not depend on the choice of b.

Proof: \((x + b, b) \sim (x + b’, b’)\) because \(x + b + b’ = x + b’ + b\).

Now note that f is injective: This is because of the cancellation property we required on addition: If \((x + b, b) \sim (y + b, b)\) then \(x + 2b = y + 2b\) and so by cancellation \(x = y\).

In order to show that it’s an isomorphic embedding we first need to construct a ring structure on R.

We’ll use our formulae from above. Define:

\[(x, y) + (u, v) = (x + u, y + v)\] \[(x,y) * (u, v) = (x * u + y * v, x * v + y * u)\]

We first need to show that these are compatible with the equivalence relation \(\sim\).

Now suppose \(a,b,c,d \in G^2\) with \(a \sim c\) and \(b \sim d\).

We first want to show that \(a + b \sim c + d\). i.e. that \((a + b)^+ + (c + d)^- = (a + b)^- + (c + d)^+\).

\[\begin{aligned}
(a + b)^+ + (c + d)^- &= a^+ + b^+ + c^- + d^-\\
&= (a^+ + c^-) + (b^+ + d^-)\\
&= (a^- + c^+) + (b^- + d^+)\\
&= (c^+ + d^+) + (a^- + b^-) \\
&= (c + d)^+ + (a + b)^- \\
&= (a + b)^- + (c + d)^+ \\
\end{aligned}\]

So \(+\) is compatible with the equivalence relation, and thus can be inherited by \(R\).

We now have to do the same with \(*\). To simplify the calculation we’ll show that \(a * b \sim a * d\). The proof that \(a * d \sim c * d\) will be identical, and the whole result will follow from transitivity.

So
\[\begin{aligned}
(a * b)^+ + (a * d)^- &= a^+ * b^+ + a^- * b^- + a^+ * d^- + a^- * d^+ \\
&= a^+ * ( b^+ + d^-) + a^- * (b^- + d^+) \\
&= a^+ * ( b^- + d^+) + a^- * (b^+ + d^-) \\
&= a^+ * b^- + a^- * b^+ + a^+ * d^+ + a^- * d^- \\
&= (a * b)^- + (a * d)^+ \\
a * b &\sim a * d \\
\end{aligned}\]

So the operations are well defined on the equivalence classes.

Now all we have to do is show that they satisfy the ring operations. Ideally without losing the will to live.

It’s obvious by construction that + is commutative and associative (because it is on \(G\)).

The equivalence class \(0 = [(x, x)]\) is a unit for + (it should be evident that every choice of x produces an equivalent element because \(x + y = x + y\)).

Proof: Let \(a \in G^2\) and \(b = (x, x)\).

\[\begin{aligned}
(a + b)^+ + a^- &= a^+ + b^+ + a^- \\
&= a^+ + b^- + a^- \\
&= a^+ + (a + b)^- \\
a + b &\sim a \\
\end{aligned}\]

Negations exist because \(a + (a^-, a^+) = (a^+ + a^-, a^+ + a^-) \sim 0\).

So now we just have to prove that * is associative and distributes over +.

Associative:

\[\begin{aligned}
((a * b) * c)^+ &= (a * b)^+ * c^+ + (a * b)^- * c^- \\
&= a^+ * b^+ * c^+ + a^- * b^- * c^+ + a^- * b^+* c^- + a^+ * b^- * c^-\\
(a * (b * c))^+ &= a^+ * (b * c)^+ + a^- * (b * c)^- \\
&= a^+ * (b * c)^+ + a^- * (b * c)^- \\
&= a^+ * b^+ * c^+ + a^- * b^- * c^+ + a^- * b^+ * c^- + a^- * b^- * c^+ \\
&= ((a * b) * c)^+ \\
\end{aligned}\]

At this point I’m prepared to accept on faith that the negative half works out the same way. Are you? Good.

Distributive (just going to show left distributive. Right should follow similarly):

\[\begin{aligned}
(a * (b + c))^+ &= a^+ * (b + c)^+ + a^- * (b + c)^- \\
&= a^+ * b^+ + a^+ * c^+ + a^- * b^- + a^- * c^- \\
&= (a^+ * b^+ + a^- * b^-) + (a^+ * c^+ + a^- * c^- )) \\
& = (a * b)^+ + (a * c)^+ \\
(a * (b + c))^- &= a^+ * (b + c)^- + a^- * (b + c)^+ \\
&= a^+ * b^- + c^- + a^- * b^+ + a^- * c^+ \\
\end{aligned}\]

Which means we’re done. Phew. That was a lot of work.

Unfortunately, now on to the next lot of work!

Several things more to prove:

Theorem:

If G has a multiplicative unit, which we’ll call 1, then \(f(1)\) is a multiplicative unit for \(R\).

Proof:

\[\begin{aligned}
(x, y) * (1 + b, b) &= (x + x * b + y * b, x * b + y + y * b) \\
&= (x + (x * b + y * b), y + (x * b + y * b))\\
&\sim (x, y) \\
\end{aligned}\]

(Using the fact that (\(x + c, y + c) \sim (x, y) \)).

Theorem:

If \(*\) is commutative on \(G\) then it is commutative on \(R\).

Proof: Follows almost directly from how \(*\) on \(R\) is defined in terms of \(*\) on \(G\).

Ok. Now we’re done.

We apply this construction to \(\mathbb{N}\), and we get \(\mathbb{Z}\): The minimal commutative ring with a 1 that N embeds isomorphically into.

Time to relax.

This entry was posted in Numbers are hard on by .

A non-transitive tournament

I have a reason for thinking about this, but don’t want to get in to the philosophy behind it, so let this post stand on its own as just a neat thing.

Consider the following game: You have a tournament of players. Each player is a rock, a paper or a scissors. Victory is determined in the usual way, with a coin flip or some other arbitrary decision between players of the same type (we’re only going to consider tallies of the number of players of each type, so it doesn’t matter who wins).

A tournament is played as follows: Pick two players at random. They fight. The loser drops out. The tournament is one when only one player is left.

Now, consider a tournament with one scissor and many papers but no rocks. The scissors is a guaranteed winner. Now add a rock. What happens? The scissors still probably wins – the paper is likely to kill the rock before the rock kills the scissors. But what happens as you keep adding rocks?

When you add the first rock, obviously it introduces the possibility that paper can win: The rock can kill the scissors, at which point the remaining paper obliterates the rock. As you add more rocks this becomes more likely to happen and scissor’s chance of winning drops.

There turns out to be a nice formula for the probability of scissors winning in this case. If we write \(S(r,p,s)\) as the probability of scissors winning with r rocks, papers and s scissors then \[S(r, p, 1) = \frac{p(p+1)}{(p + r)(p + r + 1)}\]

(Unfortunately I don’t have a nice proof of this. I arrived at it empirically with the aid of wolfram alpha, then proved it inductively using the recurrence on the probabilities)

So as you can see, the probability of victory for scissors monotonically decreases towards zero as \(O(\frac{1}{r^2})\).

So scissors is likely to lose after a few rocks have been added (the probability drops to half or less when \(r \geq \frac{1}{2}\left(\sqrt{8 p^2 + 8p + 1} – 2p – 1\right)\). Now you know), but what is the probability of a rock victory?

Not so good it turns out. As far as I can tell there’s no nice general formula for rock and paper victory probabilities, but I’ve done some calculations for simple cases and I have some code for solving the general case and I’m sorry to report that the chances of success for rock are looking pretty grim.

The problem intuitively is that with a lot of rocks you’re likely to kill off the scissors before it manages to kill off all the paper. Empirically, the best number of rocks appears to be \(\lfloor p / 2 \rfloor + 1\), but even then the probability is pretty low.

Here are some graphs of the probabilities of rock victories:

With 1 paper:

With 2 paper:

With 10 paper:

Empirically it seems to be the case that the single paper scenario is the only one where rock is ever tied for victory probability, let alone winning, and is also the only case where it ever is more likely to win than scissors.

That’s about it in terms of what I can prove or demonstrate about this problem. I’m interested in it, but it seems harder to get a handle on than I’d expect (or I’m worse at this sort of maths than I think I should be. Probably both). As a parting shot, I’ll leave you with some python code to calculate the probabilities:

class ProbabilityTable:
  def __init__(self):
    self.table = {}
 
  def S(self, rock, paper, scissors):
    if scissors == 0: return 0.0
    if rock == 0:     return 1.0
    if paper == 0:    return 0.0
    key = (rock, paper, scissors)
    if key in self.table:
      return self.table[key]
    total = rock + paper + scissors
    result = (self.S(rock - 1, paper, scissors) * rock * (rock + 2 * paper - 1) + 
              self.S(rock, paper - 1, scissors) * paper * (paper + 2 * scissors - 1) + 
              self.S(rock, paper, scissors - 1) * scissors * (scissors + 2 * rock- 1)) / (total * (total - 1))
    self.table[key] = result
    return result
 
  def P(self, rock, paper, scissors):
    return self.S(scissors, rock, paper)
 
  def R(self, rock, paper, scissors):
    return self.S(paper, scissors, rock)

Note: The dynamic programming is Important here. It takes the run time from \(O(3^{r + p + s})\) to \(O(r + p + s)\).

Let me know if you figure out anything about this problem that I haven’t spotted or didn’t prove.

This entry was posted in Numbers are hard on by .

Sieving out prime factorizations

For a project Euler problem I had an algorithm the first step of which was to find the prime factorization of the number you fed it. This was by far the slowest part of the operation (mostly because the rest of it was pretty heavily cached to a much smaller number of solutions).

I was running this function on the numbers from 1 to 10^8, and it occurred to me that there must be a simpler way to precompute the prime factorizations of all of these without doing the full factorization for each. After some thought I came up with the following solution. It’s in some sense the most obvious thing you can possibly do (at least given a basic knowledge of some of the algorithms around this), but I’d not seen it before and knowing it exists might save you some time in similar scenarios, so I thought I’d post it here.

It’s essentially a modified Sieve of Eratosthenes. The difference is what you store in the sieve and what you do in the crossing off operation. There’s also an additional step.

The idea is this: We maintain a list of prime factors found so far for each number.

We then loop over the numbers in order from 2 upwards. If when we reach a number its list of prime factors is empty then it’s a prime, and we start marking off as follows:

Call the current number p. First we mark off all multiples of p and add p to their list. Then we mark off all multiples of p^2 and add p again. Then p^3, etc. until p^k > n.

Then we’re done, and move on to the next number.

That’s really all there is to it.

Here’s some python code that implements it:

def prime_factorizations(n):
  sieve = [[] for x in xrange(0, n)]
  for i in xrange(2, n):
    if not sieve[i]:
      q = i
      while q < n:
        for r in xrange(q, n, q):
          sieve[r].append(i)
        q *= i
  return sieve

Here it is in action:

>>> prime_factorizations(15)
[[], [], [2], [3], [2, 2], [5], [2, 3], [7], [2, 2, 2], [3, 3], [2, 5], [11], [2, 2, 3], [13], [2, 7]]

Edit: In the comments, Rich points out that you can make this more space efficient by instead storing a single prime for each integer: Specifically, the smallest prime that divides it. You can then reconstruct the desired list by iterated division – if the integer stored is equal to the current index, it’s a prime. If not, it contains one of its prime factors. Then divide by that factor and recurse. This might be a bit slower but is a lot more space efficient. Here’s some more python demonstrating this:

def prime_factorizations(n):
  sieve = [0 for x in xrange(0, n)]
  for i in xrange(2, n):
    if not sieve[i]:
      sieve[i] = i
      for r in xrange(r, n, r):
        sieve[r] = sieve[r] or i
  return sieve
 
def factor_list_from_sieve(sieve, n):
  results = []
  while n > 1:
    p = sieve[n]
    results.append(p)
    if p == n: break
    n = n / p
  return results

The first method returns the described array of primes, the second reconstructs the prime factor list from it.

Example:

>>> sieve = prime_factorizations(1000)
>>> factor_list_from_sieve(sieve, 992)
[2, 2, 2, 2, 2, 31]
This entry was posted in Numbers are hard, programming on by .