Archive for the ‘Mathematics’ Category

A cute lower bound for weighted FAS tournaments

Saturday, December 10th, 2011

As mentioned, for my random hack project du jour I am currently working on an optimiser for weighted feedback arc sets.

This problem can largely be summarised as follows: I have a non-negative weight matrix \(A\) and I want to find a permutation \(\pi\) that maximizes the score $$w(\pi) = \sum_{\pi(i) < \pi(j)} A_{ij}$$

(It's more normal to phrase this in terms of minimizing the sum going the other way, but the problems are equivalent)

The thing I was wondering about recently: What sort of scores am I guaranteed to be able to achieve? There's the trivial lower bound that

$$w(\pi) \geq \sum_{i < j} \min(A_{ij}, A_{ji})$$

but, equally trivially, in the case where \(A\) is not a constant matrix this is easy to exceed. My question is: What sort of values are we guaranteed to be able to find a value of \(\pi\) that exceeds them?

I realized earlier that there's a really nice solution to this. Let \(\chi\) be a random variable that picks a permutation uniformly at random. Then we can write $$w(\chi) = \sum_{i < j} S_{ij}$$ where \(S_{ij}\) is a random variable that takes the value \(A_{ij}\) if \(\chi(i) < \chi(j)\) and else takes the value \(A_{ji}\). It's easy to see that \(S_{ij}\) takes its two values with equal probability, so $$E(S_{ij}) = \frac{1}{2}(A_{ij} + A_{ji})$$

But then we have $$E(w(\chi)) = E(\sum_{i < j} S_{ij}) = \sum_{i < j} E(S_{ij}) = \frac{1}{2} \sum A_{ij}$$

So given that the expected weight of a random permutation is \(\geq\) this value, it's certainly the case that there exists a permutation with weight \(\geq\) this value.

This is nice and all, but it turns out to be relatively easy to do better. Consider \(S_{ij}\) and \(S_{kl}\). You can prove by inspection of cases that these are independent of eachother (note that the set \(S_{ij}\) is not independent - even sets of three elements of it typically aren't. This only works for pairs). This is enough that the variances add. Therefore we have

$$Var(w(\chi)) = \sum Var(S_{ij})$$

It's a simple calculation to show that

$$Var(S_{ij}) = \frac{1}{4} (A_{ij} - A_{ji})^2$$

So

$$Var(w(\chi)) = \frac{1}{4} \sum_{i < j} (A_{ij} - A_{ji})^2$$

Why is this important? Well, we are guaranteed that we have some point which is at least the standard deviation away from the mean, and the distribution of \(w(\chi)\) is symmetric about its mean (you can see this by considering what happens under \(\pi \to \tau \cdot \pi\) where \(\tau\) is a reversal). Therefore there must be a point which is at least the mean plus the standard deviation. Therefore we can find a permutation \(\pi\) with

$$w(\pi) \geq \frac{1}{2}\left(\sum A_{ij} + \sqrt{\sum_{i < j} (A_{ij} - A_{ji})^2} \right)$$

Some closing notes:

  1. Yay for mathjax. This is my first post using it
  2. My optimizer pretty consistently exceeds this bound anyway (it would be sad if it were doing worse than moderately determined random sampling), but I’ve now added a first pass which shuffles until it does.
  3. I’m nearly certain this bound is not tight except for in the trivial case of the constant matrix. When plotting graphs of the distribution of \(w(\chi)\) the result has a distinctly bell curved nature (although obviously it’s actually discrete) which drops off smoothly rather than cutting off at the SD. However calculating more about the distribution of random sampling is hard, and I suspect may not be that useful, so for now I’m leaving it at this.
  4. I have no idea if this is a known result, but it’s pretty trivial to calculate once you have the basic idea, so it’s likely it’s one of those things that has been discovered several times but wasn’t regarded as interesting enough to publish. Yay for blogs

Generating random spheres

Thursday, November 10th, 2011

So when I was in the shower earlier and I wondered to myself “Hm. How would I generate random points on an N-sphere?” (true story).

I thought about it for a bit and generated the two obvious solutions. Unfortunately the two obvious solutions are, respectively, wrong and massively inefficient (actually the inefficient one isn’t too bad for low dimensions, but I was interested in the high dimensional case where it rapidly becomes massively inefficient). Once out of the shower I looked the answer up, and found the whole thing interesting enough that I figured someone else might too, so this is a post about that.

Obvious but wrong solution

Generate N random uniform numbers between -1 and 1. Divide the resulting vector by its norm.

Why is this wrong? Well, the initial N random numbers will lie uniformly in the N cube. So the probability density at a given point is proportional to the length of the line that goes from the zero, through that point and out to the N sphere. This line is not of uniform length for the very simple reason that cubes aren’t spheres.

Obvious but slow solution

Generate N uniform random numbers between -1 and 1. Retry this until the resulting norm is <= 1. Divide the resulting vector by its norm.

Why does this work? Because we're using a rejection method to make sure we get points inside the unit ball. The result is not uniformly distributed inside the unit ball but it is (evidently) spherically symmetrical. By dividing by the norm we’re then projecting the spherically symmetrical distribution onto the sphere and thus getting a uniform distribution on it.

Why is it slow? Because of the curse of dimensionality. When we do the rejection we’re effectively generating a geometric distribution with probability of success equal to the proportion of the N-cube the N-ball occupies. This means we expect to perform the generation 1 / that probability times, which tends towards infinity quite rapidly as N gets large.

Obvious in retrospect solution

At this point I looked up the answer, though I should have been able to figure it out.

What we want is a spherically symmetric distribution a la the previous solution but which is more efficient to calculate.

It turns out that a good choice for this is an N dimensional normal distribution, which can be obtained by simply generating N Normal(0, 1) random variables. You can then divide by the norm of those variables and get a distribution on the unit sphere.

This is apparently not the best solution (generating normal random numbers is a little expensive), but I couldn’t find a reference to a version which works in N dimensions for any N.

Here’s some C code implementing this: https://gist.github.com/1354700.

Rank aggregation basics: Local Kemeny optimisation

Friday, June 18th, 2010

The technical content of this post is based heavily on Rank Aggregation Revisited” by Ravi Kuma, Moni Naorz, D. Sivakumarx and Cynthia Dwork. It’s purely expository in nature on top of that.

You’ve all seen Hammer Principle now, right? If not, go check it out.

The task it performs of aggregating all the individual opinions into a single overall ranking is harder than it looks. This is a post about some of the technical details involved.

The setup is as follows: We have a bunch of items, and a bunch of votes placing a subset of those items in order.

The naive idea one starts with is as follows: If the majority of people prefer A to B, rank A higher than B. This is an obvious thing to aim for. Unfortunately it’s impossible, even if everyone ranks every item. Considering the following set of votes:

1: A, B, C
2: B, C, A
3: C, A, B

Then 1 and 3 think A < B, 1 and 2 think B < C, and 2 and 3 think C < A. So if we tried to order by majority opinion we'd have A < B < C < A.

This is called Condorcet's paradox: Majority voting amongst > 2 items is impossible.

Nevertheless, we hope to be able to do at least reasonably well.

One natural approach is called Kemeny optimisation: We try to find a solution which minimizes the number of pairwise disagreements between the end rank and the individual voters. That is, for some set of items I, votes V and an aggregate ranking r, the score is

K(r) = #{ i, j in I, v in V, (r(i) < r(j)) != (v(i) < v(j)) }

If r is a minimum for this score we say it's Kemeny optimal. There needn't be a unique Kemeny optimal solution: In the above example, all three of the individual votes are Kemeny optimal rankings.

Kemeny optimal rankings have the following nice majority voting property:

Let r be Kemeny optimal. If U and V partition the set of items, and for every u in U and v in V the majority think u < v, then for every u in U and v in V r(u) < r(v).

Proof:

Suppose we had r(u) > r(v) for some u, v. By passing to a smaler u and a larger v we can ensure that there is no z with r(u) > r(z) > r(v). (we can take the smallest u such that r(u) > r(v) then the largest v such that r(u) > r(v)).

But then if we were to define the ranking r’ which is exactly r except that u and v were swapped, this would decrease K: Because there are no points between them, the only pairwise disagreements it changes are those on u and v, so K(r’) – K(r) = #{ t : t(u) < t(v) } - #{ t : t(u) > t(v)}. But we know the majority think u < v, so this change in score must be negative, so we have K(r') < K(r), contradicting minimality.

QED

We call the conclusion of this theorem the generalised Condorcet criterion (the condorcet criterion is that if there's a single element that beats all the others in a majority vote then it should be the winner). It's a nice property - it is in some sense an approximation of majority voting. In many cases satisfying it will uniquely determine a great deal about the resulting ranking - it's only when there's ambiguity (as in the Condorcet paradox example) that it fails to do so. This should give us confidence that the Kemeny optimal solution is the right approach.

So, we want to calculate a kemeny optimal solution.

There's just one problem: We've defined the problem in terms of a minimization over all rankings of n items, of which there are n!. This search space is, to borrow a technical term, freaking huge. This makes it a hard problem. In fact finding a Kemeny optimal solution for rankings on n items is NP hard, even with only four votes. I won't prove this here. Read the paper if you care.

So, no Kemeny optimal solutions for us. How sad.

What would work well as a substitute?

Well, examining our proof that Kemeny optimal solutions satisfy the generalised Condorcet condition, an interesting feature appears: We don't actually use anywhere that it is a global minimum. We only use the fact that swapping two adjacent pairs can’t decrease the score. This suggests the following relaxation of the condition:

A ranking r is locally Kemeny optimal if swapping two adjacent elements of the ranking does not decrease K.

By the above observation, locally Kemeny optimal solutions satisfy the generalized Condorcet condition.

This turns our problem from a global search to a local one: Basically we can start from any point in the search space and search locally by swapping adjacent pairs until we hit a minimum. This turns out to be quite easy to do. We basically run insertion sort: At step n we have the first n items in a locally Kemeny optimal order. Swap the n+1th item backwards until the majority think its predecessor is < it. This ensures all adjacent pairs are in the majority order, so swapping them would result in a greater than or equal K.

This is of course an O(n^2) algorithm. In fact, the problem of merely finding a locally Kemeny optimal solution can be done in O(n log(n)) (for much the same reason as you can sort better than insertion sort). You just take the directed graph of majority votes and find a Hamiltonian Path. The nice thing about the above version of the algorithm is that it gives you a lot of control over where you start your search.

The above algorithm however produces an ordering which is consistent with its starting point in the following sense:

If we start with a ranking r and then run the above algorithm on it to get a ranking r’, then if r’(u) < r'(v) we must have r(u) < r(v) or a majority vote that u < v.

This is easy to see: If r(u) > r(v) and the majority think that u > v then when moving u backwards we would have to stop at v at the latest.

In fact for any starting point there is only one kemeny optimal solution which is consistent for it in this way: We can see this inductively. If it’s true for the first n – 1 items, then where can we put the nth item? The only possible place is where the above algorithm puts it: If it were to put it after that place, it wouldn’t be Kemeny optimal. If it were to put it before it, it wouldn’t be consistent because it would be less than some item which it was greater than in the original ordering.

The unique locally Kemeny optimal solution is called the local Kemenization of the ranking.

So, this process gives us the core idea of the rank aggregation mechanism we’re using: We pick a starting point according to some heuristic (I’ll explain the heuristic we’re using in a later post), and from that starting point calculate its local Kemenisation. This gives us a ranking which is guaranteed to satisfy the generalised Condorcet condition, and thus likely to be a good ranking, but takes into account whatever our heuristic thought was important as well – this can matter a lot for cases where there is ambiguity in the ranking data.

Sequential compactness and the splitting number

Sunday, March 14th, 2010

In talking with people in #scala earlier I ended up looking through some of my old maths articles. In particular this one. I noticed that I mentioned “So far the only counterexampe I have depends on the value of the reaping number, tau. It’s fairly standard (I’ll write a post on it at some point) that {0, 1}^{tau} is not sequentially compact”.

The details of the above are wrong: Firstly, I meant the splitting number rather than the reaping number (a post I found elsewhere on the internet confirms I was confusing the two – I’m not even sure what the reaping number actually is). Secondly, “fairly standard” my ass. I couldn’t find anything on it, so had to recreate it from scratch. I mean, it’s not a new result and there have been papers on it, but it’s not by any means a commonly reproduced proof. I have no access to maths journals in order to check out the original papers, and hence there will be a deplorable lack of citations here. Sorry.

Still, four years later, here’s that post. I’ve written it up as a pdf rather than subjecting you to embedded LaTeX in the post.

Axioms, definitions and agreement

Sunday, June 28th, 2009

A while ago I posted A Problem of Language, a response to an article claiming that Scala was not a functional language. This isn’t an attempt to revive that argument (and please don’t respond to it with such attempts. I’m likely to ignore or delete comments on the question of whether Scala is a functional language). It’s a post which is barely about programming, except by example. Really it’s a post about the philosophy of arguments.

My point was basically that without a definition of “functional language” (which no one had provided) it was a meaningless assertion to make.

Unfortunately this point isn’t really true. I think I knew that at the time of writing but glossed over it to avoid muddying the waters, as it’s false in a way that doesn’t detract from the basic point of the article, but it’s been bugging me slightly so I thought I’d elaborate on the point and the basic ideas.

Let’s start with what’s hopefully an unambiguous statement:

Brainfuck is not a functional language

Hopefully no one wants to argue the point. :-)

Well, why is brainfuck not a functional language? It doesn’t have functions!

So, we’re making the following claim:

A functional language must have a notion of function

(in order to make this fully formal you’d probably have to assert some more properties functions have to satisfy. I can’t be bothered to do that).

Hopefully this claim is uncontroversial.

But what have we done here? We’ve, based on commonly agreed statements, proved that Brainfuck is not functional without having defined “functional programming language”. i.e. my claim that you need a definition in order to meaningfully claim that a language is not functional is false.

What you need in order to make this claim is a necessary condition for the language to be functional. Then on showing that condition does not hold you have demonstrated the dysfunctionality of the language.

But how do we arrive at necessary conditions without a definition? Well, we simply assert them to be true and hope that people agree. If they do agree, we’ve achieved a basis on which we can conduct an argument. If they don’t agree, we need to try harder.

A lot of moral arguments come down to this sort of thing. Without wanting to get into details, things like arguments over abortion or homosexuality frequently come down to arguments over a basic tenet: Do you consider a fetus to be of equal value to a human life, do you consider homosexuality to be inherently wrong, etc. (what I said about arguments RE Scala holds in spades for arguments on these subjects). It’s very rare for one side to convince the other of anything by reasoned argument, because in order to construct a reasoned argument you have to find a point of agreement from which to argue and that point of agreement just isn’t there.

Mathematically speaking, what we’re talking about is an Axiom. Wikipedia says:

In traditional logic, an axiom or postulate is a proposition that is not proved or demonstrated but considered to be either self-evident, or subject to necessary decision. Therefore, its truth is taken for granted, and serves as a starting point for deducing and inferring other (theory dependent) truths.

I consider this definition to be true, but perhaps a bit obfuscated. I’d like to propose the following definition. It’s overly informal, but I find it’s a better way to think about it:

An axiom is a point which we agree to consider true without further discussion as a basis for arriving at an agreement.

(This may give the hardcore formalists a bit of a fit. If so, I apologise. :-) It is intended to be formalist more in spirit than letter )

The most important part of this is that axioms are social tools. They don’t have any sort of deeper truth or meaning, they’re just there to form a basis for the discussion.