Archive for the ‘Uncategorized’ Category

Northpaw, part 1

Sunday, August 15th, 2010

So I’ve bought myself a Northpaw kit.

Why? Well, partly because I’m interested in this sort of thing – different modes of perception of and interaction with information – but mostly because it looked cool and I was in a gadget buying mood.

After some grief from the royal mail it finally arrived on friday, and I spent some time today at the london hack space putting it together.

I don’t have anything very exciting to report so far: I basically got it to work, but mangled the assembly. The electronics turned out to be relatively easy – the results aren’t elegant, as I have the manual dexterity of a blind walrus and haven’t soldered in 12 years, but they seem to work. The positioning of the motors however turns out to be rather a challenge.

I found the ribbon extremely hard to configure correctly, and the power connection for it very hard to get right in a way that gave me any capability to position the motors correctly. I’m sure it’s doable, but I couldn’t do it. So the upshot is that I’m going to have to rebuild that part. Fortunately the motors all work and the electronics seems to talk to them correctly. My inclination is to take the ribbon mount off and wire it together with normal wires rather than ribbon.

Another update will be forthcoming when I’ve actually got the thing wearable.

What did you study?

Wednesday, July 7th, 2010

Most of the time when I tell people I did mathematics at university they reply with something along the lines of “Oh, I really hated that at school”.

My brother (who did philosophy) apparently often hears “So can you read my mind?”. I don’t quite get it.

Apparently musicians often hear “Oh, I played the X a while ago but stopped when I left school”. Sometimes suffixed with “Apparently I was good enough to do it professionally, but I didn’t want to”.

Beyond that, I don’t know. I’m sure most subjects have a wealth of stereotypical answers, but I’ve no idea what they are!

So… lets find out. I’ve created a site to gather all those answers which annoy you so much when you get them. It’s pretty limited so far: You just get to add answers and see other peoples’. If it goes well there are all sorts of cool things I can do with the data, but right now we’re at just the basics.

Have fun

Irrelevant alternatives aren’t

Thursday, July 1st, 2010

Every now and then someone discovers Arrow’s Impossibility Theorem and gets all excited. “Democracy is impossible! Let’s have a dictator!” they declare from the rooftops, or words to that effect. I certainly found this fascinating when I first discovered it.

Eventually they calm down. There are a lot of commonly mentioned reasons why Arrow’s impossibility theorem doesn’t have massive real world consequences – it’s not like anyone thought they were using a perfectly fair voting system in the first place, and the mechanism described in the theorem doesn’t actually correspond that closely with real world votes, which are mostly just trying to elect a single winner and don’t require nearly so strong consistency.

One reason I haven’t seen mentioned is the following: If it were possible to create a voting system which satisfied the criteria of Arrow’s impossibility theorem, it would be a bad idea. Independence of irrelevant alternatives, that the ordering of A and B doesn’t depend on the introduction of C, is an appealing condition on the face of it, but it turns out that you don’t actually want real world voting systems to have it. Consider the following set of opinions:

A > B > C
B > C > A
C > A > B
B > C > A

The numbers work out as follows:

There is a 50-50 split on A > B.
75% of people think that B > C
75% of people think that C > A.

Therefore even though there is a tie between A and B, the only fair combined ordering is that B > C > A – any other ordering would make a lot more people unhappy. So the introduction of an apparently irrelevant alternative has taken a tie between A and B and broken it decisively.

Rank Aggregation in The Right Tool

Tuesday, May 11th, 2010

As you’ve probably noticed, over the weekend I created The Right Tool, for discovering how we describe programming languages. In response to massive amounts of data and feedback I’ve thrown together some pages for seeing the results of which statements describe which languages well.

There’s one set for which statements match a given language and one for how well each language matches a given statement.

The method I’m using is straightforward enough, but quite neat. It’s an adaptation of a method from “Rank Aggregation methods For The Web, by Dwork, Kumar, Naor and Sivakumar” which they refer to as MC4. I’ve modified it to better take into account shades of grey and to be more resilient to rare events.

It belongs to a general class of scoring mechanisms based on Markov Chains. Google’s pagerank is an instance of these. Essentially what you do is define a Markov chain on the things you want to score. You try to arrange it so that transitions to good nodes are more likely than transitions to bad nodes. You then find the stationary distribution for this markov chain (the long-term probability of being at a particular node). The idea is that if transitions to good nodes are more likely than transitions to bad nodes, the nodes which you spend more time in are more likely to be good. These probabilities then form a measure of what you are trying to rank.

So, that’s all very well, but how do we decide what markov chain to define?

In MC4 the markov chain is defined as following: When at node A, pick a node at random. If the majority of people agree that that node is better than this one, transition there. Else stay where you are.

This has a couple of problems.

The first is this: How do we decide what constitutes “the majority of people”? A flat out majority is very sensitive to small numbers of voters: If only one person ranked both of two languages and ranked it the “wrong” way, we’re still going to treat it as better. I tried various things to make this work: One was just a lower bound on the number of votes we counted, one was modelling as a Bernoulli trial and requiring N (for N = 1 or 2. 1 seemed to work slightly better) standard deviations above average. It all felt a bit arbitrary though.

The second problem was shades of grey: It makes absolutely no distinction between language A being a lot better than language B and language A being slightly better than language B. As well as creating weird biases, this results in the method being very bad at getting the tail right: You get a lot of languages which everyone thinks are the pits, and they all get assigned a score of 0, so there’s no way to distinguish between them. It happened that the approximation I was using somewhat mitigated this, but it was still a bit annoying.

So I’ve modified the chain definition as follows: First pick a language B at random. Now consider the probability of language B beating A in a comparison, and jump to B with that probability, else stay at A. This means that when there’s only a slight benefit to B over A the score of B is only slightly higher than that of A, because we’ve still got a decent chance of transitioning back (it turns out that with only two languages A and B in the chain where B beats A with probability p we assign a score of p to B and 1 – p to A).

One final refinement: Obviously we don’t know the “true” probability of B beating A, so we have to estimate that from the data. The way I’ve done this is a very simple weighted bayesian average. We estimate it as (0.5 * N + Number of times B beat A) / (N + number of times B and A were compared), where N is some number representative of how hard we are to convince of a probability (I think I currently have it set to 5). For very small samples we’re artificially forcing it closer to half, and even for larger samples we’re never allowing it to quite hit 0 or 1.

That’s about it. We now have these probabilities. The ranking is then simply defined by ordering from highest to lowest probability.

This seems to work pretty well. I don’t have good metrics in place, but the lists it’s generating “feel” better than any of the previous incarnations, and it’s conceptually pleasing. It also appears to be quite spam resistant, which is a nice touch.

The best way to handle exceptions

Monday, April 26th, 2010

So, it’s well known that you shouldn’t have code that looks like this (examples in ruby and ruby-like pseudo code, but they’re trivially translatable to any other language):

begin
   do stuff
rescue
end

i.e. you’re swallowing the exceptions because they scared you and you wanted to hide them. This is naughty.

A more subtle trap people fall into is the following example from the rails boot.rb (this isn’t a rails specific problem. I see it everywhere)

    def load_rails_gem
      if version = self.class.gem_version
        gem 'rails', version
      else
        gem 'rails'
      end
    rescue Gem::LoadError => load_error
      $stderr.puts %(Missing the Rails #{version} gem. Please `gem install -v=#{version} rails`, update your RAILS_GEM_VERSION setting in config/environment.rb for the Rails version you do have installed, or comment out RAILS_GEM_VERSION
      exit 1
    end

Let’s consider the results of this rescue block: A generic error message is printed, and we exit with a non-zero status code.

Now let’s consider the results of not having this rescue block: A specific error message is printed, we exit with a non-zero status code and we get a stack trace telling us exactly what went wrong.

So by including this rescue, we have lost information. Often this information doesn’t matter, but as it turns out in this case it does: If you have a gem version clash where rails depends on a different version of a gem that has already been loaded you will get a Gem::LoadError and, consequently, a very misleading error message.

I don’t want to pick on rails. Well, correction. I don’t want to pick on rails here. This post is actually inspired by a similar incident at work: There was a piece of code that basically looked like this:

begin
  connect to server
rescue
  STDERR.puts "Could not connect to server"
  exit
end

And were getting very puzzling errors where we were sure all the details were correct but it was failing to connect to the server. Once we deleted the rescue code it was immediately obvious why it was failing (if you care, the reason was that we weren’t loading the config correctly so it was trying to connect with some incorrect default values).

Which brings me to the point of this article: The best way to handle exceptions is not to handle them. If it’s not an exception you can reasonably recover from, the chances are pretty good that the default behaviour is more informative than the “helpful” code you were going to write in order to catch and log the error. So by not writing it you get to have less code and spend less time debugging it when it inevitably goes wrong.