Author Archives: david

Rube Goldberg 2.0

I just voted up a link on reddit. The link doesn’t matter, but it started me thinking about the chain of events that followed it.

First, a bit of javascript submits some information back to reddit’s server. The python code receiving reddit stores this.

At some point later, some ruby code running on my slicehost requests a json file from reddit. The python serves this up, the ruby fetches it, parses it and hands it off to delicious.com, where a mix of php and C++ (I think) stores it in their backend.

At some point later a php script running on my blog fetches that data from delicious. It dumps it into a MySQL database.

At some point later still, someone comes along to my website. They see a link to the thing I voted up on reddit, served up by php.

Alternative titles: “Polyglotism now!”, “…except for too much indirection”, “He Knows The Unix Way”.

This entry was posted in programming and tagged , on by .

Open source term extraction

This is just a quick announcement to let people know that we’ve open sourced our JRuby library for term extraction. You can get the code from my github page.

Unlike a lot of term extraction libraries, this doesn’t take any stance as to the “significance” of the terms it extracts. It’s purely about looking at the syntax and determining where good boundaries for terms are. There are a couple reasons for this, but basically we’ve found that it’s more effective to separate the two steps and makes it easier to tinker around with them independently. The criteria for “interestingness” of terms seem to be largely distinct from those for terms which simply make sense linguistically. So we have a two stage pipeline, one which extracts semantically meaningful terms and one which determines what terms are actually interesting in the context of the document. The second step is much more complicated, and we’re not open sourcing that (yet? probably not any time soon, if ever. Even if we wanted to, it relies on a lot more global information across the document corpus and so is very tied in with how SONAR operates, making it much harder to isolate).

So, how does it work? Black magic and voodoo!

Actually, no. It’s pretty straightforward. It builds on top of the excellent OpenNLP library, using its tools for part of speech tagging, sentence splitting (a much harder problem than you’d imagine) and phrase chunking. It’s currently a rules based system on top of there, as while you’re figuring things out it makes much more sense to stick with something so easily fine tunable. Our expectation is that we’ll gradually start replacing bits of it with machine learning based techniques as we start to hit the limitations of a rules based system, but for now it’s working pretty well.

Let’s have an example. If we feed the second paragraph of this post into the term extractor, we get the following terms back:

term extraction libraries
stance
terms
syntax
good boundaries
couple reasons
two steps
steps
criteria
interestingness
sense
two stage pipeline
stage pipeline
semantically meaningful terms
context
context of the document
document
second step
open sourcing
time
document corpus
SONAR

Hope you find this useful. Let us know if you build anything cool with it!

This entry was posted in Code, programming and tagged on by .

A story, and an oft overlooked point

I’m going to tell you a story. It’s not an argument, it’s certainly not a statistic, it’s just a thing that happened.

About two and a half years ago, right before Christmas, my father fell out of a tree, into a river, and broke his spine.

My father is fine now. Were it not for a few visible pieces of metal in his back (which he’ll happily show off to you) and the fact that he went from being an inch taller than me to an inch shorter than me (which he’ll vehemently contest every time you mention) you’d never be able to tell it happened. But were it not for every single thing going right from that point and the incredible care he received over the subsequent weeks, he would be dead or paralysed.

I could tell you that the NHS saved my father’s life. It would be true, and I owe them a tremendous debt of gratitude for it, but it would also be somewhat missing the point.

I got back from a family reunion in the states about a week and a half ago. One of the themes of this reunion could be quite legitimately summed as “Hurray! None of us are dead yet!”. We’ve had more than our fair share of near fatal medical experiences on both sides of the pond, and everyone has come through ok. America and Britain both have amazing doctors.

This is probably the point at which you expect me to give a sob story about bankruptcy resulting from one of these medical conditions in the states. Fortunately not. As far as I know (certainly I’ve not heard anything to the contrary), everyone over there was suitably covered.

Perhaps then I should tell you that if my father were over there he would be bankrupted?

Actually, again, no. My father has private health insurance.

Yes, really. His employer provides it. It’s a very American setup.

You see, we may have this big scary seeming socialist monster of the NHS, but it’s not like we don’t have private healthcare too. It’s not even very expensive – I think I could probably get private care for about £100/month, without any employer subsidy, if I wanted it. I used to be covered by my father’s insurance when I was younger. It was very convenient for fast tracking things that would have had a longer wait on the NHS, and I generally received good care on it (I’d describe it as about as good as the NHS care I received but slightly more personalised), but I don’t think I needed to use it more than a handful of times.

So, why isn’t this a story about the great private healthcare my dad got in the UK? Simple: When my mother told the hospital that they had private health insurance, they shut the idea down flat. If he’d gone to a private hospital here, they wouldn’t have known what to do with him. For top end critical care in the UK, you use the NHS. It has the best emergency services, it has the best facilities, it has the best doctors. If you need medical care and you need it right now, you go to the NHS.

This entry was posted in life and tagged , on by .

A delicious way to use up stale bread

I had the end of a loaf of whole grain olive bread left over. It was very tasty when fresh, but that was about two days ago and now it wasn’t good for much on its own. So I did the following:

I took some pomodoro tomatoes, quarters them, tossed them in olive oil, rosemary, sea salt and balsamic vinegar, fried them until they were quite soft. I then sliced the bread very finely (about half normal thickness for a slice of bread then cut in half crossways), added to the mix and kept frying until the bread had very thoroughly soaked up the tomatoes’ juices.

It was delicious. I served it with scrambled eggs and a diced cucumber salad.

This entry was posted in Food on by .

How to start the week

Ah, Monday. A new start! Full of fresh opportunity, and excitement about what the week to come will bring.

This Monday, we have set the SNAFU dial on David’s life to 11. Let’s see how long it takes him to notice…

First, some context. My flat contains its own boiler. A little combination electric/gas thing. Somewhat flaky, but it provides me with an unlimited supply of hot water on demand and is therefore my friend. On Thursday night I came home to discover that my friend was dead and bleeding all over the carpet. After a quick placement of a widget to catch the water, some rapid consultation with the landlord’s answering machine, email address and emergency backup number I turned the water in my flat off.

The next morning an engineer came in and performed the autopsy. He pronounced that there was no hope and that the leak had completely knackered the electrics. This was an ex boiler. Arrangements were made for the boiler to be replaced on Monday, and my parents got the lovely surprise of their son popping up for a weekend visit to their hot water supply. I returned Sunday night, got ready for the boiler maintenance on Monday.

The start of the day was pretty chaotic. A lot more stuff than I’d realised had to be moved to enabled boiler installation. That’s ok. It was doable. My friend Michael had kindly agreed to house sit my flat while the maintenance people were here, as I had to go to my contracting job at Wordtracker for noon. Unfortunately Michael overslept, not arriving at my flat till noon. Never mind. It’s only a fifteen minute trip, and arriving at 12:15 isn’t the end of the world, but as a result I’m a little flustered when I leave. I call Wordtracker to let them know I’ll be late, but no one picks up.

En route I realise I’ve forgotten the key code for the gate there. It’s not the first time. I think I remember it, but I might have a permutation, or one digit wrong, or something. Hopefully it won’t be a problem.

I arrive at the wordtracker offices. As I approach the gate, I see that there’s a large crowd of people in suits outside it, having some sort of event. I think it’s a bit weird but mostly ignore it and try to enter.

As feared, the code doesn’t work, and the gate declares in a loud American voice “The code you have entered is invalid”. I imagine more than see the dirty looks I’m getting from the besuited people, but I’m sure they were there. I try one or two times, each time denied by the electronic American. I give wordtracker another call, no response. At this point I begin to suspect I have the wrong number for them. I try the code some more.

So here I am, standing there like a prat typing in numbers into the gate and getting admonished by an American machine.

When a hearse pulls up behind me

Not really wanting to interrupt what it now a funeral procession with electronic equivalents of “You shall not pass!” I beat a hasty retreat and attempt to find out the code through means other than brute force. So I check their website for contact details. On the amazingly shitty and probably ludicrously expensive net access on my phone (I am not yet among those blessed with a smartphone). I find a contact number, different from the one I have, call it, and get through to their customer support answering machine. Great.

“Ok”, I think, “I have an email with the gate code. All I need to do is check that email”, and so begins the ordeal of trying to read gmail on aforementioned useless mobile phone. I’m about 90% of the way through the login process I hear “Oh, hello” as I am passed by one of the wordtracker people going to lunch. Where they inform me that the code for the little gate is broken and I need to use a different code which opens both it and the large gate. They’re very nicely apologetic for not having let me know. No harm done.

So I sneak through the funeral (which is now looking a lot less jolly than it was pre-hearse), enter the code (which again results in loud electronic Americanisms) and get into wordtracker. At last.

To find that the person I’m supposed to be working with isn’t there, and the one person who is there doesn’t know if he’s coming in.

Sigh. :-)

Somehow I manage to avoid cracking up, leave my number for Marcus to get in touch with me if he arrives, and head out through the funeral and back home.

The rest of the story is a much happier one. About half an hour later I get a call from Marcus letting me know that he’s there – unavoidable transit delays and he didn’t have my number – and I can come in whenever I want. I do so, and we proceed to in fact have a very productive day. Further, on my return home I am blessed with a working boiler. Here’s hoping it stays that way, and the rest of the week follows on from today’s evening rather than its morning.

This entry was posted in life on by .