David R. MacIver's Blog: Computational linguistics and Me

Computational linguistics and Me

25 January 2009

Apparently I’m a computational linguistics blogger. This is sortof news to me. The closest I’ve come to blogging about computational linguistics is in writing a borderline rant about academia.

That being said, I do work in computational linguistics: SONAR is basically a great big NLP system.

This fact, however, is almost totally unrepresented in my blogging.

Actually, that’s part of why I’ve been blogging so much less recently. Since moving onto SONAR my brain has been afire with newly acquired knowledge and trying to figure out how best to apply it to work problems. This has left relatively little time for most of the other stuff I think about that normally generates blogging.

Of course the obvious solution is that I should be blogging about computational linguistics. But that has some obstacles. Primarily:

Confidentiality

All the computational linguistics stuff I do is for work. I tinker around with it at home, but haven’t really done anything useful. This makes it difficult to know what I can blog about: I certainly can’t go “HEY GUYS. I FIGURED OUT THIS AWESOME ALGORITHM WHICH WE’RE USING IN SONAR” for everything. We rather rely on some of that magic to make us money. :-)

That being said, there’s definitely stuff I can blog about. e.g. there’s nothing particularly confidential in how we extract likely candidate phrases from a document, and it’s at least mildly interesting (probably more to non-linguists, but who knows?). In fact, we’re actually all encouraged to blog more about what we do but never find the time. So, really, work isn’t that much of an obstacle to blogging about this. It just requires a bit of careful thought.

Experience

I’m very new to computational lingusitics. As such, I’ve a much less clear idea what’s bloggable about in it. If we look at my blogging history, I started blogging about programming in february 2007. That’s just shy of a year after I started working as a programmer (which, effectively, is just shy of a year after I started programming anything in earnest). And I think it took another six months of blogging before I actually wrote anything worth reading. In comparison, I’ve not even worked in computational linguistics for 6 months (I think I started work on SONAR in september and had no exposure to it before that). So I’m very much still sortof fumbling along, trying to figure out the best way to do things.

From a work point of view that’s fine. Actually some of my best work is done when I don’t know what I’m doing: I’m more able to ask stupid questions and get useful answers and I come at things from a sufficiently different angle to normal that sometimes I produce unexpected results.

But from a blogging point of view it’s pretty likely that what I end up writing about will range from the trivial to the wrong, until I find my feet. Some of it might be of interest to non-linguists but too basic to be of interest to linguists. Some of it might be so esoteric that it would only be of interest to linguists, at least it would if they weren’t so easily able to point out why it’s wrong. Some of it might be of interest only to me.

But actually this is a really piss poor excuse to not blog about it. Because, frankly, I do not write to amuse you. Writing for other people is, to me, a waste of time. I write about what is of interest to me. With any luck other people will find it interesting too, but that isn’t the primary point.

So...

In conclusion, my two main reasons for not blogging more about comptuational linguistics, natural language processing, etc. suck. So expect to see more about it here in the future. This probably means you’ll see more Ruby as well, as that’s what we use at work and I don’t expect I’ll bother translating into Scala except when I have a specific reason to do so.

Comments

Jan Berkel on 2009-01-27 01:34:17:

wow, great stuff david. while you write a blog post about not having enough time to write i could write a post (if i had a blog) about not finding enough time to read all the blogs i’m subscribed too. information overload, welcome to the 21st century.

david on 2009-01-27 01:38:09:

I keep meaning to write tools to help me manage the information overload, but I never find the time. :-)