Archive for January, 2009

Old mathematics posts

Sunday, January 25th, 2009

I’ve imported my old maths blog, A Mathematician’s Scratchpad. It’s available under the category mathematics. Unfortunately I can’t get the LaTeX to compile. The old LaTeX-render plugin I used no longer works with the latest version of wordpress, and I’m experiencing a host of problems with the new one (not least among them “It makes my site shit-slow”), so I’ve disabled it. Hopefully it should be reasonably readable anyway.

Most likely none of this stuff is particularly interesting to anyone who reads this blog currently, but I’m enjoying the nostalgia trip. :-)

Computational linguistics and Me

Sunday, January 25th, 2009

Apparently I’m a computational linguistics blogger. This is sortof news to me. The closest I’ve come to blogging about computational linguistics is in writing a borderline rant about academia.

That being said, I do work in computational linguistics: SONAR is basically a great big NLP system.

This fact, however, is almost totally unrepresented in my blogging.

Actually, that’s part of why I’ve been blogging so much less recently. Since moving onto SONAR my brain has been afire with newly acquired knowledge and trying to figure out how best to apply it to work problems. This has left relatively little time for most of the other stuff I think about that normally generates blogging.

Of course the obvious solution is that I should be blogging about computational linguistics. But that has some obstacles. Primarily:

Confidentiality

All the computational linguistics stuff I do is for work. I tinker around with it at home, but haven’t really done anything useful. This makes it difficult to know what I can blog about: I certainly can’t go “HEY GUYS. I FIGURED OUT THIS AWESOME ALGORITHM WHICH WE’RE USING IN SONAR” for everything. We rather rely on some of that magic to make us money. :-)

That being said, there’s definitely stuff I can blog about. e.g. there’s nothing particularly confidential in how we extract likely candidate phrases from a document, and it’s at least mildly interesting (probably more to non-linguists, but who knows?). In fact, we’re actually all encouraged to blog more about what we do but never find the time. So, really, work isn’t that much of an obstacle to blogging about this. It just requires a bit of careful thought.

Experience

I’m very new to computational lingusitics. As such, I’ve a much less clear idea what’s bloggable about in it. If we look at my blogging history, I started blogging about programming in february 2007. That’s just shy of a year after I started working as a programmer (which, effectively, is just shy of a year after I started programming anything in earnest). And I think it took another six months of blogging before I actually wrote anything worth reading. In comparison, I’ve not even worked in computational linguistics for 6 months (I think I started work on SONAR in september and had no exposure to it before that). So I’m very much still sortof fumbling along, trying to figure out the best way to do things.

From a work point of view that’s fine. Actually some of my best work is done when I don’t know what I’m doing: I’m more able to ask stupid questions and get useful answers and I come at things from a sufficiently different angle to normal that sometimes I produce unexpected results.

But from a blogging point of view it’s pretty likely that what I end up writing about will range from the trivial to the wrong, until I find my feet. Some of it might be of interest to non-linguists but too basic to be of interest to linguists. Some of it might be so esoteric that it would only be of interest to linguists, at least it would if they weren’t so easily able to point out why it’s wrong. Some of it might be of interest only to me.

But actually this is a really piss poor excuse to not blog about it. Because, frankly, I do not write to amuse you. Writing for other people is, to me, a waste of time. I write about what is of interest to me. With any luck other people will find it interesting too, but that isn’t the primary point.

So…

In conclusion, my two main reasons for not blogging more about comptuational linguistics, natural language processing, etc. suck. So expect to see more about it here in the future. This probably means you’ll see more Ruby as well, as that’s what we use at work and I don’t expect I’ll bother translating into Scala except when I have a specific reason to do so.

Writing things right

Monday, January 19th, 2009

OO has contributed many big and important innovations to programming. Among these, the foremost is that you write functions after rather than before their argument.

No, really.

It’s not just OO languages of course. Concatenative languages do the same thing. There’s a long history of mathematicians doing it as well (though we don’t like to talk about them. The cool mathematicians all write their functions on the left).

It’s funny how attached people get to this fact though.

Consider the following piece of Scala code:

object StringUtils{
  /**
   * Trims whitespace from the end of s.
   */
  def rtrim(s : String) = ...
}

We can invoke this as StringUtils.rtrim(myString). Or if we import StringUtils, just rtrim(myString);

People get very upset if you ask them to do so though, and they go to all sorts of lengths to avoid it.
Consider the following three examples from different languages:

Scala:

object StringUtils{
   implicit def string2RTrim(s : String) = new { def rtrim = ...; }
}

Ruby:

class String
  def rtrim
  ...
  end
end

C#:

class StringUtils{
   public static String rtrim(this String s) {
     ...
   }
}

What do these achieve over the previous version? Simple: You can write myString.rtrim instead of rtrim(myString). That’s it. (Actually the Ruby and Scala versions both *can* allow you to do different things than that. It’s just that here and in 90% of the use cases they aren’t used for anything else. The C# version literally doesn’t do anything else).

The thing is, while I’m making fun of this to a certain degree, it’s actually a perfectly reasonable thing to want to do. Designing things in noun-verb order is a good principle of UI design, and it works for programming as well. Things chain better – when you want to add new functions to a pipeline you add them at the point your cursor is naturally at and it matches well with thinking of it as a pipeline of “take this thing, do this to it, do that to it, do this other thing to it, get this value out”. Also you write far fewer brackets. :-) (compare Haskell’s foo . bar . baz $ thing idiom for a similar bracket avoidance tool).

Of these, I’d say that the Ruby solution is the most obvious (it just uses the fact that classes are open to add a new method to String), but it comes with the possibility of amusingly non-obvious runtime errors when someone else defines a conflicting method. The C# solution seems the best to me – it’s relatively little overhead over writing the utility method as you would otherwise and comes with the option to invoke it either as myString.rtrim or StringUtils.rtrim(myString), so when namespacing conflicts inevitably occur you have an easy fallback. But of course it uses a language feature specifically added to do this, while the other two are functions of more general language features. The Scala solution is, to my mind, decidedly the worst of the three.It’s syntactically noisy and comes with a significant additional runtime overhead.

But honestly I’m not particularly happy with any of these solutions. The Scala and Ruby solutions come with disproportionate costs to the benefit they give and the C# solution requires an additional language feature. Moreoever, each of these solutions requires effort at each definition site in order to make something available that you always want at the use site. Wouldn’t it be better if for every utility function you automatically had the option to write it on the right?

Let’s take a digression. What language is the following (rather pointless) code written in?

[1, 2, 3].sort.length

Ruby, right?

Actually, no. It’s Haskell.

Wait, what?

Well, it’s Haskell if you do something slightly evil and redefine the (.) operator (which normally means composition):

Prelude Data.List> let (.) x f = f x
Prelude Data.List> [1, 2, 3].sort.length
3

I saw this trick a while ago (the author was amusingly apologetic for it). It’s evil Haskell code because of the way it redefines an operator that normally means something else (this is totally typesafe of course – existing code will continue to use the old operator definition). But it’s a perfectly valid operator definition, and a rather nice one.

It works well with additional arguments to functions too:

Prelude Data.List> [1, 2, 3].sortBy(compare).length
3

The reason this works is that sortBy takes the list argument curried as its last argument, so sortBy(compare) gives something of type [Int] -> [Int] which we can then apply as above (Haskell’s precedence rules make this work).

So this is a nice trick, but how is it useful to you? Well, it’s probably not. I can’t think of any low noise way of making it work in any of the other languages mentioned so far (the best I can come up with is an evil evil hack in Ruby that would make god go on a kitten killing spree and a mildly nasty hack with operators and implicit conversions in Scala that’s much too noisy to really use), and using it in Haskell will make other Haskell programmers very unhappy with you. But it’s an interesting trick, and I’ll be sure to bear it in mind if I ever get around to creating DRMacIverLang.

Planet Just Scala

Sunday, January 18th, 2009
After a little bit of hacking around with Yahoo Pipes I’ve created a filtered version of the Planet Scala feed. I couldn’t figure out a way to make Yahoo Pipes give me a pretty URL (which is stupid, as it gives a pretty web URL just fine), so here’s a decent-urled version of it. http://pipes.yahoo.decenturl.com/planet-just-scala
I’ll probably replace this with a quick custom script at some point when I can be bothered, but this works for now. :-)

Planet Scala: By Scala programmers, usually about Scala

Sunday, January 18th, 2009

Planet Scala’s selection of feeds has always been a bit haphazard – in some cases the whole feed, in some cases a Scala specific one.

As I mentioned I was thinking of doing at the end of last year, I’m experimenting with changing this. Except in the highest non-scala : scala volume ratios, I’m switching the feeds over to provide the full feed. My impression so far is that 90% of the additional stuff acquired this way should be of general interest to Scala programmers and that the volume is not that high.

You’ll probably notice an initial spike in non-Scala content as the backlog of non-Scala posts becomes available, but this should settle down fairly rapidly.

I’ll also look into an easy way of providing a Scala-filtered RSS feed on top of this for people who don’t want the extra content.