Archive for July, 2009

Crowding the trampoline

Wednesday, July 29th, 2009

As most of you probably know by now, even though I don’t talk about them that much, I work for a company called Trampoline Systems. We’re a startup doing some interesting tech things. That’s not what this post is about.

We’re seeking series B funding at the moment, but it’s a difficult time to be doing it through the normal VC route, so now we’re trying something new: Crowdfunding. Rather than getting a few people to give us lots of money, let’s get lots of people to give us a little money. Alistair knows more about it than me, so I’ll refer you to him if you want to know the details.

There are a bunch of legal difficulties with this in terms of who the FSA will allow us to solicit funding from. In particular I’d be surprised if even 10% of people reading this were on the list. So, this isn’t a “Give us money” request. To be honest, even if it weren’t for the resgulations it probably wouldn’t have been – other people in the company know more about the financial side of things and can say it better than I can.

What’s most interesting to me about the crowd funding isn’t actually the financial aspects. I mean, obviously ensuring the survival of the company is a good thing, but the crowd funding is interesting in a way that merely receiving a big chunk o’ VC funding wouldn’t have been (not that it would have been unwelcome!).

What’s interesting is the additional flexibility it buys. I’m big on the subject of open source and open information (I’m not a GNU style fanatic – I’m absolutely fine with closed source too. I believe in closing as much source as you need to and opening as much source as you can). There’s been a movement amongst the dev team (particularly me and Craig, our CTO) to see what we can extract from SONAR in the way of useful open source tools. Our term extraction code for example (which takes a blob of text and gives you useful fragments of text from it which make sense in isolation) is ripe for open sourcing. Unfortunately we’ve held off on it because it sounds like a much bigger chunk of our IP than it actually is, and we need to be super careful about how things look to our funders. This is understandable from their point of view, but somewhat disheartening from mine.

With crowd funding our hope (or at least my hope. This is all still under discussion) is that the larger group will be much more amenable to a policy of openness than the smaller. In many ways it’s much more in keeping with the style of the thing, and with less invested per person there’s less of a strong financial incentive to be risk averse and more of a reason to trust us with these decisions.

So, from my point of view, I’m quite looking forward to seeing what the future brings and, with any luck, it will include a few shiny new toys for you to play with.

How packages work in Scala

Thursday, July 16th, 2009

Every now and then someone discovers how packages work in Scala. This process typically passes through a number of stages.

  1. Confusion: “Hey, guys, I found this weird bug. Can you take a look?”
  2. Surprise: “What? It works like that? Really?”
  3. Denial: “No, I don’t believe you. This has to be a bug.”
  4. Anger: “Dear scala-debate. This is the worst feature in the entire world, and if you don’t agree with me you’re a big poopy head”
  5. Acceptance: “Actually, this is quite a neat feature”

Not everyone reaches step 5. Many stay in step 4 permanently, often because they’ve discovered that this interacts poorly with certain conventions they use.

This behaviour is particularly unfortunate because actually Scala’s package behaviour is quite nice. But people don’t seem to be willing to believe this and instead make up all sorts of behaviour which it doesn’t have and never has had and then get upset when the reality does not correspond to their fiction.

And so, in the hopes of dispelling some of this confusion, I bring to you the reality of how packages work in Scala. Some of this is very basic material, but I’m presenting it in case you’ve not explicitly thought about it in these terms as it will help with the leadup to the actually important part.

Identifiers

You have a bunch of identifiers in scope. These are names for things. It doesn’t matter what they’re names for: They could be vals, defs, packages, objects, etc. So for example suppose I have:

package foo;
object bar;
object baz{
   val kittens = "kittens";
}

within this file, say within the object bar, we’ve got a bunch of identifiers in scope: We have foo, the package we are in, bar, an object, and baz, another object. We don’t have kittens in scope (except within the object baz).

Within the object baz, everything in scope at the outer level is in scope here, but we’ve introduced the additional identifier kittens.

Note that a package conceptually constitutes one “level”. Everything from your current package is in scope, regardless of how you split it up into files – I could have moved some of the objects above into separate files and nothing would have changed.

Top level identifiers

Packages like foo are “top level” – they live in the global scope. Any file can refer to the identifier foo.

Nesting of packages

In the same way we had an object inside a package and introduced a new scope, we can nest a package inside a package.

package mammals;

package rodents{
   class Rat;
}

This places the package “rodents” inside the package “mammals”. In exactly the same way the object did, this inherits everything from the outer scope (and remember: the scope of the package is the scope of everything

package mammals;

class Cat;

package rodents{
   class Rat{
     def flee(moggy : Cat) = println(“Help, help! Run away! It’s ” + moggy)
   }
}

the identifiers of the outer scope are available in the inner one.

But this sort of deeply nested package structure gets very ugly to write, so what one tends to do is seperate it out to one package in a given file, even the nested ones, and so there's syntax to support it:

package mammals.rodents;

class Rat{
  def flee(moggy : Cat) = println("Help, help! Run away! It's " + moggy)
}

This is exactly the same as the previous example except we've moved Cat to another file. It's still in scope as before.

Members

identifiers can have members. These are other identifiers which live on them and can be accessed with a .

For example, to refer to Rat from the package mammals we would refer to it as rodents.Rat.

Shadowing

You can reintroduce the same identifier at an inner level. Going back to our first example suppose we had written baz as

object baz{
   val bar = "kittens"
   val kittens = bar
}

Then kittens would still contain the string "kittens", as it refers to the definition of bar in the current scope not the outside one. Outside of baz, bar would still refer to the object.

An important aspect of this: You can shadow packages just like anything else!

Suppose we have

package foo{
   object baz;
   package foo{
     object baz;

     object stuff{
       val it = foo.baz;
     }
  }
}

Then "it" points to the innermost baz, not the outermost one: We've shadowed the definition of foo.

And this is where the problem lies.

Suppose I have

package net.liftweb{
   object AwesomeWebWidget{
      def doStuffWith(url : java.io.File) = ...
   }
}

and someone comes along (remember this doesn't have to be in the same file - it can even be in a jar) and introduces

package net.java.kittens;

class Kitten;

Now the lift code will no longer work! The problem is that what we have actually looks like this:

package net{
   package java{
     package kittens{
       class Kitten;
     }
   }

   package liftweb{
      object AwesomeWebWidget{
         def doStuffWith(url : java.io.File) = ...
      }
   }
}

the problem is we have a different java identifier in scope than the one we wanted this to mean. It actually refers to the java identifier that we acquire from the net package, rather than the base java that lives in the root as desired. This is the problem that sparked the latest "discussion" in scala-debate on this subject.

The solutions

One thing which everyone immediately leaps to propose is to change the way imports work in Scala. Hopefully the above should have demonstrated that this wouldn't help: I have not mentioned the word "import" anywhere in this explanation. So we can safely discard this as a non-solution.

The primary current solution is, unfortunately, a bit of an ugly one. When you want to say "the java at the root and I really damn mean it" you can refer to it as _root_.java.io.File. Adding this to your fully qualified names will force it to refer to the right one. Many people have taken to using _root_ on all their imports to fully qualify them. Personally I don't feel the need (I don't use Java reverse name conventions though, so I rarely run into the negative aspects of this behaviour).

Some people have taken to fully qualifying all their imports to prevent this sort of accidental shadowing. Personally I find this highly unnecessary. My preferred solution is to avoid the reverse domain name convention: Not having your top level package as something common greatly reduces the ability to accidentally have packages injected into your scope like this.

Other solutions are currently under discussion in scala-debate, so some of this may be prone to change

reddilicous: Automatically import your links from other sites into delicious

Monday, July 6th, 2009

I appear to have done something highly out of character and created a tool which is simply useful, without any real theoretical interest to it.

I followed up on the useful scripts I posted a while ago and decided to turn it into something slightly more complete and robust. The result is reddilicious, a tool for automatically importing from various sites into delicious. (Note: It’s written in Ruby, mostly due to the rather excellent HTTParty and Mechanize libraries. If you’re coming from planet scala expecting to see my awesome scala code, sorry, instead you get some rather grim Ruby code).

It currently handles:

  • Reddit: Pulls in any pages you’ve voted up. Tags them with the subreddit and via:reddit
  • Stumbleupon: Similarly pulls in any pages you’ve thumbed up. Pulls in your blog entry on it if there is one as an extended comment. Pulls in any tags. Tags it via:stumbleupon
  • Twitter: Pulls in any links mentioned in your friends timeline. Tags them via:twitter, from:user, to:users (anyone mentioned @ in the tweet) and with the hashtags mentioned in the tweet

It correctly handles historical data for all of them (twitter it only goes back to the limit of your friends timeline, not everything ever mentioned by a friend of yours), with timestamps set appropriately (on reddit the timestamp is the post date rather than the date you thumbed it up).

It’s all very rough around the edges at the moment, but it does work rather well if you’re prepared to put up with its quirks. The basic mechanism is as follows: There’s a script “reddilicious” in the distribution (note: It currently gets unhappy if you refer to it with a symlink due to path handling issues. Fixing that is on my todo list) which handles all its operations. A reddilicious instance corresponds to a directory where it stores all its data (including passwords in plaintext. Sorry). You create an instance with:

reddilicious create somedir

add accounts to it with e.g.

reddilicious twitter somedir

and update it with

reddilicious update somedir

update will pull in new items and post them to delicious. It should be pretty well behaved about not stomping existing bookmarks when there are duplicates (but will add in additional tags to them, add an extended description if there isn’t one already there, etc).

Its logging output is very chatty (an artifact of my spending way too much time debugging it), and currently logs to stdout. I currently have it running in cron redirected to a file.