Open source nostalgia

I write a lot of open source code.

This is not, however, the same as saying I write a lot of useful open source code. I suspect 90% of open source code I’ve written has never been productively used by anyone except me (and probably half of that 90% hasn’t been used productively by me either and was just an amusing hack). That’s ok though. It’s not really meant to be – when I create a project I actively intend for other people to use I put a bit more effort into it, but for the most part it’s really a case of “I wrote this. I don’t care about it that much. Maybe someone else will find it useful”. Most of the time they don’t, but far be it from me to judge what would and wouldn’t be useful to other people among my random hackings.

There was a nice example of this the other day, when I had the following exchange on twitter with Josh Reich (transcribed in IRC format because I don’t have a better twitter transcript convention):

<@i2p> I'm stealing some java code found on the internets written by one @DRMacIver - thanks!
<@DRMacIver> @i2pi You're welcome. :-) Which code?
<@i2pi> @DRMacIver FlatteningIterator + its recursive cousin
<@DRMacIver> @i2pi Oh,wow. Code from the dawn of time. :-)
<@i2pi> @DRMacIver It's funny - years ago I used your FlatteningIterator, but never realized it was 
             yours until I went to grab the code again today.
<@DRMacIver> @i2pi Glad it was useful! I'm never sure how much, if any, of the random code I put 
             out there is.

That’s all. Nothing earth shattering – but it was nice to get a thank you for a little bit of code I wrote once upon a time, and to know that that code had helped someone out. It made me smile on what was otherwise turning out to be a fairly meh day, so was extra appreciated for that.

The code in question is here. It’s nothing special – it’s an iterator that recursively descends into other iterators in a depth first traversal. It’s one of those things that just about anyone could write, but if someone else has already written it you’d probably want to reuse their version rather than write your own owing to the presence of a few fiddly edge cases.

To be honest I’d forgotten all about it. I actually wrote this code about 8 or so months after I first really learned to program. It’s not bad code given that: I can’t see anything obvious about it that makes me go “Oh my god, I did that?”, though the design of the overall API is a bit icky (you have iterators which return a mix of values, other iterators and collections, and it descends into every iterator or collection it finds and returns every value), but given the specified behaviour the code is ok. Except for the bug I just noticed where the code for supporting arrays is completely wrong. The commenting style is a bit too “I should write comments as much as possible” so a little over-obvious. Still, it was nice to know I wasn’t a complete idiot back then (the Array bug is an instance of stuff I seem to still be guilty of today – not testing enough – so I’m not counting that)

Anyway, I enjoyed being reminded of this code, enjoyed the fact that it was useful to someone and appreciated being thanked for it. I didn’t really have anything more profound to say than that.

This entry was posted in programming and tagged on by .

Crowding the trampoline

As most of you probably know by now, even though I don’t talk about them that much, I work for a company called Trampoline Systems. We’re a startup doing some interesting tech things. That’s not what this post is about.

We’re seeking series B funding at the moment, but it’s a difficult time to be doing it through the normal VC route, so now we’re trying something new: Crowdfunding. Rather than getting a few people to give us lots of money, let’s get lots of people to give us a little money. Alistair knows more about it than me, so I’ll refer you to him if you want to know the details.

There are a bunch of legal difficulties with this in terms of who the FSA will allow us to solicit funding from. In particular I’d be surprised if even 10% of people reading this were on the list. So, this isn’t a “Give us money” request. To be honest, even if it weren’t for the resgulations it probably wouldn’t have been – other people in the company know more about the financial side of things and can say it better than I can.

What’s most interesting to me about the crowd funding isn’t actually the financial aspects. I mean, obviously ensuring the survival of the company is a good thing, but the crowd funding is interesting in a way that merely receiving a big chunk o’ VC funding wouldn’t have been (not that it would have been unwelcome!).

What’s interesting is the additional flexibility it buys. I’m big on the subject of open source and open information (I’m not a GNU style fanatic – I’m absolutely fine with closed source too. I believe in closing as much source as you need to and opening as much source as you can). There’s been a movement amongst the dev team (particularly me and Craig, our CTO) to see what we can extract from SONAR in the way of useful open source tools. Our term extraction code for example (which takes a blob of text and gives you useful fragments of text from it which make sense in isolation) is ripe for open sourcing. Unfortunately we’ve held off on it because it sounds like a much bigger chunk of our IP than it actually is, and we need to be super careful about how things look to our funders. This is understandable from their point of view, but somewhat disheartening from mine.

With crowd funding our hope (or at least my hope. This is all still under discussion) is that the larger group will be much more amenable to a policy of openness than the smaller. In many ways it’s much more in keeping with the style of the thing, and with less invested per person there’s less of a strong financial incentive to be risk averse and more of a reason to trust us with these decisions.

So, from my point of view, I’m quite looking forward to seeing what the future brings and, with any luck, it will include a few shiny new toys for you to play with.

This entry was posted in life, programming on by .

How packages work in Scala

THIS PIECE IS FULL OF LIES DO NOT TRUST IT

More accurately, its information is out of date and no longer valid. This describes the old behaviour of the Scala package system. Its behaviour has been different from this for some years now, as it turned out most people weren’t reaching the “acceptance” stage I describe below and after enough shouting the behaviour got changed. This is preserved solely for posterity. Do not rely on it for accurate information.

Original piece follows:

Every now and then someone discovers how packages work in Scala. This process typically passes through a number of stages.

  1. Confusion: “Hey, guys, I found this weird bug. Can you take a look?”
  2. Surprise: “What? It works like that? Really?”
  3. Denial: “No, I don’t believe you. This has to be a bug.”
  4. Anger: “Dear scala-debate. This is the worst feature in the entire world, and if you don’t agree with me you’re a big poopy head”
  5. Acceptance: “Actually, this is quite a neat feature”

Not everyone reaches step 5. Many stay in step 4 permanently, often because they’ve discovered that this interacts poorly with certain conventions they use.

This behaviour is particularly unfortunate because actually Scala’s package behaviour is quite nice. But people don’t seem to be willing to believe this and instead make up all sorts of behaviour which it doesn’t have and never has had and then get upset when the reality does not correspond to their fiction.

And so, in the hopes of dispelling some of this confusion, I bring to you the reality of how packages work in Scala. Some of this is very basic material, but I’m presenting it in case you’ve not explicitly thought about it in these terms as it will help with the leadup to the actually important part.

Identifiers

You have a bunch of identifiers in scope. These are names for things. It doesn’t matter what they’re names for: They could be vals, defs, packages, objects, etc. So for example suppose I have:

package foo;
object bar;
object baz{
   val kittens = "kittens";
}

within this file, say within the object bar, we’ve got a bunch of identifiers in scope: We have foo, the package we are in, bar, an object, and baz, another object. We don’t have kittens in scope (except within the object baz).

Within the object baz, everything in scope at the outer level is in scope here, but we’ve introduced the additional identifier kittens.

Note that a package conceptually constitutes one “level”. Everything from your current package is in scope, regardless of how you split it up into files – I could have moved some of the objects above into separate files and nothing would have changed.

Top level identifiers

Packages like foo are “top level” – they live in the global scope. Any file can refer to the identifier foo.

Nesting of packages

In the same way we had an object inside a package and introduced a new scope, we can nest a package inside a package.

package mammals;

package rodents{ 
   class Rat;
}

This places the package “rodents” inside the package “mammals”. In exactly the same way the object did, this inherits everything from the outer scope (and remember: the scope of the package is the scope of everything

 

package mammals;

class Cat;

package rodents{ 
   class Rat{
     def flee(moggy : Cat) = println("Help, help! Run away! It's " + moggy)
   }
}

the identifiers of the outer scope are available in the inner one.

But this sort of deeply nested package structure gets very ugly to write, so what one tends to do is seperate it out to one package in a given file, even the nested ones, and so there’s syntax to support it:

package mammals.rodents;

class Rat{
  def flee(moggy : Cat) = println("Help, help! Run away! It's " + moggy)
}

This is exactly the same as the previous example except we’ve moved Cat to another file. It’s still in scope as before.

Members

identifiers can have members. These are other identifiers which live on them and can be accessed with a .

For example, to refer to Rat from the package mammals we would refer to it as rodents.Rat.

Shadowing

You can reintroduce the same identifier at an inner level. Going back to our first example suppose we had written baz as

object baz{
   val bar = "kittens"
   val kittens = bar
}

Then kittens would still contain the string “kittens”, as it refers to the definition of bar in the current scope not the outside one. Outside of baz, bar would still refer to the object.

An important aspect of this: You can shadow packages just like anything else!

Suppose we have

package foo{
   object baz;
   package foo{
     object baz;

     object stuff{
       val it = foo.baz;
     }
  } 
}

Then “it” points to the innermost baz, not the outermost one: We’ve shadowed the definition of foo.

And this is where the problem lies.

Suppose I have

package net.liftweb{
   object AwesomeWebWidget{
      def doStuffWith(url : java.io.File) = ...
   }
}

and someone comes along (remember this doesn’t have to be in the same file – it can even be in a jar) and introduces

package net.java.kittens;

class Kitten;

Now the lift code will no longer work! The problem is that what we have actually looks like this:

package net{
   package java{
     package kittens{
       class Kitten;
     }
   }

   package liftweb{
      object AwesomeWebWidget{
         def doStuffWith(url : java.io.File) = ...
      }
   }
}

the problem is we have a different java identifier in scope than the one we wanted this to mean. It actually refers to the java identifier that we acquire from the net package, rather than the base java that lives in the root as desired. This is the problem that sparked the latest “discussion” in scala-debate on this subject.

The solutions

One thing which everyone immediately leaps to propose is to change the way imports work in Scala. Hopefully the above should have demonstrated that this wouldn’t help: I have not mentioned the word “import” anywhere in this explanation. So we can safely discard this as a non-solution.

The primary current solution is, unfortunately, a bit of an ugly one. When you want to say “the java at the root and I really damn mean it” you can refer to it as _root_.java.io.File. Adding this to your fully qualified names will force it to refer to the right one. Many people have taken to using _root_ on all their imports to fully qualify them. Personally I don’t feel the need (I don’t use Java reverse name conventions though, so I rarely run into the negative aspects of this behaviour).

Some people have taken to fully qualifying all their imports to prevent this sort of accidental shadowing. Personally I find this highly unnecessary. My preferred solution is to avoid the reverse domain name convention: Not having your top level package as something common greatly reduces the ability to accidentally have packages injected into your scope like this.

Other solutions are currently under discussion in scala-debate, so some of this may be prone to change

This entry was posted in programming and tagged on by .

reddilicous: Automatically import your links from other sites into delicious

I appear to have done something highly out of character and created a tool which is simply useful, without any real theoretical interest to it.

I followed up on the useful scripts I posted a while ago and decided to turn it into something slightly more complete and robust. The result is reddilicious, a tool for automatically importing from various sites into delicious. (Note: It’s written in Ruby, mostly due to the rather excellent HTTParty and Mechanize libraries. If you’re coming from planet scala expecting to see my awesome scala code, sorry, instead you get some rather grim Ruby code).

It currently handles:

  • Reddit: Pulls in any pages you’ve voted up. Tags them with the subreddit and via:reddit
  • Stumbleupon: Similarly pulls in any pages you’ve thumbed up. Pulls in your blog entry on it if there is one as an extended comment. Pulls in any tags. Tags it via:stumbleupon
  • Twitter: Pulls in any links mentioned in your friends timeline. Tags them via:twitter, from:user, to:users (anyone mentioned @ in the tweet) and with the hashtags mentioned in the tweet

It correctly handles historical data for all of them (twitter it only goes back to the limit of your friends timeline, not everything ever mentioned by a friend of yours), with timestamps set appropriately (on reddit the timestamp is the post date rather than the date you thumbed it up).

It’s all very rough around the edges at the moment, but it does work rather well if you’re prepared to put up with its quirks. The basic mechanism is as follows: There’s a script “reddilicious” in the distribution (note: It currently gets unhappy if you refer to it with a symlink due to path handling issues. Fixing that is on my todo list) which handles all its operations. A reddilicious instance corresponds to a directory where it stores all its data (including passwords in plaintext. Sorry). You create an instance with:

reddilicious create somedir

add accounts to it with e.g.

reddilicious twitter somedir

and update it with

reddilicious update somedir

update will pull in new items and post them to delicious. It should be pretty well behaved about not stomping existing bookmarks when there are duplicates (but will add in additional tags to them, add an extended description if there isn’t one already there, etc).

Its logging output is very chatty (an artifact of my spending way too much time debugging it), and currently logs to stdout. I currently have it running in cron redirected to a file.

This entry was posted in programming and tagged on by .

Axioms, definitions and agreement

A while ago I posted A Problem of Language, a response to an article claiming that Scala was not a functional language. This isn’t an attempt to revive that argument (and please don’t respond to it with such attempts. I’m likely to ignore or delete comments on the question of whether Scala is a functional language). It’s a post which is barely about programming, except by example. Really it’s a post about the philosophy of arguments.

My point was basically that without a definition of “functional language” (which no one had provided) it was a meaningless assertion to make.

Unfortunately this point isn’t really true. I think I knew that at the time of writing but glossed over it to avoid muddying the waters, as it’s false in a way that doesn’t detract from the basic point of the article, but it’s been bugging me slightly so I thought I’d elaborate on the point and the basic ideas.

Let’s start with what’s hopefully an unambiguous statement:

Brainfuck is not a functional language

Hopefully no one wants to argue the point. :-)

Well, why is brainfuck not a functional language? It doesn’t have functions!

So, we’re making the following claim:

A functional language must have a notion of function

(in order to make this fully formal you’d probably have to assert some more properties functions have to satisfy. I can’t be bothered to do that).

Hopefully this claim is uncontroversial.

But what have we done here? We’ve, based on commonly agreed statements, proved that Brainfuck is not functional without having defined “functional programming language”. i.e. my claim that you need a definition in order to meaningfully claim that a language is not functional is false.

What you need in order to make this claim is a necessary condition for the language to be functional. Then on showing that condition does not hold you have demonstrated the dysfunctionality of the language.

But how do we arrive at necessary conditions without a definition? Well, we simply assert them to be true and hope that people agree. If they do agree, we’ve achieved a basis on which we can conduct an argument. If they don’t agree, we need to try harder.

A lot of moral arguments come down to this sort of thing. Without wanting to get into details, things like arguments over abortion or homosexuality frequently come down to arguments over a basic tenet: Do you consider a fetus to be of equal value to a human life, do you consider homosexuality to be inherently wrong, etc. (what I said about arguments RE Scala holds in spades for arguments on these subjects). It’s very rare for one side to convince the other of anything by reasoned argument, because in order to construct a reasoned argument you have to find a point of agreement from which to argue and that point of agreement just isn’t there.

Mathematically speaking, what we’re talking about is an Axiom. Wikipedia says:

In traditional logic, an axiom or postulate is a proposition that is not proved or demonstrated but considered to be either self-evident, or subject to necessary decision. Therefore, its truth is taken for granted, and serves as a starting point for deducing and inferring other (theory dependent) truths.

I consider this definition to be true, but perhaps a bit obfuscated. I’d like to propose the following definition. It’s overly informal, but I find it’s a better way to think about it:

An axiom is a point which we agree to consider true without further discussion as a basis for arriving at an agreement.

(This may give the hardcore formalists a bit of a fit. If so, I apologise. :-) It is intended to be formalist more in spirit than letter )

The most important part of this is that axioms are social tools. They don’t have any sort of deeper truth or meaning, they’re just there to form a basis for the discussion.

This entry was posted in Numbers are hard, rambling nonsense and tagged , on by .