# How packages work in Scala

### THIS PIECE IS FULL OF LIES DO NOT TRUST IT

More accurately, its information is out of date and no longer valid. This describes the old behaviour of the Scala package system. Its behaviour has been different from this for some years now, as it turned out most people weren’t reaching the “acceptance” stage I describe below and after enough shouting the behaviour got changed. This is preserved solely for posterity. Do not rely on it for accurate information.

Original piece follows:

Every now and then someone discovers how packages work in Scala. This process typically passes through a number of stages.

1. Confusion: “Hey, guys, I found this weird bug. Can you take a look?”
2. Surprise: “What? It works like that? Really?”
3. Denial: “No, I don’t believe you. This has to be a bug.”
4. Anger: “Dear scala-debate. This is the worst feature in the entire world, and if you don’t agree with me you’re a big poopy head”
5. Acceptance: “Actually, this is quite a neat feature”

Not everyone reaches step 5. Many stay in step 4 permanently, often because they’ve discovered that this interacts poorly with certain conventions they use.

This behaviour is particularly unfortunate because actually Scala’s package behaviour is quite nice. But people don’t seem to be willing to believe this and instead make up all sorts of behaviour which it doesn’t have and never has had and then get upset when the reality does not correspond to their fiction.

And so, in the hopes of dispelling some of this confusion, I bring to you the reality of how packages work in Scala. Some of this is very basic material, but I’m presenting it in case you’ve not explicitly thought about it in these terms as it will help with the leadup to the actually important part.

### Identifiers

You have a bunch of identifiers in scope. These are names for things. It doesn’t matter what they’re names for: They could be vals, defs, packages, objects, etc. So for example suppose I have:

package foo;
object bar;
object baz{
val kittens = "kittens";
}

within this file, say within the object bar, we’ve got a bunch of identifiers in scope: We have foo, the package we are in, bar, an object, and baz, another object. We don’t have kittens in scope (except within the object baz).

Within the object baz, everything in scope at the outer level is in scope here, but we’ve introduced the additional identifier kittens.

Note that a package conceptually constitutes one “level”. Everything from your current package is in scope, regardless of how you split it up into files – I could have moved some of the objects above into separate files and nothing would have changed.

### Top level identifiers

Packages like foo are “top level” – they live in the global scope. Any file can refer to the identifier foo.

### Nesting of packages

In the same way we had an object inside a package and introduced a new scope, we can nest a package inside a package.

package mammals;

package rodents{
class Rat;
}

This places the package “rodents” inside the package “mammals”. In exactly the same way the object did, this inherits everything from the outer scope (and remember: the scope of the package is the scope of everything

package mammals;

class Cat;

package rodents{
class Rat{
def flee(moggy : Cat) = println("Help, help! Run away! It's " + moggy)
}
}

the identifiers of the outer scope are available in the inner one.

But this sort of deeply nested package structure gets very ugly to write, so what one tends to do is seperate it out to one package in a given file, even the nested ones, and so there’s syntax to support it:

package mammals.rodents;

class Rat{
def flee(moggy : Cat) = println("Help, help! Run away! It's " + moggy)
}

This is exactly the same as the previous example except we’ve moved Cat to another file. It’s still in scope as before.

### Members

identifiers can have members. These are other identifiers which live on them and can be accessed with a .

For example, to refer to Rat from the package mammals we would refer to it as rodents.Rat.

You can reintroduce the same identifier at an inner level. Going back to our first example suppose we had written baz as

object baz{
val bar = "kittens"
val kittens = bar
}

Then kittens would still contain the string “kittens”, as it refers to the definition of bar in the current scope not the outside one. Outside of baz, bar would still refer to the object.

An important aspect of this: You can shadow packages just like anything else!

Suppose we have

package foo{
object baz;
package foo{
object baz;

object stuff{
val it = foo.baz;
}
}
}

Then “it” points to the innermost baz, not the outermost one: We’ve shadowed the definition of foo.

And this is where the problem lies.

Suppose I have

package net.liftweb{
object AwesomeWebWidget{
def doStuffWith(url : java.io.File) = ...
}
}

and someone comes along (remember this doesn’t have to be in the same file – it can even be in a jar) and introduces

package net.java.kittens;

class Kitten;

Now the lift code will no longer work! The problem is that what we have actually looks like this:

package net{
package java{
package kittens{
class Kitten;
}
}

package liftweb{
object AwesomeWebWidget{
def doStuffWith(url : java.io.File) = ...
}
}
}

the problem is we have a different java identifier in scope than the one we wanted this to mean. It actually refers to the java identifier that we acquire from the net package, rather than the base java that lives in the root as desired. This is the problem that sparked the latest “discussion” in scala-debate on this subject.

### The solutions

One thing which everyone immediately leaps to propose is to change the way imports work in Scala. Hopefully the above should have demonstrated that this wouldn’t help: I have not mentioned the word “import” anywhere in this explanation. So we can safely discard this as a non-solution.

The primary current solution is, unfortunately, a bit of an ugly one. When you want to say “the java at the root and I really damn mean it” you can refer to it as _root_.java.io.File. Adding this to your fully qualified names will force it to refer to the right one. Many people have taken to using _root_ on all their imports to fully qualify them. Personally I don’t feel the need (I don’t use Java reverse name conventions though, so I rarely run into the negative aspects of this behaviour).

Some people have taken to fully qualifying all their imports to prevent this sort of accidental shadowing. Personally I find this highly unnecessary. My preferred solution is to avoid the reverse domain name convention: Not having your top level package as something common greatly reduces the ability to accidentally have packages injected into your scope like this.

Other solutions are currently under discussion in scala-debate, so some of this may be prone to change

This entry was posted in programming and tagged on by .

# Determining logical project structure from commit logs

In a bored 5 minutes at work I threw the following together: Logical source file groupings in the Scala repo

The largest cluster is clearly noisy and random. I more or less expected that. But the small and medium ones often make a lot of sense.

The basic technique is straightforward: We use a trivial script to scrape SVN logs to get a list of files that change in each commit. We use this to calculate the binary pearsons of these observations to get a measure of the similarity between two files (a number between -1 and 1, though we throw away anything <= 0). We then use markov clustering to cluster the results into distinct groupings.

The results are obviously far from perfect. But equally obviously there’s a lot of interesting information in them, and the technique could certainly be refined (e.g. by looking at sizes of diffs on each file and using that rather than a simple 0/1 changed. Also experimenting with other clustering algorithms, etc). Maybe something worth pursuing?

This entry was posted in programming and tagged , , on by .

# Scala trivia of the day: Traits can extend classes

A lot of people seem to not know this. In particular 90% of use of self types I’ve seen appear to exist solely because people do not know this.

Observe the following interpreter session:

scala> class Foo;
defined class Foo

scala> class Bar;
defined class Bar

scala> trait Stuff extends Foo;
defined trait Stuff

scala> new Foo with Stuff;
res0: Foo with Stuff = [email protected]

scala> new Bar with Stuff;
<console>:8: error: illegal inheritance; superclass Bar
is not a subclass of the superclass Foo
of the mixin trait Stuff
new Bar with Stuff;
^

You can basically view this as putting a constraint on the trait, saying that all classes that implement this trait must extend this superclass. This can be particularly useful for adding various sorts of behaviour to classes. e.g. traits which add behaviours to GUI components.

Thus ends our public service announcement.

This entry was posted in programming and tagged on by .

# How do you talk about Scala?

I gave a talk about Scala at Last.fm last night (It’s not online: not on Last.fm. At. I physically walked over to their offices and gave the talk to some of their devs).

Depending on how you look at it, it either went moderately well or was a complete disaster. People seemed interested, and went away knowing more about Scala than they came in with, so that part went well. The big issue is that I never actually talked about the subject I went in prepared to talk about. Oops.

I think part of the problem is that I aimed the talk at slightly too high a level. The talk I had designed was about testing with ScalaCheck, as that’s a pretty nice distinguisher of Scala from most of the other JVM languages – it’s something that really takes advantage of the type system in powerful ways, but isn’t too scary. I went in with a set of code I’d already written and was going to live code the tests for it. I still think this would have been a good talk, but I think it would have been a much better second talk on Scala than an introduction.

What happened instead was that we very quickly got diverted onto basic questions of syntax and semantics, and ended up touring Scala through the interpreter and performing a sort of general Q&A about it. This worked reasonably well (at least partly because Miles joined in and backed me up on some of the questions. Thanks, Miles), and would probably have worked even better if I’d come in with a couple small examples to demo this way. So maybe there’s something worth refining in there – certainly the “Here’s the interpreter. Let’s talk about code” model seemed like a pleasant one.

But I feel the talk still sortof missed the mark. It wasn’t a bad introduction to Scala, but it wasn’t a good “Here’s why you should use Scala”. And I’m still not sure how to do one.

The traditional way to present Scala seems to be to present it as a better Java. “Here’s an example in Java. Let’s translate it to Scala. Oooh, look how much shorter it is!”. This seems to go pretty well from what I’ve heard reported. But I have a couple problems with it: Some philosophical, some practical.

The biggest one is that the JVM is not short of languages which are better than Java. I don’t consider the question “Should I use Scala instead of Java” interesting. The answer, as far as I’m concerned, is obviously YES. Except for a few remaining interoperability issues I don’t consider that there’s any good reason to use Java these days (Disclaimer: The one exception I grant is in creating other JVM languages which you don’t want to be burdened with the Scala standard library on top of your language’s runtime). The interesting questions are “Should I use Scala instead of Clojure?” or “Should I use Scala instead of JRuby?”.

And here’s the thing: Rewriting a simple Java example in a better language looks exactly the same with almost any nicer language. You do a straight port, you find opportunities to introduce some functional programming, maybe you take advantage of sequence comprehensions or such like, etc. but basically what you end up with is the original program with a terser syntax and a few obvious abstractions factored in. It’s really not that exciting, and it doesn’t sell the language well. It might, if you’re talking to a group that has no experience of other languages (I wasn’t), but even then you’re basically playing on your group’s ignorance rather than making a compelling argument.

There are a bunch of neat things in Scala that could be used as a compelling argument: Scalacheck, actors, parser combinators, pattern matching in general, etc. but I’m not really how to go from zero to a full fledged example in one of these in only an hour. Perhaps if (rather than starting from a blank slate like I did) I started from a pre written example and disected it, but I’ve no idea if that would work any better.

So, what have you done? Did it work? Any ideas for how to do it better?

This entry was posted in programming and tagged on by .

# Writing things right

OO has contributed many big and important innovations to programming. Among these, the foremost is that you write functions after rather than before their argument.

No, really.

It’s not just OO languages of course. Concatenative languages do the same thing. There’s a long history of mathematicians doing it as well (though we don’t like to talk about them. The cool mathematicians all write their functions on the left).

It’s funny how attached people get to this fact though.

Consider the following piece of Scala code:

object StringUtils{
/**
* Trims whitespace from the end of s.
*/
def rtrim(s : String) = ...
}

We can invoke this as StringUtils.rtrim(myString). Or if we import StringUtils, just rtrim(myString);

People get very upset if you ask them to do so though, and they go to all sorts of lengths to avoid it.
Consider the following three examples from different languages:

Scala:

object StringUtils{
implicit def string2RTrim(s : String) = new { def rtrim = ...; }
}

Ruby:

class String
def rtrim
...
end
end

C#:

class StringUtils{
public static String rtrim(this String s) {
...
}
}

What do these achieve over the previous version? Simple: You can write myString.rtrim instead of rtrim(myString). That’s it. (Actually the Ruby and Scala versions both *can* allow you to do different things than that. It’s just that here and in 90% of the use cases they aren’t used for anything else. The C# version literally doesn’t do anything else).

The thing is, while I’m making fun of this to a certain degree, it’s actually a perfectly reasonable thing to want to do. Designing things in noun-verb order is a good principle of UI design, and it works for programming as well. Things chain better – when you want to add new functions to a pipeline you add them at the point your cursor is naturally at and it matches well with thinking of it as a pipeline of “take this thing, do this to it, do that to it, do this other thing to it, get this value out”. Also you write far fewer brackets. :-) (compare Haskell’s foo . bar . baz \$ thing idiom for a similar bracket avoidance tool).

Of these, I’d say that the Ruby solution is the most obvious (it just uses the fact that classes are open to add a new method to String), but it comes with the possibility of amusingly non-obvious runtime errors when someone else defines a conflicting method. The C# solution seems the best to me – it’s relatively little overhead over writing the utility method as you would otherwise and comes with the option to invoke it either as myString.rtrim or StringUtils.rtrim(myString), so when namespacing conflicts inevitably occur you have an easy fallback. But of course it uses a language feature specifically added to do this, while the other two are functions of more general language features. The Scala solution is, to my mind, decidedly the worst of the three.It’s syntactically noisy and comes with a significant additional runtime overhead.

But honestly I’m not particularly happy with any of these solutions. The Scala and Ruby solutions come with disproportionate costs to the benefit they give and the C# solution requires an additional language feature. Moreoever, each of these solutions requires effort at each definition site in order to make something available that you always want at the use site. Wouldn’t it be better if for every utility function you automatically had the option to write it on the right?

Let’s take a digression. What language is the following (rather pointless) code written in?

[1, 2, 3].sort.length

Ruby, right?

Actually, no. It’s Haskell.

Wait, what?

Well, it’s Haskell if you do something slightly evil and redefine the (.) operator (which normally means composition):

Prelude Data.List> let (.) x f = f x
Prelude Data.List> [1, 2, 3].sort.length
3

I saw this trick a while ago (the author was amusingly apologetic for it). It’s evil Haskell code because of the way it redefines an operator that normally means something else (this is totally typesafe of course – existing code will continue to use the old operator definition). But it’s a perfectly valid operator definition, and a rather nice one.

It works well with additional arguments to functions too:

Prelude Data.List> [1, 2, 3].sortBy(compare).length
3

The reason this works is that sortBy takes the list argument curried as its last argument, so sortBy(compare) gives something of type [Int] -> [Int] which we can then apply as above (Haskell’s precedence rules make this work).

So this is a nice trick, but how is it useful to you? Well, it’s probably not. I can’t think of any low noise way of making it work in any of the other languages mentioned so far (the best I can come up with is an evil evil hack in Ruby that would make god go on a kitten killing spree and a mildly nasty hack with operators and implicit conversions in Scala that’s much too noisy to really use), and using it in Haskell will make other Haskell programmers very unhappy with you. But it’s an interesting trick, and I’ll be sure to bear it in mind if I ever get around to creating DRMacIverLang.

This entry was posted in programming and tagged , , , , on by .