# Planet Scala: By Scala programmers, usually about Scala

Planet Scala’s selection of feeds has always been a bit haphazard – in some cases the whole feed, in some cases a Scala specific one.

As I mentioned I was thinking of doing at the end of last year, I’m experimenting with changing this. Except in the highest non-scala : scala volume ratios, I’m switching the feeds over to provide the full feed. My impression so far is that 90% of the additional stuff acquired this way should be of general interest to Scala programmers and that the volume is not that high.

You’ll probably notice an initial spike in non-Scala content as the backlog of non-Scala posts becomes available, but this should settle down fairly rapidly.

I’ll also look into an easy way of providing a Scala-filtered RSS feed on top of this for people who don’t want the extra content.

This entry was posted in Admin, programming and tagged , on by .

# A cute Scala hack

 class ErrRef[S](s : S){ private[this] var contents : Either[Throwable, S] = Right(s);   def update(value : =>S){ contents = try { Right(value) } catch { case (e : Throwable) => Left(e) } }   def apply() = contents match { case Right(s) => s; case Left(e) => throw e.fillInStackTrace(); } }   object ER{ def apply[S](s : S) = new ErrRef(s); }

And using it…

 scala> ER(1) res0: ErrRef[Int] = ErrRef@a96606   scala> res0() res1: Int = 1   scala> res0() = 3   scala> res0() res3: Int = 3   scala> res0() = { println("Hello world"); 3} Hello world   scala> res0() res5: Int = 3   scala> res0() = error("Lets see what happens here...")   scala> res0() java.lang.RuntimeException: Lets see what happens here... at ErrRef.apply(RefExcept.scala:11) at .(:6) at .() at RequestResult$.(:3) at RequestResult$.() at RequestResult$result() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAcc... scala> res0() = throw new IllegalArgumentException("Go away") scala> res0() java.lang.IllegalArgumentException: Go away at ErrRef.apply(RefExcept.scala:11) at .(:6) at .() at RequestResult$.(:3) at RequestResult$.() at RequestResult$result() at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorI...
This entry was posted in programming and tagged , on by .

So I was taking a look at the Scala Array source code, as part of research for a forthcoming blog post (coming soon to a drmaciver.com near you) when I suddenly realised something. Then I suddenly had a WTF moment.

The scala.Array object has a method appy(xs : Int*) : Array[Int], and a method appy(xs : Double*) : Array[Double], and… etc. i.e. it has an overload for each value type and one for the reference types.

Something like this:

 object Overload{ def foo(xs : String*) = "foo"; def foo(xs : Int*) = "bar"; }

Um.

 /home/david/Foo.scala:3: error: double definition: method foo:(Int*)java.lang.String and method foo:(String*)java.lang.String at line 2 have same type after erasure: (Seq)java.lang.String def foo(xs : Int*) = "bar"; ^

See, Scala varargs use Seq, so all varargs erase to the same thing.

So how the hell is scala.Array working? Is it compiler magic?

Well, sortof compiler magic. It’s a compiler magic you too can use. Lets change the above code slightly:

 object Overload{ def foo(xs : String*) = "foo"; def foo(xs : Int*) = 3; }

This compiles and works fine.

So, what’s going on here?

Well, although this fact is visible only as an implementation detail in most languages, the JVM actually lets you overload based on return type. Two methods are considered identical if they have precisely the same (erased) argument and return types. And although Scala doesn’t let you overload based on return type directly, it seems it will happily accept methods whose arguments erase to the same thing as long as their return types don’t.

This is a bit strange.

This entry was posted in programming and tagged , , on by .

# New collections in Scala 2.7.2

Astute observers might have noticed my name on the release note for the latest version of Scala. Alas, not because I’ve been involved in reengineering the compiler from the ground up and making Scala into a dependently typed purely functional language with backwards in time debugging, but because I’ve added a bunch of collections implementations to the standard library. This post is just a brief introduction to them.

Performance disclaimer: All numbers I mention in this post should be assumed to be referring to specific results in specific tests run on one computer. Consequently they’re at most a guideline and shouldn’t be assumed to be hard and fast rules about performance

## Immutable collections

There are a number of immutable map implementations included. The principle inspiration for this is that I have a bit of a hatred of the standard immutable HashMap implementation. It’s mutable behind the scenes, which means that it has to synchronize in weird ways and is thus unsafe for passing between multiple threads. Additionally it means that it has a lot of correctness problems, and has performance characteristics more like that of a mutable map (in particular a very low degree of sharing between distinct instances)

All three immutable map implementations I’ve provided are truly immutable. They don’t use mutation behind the scenes and will have a fairly high degree of sharing between distinct instances. They don’t synchronize and should generally be rather more reliable.

Unfortunately they’re also somewhat slower. The numbers for this are all a bit suspect as they’re extremely dependent on number of cores, cache size, etc. so vary from computer to computer, but because of the tree based implementation instead of the array based one they may be about a factor of two slower on get and update (on the other hand, Ismael Juma has reported them being somewhat faster  on get in some tests on his machine. Like I said, the numbers are hard to make precise). Curiously, and I’m not entirely sure why in some cases, they appear to be dramatically faster on bulk operations like map, foreach and filter. Additionally in algorithms which can benefit heavily from sharing they may give you a better algorithmic complexity.

When should you use these? Well, I’m probably going to use them by default when I want immutable maps, and not just because I’m biased. :-) They’re never dramatically slower than the standard immutable HashMap and are occasionally dramatically faster. Additionally there are a lot of nice correctness advantages (at least, I hope so! I certainly don’t promise that they’re bug free, but I’ve tested them reasonably heavily).

As a general point of advice: None of the immutable collections performance is that great. If you need to build a collection once and then never modify it and you have fairly stringent performance requirements, you may be better building a mutable one and then calling readOnly on it. It won’t give you an order of magnitude speedup, but it will definitely be faster.

### immutable.IntMap & immutable.LongMap

This is a pair of data structures specialised for integer and long keys respectively, implementing Okasaki and Gill’s “Fast Mergeable Integer Maps”. As well as the standard operations for immutable maps they offer a number of additional methods for merging two maps together (most of which I think should probably be added to immutable.Map).

### immutable.TreeHashMap

TreeHashMap should be a drop in replacement for the standard immutable.HashMap. Its implementation is as an IntMap of hash codes to lists of key value pairs (well, it’s not literally a List[(K, V)], but it’s the same idea).

## Mutable collections

By and large there aren’t correctness problems with Scala’s mutable collections. The main problem with them is that they’re a bit slow. So I added two alternative implementations which should be significantly faster.

### ArrayStack

The most natural way to build an iterator over a tree is by using a stack to simulate recursion. So that’s what I did when I built the iterator for IntMap. To my great surprise it was incredibly slow. About 5 times slower than that of HashMap’s. So I rewrote it to use a special purposed implementation of a fixed depth stack and it promptly sped up by an order of magnitude (so it’s now about twice as fast as HashMap’s).
This was a little shocking, so I investigated and discovered that the mutable stack in the standard library was not exactly brilliantly implemented and was based on a linked list. This was a bit sad, so I provided this implementation based on a growable array. It also contains some useful methods for stack shuffing that the existing one doesn’t.

### OpenHashMap

When I was benchmarking the immutable map implementations I also tried them against the mutable implementations and discovered that jcl.HashMap was significantly faster than mutable.HashMap. This was a bit sad, so I decided to fix it by providing a faster mutabe map implementation. After some experimenting I settled on this implementation which is based on an open hashing scheme approximately borrowed from Python’s dictionary implementation.

Interfacewise, it’s just another mutable map. However it does have the nice feature of providing guarantees about certain types of mutations you can do during iteration. Specifically, you are allowed to delete keys and modify the values of existing mappings (but not add new mappings) while iterating.

This entry was posted in programming and tagged , on by .

# A follow up to yesterday’s article

There seems to have been a lot of misunderstanding about what yesterday’s article was about. Mostly from people who didn’t read past the topic. So, here’s a summary version more in line with your attention spans.

# FUNCTIONAL PROGRAMMING: UR DOIN’ IT RONG

This entry was posted in programming and tagged , on by .