David R. MacIver's Blog: Tail call optimisation in Scala

Tail call optimisation in Scala

4 September 2007

This is just a quick note. I couldn’t seem to find any good information on what sorts of tail call optimisations were performed by Scala so I ran a few quick tests. Disappointingly, the answer seems to be not much. A simple tail recursion was turned into a loop, but the following code got no love:

object Main extends Application{
def foo (x : Int){
if (x == Integer.MAX_VALUE) Console.println(“Hello world!”);
else bar(x + 1);
}

def bar (x : Int){
if (x == Integer.MAX_VALUE) Console.println(“Hello world!”);
else foo(x + 1);
}
foo(0);
}

This isn’t really surprising, although it’s a bit sad. Eliminating general tail calls seems to be quite hard without just converting everything into CPS and making everything work that way (at least so I’m lead to believe. My expertise on the subject is basically nonexistent), which is probably not a great idea on the JVM.

Curiously, if you enabled optimisations with -XO, foo was inlined into bar but the resulting tail recursion was not eliminated. That’s probably a bug.

Update: It’s been pointed out to me that I’ve gotten myself completely confused on the relationship between continuations and tail call elimination. Please ignore any mumbling to that effect. :-) The issue is apparently that TCO is easy when compiling to assembly and hard when compiling to something like JVM byte code which is higher level and doesn’t already support it.

Comments

James Iry on 2007-09-05 04:44:00:

Continuations and TCO aren’t totally unrelated. Full blown continuations require the compiler to have direct control of the “stack” and once you’ve got that control its pretty easy to throw away frames - hence TCO. But continuations aren’t a requirement for TCO. Anyway, it’s not impossible on the JVM but there’s a tradeoff. SICP is a Scheme with full tail call optimization (and first class continuations) on the JVM. It does this by creating its own heap based control structure for function calls. The downside is that SICP pays a performance penalty on all calls, whether or not they need TCO or continuations. Scala aims for Java like performance and seamless integration, so it uses the native JVM stack. The result: full TCO and continuations just aren’t in the picture. On the other hand, tail recursion optimization is easy because the JVM has “goto”. TRO just means the tail recusive call sets the parameter variables to the new values and does a goto back to the “top”.

David R. MacIver on 2007-09-05 09:10:00:

Yep. I understand why tail recursion is relatively easy compared to other tail call optimisations (and why the others are hard). I was just confused as to what would be needed to make the whole thing work ‘properly’. :-)

It’s probably not worth the effort, but I wonder if a policy of aggressive inlining prior to tail call elimination would help? It would be able to get every case, but it would probably help with a lot of the simple ones. e.g. in the example I posted, if you inline foo into bar and vice versa then the resulting functions become tail recursive. In cases where there’s only one cycle in the call graph it should be possible to compleletely optimise out the cycle into a loop. It could cause a significant amount of code bloat, so probably worth making optional.

(You can probably even deal with some more complicated things with multiple cycles by inlining several function bodies into one and swapping between them using gotos. But that’s beyond the scope of this “I wonder...” :-) ).

Iulian on 2007-09-05 14:47:00:

David, you’re right about general tail call optimization and the JVM: there’s no easy way to do it, if you want to keep using JVM’s stack (which I think it’s essential for decent performance). On the other hand, I think there are few useful programs that use tail calls and are not directly tail recursive. Maybe I just didn’t program enough in Scheme :-)

I agree about the weird behaviour after inlining. Tail recursion should be handled after inlining, but right now the compiler does tail recursion optimization very early, and general optimization very late. We should revisit this someday.

David R. MacIver on 2007-09-05 16:26:00:

I think there are probably some quite reasonable use cases for mutually tail recursive functions like in my example. ’though I admit I don’t really have a good one to hand (emulating coroutines maybe?)

More generally, full tail call optimisation (which is not something my suggested “aggressively inline everything!” approach would really help with) enables you to write in continuation passing style. Which umm... I’m sure there’s some good reason to do so... :-)