Consider what should be the basis of a general purpose collections API. In Scala it is Iterable, which uses an Iterator to define a foreach method. Most other things are defined in terms of this (note: This statement is untrue. Many of them use the iterator directly. But it is morally true).
This is all icky and imperative. Yuck.
As every good functional programmer knows, the true notion behind this silly “iteration” thing is actually folding.
The JVM is mean to us and doesn’t support lazy evaluation or variable sized stacks, so foldRights tend to be an exercise in pain: They stack overflow distressingly soon. So let’s base our collections API on foldLeft: Most designs of it are tail recursive (or iterative. Sigh). As well as dealing with JVM limitations this lets us develop a very high performance interface, and we can do pleasant things like dealing with very large lazy collections in constant space[1]
So:
trait Foldable[T]{
def foldLeft[S](s : S)(f : (S, T) => S) : S;
}
There. That’s a nice basic design.
What methods can we add to this?
Well the obvious thing we can do to placate the Java programmers:
def foreach(f : T => Unit) = foldLeft[Unit](())((s, t) => f(t))
Now let’s define some stuff we care about. One handy method on Iterable is find. So let’s add that.
def find(f : T => Boolean) : Option[T] =
foldLeft[Option[T]](None)((s, t) =>
s match {
case Some(s) => Some(s);
case None if f(t) => Some(t);
_ => None
})
Nice, isn’t it?
So what happens if we do Stream.from(1).find(_ > 10).
It loops forever. :-(
Unfortunately our nice efficient foldLeft based implementation always eats the entire collection. This is sad: We can’t deal with infinite collections.It’s also really inefficient for large collections where we don’t need the whole thing.
So let’s add a way of stopping when we’re done:
def foldLeft[S](s : S)(f : (S, T) => Option[S]) : S;
The idea is that the Option is used to signal a stop condition. We return a None when we’re done consuming the collection, and the foldLeft stops.
Rewriting find in terms of this:
def find(f : T => Boolean) : Option[T] =
foldLeft[Option[T]](None)((s, t) =>
s match {
case Some(s) => None;
case None if f(t) => Some(Some(t));
_ => Some(None)
})
Ok. That’s a bit gross. But that’s purely a function of the fact that we were already trying to return an Option[T]. I’m sure for normal code it will look ok.
What happens is that once we’ve already found something, we now know we can stop, so we return a None to indicate stopping.
Hmm. It would be nice if we could incorporate some of this functionality into find itself. Let’s consider the following example:
(1 to 100000000 by 3).find(_ == 11)
We know this collection is ordered, so once we’ve gotten to 13 we know we’re never going to find 11. But find keeps on trucking till it hits the end of the collection. :-(
So, let’s modify find to support early termination as well. First we change its signature to f => Option[Boolean], and then…
No, sorry. My sense of good taste just revolted.
The problem is we now have to modify all signatures that might possibly want to support early termination of the collection. And they then have to explicitly deal with passing that on to foldLeft in the right way. All just to allow callers to pretend that the collection is smaller than it really is. It would be nice if we could write this functionality once and then have all the higher order functions based on foldLeft inherit it for free.
Agreed?
Well, there fortunately *is* a way to do that. It involves some loss of type safety, but as long as foldLeft is always properly implemented this should be acceptable.
case object Stop extends Throwable;
def stop : Nothing = throw Stop;
Implementations of foldLeft then have to catch Stop and treat that point as the end of the collection.[2]
Now, let’s implement find as follows:
def find(f : T => Boolean) : Option[T] =
foldLeft[Option[T]](None)((s, t) =>
s match {
case Some(s) => stop;
case None if f(t) => Some(t);
_ => None
})
Only a simple modification to our original implementation, but now it correctly consumes only a minimal amount of the collection (actually it consumes one element too many, but that’s because I’ve modified the implementation from the real one we’d use for pedagogical reasons).
And, for free, we can use stop in the argument to find as well:
(1 to 100000000 by 3).find{x => val c = x compare 11; if(c > 0) stop else (c == 0)}
And we no longer continue past the point where we need to, even though we don’t find our element. Yay!
So, hopefully I have persuaded you that left folds with an early termination forms a nice basis for a collection library: Clean, functional, and efficient. What more could you hope for?
Oh, before I finish. One more little bone to throw to our Java friends to make them feel more comfortable with this design:
def break = stop;
[1] See “Towards the best collections API” for a discussion of building an API around this design (note that their solution does involve a bit more than what we talk about here, some of which is linguistically difficult in Scala without continuations).
[2] You may be making squawking noises about performance now. Don’t. It’s fine. But now is not the time for discussion of this.