David R. MacIver's Blog: Writing things right

Writing things right

19 January 2009

OO has contributed many big and important innovations to programming. Among these, the foremost is that you write functions after rather than before their argument.

No, really.

It’s not just OO languages of course. Concatenative languages do the same thing. There’s a long history of mathematicians doing it as well (though we don’t like to talk about them. The cool mathematicians all write their functions on the left).

It’s funny how attached people get to this fact though.

Consider the following piece of Scala code:

object StringUtils{
  /** 
   * Trims whitespace from the end of s.
   */
  def rtrim(s : String) = ...
}

We can invoke this as StringUtils.rtrim(myString). Or if we import StringUtils, just rtrim(myString);

People get very upset if you ask them to do so though, and they go to all sorts of lengths to avoid it.
Consider the following three examples from different languages:

Scala:

object StringUtils{
   implicit def string2RTrim(s : String) = new { def rtrim = ...; }   
}

Ruby:

class String
  def rtrim
  ...
  end
end

C#:

class StringUtils{
   public static String rtrim(this String s) {
     ...
   }
}

What do these achieve over the previous version? Simple: You can write myString.rtrim instead of rtrim(myString). That’s it. (Actually the Ruby and Scala versions both *can* allow you to do different things than that. It’s just that here and in 90% of the use cases they aren’t used for anything else. The C# version literally doesn’t do anything else).

The thing is, while I’m making fun of this to a certain degree, it’s actually a perfectly reasonable thing to want to do. Designing things in noun-verb order is a good principle of UI design, and it works for programming as well. Things chain better - when you want to add new functions to a pipeline you add them at the point your cursor is naturally at and it matches well with thinking of it as a pipeline of “take this thing, do this to it, do that to it, do this other thing to it, get this value out”. Also you write far fewer brackets. :-) (compare Haskell’s foo . bar . baz $ thing idiom for a similar bracket avoidance tool).

Of these, I’d say that the Ruby solution is the most obvious (it just uses the fact that classes are open to add a new method to String), but it comes with the possibility of amusingly non-obvious runtime errors when someone else defines a conflicting method. The C# solution seems the best to me - it’s relatively little overhead over writing the utility method as you would otherwise and comes with the option to invoke it either as myString.rtrim or StringUtils.rtrim(myString), so when namespacing conflicts inevitably occur you have an easy fallback. But of course it uses a language feature specifically added to do this, while the other two are functions of more general language features. The Scala solution is, to my mind, decidedly the worst of the three.It’s syntactically noisy and comes with a significant additional runtime overhead.

But honestly I’m not particularly happy with any of these solutions. The Scala and Ruby solutions come with disproportionate costs to the benefit they give and the C# solution requires an additional language feature. Moreoever, each of these solutions requires effort at each definition site in order to make something available that you always want at the use site. Wouldn’t it be better if for every utility function you automatically had the option to write it on the right?

Let’s take a digression. What language is the following (rather pointless) code written in?

[1, 2, 3].sort.length

Ruby, right?

Actually, no. It’s Haskell.

Wait, what?

Well, it’s Haskell if you do something slightly evil and redefine the (.) operator (which normally means composition):

Prelude Data.List> let (.) x f = f x
Prelude Data.List> [1, 2, 3].sort.length
3

I saw this trick a while ago (the author was amusingly apologetic for it). It’s evil Haskell code because of the way it redefines an operator that normally means something else (this is totally typesafe of course - existing code will continue to use the old operator definition). But it’s a perfectly valid operator definition, and a rather nice one.

It works well with additional arguments to functions too:

Prelude Data.List> [1, 2, 3].sortBy(compare).length
3

The reason this works is that sortBy takes the list argument curried as its last argument, so sortBy(compare) gives something of type [Int] -> [Int] which we can then apply as above (Haskell’s precedence rules make this work).

So this is a nice trick, but how is it useful to you? Well, it’s probably not. I can’t think of any low noise way of making it work in any of the other languages mentioned so far (the best I can come up with is an evil evil hack in Ruby that would make god go on a kitten killing spree and a mildly nasty hack with operators and implicit conversions in Scala that’s much too noisy to really use), and using it in Haskell will make other Haskell programmers very unhappy with you. But it’s an interesting trick, and I’ll be sure to bear it in mind if I ever get around to creating DRMacIverLang.

Comments

Basu on 2009-01-19 13:12:14:

Interesting argument. But I think there is something to be said from a software engineering standpoint. If we do obj.func(), then there is an interpretation that func() is some function that should naturally happen to obj (let’s say trim a string). Hence the language/class/library designer chose to place func as part of the class definition itself.
On the other hand func(obj) means that func is something that could technically be done to obj, but it’s not placed inside obj’s class def because it’s a domain specific application (treating a string as a pathname) or otherwise limited enough in scope that making in part of the class def would only be bloat.
Of course, this doesn’t really help the issue of syntactic clutter for users. Going back to software engineering, there are two things you can do if you need to do something on a string that doesn’t come packaged. Using your string example, you could write a StringUtils class containing a bunch of functions that work on strings, but most languages would give you the problems you described. But you can also subclass String to have SuperString objects on which you can use the postfic notation.
I’ll be the first to admit that using subclassing too much in OO is not a good idea, but I’ll also wager that it’s justified when we’re talking about performing natural actions on objects.

david on 2009-01-19 13:32:10:

No, subclassing isn’t the right solution to the problem at all. The problem is that all the existing methods on String return String, not MyString. e.g. suppose I wanted to do

“foo”.substring(1).rtrim

I’d now have to do

(new MyString(“foo”.substring(1))).rtrim

And it wouldn’t compose.

e.g. (new MyString((new MyString(“foo”)).rtrim.concat(“stuff”))).rtrim

You need to add wrappers for MyString all over the place.

In Scala you could do this with an implicit conversion to remove the baggage (although in both Scala and Java String is final so you can’t do this at all), but then we’re back to the previous non-optimal solution.

The nice thing about the Haskell style solution is that you get to make the choice about position. If you feel that this is somehow “intrinsic” you can write it on the right. If you don’t, you can write it on the left. e.g. print x and x.print are both valid ways of writing it.

Ultimately I don’t think your notion of intrinsic operations on an object is at all useful, and usage patterns from people with extension methods, implicit conversions and open classes seem to agree. I would much rather have the flexibility and low impact of being able to write it either way as I choose.

Onne on 2009-01-19 14:09:03:

Quite useful I’d argue:

length(nodups(toLower(trim(” some String “))))

is quite annoying, compared to:

” some String “.trim().toLower().nodups().length()

The latter is more like a recipe:

mix the eggs and the flour, add the milk, stir until smooth, add the sugar, set aside for 20 minutes … etc.

While the first is like this, and rather hard to read:

set aside for 20 minutes( add the sugar( stir until smooth( add the milk( mix the( eggs and flour )))))

right? first things first, basically; that is why people (at least me) prefer functions after the object. (I’m not a native English speaker, so my recipe example might sound a bit off.)

david on 2009-01-19 14:18:27:

I agree it’s annoying to write and read things in that order. That’s somewhat the point of the article. :-)

But the trick I mention at the end is not all that useful in most languages because of the difficulty of implementing it in a non-noisy manner.

Ricky Clarkson on 2009-01-19 16:42:21:

Onne:

The difference in your case is the amount of nesting of parentheses. If you had to write your second example as ((((“some String”).trim()).toLower()).noDups()).length() you’d feel just as bad about it as the first. Normal Haskell code:

length . nodups . toLower . trim $ “some String”

or:

trim >>> toLower >>> noDups >>> length $ “some String”

Henrik Huttunen on 2009-01-21 05:25:38:

Hi David.

How come “each of these solutions requires effort at each definition site in order to make something available that you always want at the use site”. Did I miss something?

object Test{
object Library{

def sum(array: Array[Int]): Int = array.foldLeft(0)(_ + _)

def filter(array: Array[Int]) = array filter(_ % 2 == 0)
}

import Library._

class Extended[t](x: t){
def >>[s](lambda: t=>s):s = {
lambda(x)
}
}

implicit def any2lambda[t](x : t) = {
new Extended(x)
}

println( Array(4, 3, 10, 5) >> filter _ >> sum _ ) // 14
}

I think this is fairly useful.

jherber on 2009-01-21 23:31:30:

can always view through the lens of hidden intent:

send operation to subject
“string”.send :trim
// when you want to emphasize the invocation of an operation and de-emphasize the declaration between the object and a verb property?

perform verb on object
trim(“string”)
// when you want to emphasize verb?

object has a property that is a verb
“string”.trim
// OO conformance - move along, nothing exciting here!

i would consider order of execution of multiple operations on a subject as yet another dimension. do we want to “pipe” between lazy functions or are we chaining methods or monads?

the bottom may be that flexibility gives us expressiveness, and some languages have chosen to let us assign operators to invocation operations, while others have not :/

popurls.com // popular today on 2009-04-09 21:20:10:

popurls.com // popular today...

story has entered the popular today section on popurls.com...

Keith Braithwaite on 2009-04-09 21:24:47:

You said: OO has contributed many big and important innovations to programming. Among these, the foremost is that you write functions after rather than before their argument.

That’s not the contribution of OO, it’s the contribution of single dispatch OO languages. The Object-oriented language Lisp has no such feature, for example.

(I’m sure you know this, I’m just saying :)

Andrew B. on 2009-04-09 21:28:06:

david, regarding subclassing, see Django’s SafeString class. It would be pretty simple to extend the technique it uses to automatically wrap the results of all method calls on subclass instances.

jon on 2009-04-09 22:29:49:

mark on 2009-04-09 23:20:09:

Actually ruby already has .rtrim method built in.

Anyway, I think one point is that this fine article lost a bit track.

You started with “OO has contributed many big and important innovations to programming.”
but then changed to Haskell quickly... which left me confused. What has Haskell to do with OO?

Haskell itself is evil already, I dont know why syntax changes make Haskell code anymore evil - it is as close as Satan codes.

God may use Lisp, ancient as he may be if he would exist - but Satan uses Haskell.

It is as evil as programming can be.

Reinier Zwitserloot on 2009-04-10 00:14:27:

A language I’ve always wanted to see is one where the type, its canonical representation (in most circumstances, the fields of an object in memory), and the operations available to any given singular unit of code, are all separate. Currently all OO languages I know of effectively let object representation and operations available be the same for all involved units of code that make up a program.

I want to be able to say: Okay, Strings are this class, and this object is an instance of String, but whenever *THIS* bit of code is dealing with strings, I want you to offer the string through this filter. The filter is free to do just about anything. It could, for example, add an rtrim() method. It could also offer deprecated functionality that the canonical (newer) version no longer has. There are quite a few things you can do with such a feature:

- put however much lipstick on whatever pig you have to work with until you find it palatable (such as adding rtrim as a method, but only for your code unit e.g. package, module, source file, whatever is suitable).

- aggressively update APIs without any concern for backwards compatibility - then write a filter that is automatically offered to all other bits of code that declare that they were expecting v(old). You now not only have 2 separate code bases that represent the same thing (String v1 and String v2), but strings from one version are 100% interchangible and compatible with the other. They’re really the same object, each code base simply sees a different set of operations on them.

- aggressively modularize and personalize the entire language. If *I* am a folding fanatic and I want /: to mean fold left, that’s fine, and I can adopt this in my view. But, if you take this notion to the extreme, a tool can analyse my call to /: on an iterable, and figure out that the ‘foldLeft’ call in a different view of the same type means the same thing, and translate. If I pass this code off to another developer who has set up his programming environment to translate things to his particular view if at all possible, he will see .foldLeft, and if he saves this and sends it back to me, I will see /: again. In actual fact, the canonical method is located somewhere outside of the Iterable type altogether (lets say in an IterablesUtils class, or some such - analogous to your C# StringUtils) - but nobody involved here even needs to know this. Even better: If that other guy finds the code unbearably long-winded and complicated, he can look through a specific ‘view’ definition I’ve set up that makes this code much clearer to me (think DSL), and selectively start enabling each transformation described in it as he groks them, which instantly translates all the code of that project to incorporate these transformations. He can also elect to start off with my view, and anytime he sees gibberish he doesn’t understand, just mouse-over or do some other quick HID operation to temporarily see how the given code snippet would have looked like if I programmed it according to my view definitions.

Pete on 2009-04-10 14:08:22:

This would have been better with a look at objective c, which allows you to interleave a method’s name with its arguments. For example, if you have a “NSMutableArray *myArray”, you can do this:

[myArray insertObject:myObj atIndex:myIndex]

...where the message “insertObj:atIndex:” is sent with “myObj” and “myIndex” as its arguments to myArray.

Best of drmaciver.com | David R. MacIver on 2014-01-15 12:35:21:

[…] Writing things right […]