David R. MacIver's Blog: It might be worth learning an ML-family language

It might be worth learning an ML-family language

26 July 2016

It’s long been a popular opinion that learning Haskell or another ML-family language will make you a better programmer. I think this is probably true, but I think it’s an overly specific claim because learning almost anything will make you a better programmer, and I’ve not been convinced that Haskell is a much better choice than many other things in terms of reshaping your thinking. I’ve never thought that you shouldn’t learn Haskell of course, I’ve just not been convinced that learning Haskell purely for the sake of learning Haskell was the best use of time.

But... I’ve been noticing something recently when teaching Python programmers to use Hypothesis that has made me reconsider somewhat. Not so much a fundamental reshaping of the way you think as a highly useful microskill that people seem to struggle to learn in dynamically typed languages.

That skill is this: Keeping track of what the type of the value in a variable is.

That may not seem like an important skill in a dynamic language, but it really is: Although functions will tend to be more lenient about what type of value they accept (is it a list or a tuple? Who cares!), they will tend to go wrong in interesting and confusing ways when you get it too wrong, and you then waste valuable debugging time trying to figure out what you did wrong. A good development workflow will typically let you find the problem, but it will still take significantly longer than just not making the mistake in the first place.

In particular this seems to come up when the types are related but distinct. Hypothesis has a notion of a “strategy”, which is essentially a data generator, and people routinely seem to get confused as to whether something is a value of a type, a strategy for producing values of that type, or a function that returns a strategy for producing the type.

It might be that I’ve just created a really confusing API, but I don’t think that’s it - people generally seem to really like the API and this is by far the second most common basic usage error people make with it (the first most common is confusing the functions one_of and sampled_from, which do similar but distinct things. I’m still trying to figure out better names for them).

It took me a while to notice this because I just don’t think of it as a difficult thing to keep track of, but it’s definitely a common pattern. It also appears to be almost entirely absent from people who have used Haskell (and presumably other ML-family languages - any statically typed language with type-inference and a bias towards functional programming really) but I don’t know of anyone who has tried to use Hypothesis knowing an ML-family language without also knowing Haskell).

I think the reason for this is that in an ML family language, where the types are static but inferred, you are constantly playing a learning game with the compiler as your teacher (literally a typing tutor): Whenever you get this wrong, the compiler tells you immediately that you’ve done so and localises it to the place where you’ve made the mistake. The error messages aren’t always that easy to understand, but it’s a lot easier to figure out where you’ve made the mistake than when the error message is instead “AttributeError: ‘int’ object has no attribute ‘kittens’” in some function unrelated to where you made the actual error. In the dynamically typed context, there’s a larger separation between the observed problem and the solution, which makes it harder to learn from the experience.

This is probably a game worth playing. If people are making this error when using Hypothesis, they I’d expect them to be making it in many other places too. I don’t expect many of these errors are making it through to production (especially if your code is well tested), but they’re certainly wasting time while developing.

In terms of which ML-family language to choose for this, I’m not sure. I haven’t actually used it myself yet (I don’t really have anything I want to write in the space that it targets), but I suspect Elm is probably the best choice. They’ve done some really good work on making type errors clear and easy to understand, which is likely exactly what you need for this sort of learning exercise.

Comments

F# Weekly #31, 2016 – Sergey Tihon's Blog on 2016-07-31 15:18:13:

[…] It might be worth learning an ML-family language – David R. MacIver […]