Category Archives: Uncategorized

Heuristics are for learning, not for teaching

Often complex bodies of knowledge and issues where the answer is “it depends” are boiled down to simple rules.

  • Look both ways before crossing.
  • Wear a helmet when cycling.
  • Don’t eat too much sugar (or is it fat we’re not supposed to eat now? Or processed foods? Are vegetables bad for you now?)
  • Correlation is not causation.
  • Don’t compare floating point numbers for equality.

You can probably think of many others.

These rules are useful. They provide you with handy heuristic shortcuts that you can remember and follow without having to load the whole complex set of circumstances back into your head. They’re things to bear in mind and follow when the cost of thinking about it is greater than the cost of just following the rule.

But that’s all they are. They are useful heuristics. They are there to help you in your daily activities, not to be preached as gospel.

And what this means is that if you think you see someone who is not following the heuristic you do, that’s OK. You should at most say “hey so here’s a heuristic I use in these circumstances. Maybe you would find this useful?”.

If you instead say “You must always follow this heuristic and you are a bad person and should feel bad for not doing so” then you have forgotten the purpose of your heuristic and you are a bad person and you should feel bad.

There may be circumstances in which you are in a position of consensually teaching someone about something and you knowingly say “Here is a heuristic that is not always true, but it is useful and teaching you the full complex set of actual rules would take way more time than it’s worth to you right now”. That’s OK. That is not the circumstance you find yourself in with strangers.

I would like “assume competency until proven otherwise” to be a basic rule of courtesy. I know many people aren’t competent, but treating someone like they’re incompetent is a fast track to an extremely unpleasant interaction, and saying “Have you considered that maybe you should unerringly follow this simple reductive rule instead of the complex picture of the situation you thought you had?” is one of the easiest ways to achieve that. Don’t do it.

PS. Attempting to boil this post down to simple reductive rules in order to annoy me is extremely predictable and not actually funny. Don’t do that either.

This entry was posted in Uncategorized on by .

A wish list for programming languages

Once upon a time I was really into programming languages. I cared a lot about Scala and Haskell, I was interested in all sorts of weird languages (Shout out to anyone else who has used Nice or Clay).

These days, eh. I mostly write Python. Some C. I could do C++ if I had to but I generally don’t have to. I’ve considered checking out Julia but, well.

It’s not that I’m no longer interested, and it’s certainly not that I’m against exotic languages, or no longer care about type systems. I’m very glad that there are other people who still actively pursue these things, as the current state of programming languages is pretty piss-poor and I would like it to be better, but I find that these days I lack the energy to care and my priorities in a language have shifted to things that are more… pedestrian.

And yet somehow still really hard to satisfy.

So here’s a laundry list of stuff that would feature in my dream language. Advance warning that I am a grumpy old man and this is a super boring list that contains almost no cool features. Also it’s not in any particular order of priority – it’s mostly the order I thought of things in – and it’s definitely not complete – it’s just the stuff I thought of before I got bored of writing this post.

Also, most of these are things that you can’t without get large amounts of time, effort and money. They’re boring, not easy, and if anything them being boring makes them harder because you can’t really get people excited about working on them.

Community

Community is so important.

Here is what I want out of a programming language community:

  1. Large. Small communities are nice but I want a community who I can share the work load with, and a small community isn’t it.
  2. Friendly. Elitism is toxic, and a community that isn’t helpful to beginners or goes on and on about how they’re super smart for using this language that other people don’t get is the worst.
  3. Diverse, and committed to it. Codes of Conduct for everyone all round.
  4. Committed to quality. Documentation matters. Testing matters. We like having high quality libraries and we’re prepared to put the work (and, where possible, money) in to get them.

Packaging Infrastructure

Good packaging infrastructure is vital. And so hard to do. Basically nobody does it well. Packages should be:

  1. Easy to create new packages. If a problem could be solved by creating a new package it should be easier to solve by creating a new package than by not. You should never find yourself going “oh god but I have to write all that XML”.
  2. Versioned, with version constraints between dependencies automatically resolved
  3. Local to a project
    • without a lengthy compile each time you install into a new project
    • no pollution in the global install namespace
  4. Easy to mirror
  5. But with a good standard central repository
  6. Clearly marked for stability
  7. Try hard at maintaining compatibility between package versions
  8. Easy to write in a way that is portable to other versions of the language (e.g. don’t be Scala where a package compiled for one version of the language doesn’t work with any others, even between point releases)
  9. Easy to write in a way that is compatible with multiple operating systems.

It would be great if you could install multiple versions of a library in the same binary, but having this work correctly is sufficiently rare that I’m worried this might be a grass is greener on the other side issue. I’d be more comfortable with this is in a statically typed languages where you can wall off different versions from eachother by having them be distinct types.

Most languages eventually get something which is approximately this. Cabal with sandboxes, pip with virtualenv, ruby with bundler, all manage most of this (mirroring is typically not handled well. I think maybe it is in Cabal but it’s not in python or ruby).

Testing tools

There should be a standard test runner that works sufficiently well that nobody bothers writing their own unless they’re someone with an rspec fetish, and the community should laugh at the people with rspec fetishes and tell them to go play elsewhere politely suggest that maybe this isn’t adding very much to the testing workflow.

There should be good code coverage tools. They should work reliably with minimal overhead (if it takes twice as long to run under coverage then this is very sad and people will use it less). It should be able to do branch coverage. It would be great if it could do predicate coverage. More features – e.g. stats on paths and traces – would be amazing.

It would be fantastic to steal the CPAN feature that tests run on install and report back pass/fail information to somewhere sensible, which means you want testing integrated with the packaging system. Given the aforementioned versioning constraints and per project installs you probably only want to do this one per distinct set of versions of dependent libraries.

Obviously all languages should have a Quickcheck like testing tool (if I didn’t think this I probably wouldn’t have sunk more than six months of free full time labour into making Hypothesis).

Good tools for working with source code

I’m mostly very indifferent to Go, but there’s one feature that I think it gets so very right and wish everyone else would steal right now.

Your standard library should include a precise parser which for any valid (or, ideally, nearly valid) source code you can parse to an AST then print the AST as a bytewise identical file to the original.

It should also include a pretty printer that outputs code in a “standard” and correct format.

It should also be easy to make tools that use the AST representation to make changes to your source code.

Basically: I want good refactoring and reformatting tools, and in order to get that I want standard ways of building them.

I also want good static analysis tools. A well designed language obviates the need for a lot of these, but there’s always room for improvement.

Foreign Function Interface

There should be a standard, good, foreign function interface which makes it easy to bind to C libraries which does not expose internals.

In reality almost every language has too many ways to do it, none of them good.

Relatedly, please run valgrind clean by default. I know it’s a pain, and I know it hurts some microbenchmarks, but it makes debugging code integrated with C so much easier.

Text handling

Text is:

  1. Efficiently represented.
  2. Always immutable (Note: Having a separate editable representation for text is perfectly reasonable, but it’s not your default).
  3. Always unicode.
  4. Always understood to be a variable length encoding that you cannot index into by an offset.
  5. Easy to read and write to a variety of encodings.

Anything else is wrong and you are a sinner for contemplating it.

Equality works correctly

  1. There is a single operator you use for equality. You do not use different ones for different types.
  2. Differently typed values are never equal. Yes I know this violates the Liskov Substitution Principle. I don’t even slightly care. Ideally comparing different types for equality should be an error.
  3. Equality is reflexive. That is, x == x, always. I don’t know what to do about NaN here. So far my most practical solution involves a time machine. My second most practical answer is “Ignore IEEE and deal with the resulting confusion”.
  4. Equality is symmetric.
  5. Equality is transitive.

Good numeric hierarchy

Your language should be able to represent:

  1. Signed and unsigned fixed size integers of various machine sizes
  2. Arbitrary precision integers
  3. Double and single precision floats
  4. Arbitrary precision rational numbers
  5. Fixed width decimal arithmetic

These should all be easy to convert to eachother (but not compare equal if they are of different types!), and they should certainly all consistently use standard operators.

Most of this should be implementable as libraries rather than needing to be baked in to the language.

Packed data

At some point you are going to need to deal with arrays of “primitive” types – bytes, doubles, machine words, etc. If you cannot represent this in an efficient way when you come to do this, you will be sad. Ideally you want to do this in a way that makes it easy to interact with the aforementioned foreign function interface.

Ideally this would also support arrays of structs of some sort. I don’t really care about representing structs as individual values efficiently, but for large arrays of data it matters.

Namespacing and scoping

Everything should have a clear, lexical, scope. It should be obvious where a variable is introduced. The answer should never be “into the global namespace”. It should be hard to make typos and not notice.

As far as I can tell, basically the only languages which get this right are statically typed or a lisp. (ETA: Apparently perl with use strict also gets this right).

Higher order functions

Languages should have first class functions, and higher order functions like map or filter.

This one… is doing pretty well actually. This debate is over and we won. The last time I checked, Java was the only mainstream language that didn’t do this. Since Java 8 last year there are no mainstream languages that don’t do this.

A REPL

Not much to say here except that a REPL is so invaluable to how I work that it’s really painful using languages without one. I can do it of course, but it tends to involve writing lots of tiny little throwaway programs that act as a poorer version of a REPL.

Take typing seriously

I’m fine with dynamically typed languages. I’m also fine with languages with fairly serious static type systems (Haskell, OCaml, F#. Even C++ and C# are pretty OK). But if you’re going to have a type system don’t half-arse it. Good type systems are good, but bad type systems are worse than no type systems.

Note: Type system wars in the comments will not make it through moderation.

Solid, high performance, implementation

Why are we all using slow and unreliable implementations? It makes me really sad.

I mean, I do know the answer, it’s because writing a concurrent garbage collector and a high performance compiler is hard and reusing language-specific VMs mostly works but has its own set of problems.

Basically I want garbage collections and threading to just work, and I want to be able to write code that looks as if it should be reasonably low level and have it not produce something that’s hundreds of times slower than the equivalent C. If you can compile high level abstractions down to low level code, that’s great too.

Yes I know that low level concurrency is passé and we’re all doing message passing now. A good message passing API on top of the concurrency primitives would be great, but I want the primitives too.

Rich standard library

It has major problems, but the size and (mostly) quality of the Java standard library is one of the few things I miss about it.

The standard library should have all of the normal really boring things we need to get things done.

  • File system access
  • Sockets – client and server
  • A solid HTTP client (I’m ambivalent as to whether there should also be a server. Experience of how little ones from the standard libraries of existing languages are used suggests no)
  • Parsing for standard formats – XML, JSON, etc.
  • Good concurrency primitives
  • Pseudo random number generators
  • Invoking and running external programs
  • Probably many others I’m forgetting

There are plenty of things that shouldn’t be in the standard library because you want a faster release cycle or because there are multiple good ways to do them, but in general there are things that we’ve basically got figured out and are commonly needed and those should be standardized.

Collections Library

I really want a good collections library. With standard interfaces for things. We seem to have settled on “Eh, you’ve got hash tables and dynamically sized arrays, what more do you want?”. I’ll tell you what more I want:

  • Uniform interfaces. There are many things I dislike about Python but high up on the list is that if I write add when I meant append or append when I meant add one more time I’m going to scream
  • Immutable collections (not just frozenset. I want efficiently updateable immutable collections)
  • Sorted collections
  • Heaps
  • Priority queues

Java collections library I miss you. Please come back?

Database access

There should be a standard API for talking to a relational database. It doesn’t need to (and shouldn’t) bundle everything into it, but it would be nice if the API were standard and the standard library came with e.g. a sqlite3 adapter.

Summary

I think this can mostly be summarized by saying that I want is completeness and quality. The domain of programming is large, messy, and broken, and it would be nice if the language that I use to interact with it were a bastion of things mostly working and being easy rather than fighting against me every step of the way. There’s enough stuff that is common and known how to do well that it would be great to just do it well and then stop having to worry about it.

This will of course never happen, but there are enough standard sources of annoyance and things that languages get wrong that it sure would be nice if we could do without, and every one we manage to fix is one less thing to worry about.

Edit to add: You are welcome to suggest languages in the comments if you really feel the need to, but I am unlikely to dignify them with a response. Chances are extremely good that I am aware of the language you are suggesting and do not feel it lives up to this list.

This entry was posted in Hypothesis, Uncategorized on by .

Hypothesis 1.7.1 is out

(Note: I’ve realised that this blog has a much higher number of interested readers than the mailing list does, so I’m going to start mirroring announcements here)

As of this past Monday, Hypothesis 1.7.1 (Codename: There is no Hypothesis 1.7.0) is out.

The main feature this release adds is Python 2.6 support. Thanks hugely to Jeff Meadows for doing most of the work for getting this in.

Other features:

  • Strategies now has a permutations() function which returns a strategy yielding permutations of values from a given collection.
  • if you have a flaky test it will print the exception that it last saw before failing with Flaky, even if you do not have verbose reporting on.
  • Slightly experimental git merge script available as “python -m hypothesis.tools.mergedbs”. Instructions on how to use it in the docstring of that file.

This also contains two important counting related bug fixes:

  • floats() with a negative min_value would not have worked correctly (worryingly, it would have just silently failed to run any examples). This is now fixed.
  • tests using sampled_from would error if the number of sampled elements was smaller than min_satisfying_examples.

It also contains some changes to filtering that should improve performance and reliability in cases where you’re filtering by hard to satisfy conditions (although they could
also hurt performance simply by virtue of enabling Hypothesis to find more examples and thus running your test more times!).

This should be a pretty safe upgrade, and given the counting bugs I would strongly encourage you to do so.

This entry was posted in Hypothesis, Uncategorized on by .

Thinking with the machine

Content note: Rambling and slightly incoherent.

When was the last time you got lost?

It used to be very easy for me to get lost. I have a terrible sense of direction. Now I have an excellent sense of direction. It’s a little black and glass oblong, fits in my pocket. I only really get lost if I’m out of power or data [edit: Or, as I discovered half an hour after writing this, driving and unable to access my phone].

I also tend not to forget commonly available information, because I can just type Google for the information.

It used to be the case that calculating pi was a life’s work. At the conference this weekend were complaining about how using Python it took seconds – sometimes even minutes – to get this sort of approximation.

If I want to think through a line of thought I can write it down and save it for later. Even just considering raw typing speed, this is about three times faster than doing so by hand, but it also offers unprecedented editing capabilities and an essentially unlimited amount of writing space. This allows me to coherently put together more complicated thoughts than I would ever be able to do unaided.

In a very real sense I am vastly more intelligent than someone of even a hundred years ago, let alone a thousand. It’s not that I’m more intelligent in some biological sense (though due to advances in nutrition and healthcare this may be true too), but I am augmented by the world around me and the tools available to me in such a way as to greatly boost my natural capabilities.

There’s a joke in AI circles that artificial Intelligence is whatever hasn’t been done yet [irony: Counter-example to my above claim about forgetting things. Googling for this phrase just turns up irrelevant stuff about AI risk. I had to result to my other source of transactive memory]. If we have figured out how to make a computer do it then it’s just calculating. The same seems to hold true for natural intelligence – once upon a time, a good memory was considered the hallmark of intelligence. Now it’s just a thing you use your computer for.

So we’ve got the useful things the computers do and the actual intelligence that we leave to the humans.

But there is a middle way, and I think that that way is where the really exciting stuff lies.

If you looked at my examples above, you might have noticed that as the bird says, one of these things is not like the others.

When I navigate the computer is doing the work. When I Google, there is art in asking the right question, but answering the question is all the computer’s work.

With writing in order to think through a problem though, the computer isn’t really doing the work. I am. The computer is lending me its capabilities – the ones that aren’t “really” intelligence, but that somehow when you add them to “real” intelligence you get something greater.

Computers are good at many things that we are not. In this case I am more or less using the computer as a working memory, because my working memory is pretty good but still bounded, while the computer’s is effectively infinite (in that it’s finite, but it’s so much larger than mine that I hit my limits in terms of how I can offload to it long before we hit its limits). The result is that the ecosystem of me plus the computer is something greater than the sum of our parts – I can use the computer’s strengths to remove my weaknesses, and the result is something that I could not have produced unaided, and the computer certainly couldn’t have produced.

This is also how I think of Hypothesis.

People talk about Quickcheck, or Hypothesis, finding a bug in their software. This is not correct. Hypothesis does not find bugs, people do. Hypothesis sure helps though.

There is software that finds bugs without you having to do anything other than run it. e.g. static analysis tools fall into this category. This is not what property based testing does. In property based testing you are still the one writing the tests, you are still the one finding the bugs, the computer is just there to help you out at the bits you’re bad at by doing the thing that computers do best: Repeating the same task over and over again really quickly.

When I started this post I thought I was going to be introducing the concept of “transhumanist software tools”. Software tools that work by augmenting human intelligence in order to help us write better software. There are some tools that I think are unambiguously of this style: Property based testing, interactive theorem provers, IDEs (in particular autocomplete).

But I think this is a wrong label. In much the same way that there is no such thing as a functional programming language, I don’t think there’s any such thing as a transhumanist software tool. It’s too fuzzy a category. Is a REPL transhumanist? Is a type system? The answer is obvious: “Kinda?”.

There is such a thing as transhumanist software development though: Software development where we lean heavily on the computer, and think in terms of how we can not just make the computer work for us but also with us.

And I think there’s a lot of potential to explore here. Right now we assume any task is either intrinsically human or is “automation”, where we just want to replace the people doing it with a small shell script, and the middle ground is really under explored.

Computers cannot write software (yet). But sometimes it feels like neither can humans. Perhaps together we can?

This entry was posted in Hypothesis, Python, Uncategorized on by .

Hypothesis continues to teach me

I’ve learned a lot technically in my work so far on Hypothesis. It’s both taught me interesting computer science things and also has I think caused me to level up a lot as a developer. It’s been a great, if occasionally frustrating, experience and I expect it will continue to be one for some time yet.

But that’s not what I’m learning about right now. As you’ve probably noticed, and I mentioned previously, I’ve not been doing a huge amount of development recently. There have been a couple patch releases for bug fixes and example quality but nothing very serious. I have some interesting work going on behind the scenes on finding multiple bugs with one test, but it’s probably a while off yet.

Because right now what I’m learning about because of Hypothesis is

  • Public speaking
  • Marketing
  • Pricing and sales

You know, “fluffy stuff”.

I’m also learning how to basically suck it up and admit I want things. A combination of geek and English social failings makes it very hard for me to do that. So when I put out a new project or write a blog post there’s always this weird dance of “yeah I totally just did this for me. I guess you can retweet it if you like, maybe star it on github, but whatever I don’t really care” followed by staring obsessively at every notification about it.

With Hypothesis it’s different, because there’s no pretence. I want Hypothesis to be popular. It will make the world a better place, and potentially it will make me some money (or at least help me recoup the money I effectively burned by taking a sabbatical to make it).

And this is weird to me, because it’s basically forcing me out of my shell and making me develop the skills I’ve always shunned. Public Speaking is something I assumed I would never be good at (turns out that I’m actually pretty OK at it. Maybe with some practice I’ll even be good). Sales and marketing have always been things where… I knew abstractly that they weren’t intrinsically evil, but they always felt dirty and I didn’t really want to have anything to do with them. This wasn’t my reasoned and held position so much as my subconscious biases at work, but those are if anything harder to go against.

With Hypothesis, I need to figure out how to promote it if I want people to use it, and I do want people to use it, so I’m forced into a sales and marketing position. Moreover, talking about it to new groups is one of the best things I can do to promote it, so this in turn forces me into public speaking.

Moreover, it’s fairly unambiguously a good thing for me to ask for money for it. I know I’ve done great work in Hypothesis, and I want to continue doing great work in Hypothesis, but in order to do that I also need to eat, have a place to live, etc.

Moreover it’s clearly a bad thing for me to undercharge! As well as value of labour, etc. etc. it’s a bad thing simply because I’m mostly not charging for the open source development part, so if I’m undercharging that means I have to do more work that isn’t that in order to make decent money, which will in turn mean that less work that benefits everybody gets done.

Not undercharging turns out to be hard. I’ve had multiple conversations with friends to the tune of “I was thinking of charging £X?” “Um. No. It would be cheap at £2X.” “I guess I could charge £Y?” “MORE MONEY” “OK OK how about £Z?” “Yeah I guess you could start there and raise your prices later”.  I understand where these numbers come from, and my friends are right and I am wrong, but that’s sure not how it feels.

Ultimately this is proving to be an… interesting experience. It’s super uncomfortable, as I’m having to go against all my social instincts and unlearn a lot of bad habits, but I think it will be a good thing for me, and hopefully it will be a good thing for Hypothesis too.

This entry was posted in Hypothesis, Python, Uncategorized on by .