Category Archives: Hypothesis

A wish list for programming languages

Once upon a time I was really into programming languages. I cared a lot about Scala and Haskell, I was interested in all sorts of weird languages (Shout out to anyone else who has used Nice or Clay).

These days, eh. I mostly write Python. Some C. I could do C++ if I had to but I generally don’t have to. I’ve considered checking out Julia but, well.

It’s not that I’m no longer interested, and it’s certainly not that I’m against exotic languages, or no longer care about type systems. I’m very glad that there are other people who still actively pursue these things, as the current state of programming languages is pretty piss-poor and I would like it to be better, but I find that these days I lack the energy to care and my priorities in a language have shifted to things that are more… pedestrian.

And yet somehow still really hard to satisfy.

So here’s a laundry list of stuff that would feature in my dream language. Advance warning that I am a grumpy old man and this is a super boring list that contains almost no cool features. Also it’s not in any particular order of priority – it’s mostly the order I thought of things in – and it’s definitely not complete – it’s just the stuff I thought of before I got bored of writing this post.

Also, most of these are things that you can’t without get large amounts of time, effort and money. They’re boring, not easy, and if anything them being boring makes them harder because you can’t really get people excited about working on them.

Community

Community is so important.

Here is what I want out of a programming language community:

  1. Large. Small communities are nice but I want a community who I can share the work load with, and a small community isn’t it.
  2. Friendly. Elitism is toxic, and a community that isn’t helpful to beginners or goes on and on about how they’re super smart for using this language that other people don’t get is the worst.
  3. Diverse, and committed to it. Codes of Conduct for everyone all round.
  4. Committed to quality. Documentation matters. Testing matters. We like having high quality libraries and we’re prepared to put the work (and, where possible, money) in to get them.

Packaging Infrastructure

Good packaging infrastructure is vital. And so hard to do. Basically nobody does it well. Packages should be:

  1. Easy to create new packages. If a problem could be solved by creating a new package it should be easier to solve by creating a new package than by not. You should never find yourself going “oh god but I have to write all that XML”.
  2. Versioned, with version constraints between dependencies automatically resolved
  3. Local to a project
    • without a lengthy compile each time you install into a new project
    • no pollution in the global install namespace
  4. Easy to mirror
  5. But with a good standard central repository
  6. Clearly marked for stability
  7. Try hard at maintaining compatibility between package versions
  8. Easy to write in a way that is portable to other versions of the language (e.g. don’t be Scala where a package compiled for one version of the language doesn’t work with any others, even between point releases)
  9. Easy to write in a way that is compatible with multiple operating systems.

It would be great if you could install multiple versions of a library in the same binary, but having this work correctly is sufficiently rare that I’m worried this might be a grass is greener on the other side issue. I’d be more comfortable with this is in a statically typed languages where you can wall off different versions from eachother by having them be distinct types.

Most languages eventually get something which is approximately this. Cabal with sandboxes, pip with virtualenv, ruby with bundler, all manage most of this (mirroring is typically not handled well. I think maybe it is in Cabal but it’s not in python or ruby).

Testing tools

There should be a standard test runner that works sufficiently well that nobody bothers writing their own unless they’re someone with an rspec fetish, and the community should laugh at the people with rspec fetishes and tell them to go play elsewhere politely suggest that maybe this isn’t adding very much to the testing workflow.

There should be good code coverage tools. They should work reliably with minimal overhead (if it takes twice as long to run under coverage then this is very sad and people will use it less). It should be able to do branch coverage. It would be great if it could do predicate coverage. More features – e.g. stats on paths and traces – would be amazing.

It would be fantastic to steal the CPAN feature that tests run on install and report back pass/fail information to somewhere sensible, which means you want testing integrated with the packaging system. Given the aforementioned versioning constraints and per project installs you probably only want to do this one per distinct set of versions of dependent libraries.

Obviously all languages should have a Quickcheck like testing tool (if I didn’t think this I probably wouldn’t have sunk more than six months of free full time labour into making Hypothesis).

Good tools for working with source code

I’m mostly very indifferent to Go, but there’s one feature that I think it gets so very right and wish everyone else would steal right now.

Your standard library should include a precise parser which for any valid (or, ideally, nearly valid) source code you can parse to an AST then print the AST as a bytewise identical file to the original.

It should also include a pretty printer that outputs code in a “standard” and correct format.

It should also be easy to make tools that use the AST representation to make changes to your source code.

Basically: I want good refactoring and reformatting tools, and in order to get that I want standard ways of building them.

I also want good static analysis tools. A well designed language obviates the need for a lot of these, but there’s always room for improvement.

Foreign Function Interface

There should be a standard, good, foreign function interface which makes it easy to bind to C libraries which does not expose internals.

In reality almost every language has too many ways to do it, none of them good.

Relatedly, please run valgrind clean by default. I know it’s a pain, and I know it hurts some microbenchmarks, but it makes debugging code integrated with C so much easier.

Text handling

Text is:

  1. Efficiently represented.
  2. Always immutable (Note: Having a separate editable representation for text is perfectly reasonable, but it’s not your default).
  3. Always unicode.
  4. Always understood to be a variable length encoding that you cannot index into by an offset.
  5. Easy to read and write to a variety of encodings.

Anything else is wrong and you are a sinner for contemplating it.

Equality works correctly

  1. There is a single operator you use for equality. You do not use different ones for different types.
  2. Differently typed values are never equal. Yes I know this violates the Liskov Substitution Principle. I don’t even slightly care. Ideally comparing different types for equality should be an error.
  3. Equality is reflexive. That is, x == x, always. I don’t know what to do about NaN here. So far my most practical solution involves a time machine. My second most practical answer is “Ignore IEEE and deal with the resulting confusion”.
  4. Equality is symmetric.
  5. Equality is transitive.

Good numeric hierarchy

Your language should be able to represent:

  1. Signed and unsigned fixed size integers of various machine sizes
  2. Arbitrary precision integers
  3. Double and single precision floats
  4. Arbitrary precision rational numbers
  5. Fixed width decimal arithmetic

These should all be easy to convert to eachother (but not compare equal if they are of different types!), and they should certainly all consistently use standard operators.

Most of this should be implementable as libraries rather than needing to be baked in to the language.

Packed data

At some point you are going to need to deal with arrays of “primitive” types – bytes, doubles, machine words, etc. If you cannot represent this in an efficient way when you come to do this, you will be sad. Ideally you want to do this in a way that makes it easy to interact with the aforementioned foreign function interface.

Ideally this would also support arrays of structs of some sort. I don’t really care about representing structs as individual values efficiently, but for large arrays of data it matters.

Namespacing and scoping

Everything should have a clear, lexical, scope. It should be obvious where a variable is introduced. The answer should never be “into the global namespace”. It should be hard to make typos and not notice.

As far as I can tell, basically the only languages which get this right are statically typed or a lisp. (ETA: Apparently perl with use strict also gets this right).

Higher order functions

Languages should have first class functions, and higher order functions like map or filter.

This one… is doing pretty well actually. This debate is over and we won. The last time I checked, Java was the only mainstream language that didn’t do this. Since Java 8 last year there are no mainstream languages that don’t do this.

A REPL

Not much to say here except that a REPL is so invaluable to how I work that it’s really painful using languages without one. I can do it of course, but it tends to involve writing lots of tiny little throwaway programs that act as a poorer version of a REPL.

Take typing seriously

I’m fine with dynamically typed languages. I’m also fine with languages with fairly serious static type systems (Haskell, OCaml, F#. Even C++ and C# are pretty OK). But if you’re going to have a type system don’t half-arse it. Good type systems are good, but bad type systems are worse than no type systems.

Note: Type system wars in the comments will not make it through moderation.

Solid, high performance, implementation

Why are we all using slow and unreliable implementations? It makes me really sad.

I mean, I do know the answer, it’s because writing a concurrent garbage collector and a high performance compiler is hard and reusing language-specific VMs mostly works but has its own set of problems.

Basically I want garbage collections and threading to just work, and I want to be able to write code that looks as if it should be reasonably low level and have it not produce something that’s hundreds of times slower than the equivalent C. If you can compile high level abstractions down to low level code, that’s great too.

Yes I know that low level concurrency is passé and we’re all doing message passing now. A good message passing API on top of the concurrency primitives would be great, but I want the primitives too.

Rich standard library

It has major problems, but the size and (mostly) quality of the Java standard library is one of the few things I miss about it.

The standard library should have all of the normal really boring things we need to get things done.

  • File system access
  • Sockets – client and server
  • A solid HTTP client (I’m ambivalent as to whether there should also be a server. Experience of how little ones from the standard libraries of existing languages are used suggests no)
  • Parsing for standard formats – XML, JSON, etc.
  • Good concurrency primitives
  • Pseudo random number generators
  • Invoking and running external programs
  • Probably many others I’m forgetting

There are plenty of things that shouldn’t be in the standard library because you want a faster release cycle or because there are multiple good ways to do them, but in general there are things that we’ve basically got figured out and are commonly needed and those should be standardized.

Collections Library

I really want a good collections library. With standard interfaces for things. We seem to have settled on “Eh, you’ve got hash tables and dynamically sized arrays, what more do you want?”. I’ll tell you what more I want:

  • Uniform interfaces. There are many things I dislike about Python but high up on the list is that if I write add when I meant append or append when I meant add one more time I’m going to scream
  • Immutable collections (not just frozenset. I want efficiently updateable immutable collections)
  • Sorted collections
  • Heaps
  • Priority queues

Java collections library I miss you. Please come back?

Database access

There should be a standard API for talking to a relational database. It doesn’t need to (and shouldn’t) bundle everything into it, but it would be nice if the API were standard and the standard library came with e.g. a sqlite3 adapter.

Summary

I think this can mostly be summarized by saying that I want is completeness and quality. The domain of programming is large, messy, and broken, and it would be nice if the language that I use to interact with it were a bastion of things mostly working and being easy rather than fighting against me every step of the way. There’s enough stuff that is common and known how to do well that it would be great to just do it well and then stop having to worry about it.

This will of course never happen, but there are enough standard sources of annoyance and things that languages get wrong that it sure would be nice if we could do without, and every one we manage to fix is one less thing to worry about.

Edit to add: You are welcome to suggest languages in the comments if you really feel the need to, but I am unlikely to dignify them with a response. Chances are extremely good that I am aware of the language you are suggesting and do not feel it lives up to this list.

This entry was posted in Hypothesis, Uncategorized on by .

Hypothesis 1.7.2 is out

I’ve just released Hypothesis 1.7.2. It’s quite a small diff against 1.7.1, but there were some significant bugs in there that I thought it was worth pushing out a fix sooner rather than later. Most of these have been in there for a while, so you might want to upgrade, but if you’re not currently seeing any weird behaviour you’re probably not being affected by them.

  • When using floats() with stale data in the database you could sometimes get values in your tests that did not respect min_value or max_value.
  • When getting a Flaky error from an unreliable test it would have incorrectly displayed the example that caused it.
  • 2.6 dependency on backports was incorrectly specified. This would only have caused you problems if you were building a universal wheel from Hypothesis, which is not how Hypothesis ships, so unless you’re explicitly building wheels for your dependencies and support Python 2.6 plus a later version of Python this probably would never have affected you.
  • If you use flatmap in a way that the strategy on the right hand side depends sensitively on the left hand side you may have occasionally seen Flaky errors caused by producing unreliable examples when minimizing a bug. This use case may still be somewhat fraught to be honest. This code is due a major rearchitecture for 1.8, but in the meantime this release fixes the only source of this error that I’m aware of.
This entry was posted in Hypothesis, Python on by .

Using z3 to try to solve Feedback Arc Set for Tournaments

I’ve been meaning to check out Z3 for a while – it’s an open source theorem prover from Microsoft with great looking Python bindings. I’m partly interested just out of pure curiousity, partly because there are some interesting potential applications for Hypothesis (it’s currently unclear to me how possible those are – the architecture of Hypothesis is not well set up for concolic testing, nor are the semantics of Python, but still).

Anyway, I thought I’d turn it loose on my favourite evil problem, feedback arc set for tournaments. It’s at that weird boundary where what you want is sortof a SAT solver and sortof an integer linear programming solver. There’s some suggestion that using hybrid approaches would be better than either, but I haven’t really found any easily usable hybrid solvers (there’s a fork of minisat that might work. It’s on my TODO list)

Anyway, basically what we do with z3 is that we create the initial problem, ask it for a feasible solution, then successively ask it to improve on that feasible solution until it no longer can. Here’s the code:

The good: z3 was extremely easy to use. The API was pleasant, reasonably intuitive, and basically did what I expected.

The bad: This solution is strictly worse than a solution which just generates a massive LP file and hands it off to lp_solve. So although it was very easy to solve this problem with z3, z3 appears not to be well suited to it. Z3 struggled at n = 15 and seemed to get stuck at n = 20.

This shouldn’t necessarily be considered indicative that Z3 is “bad”. This is a problem with a lot of constraints (O(n^3)), so it’s probably not representative of the sort of things Z3 is good at, and it is representative of the sort of things that lp_solve is good at. So I’m not really deriving any lessons from the fact that Z3 wasn’t very good at this problem . I’m more interested in the fact that it was super easy to use, so I really should look at using it more in future.

This entry was posted in Hypothesis, Python on by .

Properties for testing optimisation

I’ve been thinking a bunch recently about how to use Hypothesis to test constrained optimisation problems. I put together this talk for pydata London, and yesterday I sketched out these tests for how you might test a solution to the knapsack problem (both of these are specifically combinatorial optimisation problems, but the principles I’m outlining here work in general).

Essentially there are only two properties to constrained optimisation problems:

  1. For all input problems, the optimised solution satisfies the constraints
  2. For all input problems and solutions satisfying the constraints of those problems, the solution does not score better than the optimised solution.

The first property is usable as is – you write a generator for the problems you’re trying to solve, write a test that runs the optimiser on the problem and spits out a solution, check it satisfies the constraints. This is a super useful test for the correctness of your optimiser and you should definitely write it.

But really the one we’re interested in is the second property. We want to assert that the solution our optimiser provides us with is actually optimal.

The problem is that fuzz testers are not optimisers, and if your optimal solution is worse than simply picking a point at random, basically any sort of test will demonstrate this. You can’t reliably expect Hypothesis or similar to find an improvement just by asking for arbitrary data.

If we have a reference solution (e.g. by restriction to a sufficiently small corner of the space that we can solve the problem through brute force) then that’s one way of doing it: Just run the problem through both your optimiser and the reference solution and assert that they produce solutions with the same score (note: Not necessarily the same solution. There may be multiple equally good optima). This also works if you have a reference but not very good solution – if you implement say a cheap local optimiser and assert that running the local optimiser on a randomly picked point doesn’t improve on your optimal solution, this is still a pretty good test.

But I’m interested in testing the case where we don’t have a reference solution. How would we do that?

Without further ado, here are some properties I’ve figured out that seem to work well:

  1. If we take the optimal solution and perturb it slightly, this should never improve on the optimal solution. Bonus: If you have a local optimiser, perturb the solution slightly and then run the local optimiser on that perturbed solution.
  2. If we make the problem strictly easier (e.g. in the knapsack problem by raising the value of one of the items we’ve already chosen) then this should strictly improve the score if we rerun the optimiser.
  3. If we make the problem strictly harder (e.g. by removing one of the chosen items in the knapsack) and rerun the optimiser, this should never improve the score.

The last seems the most promising to me, because it helps create situations where you break out of local optima. You potentially kill off the local optima you’ve found yourself in and see if you can find a better one. Notably, both of the tests that demonstrated a problem in the knapsack algorithm (increasing the weight of a chosen item, removing a chosen item) are of this form.

The first is also useful, particularly for testing a local optimiser because it helps you find new things you can add in to the optimisation process! e.g. if you find that reversing the solution and rerunning the optimiser on the reversed solution sometimes improves matters, why not iterate that to a fixed point in your optimisation process?

The middle solution is I suspect not that useful because in general things that make the problem easier by promoting stuff that was already chosen, so the optimiser will tend to make the same choice. I might be missing cases where this is actually a useful thing to be doing. Edit: I was wrong, because I’d got the form of the property slightly wrong. What you want to be doing is not “make the current solution better”, it’s “keep the current solution possible but add something in that you could use instead”. In the knapsack packing problem, you can do this by taking an item from the current solution and cloning it but adding one to the value. This means that the current solution is still usable, but there is definitely a solution with a higher score.

I’m going to keep an eye on this, as it seems to be an under-explored area (not completely unexplored – there’s some research on doing more classic fuzz testing on SMT solvers that I need to read up on) and I think it might inform useful heuristics for other places where you can do property based testing (machine learning for example is something that I’m interested in figuring out how to do).

This entry was posted in Hypothesis, Python on by .

Want to take a weekend to write tests?

I’d like to get more people using Hypothesis, and I’d like to raise the general quality of open source software. For bonus, lets see if we can help people out who wouldn’t normally contribute to open source. So, here’s the plan!

  1. Some number of us gather together in London one weekend.
  2. The rest of you pair up and work on writing tests for open source software using Hypothesis, and fixing any bugs you find.
  3. I act in a supervisory and teaching role, running around helping people out when they get stuck or need some advice.

(Obviously there will be a code of conduct for the event. I will probably continue my habit of using Recurse Center social rules despite never having attended Recurse Center).

If this sounds like your cup of tea, coffee, or other beverage of your choice, please put your name down and I’ll keep you apprised of any further developments.

This entry was posted in Hypothesis, Python on by .