Once upon a time I was really into programming languages. I cared a lot about Scala and Haskell, I was interested in all sorts of weird languages (Shout out to anyone else who has used Nice or Clay).
These days, eh. I mostly write Python. Some C. I could do C++ if I had to but I generally don’t have to. I’ve considered checking out Julia but, well.
It’s not that I’m no longer interested, and it’s certainly not that I’m against exotic languages, or no longer care about type systems. I’m very glad that there are other people who still actively pursue these things, as the current state of programming languages is pretty piss-poor and I would like it to be better, but I find that these days I lack the energy to care and my priorities in a language have shifted to things that are more… pedestrian.
And yet somehow still really hard to satisfy.
So here’s a laundry list of stuff that would feature in my dream language. Advance warning that I am a grumpy old man and this is a super boring list that contains almost no cool features. Also it’s not in any particular order of priority – it’s mostly the order I thought of things in – and it’s definitely not complete – it’s just the stuff I thought of before I got bored of writing this post.
Also, most of these are things that you can’t without get large amounts of time, effort and money. They’re boring, not easy, and if anything them being boring makes them harder because you can’t really get people excited about working on them.
Community
Community is so important.
Here is what I want out of a programming language community:
- Large. Small communities are nice but I want a community who I can share the work load with, and a small community isn’t it.
- Friendly. Elitism is toxic, and a community that isn’t helpful to beginners or goes on and on about how they’re super smart for using this language that other people don’t get is the worst.
- Diverse, and committed to it. Codes of Conduct for everyone all round.
- Committed to quality. Documentation matters. Testing matters. We like having high quality libraries and we’re prepared to put the work (and, where possible, money) in to get them.
Packaging Infrastructure
Good packaging infrastructure is vital. And so hard to do. Basically nobody does it well. Packages should be:
- Easy to create new packages. If a problem could be solved by creating a new package it should be easier to solve by creating a new package than by not. You should never find yourself going “oh god but I have to write all that XML”.
- Versioned, with version constraints between dependencies automatically resolved
- Local to a project
- without a lengthy compile each time you install into a new project
- no pollution in the global install namespace
- Easy to mirror
- But with a good standard central repository
- Clearly marked for stability
- Try hard at maintaining compatibility between package versions
- Easy to write in a way that is portable to other versions of the language (e.g. don’t be Scala where a package compiled for one version of the language doesn’t work with any others, even between point releases)
- Easy to write in a way that is compatible with multiple operating systems.
It would be great if you could install multiple versions of a library in the same binary, but having this work correctly is sufficiently rare that I’m worried this might be a grass is greener on the other side issue. I’d be more comfortable with this is in a statically typed languages where you can wall off different versions from eachother by having them be distinct types.
Most languages eventually get something which is approximately this. Cabal with sandboxes, pip with virtualenv, ruby with bundler, all manage most of this (mirroring is typically not handled well. I think maybe it is in Cabal but it’s not in python or ruby).
Testing tools
There should be a standard test runner that works sufficiently well that nobody bothers writing their own unless they’re someone with an rspec fetish, and the community should laugh at the people with rspec fetishes and tell them to go play elsewhere politely suggest that maybe this isn’t adding very much to the testing workflow.
There should be good code coverage tools. They should work reliably with minimal overhead (if it takes twice as long to run under coverage then this is very sad and people will use it less). It should be able to do branch coverage. It would be great if it could do predicate coverage. More features – e.g. stats on paths and traces – would be amazing.
It would be fantastic to steal the CPAN feature that tests run on install and report back pass/fail information to somewhere sensible, which means you want testing integrated with the packaging system. Given the aforementioned versioning constraints and per project installs you probably only want to do this one per distinct set of versions of dependent libraries.
Obviously all languages should have a Quickcheck like testing tool (if I didn’t think this I probably wouldn’t have sunk more than six months of free full time labour into making Hypothesis).
Good tools for working with source code
I’m mostly very indifferent to Go, but there’s one feature that I think it gets so very right and wish everyone else would steal right now.
Your standard library should include a precise parser which for any valid (or, ideally, nearly valid) source code you can parse to an AST then print the AST as a bytewise identical file to the original.
It should also include a pretty printer that outputs code in a “standard” and correct format.
It should also be easy to make tools that use the AST representation to make changes to your source code.
Basically: I want good refactoring and reformatting tools, and in order to get that I want standard ways of building them.
I also want good static analysis tools. A well designed language obviates the need for a lot of these, but there’s always room for improvement.
Foreign Function Interface
There should be a standard, good, foreign function interface which makes it easy to bind to C libraries which does not expose internals.
In reality almost every language has too many ways to do it, none of them good.
Relatedly, please run valgrind clean by default. I know it’s a pain, and I know it hurts some microbenchmarks, but it makes debugging code integrated with C so much easier.
Text handling
Text is:
- Efficiently represented.
- Always immutable (Note: Having a separate editable representation for text is perfectly reasonable, but it’s not your default).
- Always unicode.
- Always understood to be a variable length encoding that you cannot index into by an offset.
- Easy to read and write to a variety of encodings.
Anything else is wrong and you are a sinner for contemplating it.
Equality works correctly
- There is a single operator you use for equality. You do not use different ones for different types.
- Differently typed values are never equal. Yes I know this violates the Liskov Substitution Principle. I don’t even slightly care. Ideally comparing different types for equality should be an error.
- Equality is reflexive. That is, x == x, always. I don’t know what to do about NaN here. So far my most practical solution involves a time machine. My second most practical answer is “Ignore IEEE and deal with the resulting confusion”.
- Equality is symmetric.
- Equality is transitive.
Good numeric hierarchy
Your language should be able to represent:
- Signed and unsigned fixed size integers of various machine sizes
- Arbitrary precision integers
- Double and single precision floats
- Arbitrary precision rational numbers
- Fixed width decimal arithmetic
These should all be easy to convert to eachother (but not compare equal if they are of different types!), and they should certainly all consistently use standard operators.
Most of this should be implementable as libraries rather than needing to be baked in to the language.
Packed data
At some point you are going to need to deal with arrays of “primitive” types – bytes, doubles, machine words, etc. If you cannot represent this in an efficient way when you come to do this, you will be sad. Ideally you want to do this in a way that makes it easy to interact with the aforementioned foreign function interface.
Ideally this would also support arrays of structs of some sort. I don’t really care about representing structs as individual values efficiently, but for large arrays of data it matters.
Namespacing and scoping
Everything should have a clear, lexical, scope. It should be obvious where a variable is introduced. The answer should never be “into the global namespace”. It should be hard to make typos and not notice.
As far as I can tell, basically the only languages which get this right are statically typed or a lisp. (ETA: Apparently perl with use strict also gets this right).
Higher order functions
Languages should have first class functions, and higher order functions like map or filter.
This one… is doing pretty well actually. This debate is over and we won. The last time I checked, Java was the only mainstream language that didn’t do this. Since Java 8 last year there are no mainstream languages that don’t do this.
A REPL
Not much to say here except that a REPL is so invaluable to how I work that it’s really painful using languages without one. I can do it of course, but it tends to involve writing lots of tiny little throwaway programs that act as a poorer version of a REPL.
Take typing seriously
I’m fine with dynamically typed languages. I’m also fine with languages with fairly serious static type systems (Haskell, OCaml, F#. Even C++ and C# are pretty OK). But if you’re going to have a type system don’t half-arse it. Good type systems are good, but bad type systems are worse than no type systems.
Note: Type system wars in the comments will not make it through moderation.
Solid, high performance, implementation
Why are we all using slow and unreliable implementations? It makes me really sad.
I mean, I do know the answer, it’s because writing a concurrent garbage collector and a high performance compiler is hard and reusing language-specific VMs mostly works but has its own set of problems.
Basically I want garbage collections and threading to just work, and I want to be able to write code that looks as if it should be reasonably low level and have it not produce something that’s hundreds of times slower than the equivalent C. If you can compile high level abstractions down to low level code, that’s great too.
Yes I know that low level concurrency is passé and we’re all doing message passing now. A good message passing API on top of the concurrency primitives would be great, but I want the primitives too.
Rich standard library
It has major problems, but the size and (mostly) quality of the Java standard library is one of the few things I miss about it.
The standard library should have all of the normal really boring things we need to get things done.
- File system access
- Sockets – client and server
- A solid HTTP client (I’m ambivalent as to whether there should also be a server. Experience of how little ones from the standard libraries of existing languages are used suggests no)
- Parsing for standard formats – XML, JSON, etc.
- Good concurrency primitives
- Pseudo random number generators
- Invoking and running external programs
- Probably many others I’m forgetting
There are plenty of things that shouldn’t be in the standard library because you want a faster release cycle or because there are multiple good ways to do them, but in general there are things that we’ve basically got figured out and are commonly needed and those should be standardized.
Collections Library
I really want a good collections library. With standard interfaces for things. We seem to have settled on “Eh, you’ve got hash tables and dynamically sized arrays, what more do you want?”. I’ll tell you what more I want:
- Uniform interfaces. There are many things I dislike about Python but high up on the list is that if I write add when I meant append or append when I meant add one more time I’m going to scream
- Immutable collections (not just frozenset. I want efficiently updateable immutable collections)
- Sorted collections
- Heaps
- Priority queues
Java collections library I miss you. Please come back?
Database access
There should be a standard API for talking to a relational database. It doesn’t need to (and shouldn’t) bundle everything into it, but it would be nice if the API were standard and the standard library came with e.g. a sqlite3 adapter.
Summary
I think this can mostly be summarized by saying that I want is completeness and quality. The domain of programming is large, messy, and broken, and it would be nice if the language that I use to interact with it were a bastion of things mostly working and being easy rather than fighting against me every step of the way. There’s enough stuff that is common and known how to do well that it would be great to just do it well and then stop having to worry about it.
This will of course never happen, but there are enough standard sources of annoyance and things that languages get wrong that it sure would be nice if we could do without, and every one we manage to fix is one less thing to worry about.
Edit to add: You are welcome to suggest languages in the comments if you really feel the need to, but I am unlikely to dignify them with a response. Chances are extremely good that I am aware of the language you are suggesting and do not feel it lives up to this list.
I think you should have a look at D (http://dlang.org). It’s basically a better C++, safer, stronger, with a good standard library and nice abstractions that seems to address most of you points. The only two points that doesn’t seem to be addressed are the community (although there are many users, the community is still small) and packaging (because you’re right, nobody does it well).
It is not a better C++ but the one true C++.
If someone accidentally skips the introduction, it looks like list of Elixir language properties. Have you checked it?
You seem very well informed on current production worthy languages, so I’m not going to suggest one as your answer. However, what you’re asking for doesn’t seem so far off of some of the languages you have experience with that I’m wondering if you forgot to include a list of gripes against them. Would a multi-threaded OCaml with better tooling be what you want? What about an F# experience with module support ?
I’m not *suggesting* a language, but I’m more curious as to what gets ‘closest’ and how far away you think it would be or the effort it would take to get it to where you want it?
I didn’t forget to include a list of gripes, I mostly don’t want this to turn into me dumping on different programming languages because that never ends well.
RE Ocaml: I’ve no particular objections to it, but it’s never really endeared itself to me. The community has never struck me as terribly together and I’m not vastly keen on the core language, but I’d like to like it more than I do. Last time I made a serious effort at using it, the package management really wasn’t there, but I know OPAM has improved matters a lot and I’ve been meaning to take another look now that it’s further along.
I used OCaml as my main language in 1997-1999, but a lot of the pain points that existed then still exist now, despite OPAM. For example, people are still coming up with their own project directory structures, Makefiles, and the like. There is still no single standard command to create an empty “hello world” project that is immediately buildable, testable, and publishable. This is unbelievable to me.
Umm…Clojure?
I think you’ve got a great wish list there, and I really get where you’re coming from. Something I’d really like to see that doesn’t really seem to be addressed often is an extension to your wish for ‘packed’ data. That’d be great, if not a huge improvement over what’s sometimes available now, but I’d really like the means to map a standardized algebraic data type interface to a completely user-defined representation, including reliable control over data alignment/padding, bit field ordering, byte ordering, and user-defined discrimination functions for variants/unions that can be constrained enough to produce helpful compile-time warnings when ambiguity or un-covered cases arise. Platform variations would be exposed in a standard, modular way that allows the compiler to choose natively performant representations when the exact representation isn’t an issue but allows the programmer to exert just the amount of representational control necessary.
As a frequent programmer of device drivers and network stack components for embedded systems that need to be portable across various architectures, I find C’s support for this level of programming very tedious; there are a number of options to end up with the code you need, but none of them are *great* options and it seems to me that we could do a lot better. Erlang’s binary pattern matching is a nice answer to one part of this, and there are pieces of it in various research languages like Habit from PSU/Galois’ HASP project and old systems languages like DEC BLISS, but they seem to focus *either* on interpreting binary stream *or* on memory-mapped register programming, not on a solution that encompasses both. Alas, the closest thing I can find right now to my ideal in this area is probably Ada, and that hasn’t exactly got a thriving community.
A central package repository would be nice. The repo doesn’t necessary need to store the package, just point to where it is. A central repo would allow one to search for a namespace and have the package manager automatically download the best package for your language version. It should be possible to do this from the IDE. In fact, as soon as the syntax checker knows that namespace is not being referenced the IDE should give you the option to install the missing package after the cursor has moved off the line which declares a namespace to import with missing reference.
Apart from a multiline REPL, I’d like instant function evaluation in the REPL like LightTable/clojure, and eclipse/scala.
I want a fully functional language with default persistent data, including persistent collections. Super-strong typing, a la Haskell, with parametric polymorphism, and the best possible type inference in functions.
No void/null confusion please. Errors when nulls aren’t wrapped within options.
Pingback: Visto nel Web – 192 | Ok, panico
I LOVE this list. In fact, I love it so much that instead of listing things that I love about it I will list the things I don’t love about it. Every thing I don’t mention here is something I love. And I have only some very, very small ways in which I weaken your claims:
* NAN – I like your first choice solution here (a time machine to change the IEEE spec). I am less sanguine about your second choice (ignore IEEE specified NAN behavior). I think I would be open to EITHER your proposed behavior (ignore IEEE and let NAN=NAN) OR a behavior where NAN != NAN but some weird magic makes it OK to use NANs in places like collections which need to assume X == X behavior.
* You required arbitrary precision rational numbers in your numeric hierarchy. I could live without them.
Finally, you require a language have ALL of these things. I am more willing to give a new language some slack. If a new language has MANY of these features and aspires to get the rest of them, then I could jump on board. Except for the specific features that it MUST get right from the start to have any chance of EVER succeeding — I think that’s the attitude of the community, a sane approach to typing (I agree not to argue about just which way of handling is best). And it needs the majority of the other features plus plans to add the rest.
I think we’re mostly in agreement.
Python essentially has the weird magic, whereby most collections which you’d think have == behaviour relax it to the constraint where they consider x to always be equal to x (specifically if “x is y” returns True then collections consider x == y). For example x = float(‘nan’), x in [x] returns True. It hasn’t made my life dramatically easier – if anything it’s made it harder because there’s one more weird edge case to deal with.
Another alternative someone suggested which I initially rejected but on further consideration think might be my preferred approach is to use signalling NaNs everywhere and to not have NaN as a representable value. I think this has undesirable behaviour for float vectors, but you could have the implementation use quiet nans internally and then raise on read.
Arbitrary precision rationals are also low down on my list of things I care about. They’re useful enough occasionally that I’d like them available as standard, but I wouldn’t lose sleep about their absence.
In general there are no languages which implement all of these, and my current languages of choice definitely don’t, so I’d be pretty interested in any language which merely does well at implementing the list. I just think I’ve probably passed out of my early adopter of languages phase for the moment, so I’m probably more willing to wait to see how the languages do at making progress on these. :-)
Also it’s worth noting that some features aren’t mandatory to get right early on, but the righter you get them the less pain you’ll have later. e.g. packaging is a right nuisance to get wrong. It is fixable but you end up with all sorts of backwards compatibility bodges if you don’t get it right quite quickly.
Pingback: What if we had more finished libraries? | David R. MacIver
Great list. I having a try at evaluating my current language of choice against your list. Could you expand on:
1. Where do you draw the line between large / small communities ?
2. Packaging Infrastructure > Local to a project, could you expand on this in general and specifically “no pollution of the global namespace”
3. Efficiently updateable immutable collections
cheers -ben
1. I don’t really. It’s a sliding scale. In general a small community is one where the chances are low that someone has solved a common problem before me.
2. If I need multiple versions of a package installed because different projects depend on different versions this should never cause problems. This is particularly bad when you consider version constraints that are imposed by packages depending on other packages.
3. Good point. Agreed.