Author Archives: david

Simplify starts from the wrong end

This is a thing I just noticed when working out some performance bugs in Hypothesis in prep for the 1.0 release: In a lot of cases, I’ve been doing simplification (shrinking in Quickcheck terms) entirely backwards.

It turns out that if I’d paid attention this information was in Quickcheck all along – I haven’t found it in the paper (ok I haven’t looked in the paper) but this is what the API does.

Basically, if you write a function simplify which takes a value and gives you something to iterate over the simpler versions of that value, you will be tempted to start from things that are most similar to that value and work your way down. This is totally wrong. The correct order is to start from the simplest version and work your way up.

The reason for this is that it will result in far fewer calls to simplify in most cases: Most of the time your values will be a lot more complicated than you need them to be, and you will end up with a lot of recursive calls to simplify if you start from the top. If you start from the bottom you will very rapidly converge on the simplest thing that can possibly work.

I can’t guarantee I’ll catch all the instances of my doing this before the 1.0 release, but I should catch all the instances where this is causing significant performance problems. In particular the performance of simplifying strings is now much faster.

This entry was posted in Uncategorized on by .

I can dream, right?

You wake up. There’s a demon in front of you. You can tell it’s a demon from the horns, the red skin, and the general smell of brimstone.

You suspect this is not a good sign.

“What happened? Where am I?”

“Oh come on. You’re clever. You can probably figure it out”

“Am… am I in hell?”

“Got it in one! I told you you were clever!”

“But why am I here? I lived a good life! I helped widows and orphans! I gave money to charity!”

“Hmm. That’s a good question. Why are you here? Your file looks quite clean. I don’t see why… oh, right. It says here you’re a programmer. How were you at writing error messages?”

“What? Well, uh, I guess I was OK at it. I mean uh, no one writes great error messages, right?”

“So you included enough information to debug the error in the message?”

You laugh nervously.

“Well, uh, I guess? I mean I said that something went wrong and I gave you a line number. That’s usually enough, right?”

“Ah, that’ll be it. There’s a bug in our filing software you see. It categorises programmers who write bad error messages in with the betrayers, child molesters and people who talk in cinemas. No one is quite sure why. Every time we try to rerun the result it just says ‘Out of cheese error. Redo from start'”

“Oh. Um. Well, maybe I could help you fix your software?”

The demon appears to perk up. “Oh, could you? That would be great. Honestly, bane of my existence. Bloody computers, right? Do you really think you could help?”

“Well I am a software developer. It’s sortof what I do”

“Great. How’s your COBOL?”

You swallow nervously.

“Well, I can probably figure it out given the documentation”

“Oh, we used the documentation to kindle the eternal flame. But it’s fairly self documenting, or so the imps who wrote it told me before I ate them. You should be fine”

“Oh, ok. I guess I’ll just check the test suite”

“What’s a test suite?”

At this point you are a nice green to match the demon’s red.

“I… guess… I… can… just… read the code?”

“Excellent idea! Here, let me show you to your terminal!”

He leads you to a decidedly old fashioned computer, blinking green on black at you.

“Here you go. Good luck! I’ll give you a couple hours to familiarise yourself with the system, and then your introductory flaying will begin. Don’t worry if you can’t figure it out immediately – you can always come back to it later. You’ve got an eternity to sort it out after all”

This entry was posted in Uncategorized on by .

##computer-enthusiasm, an experiment in liking things again

I hang out a lot in ##computer, the IRC channel for Computer Anonymous. I like the people there, I think the code of conduct is good, and I genuinely think it’s a great channel which I am glad exists.


It turns out that when a bunch of people band together as a refuge against how terrible everything is, a lot of the conversation centers around how terrible everything is. There’s frequent linking to examples of things being terrible, a massive amount of snark in response to various events, etc.

Which is fine. That’s a large part of what the channel is for. But given what a large percentage of my IRC time it is, and given that I’m currently living in a city where I’m pretty isolated from everyone in the tech community who isn’t one of my flatmates, it’s making my exposure to the world of tech very unbalanced. There’s a lot of stuff in tech that is terrible, but not everything in tech is terrible, and a picture which makes it looks like it is is proving very bad for my mental health.

So I would like to try an experiment to offset it. A place to talk about things being actually pretty good! And this is where the new sister channel, ##computer-enthusiasm (Note: Two hashes at the front. It’s a freenode thing) comes in.

Here are the rules of ##computer-enthusiasm:

  1. The code of conduct of Computer Anonymous still applies. This is an inclusive space, and isms of all stripes are unwelcome.
  2. No starting conversations about things being bad. In particular no hate-linking to things to show how terrible they are. Note: “Here is this great idea someone had for fixing thing that is bad” is totally allowed and actively encouraged.
  3. If you think something that someone else has brought up is bad, you’re entirely welcome to say so, but please explain why. If possible, try to do so non-judgementally. “I had a bad experience with that software” is fine. “Oh god that software is terrible” is not. “When people do X it tends to exclude people like me who are Y” is fine (and super encouraged) but “Only assholes do X” is not.
  4. Sarcasm and snark are discouraged. This is super hard to police, so we’re not going to, but I would like you to self regulate, and give people gentle nudges if you think they’re doing a bad job of it. If this proves insufficient we can try to come up with something more concrete.
  5. Mentioning or linking to things you think are neat is a great idea. Do as much of that as you want. It doesn’t even have to be tech related.
  6. There is no such thing as off topic as long as it adheres to the above rules.

I wouldn’t want these norms to be in place everywhere – that would be pure cult of positive thinking, which is legit terrible – but I’d like a place where they hold. Tech can be terrible, but it also can be pretty amazing when it goes well, and I’d like somewhere that we can focus on that.

This entry was posted in Uncategorized on by .

Stable serialization and cryptographic hashing for tracking seen objects

This is a clever trick I figured out yesterday. It’s in retrospect moderately obvious, but I’ve never seen it before and it’s taken Hypothesis example tracking down from an expected space complexity of O(mn) and time complexity of O(god I don’t even know) to a deterministic O(n) space complexity and an O(mn) time complexity. It’s also vastly improved the constant factors by moving a bunch of stuff from pure python to native code.

(Actually there’s a bit of a wrinkle in that it’s not strictly deterministic because n is a random function of the number of examples you’re looking for based on the duplicate example rate, but in practice it’s rarely more than a few times larger than the number of examples).

First: The problem. Due to the way its data is generated, Hypothesis has a decent chance of producing the same example more than once (if it didn’t there would be a lot of other interesting examples it struggled to produce). Rather than run the same example multiple times, Hypothesis keeps a record of which examples it’s already tried and doesn’t try them again. This is also used during simplification to avoid loops in the simplify graph (it’s a lot easier to write simplifies that might occasionally have loops and let Hypothesis sort that out than it is for every simplify method to do a lot of careful book keeping).

Previously I was using the obvious data structure to track which examples we’ve already seen: A hash table. Stick the object in a hash table when it’s seen, check in the hash table to see if we’ve already seen it.

There are two problems with this. One is intrinsic, the other implementation specific.

The intrinsic one is that we’re keeping all these objects around! That uses up a lot of memory. Some of these objects may be large nested lists of things, and even if they aren’t Python is not exactly good at compact object representation.

The second problem is that Python’s default equality and hashing are completely unsuitable for this. We don’t want to consider {1} and frozenset({1.0}) equal for these purposes, and we do want to consider nan and nan equal for these purposes. Additionally we need to be able to track things like lists which aren’t hashable. So this means we need our own hashing and equality logic. This was written in pure python and thus was very slow when dealing with large objects.

This is very sub-optimal, but it’s not clear how I could have done it better when tracking arbitrary python objects.

…which is the clue. These days Hypothesis doesn’t need to track arbitrary Python objects. It tracks templates, which I can require to have a much more specific type than the objects tracked. In particular I can require them to be marshallable.

In the end I decided on a slightly laxer requirement because marshalling doesn’t handle certain types I wanted to be able to use (specifically namedtuple instances). Every template needs to be either:

  1. Marshallable
  2. An iterable type containing other templates

Combining these by flattening out collections into tuples + a type tag and then marshalling the whole lot lets us generate a binary string for every template value such that the two binary strings are equal the templates are equivalent (and generally speaking the converse too, though there are some special cases where that might not quite work if the iteration order is unstable and we just accept the slight duplication).

This is already a pretty great saving: We have native equality and hashing which does what we want it to do, and the representation is so much more compact than the Python object representation.

But we can do better! Sometimes (often) the strings we get here will be quite long. When a string is at least twenty bytes long we replace it with its sha1 hash. This gives us yet another space saving (in theory it also gives us a time saving, but the hash for binary strings is quite good and with random data equality will tend to return False early, so in practice I don’t expect equality comparisons between long strings would have been a noticable time cost).

The result of this change has been that some parts of my build are now nearly ten times faster than they used to be, and the memory usage has become much more predictable. It’s also meant that I could ditch a ton of code that I really hated – the custom hashing and equality code used to have other use cases, but this was the last remaining one. Over all, this has been one of my better ideas.

This entry was posted in Uncategorized on by .

An alternate timeline for Stargate SG1

Advance warning: All my non-fiction-writing brain is taken up by writing documentation for Hypothesis, so instead you’re getting teased with another piece of fiction I’ll never actually write.

So I watch actually quite a lot of TV. I rarely sit down and watch TV, but it’s great background for while I’m doing other things – cooking, cleaning, doing exercises, etc.

One of my all time favourite TV series is Stargate SG1. It’s goofy as hell, but it’s a lot of fun and I love the characters.

But… you know, every now and then they have these episodes where Evil Civilian Regulators come in and point out that actually Stargate Command is pretty incompetent and look at all these things they did wrong and maybe there should be some competent people in charge instead?

Obviously these people are civilians and thus secretly evil and a suitable deus ex machina like the thunder god thor descending from the heavens and telling them to back off (this is not a hypothetical example) occurs and the evil civilian attempt to instil some competence is thwarted.

As you can tell, I think the evil civilians have a point. For example, it does not occur to anyone until they nearly unleash a devastating plague that could wipe out all of human civilization that maybe they should have quarantine procedures in place for people coming back from human populations that have been isolated from earth for thousands of years. This takes precisely 4 episodes to bite them.

So I’d like to propose a Stargate AU. Competent!Stargate you might call it. No-one has had a massive intelligence upgrade, the whole thing is not a massive game of Xanatos speed chess, it’s just… everyone behaves at a level of competence and strategic thinking that you would expect from their character and role.

Oh, that and nobody off Earth speaks English because argh seriously?

It’s similar to this alternate starting point for Atlantis, but not compatible with it.

Why is it not compatible with it? Well, easy. Stargate Command couldn’t possibly spare Doctor Weir from her role as head of the civilian half of Stargate Command, a position she’s held since Catherine retired from it.

Uh, let me start from the beginning.

It starts with a very simple point of departure. A conversation between Colonel O’Neill (the actual point of departure is that he’s had two ls from the beginning which put him in a much better state of mind) and General West.

O’Neill: General, we detonated the bomb and eliminated the threat, as per orders.
West: And is the Stargate on the other side destroyed?
O’Neill: Well, no.
West: All right, we’ll send through another bomb.
O’Neill: You can’t do that, General.
West: And why not exactly?
O’Neill: Well because on the other side there’s no threat, but there are an awful lot of innocent civilians. Oh and also a large quantity of the mineral that the Stargate is made out of which by all accounts is an amazing power source.
West: Hmm. A power source you say?
O’Neill: Ra said it would enhance our nuke by, uh, a lot.
West: I do like enhanced nuclear weapons…

An expedition is sent back through the gate, where they’re met by Daniel Jackson and a bunch of the Abydonians. It is politely explained to them that while they are welcome guests they are on the Abydonians’ land and will not be allowed to just start mining Naquadah. Daniel requests Catherine be involved in negotiating rights.

Between Daniel and Catherine they negotiate a deal for a provisional survey in exchange for food, medicines and a variety of other things that the Abydonians could really use from modern earth civilizations. Captain Carter comes through the gate and is so ridiculously excited by the Naquadah that she talks general West, and through him the US government, into realising they basically have to have these mining rights.

Daniel Jackson is not very keen on the military at this point, so part of the bargaining is that he really wants to deal with civilians, and preferably ones he trusts. Eventually a compromise agreement is reached where Stargate Command will be a join military/civilian operation with Catherine in charge of the civilian side, especially focused around coordinating the scientific and archaeological research teams on Abydos and liasons with the Abydonians.

The US is leased mineral rights to mine for Naquadah, plus the assistance of several Abydonians who are experienced in its mining, in exchange for a merely extortionate fee and assistance with converting that fee into personnel and goods.

So they get cracking on this. Carter heads up a team researching the various applications of Naquadah. Daniel assists with the archaeologists investigating the pyramid.

After some months of discovery they discover the Abydos cartouche and Sam rapidly figures out the program she wrote in canon SG1 to convert these into coordinates they can dial. The SGC’s exploration program begins 8 months early, long before Apophis has discovered the coordinates to earth or Abydos.

Colonel O’Neill asks to head up SG1 as he’s really bored with the military side of the SGC – it’s all depressingly quiet, and he wants to see the universe. Several members of his old team join him. Daniel Jackson and Sam Carter are way too busy doing important research and it’s not like they’ve been incompetent enough to get anyone they cared about be abducted by aliens, so why would they come along?

After a planet or two of cautious exploration, they discover that basically Goa’uld is the lingua franca of the galaxy. It’s not that far from the dialect they speak on Abydos, but none of the US military personnel have bothered to learn more than a couple words like “moonshine” in it, so this doesn’t go so well.

A new plan is formed: There are enough Abydonians who speak tolerable English and who want to see the universe who are more than happy to go exploring. Each SG team is equipped with one of them as a translator. Additionally, the SGC puts on regular classes in Goa’uld. Everyone who wishes to go offworld is required to attend these classes – the translators are a sufficient solution for now, so exploration continues, but being in an SG team and not trying to learn to speak the language is not an option.

Jack picks it up surprisingly quickly, with help from Skaara (they play baseball together on weekends), though his accent is and will remain atrocious.

After some months of exploration and talking to people they learn about the Goa’uld in more detail. They’ve yet to actually encounter any, but have figured out that Ra was definitely not the last of his race and that they should be prepared.

As part of this preparedness they realise they need better security to prevent invasion through the gate. The SGC installs Irises on both the earth and Abydos gates (the latter under the control of the Abydonians – they insisted, and Earth was keen enough to protect their Naquadah mine that they didn’t really have the option to say no).

After some less than responsible behaviour from people when confronted by natives who think they are gods, the SGC institutes a program of extremely strict psychological screening for teams. Jack is raised as a potential issue but persuades the psychologists that he is dealing well with his grief. Jonas Hanson is flagged as unsuitable for first contact situations.

Carter and her team meanwhile are making extremely good progress figuring out Naquadah power sources, given the ample supply of raw material and Goa’uld technology left on Abydos. She does miss being in the field but it would be cutting into way too much valuable research time. She’s the leading expert on this stuff.

Daniel on the other hand feels that the rest of the archaeological team has the study of the history of Abydos well in hand and requests to join SG1 with Jack. Jack is enthusiastic about the idea and his membership is accepted.

The SGC exploration teams discover Avnil and its abandoned Naquadah mines. After the aforementioned entirely psychologically stable team comes through, informs the locals that they are not gods, and does some investigation they find the radiation shield and turn it on (ably assisted by Carter’s team of scientists and engineers who while they haven’t yet got the underlying principles very clear yet are more than capable of finding the on switch). The locals are ridiculously grateful and are more than happy to let them set up a modern mining operation here. The mine is pretty close to depleted, so it’s not a patch on the Abydos mining operation, but the SGC are keen on having a redundant operation which they have much less ambiguous rights to.

They install an iris on the Avnil stargate and declare it to be their beta site. A garrison is established there and SG teams are routed through it when exploring so as to provide a secondary quarantine measure.

Apophis finds Ra’s cache of hidden addresses and dials both earth and Abydos. He does not have the relevant codes, so the Irises do not open and he concludes the gates must be buried.

The SGC encounter Apophis’s forces. They attempt peaceful negotiation, but Apophis is having none of these uppity humans who thinks they are his equal.

A Guerilla war between Apophis’s forces and the SGC begins. The SGC’s goal is generally to avoid conflict, but generally speaking where conflict occurs they wipe the floor with Apophis’s forces. They’re better at tactics and have better weapons (it’s basically canon that Earth projectile weapons are vastly more effective than the staffs the Jaffa are typically armed with).

Teal’c and Bra’tac hear word of the fact that there is a human force out there engaged in regular stargate travel and capable of taking on the Goa’uld. They start quietly sounding out people for starting the Jaffa rebellion. They gradually grow their network, with instructions to try to make contact with anyone from the SGC they run into.

However first, SG1 are met by a man with glowing eyes who tells them that not all Goa’uld are as they think. Although some are parasites, others achieve true symbiosis – the host and the symbiote sharing the body and making decisions together. They speak with both the host and he confirms this story.

That Goa’uld’s name? Ba’al.

Obviously they’re not stupid enough to do anything that would give Ba’aal the coordinates of earth, but they set up yet another off world base with an iris and give Ba’al the access codes for that. This becomes their point of coordination for the new offensive against Apophis.

Assisted by Ba’aal who provides them with ships and technology while they provide the manpower, Apophis is rapidly put on the defensive.

Jack O’Neill leads the final invasion of Apophis’s palace. There he faces Teal’c, who tells him that he has important information about their new “ally” Ba’al, and surrenders to O’Neill. Teal’c is taken back to Earth to be debriefed.

Apophis however is nowhere to be found, having escaped through the Stargate before the invasion.

Meanwhile, Ba’als forces have successfully invaded Apophis’s private sanctum, where they find, among other things his private stash of Stargate addresses, including the ones he obtained from Ra.

Acting on this information, Ba’al and his servants add these addresses to their program of exploration. They are particularly interested in this one for the Tau’ri given their interesting new allies with the remarkably populous planet.

Season 1 ends.


  • I really like Sam as a character, but it just makes no sense at all for her to be on the front line given her expertise.
  • Daniel on the other hand actually is a field researcher, so has reason to join SG1, he’s just got not quite as strong an incentive as in canon
  • In Competent!Stargate it’s really hard to get Teal’c into contact with SG1, which is why it takes until end of season 1.
  • Ba’al’s appearance is brought forward four seasons. This is partly because he’s the only actually credibly competent threat amongst the Goa’uld and partly because as a competent Goa’uld there’s no way he wouldn’t take this opportunity to cement his power amongst the system lords (in canon the year after Ra’s death is basically one giant power struggle amongst the system lords, with Ba’al, Apophis and several others coming out on top).
  • Technological advancement proceeds a little faster than in canon because of the much greater availability of Naquadah and Goa’uld technology from Abydos, and later from Ba’al. Also because we actually have Sam Carter and her team working on it as a full time job. It doesn’t proceed ridiculously fast though because they’re still trying to reverse engineer a vastly more advanced alien technology – e.g. it will probably still take contact with the Orbanians before there are portable Naquadah generators.



This entry was posted in Uncategorized on by .