Note: This is more of a chapter fragment than a full chapter. Sorry.

The report only took about a ksec to compose – it’s not like there was a great deal to say. “Look, some legacy code that we should raise the rewrite priority on” is practically routine. I had weak evidence that it was a root cause for a problem, but more importantly it had raised its head and got in the way, and for something with this many red flags on it that was a sign that we should look into replacing it. Step one of that was simulation.

I thought for a bit. Of course, there was no reason we couldn’t do step two in parallel, and it would make the eventual simulation much easier and higher impact.

“Ide, can we synthesise a replacement program?”

“We lack a sufficient formal model of Go# to do so formally, but the program interface is sufficiently constrained and the typical program lifetime is short enough that it is likely that an empirical sampling method would suffice to guarantee a replacement within no more than five megaseconds.”

“Hmm. Can we speed that up?”

“Given sufficient simulation resources a trial candidate for phased roll-out could be synthesized in approximately 500 kiloseconds.”

Ugh, right. Simulation time which we didn’t have.

“OK. Start synthesizing a replacement in real time then.”

“Scheduling now.”

That was probably about as much as I could do with this particular program in isolation for now.

“Is there anything downstream of this?”

“Program influence terminates in physical control of plumbing temperature regulation with no additional software control.”

So, effectively, *everything* was downstream of this. The problem with working on plumbing is that the main communication channel was the physical environment. It makes for some… interesting interactions.

I checked for Kimiko’s availability and got that they were still busy. A quick check of my social modelling software confirmed a hunch: Around 60% chance they were stalling me.

Oh well. That was their prerogative, and I could certainly understand wanting to put off a difficult conversation.

I wasn’t exactly sure *how* they would know that this was going to be a difficult conversation, but other people were weirdly good at spotting that kind of thing.

I decided to to continue working to distract myself.

“Ide, give me another interesting issue in the area.”

Next chapter when it happens. Nominally in two weeks time, but the schedule has become very erratic. Hopefully the next one will be longer.

Like this? Why not support me on Patreon! It’s good for the soul, and will result in you seeing chapters as they’re written (which is currently not very much earlier than they’re published, but hopefully I’ll get back on track soon).

]]>The auction was won by Zack M. Davis. He bid £550 and paid £501 for help working with some maths self-study on Causal Inference. We’re working through understanding a textbook he’s studying, and did one three hour session over Google hangouts and will do another one later this week.

I think we were both a little sceptical that the format would work well (even without the fact that I don’t know much about causal inference), but if so we were both wrong. There were some difficulties – communicating about maths without a shared blackboard (or, if you insist, whiteboard) is hard, but fortunately we had the textbook in front of us for reference and could fall back to typing if necessary, which worked more than well enough. At the end I think we both understood the material a lot better than at the beginning, and Zack seemed very happy with the result (and I enjoyed it too!).

So all told I’m calling that a success: I’m happy and Zack is happy. I got an amount of money for my time that I’m perfectly happy with (it’s a lot less than I’d do for my high priced consulting, but a pretty good day rate for general purpose dev contracting), and Zack got help that he found useful at a price he was happy to pay. A good time was had by all.

Some statistics for the interested:

- There were a total of 15 bids.
- The second and first prices were quite a lot higher than the others. The third price bid was £200.
- The lowest bid was £15
- The median (and modal) bid was £100.
- Not terribly surprisingly, almost every bid came from someone in a personal capacity rather than a company. The exceptions were people who were basically freelancers who have their own company for their work and Beeminder, probably partly because Danny heard “Vickrey auction” and came running.

In general, I’m very pleased with this experiment, and will be repeating it by running another auction next month (I’m skipping this month both because of the delay and because everything is a bit busy right now).

]]>But this runs into a problem I’ve pointed out with voting on group meals before: If you do it often, you consistently make the same sort of choices over and over again, excluding the same people each time. If the group is mostly fine with Wednesday or Thursdays but has a slight Thursday preference, if I can never do Thursdays then I’m out of luck.

As in the previous lunch post, random ballot solves this: It gives you proportional representation of interests over time.

But while this worked reasonably well for lunch places, it doesn’t work so well for scheduling: Typical scheduling problems have any given person being genuinely indifferent between four or five options. Why force them to express a strict preference for one of those options, especially when it may be that some of those days work much better for some people than others?

So for scheduling it seems like approval voting is in fact very well suited. But is there a system that combines the benefits of both?

There is! It’s in fact an extremely direct hybrid of the two, but weirdly not one I’ve seen before.

- Everyone casts a vote as in approval voting. That is, they specify a list of acceptable options.
- When tallying, you pick a random voter and restrict the set of possible winners to the ones on their ballot.
- Amongst that restricted set, votes are tallied as per normal approval voting, picking the one that the largest number of people approve of.

And that’s all there is to it. It satisfies the proportionality criterion that any voting block will on average get an acceptable answer at least in proportion to the size of the bloc (they may get it much more than that, but that’s intrinsic to the problem: e.g. the set of people who are happy with every answer will get an acceptable answer 100% of the time), while still being broadly majoritarian because of the approval voting component.

You can also add a threshold if you like. Because each person votes for multiple options, doing so is much simpler than in normal random ballot, and much more reasonable, so you could quite easily decide to e.g. exclude any options from the schedule that fewer than half of the people can go to (or even more complicated criterion like e.g. it being at least some fixed fraction of the approval winning outcome). The result is that you only end up excluding people who really can’t make any day that most other people in the group can. Which is fair enough really, though a bit unfortunate.

Is this applicable to non-scheduling problems? I don’t know. It might be! I think it adds a strong centrist bias to random ballot, which can be both good and bad.

The properties of scheduling which matter here are I think:

- Every voter typically has many acceptable options
- It’s something done often enough that the randomness tends to balance out
- It’s something where despite that consensus is probably more important than true proportionality.

It also has the nice property that experimenting with voting systems for scheduling is rather low impact. I’d want to do a more formal analysis of this before proposing it for anything more serious, but I suspect it might still have some nice properties that make it worth considering.

]]>Sam and I worked in companionable silence for about five kiloseconds, but eventually they had to go lead a Krav Maga session, so we kissed each other goodbye. Kimiko was still flagged as busy, so I took the opportunity to retreat to a pod to do some real work.

Uh, not that social network analysis *isn’t* real work you understand, it’s just not exactly in my remit. It’s a useful skill to keep your hand in on, but I try to avoid the trap of becoming a generalist.

I reviewed where I was on my current task: I still didn’t know much about what was going wrong, but had a hint that it was something to do with temperature events.

At this point I *could* do an exhaustive analysis and try to binary search out the exact problem. It would take ages and require a lot of detail work, but it would almost certainly work in narrowing down at least one real problem.

But I wasn’t really in the right frame of mind for detail work, so I decided to gamble instead.

“Ide, show me something interesting to do with the current task.”

“I have a temperature control program marked as critical that exhibits anomalous command output prior to the event and currently has a failing build. Is that suitable?”

“Perfect.”

Almost too perfect in fact. I wondered why that hadn’t that flagged up before.

“How many other equally interesting things could you have shown me?”

“113”

RIght.

“OK, call up the specs for this program.”

Subject: Stochastic Temperature Control Feedback Regulation Unit 3 Origin: New Earth 2 Language: Go# Importance: Critical Reliability: High Obsolescence: High Fragility: High Notes: It might be best to leave now, you probably shouldn't touch this.

That wasn’t encouraging. Also, I wasn’t thrilled by the idea of learning about *another* weird Grounder programming language.

I sighed. Still, I wasn’t just going to stop without looking into it a *bit*.

“Wiki, show me the specs for Go#”

Subject: Go# Category: Programming language, text based. Lineage: Pre-diaspora, began as a dialect of Go in 2021. Common Tags: Archaic, Esoteric, Moderate Complexity, Evolutionary Dead End, Poorly Thought Out. Normalised Rating: Please tell me you're not still using this.

*Definitely less than* encouraging.

“OK, show me the failing build step.”

Ide displayed a bunch of code for me. I can’t say I understood any of it, but one thing stood out.

“Wiki, what’s Hypothesis?”

“Hypothesis is a generic term for a family of testing tools that were popular for a period of approximately five gigaseconds before and around the diaspora. They work by generating random data to run a conventional unit test against.”

“Wow, really? Does that work?”

“Significantly better than the methods that predate it. Unassisted humans tend to to be very bad at writing correct software, which results in many shallow bugs that simple random testing can uncover. However, it has largely been supplanted by modern symbolic execution and formal methods, as the number of bugs it finds grows logarithmically.”

“Ide, how long did it take to find this particular bug?”

“Approximately nine gigaseconds of compute time.”

Wow. This code had run for most of a crew lifespan before eventually finding a bug. That was rather adorable. I vaguely saluted whatever grounder was responsible for this thing, and reflected on how grateful I was to not be them and to have access to modern tooling.

“How long would it have taken given appropriate formal methods?”

“Difficult to estimate due to low availability of formal models for this language. However, based on the execution trace this is a known bug in OpenSSH, where the bug was found within the first four seconds of active testing of it under a more modern test suite written as a student exercise in a class on software archaeology on the Star Struck three gigaseconds ago coordinated time.

That was about what I expected.

“Wiki, what’s OpenSSH?”

“It is a secure network communication protocol, originally designed to provide remote access to a system via a local PTY.”

“What’s a PTY?”

“Warning: This information has been tagged as a memetic hazard, subcategory can of worms. Do you wish to proceed?”

I blinked. That was unexpected. I was almost curious enough to proceed anyway, but these warnings were usually worth taking seriously and they didn’t normally get attached to *interesting* awful information. Besides, this really wasn’t that relevant.

“No, that’s fine.”

I thought for a bit. I was pretty sure why this known bug was still present, but decided to check anyway.

“Ide, why has this bug not been fixed despite being known?”

“Due to the rating of this process as high in all of criticality, stability and fragility, it was flagged as an ultra-low priority fix.”

That’s what I thought. It works, but trying to fix it is probably more likely to break it, and then the plumbing backs up. Not unlike the problem I’d run into with the ramscoop, but the difference was that one this one *was* in my remit.

“Is this bug being triggered in the wild?”

“Unknown as to whether the particular sequence of events Hypothesis has found are present in the wild, but logs indicate that the underlying OpenSSH bug is triggered.”

“Is it being triggered in the vicinity of the anomalous event?”

“Yes”.

OK. So this was definitely a plausible culprit.

“Can we run a simulation of what would happen if this bug was fixed?”

“Warning: At current resource availability, such a simulation would take 0.8 gigaseconds to complete.”

“Ugh. Show me the cost curve.”

I looked at the cost curve. Right. All those game theory simulations the programmers at arms were running were taking up most of our spare capacity, and I didn’t have budget to outbid them on anything except the very tiny subset of capacity I had priority on.

Which wasn’t wholly unfair. But I now had evidence that a critical system was misbehaving and might be triggering an anomalous plumbing event, which was serious. Granted it was less *important* than the fate of humanity, but it might be higher priority. Time to try and free up some budget for simulation.

I sighed, and started putting together a report.

Next chapter: Can we speed that up?

Like this? Why not support me on Patreon! It’s good for the soul, and will result in you seeing chapters as they’re written (which is currently not very much earlier than they’re published, but hopefully I’ll get back on track soon).

]]>Some of this is because the way people think about binary search is somewhat flawed. It’s often treated as being about sorted arrays data, when that’s really only one application of it. So lets start with a review of what the *right* way to think about binary search is.

We have two integers \(a\) and \(b\) (probably non-negative, but it doesn’t matter), with \(a < b\). We also have some function that takes integers \(a \leq i \leq b\), with \(f(a) \neq f(b)\). We want to find \(c\) with \(a \leq c < b\) such that \(f(c) \neq f(c+ 1)\).

i.e. we’re looking to find a single exact point at which a function *changes value*. For functions that are monotonic (that is, either non-increasing or non-decreasing), this point will be unique, but in general it may not be.

To recover the normal idea of binary search, suppose we have some array \(xs\) of length \(n\). We want to find the smallest insertion point for some value \(v\).

To do this, we can use the function \(f(i)\) that that returns whether \(xs[i] < v\). Either this function is constantly true (in which case every element is < v and v should be inserted at the end), constantly false (in which case v should be inserted at the beginning), or the index i with \(f(i) \neq f(i + 1)\) is the point after which \(v\) should be inserted.

This also helps clarify the logic for writing a binary search:

```
def binary_search(f, lo, hi):
# Invariant: f(hi) != f(lo)
while lo + 1 < hi:
assert f(lo) != f(hi)
mid = (lo + hi) // 2
if f(mid) == f(lo):
lo = mid
else:
hi = mid
return lo
```

Every iteration we cut the interval in half - because we know the gap between them is at least one, this must reduce the length. If \(f\) gives the same value to the midpoint as to lo, it must be our new lower bound, if not it must be our new upper bound (note that generically in this case we might not have \(f(mid) = f(hi)\), though in the typical case where \(f\) only takes two values we will).

Anyway, all of this is besides the point of this post, it's just scene setting.

Because the point of this post is this: Is this actually optimal?

Generically, yes it is. If we consider the functions \(f_k(i) = i < k\), each value we examine can only cut out half of these functions, so we must ask at least \(\log_2(b - a)\) questions, so binary search is optimal. But that's the generic case. In a lot of typical cases we have something else going for us: Often we expect change to be quite frequent, or at least to be very close to the lower bound. For example, suppose we were binary searching for a small value in a sorted list. Chances are good it's going to be a lot closer to the left hand side than the right, but we're going to do a full \(\log_2(n)\) calls every single time. We can solve this by starting the binary search with an exponential probe - we try small values, growing the gap by a factor of two each time, until we find one that gives a different value. This then gives us a (hopefully smaller) upper bound, and a lower bound somewhat closer to that.

```
def exponential_probe(f, lo, hi):
gap = 1
while lo + gap < hi:
if f(lo + gap) == f(lo):
lo += gap
gap *= 2
else:
return lo, lo + gap
return lo, hi
```

We can then put these together to give a better search algorithm, by using the exponential probe as the new upper bound for our binary search:

```
def find_change_point(f, lo, hi):
assert f(lo) != f(hi)
return binary_search(f, *exponential_probe(f, lo, hi))
```

When the value found is near or after the middle, this will end up being more expensive by a factor of about two - we have to do an extra \(\log_2(n)\) calls to probe up to the midpoint - but when the heuristic wins it potentially wins big - it will often take the \(\log_2(n)\) factor (which although not huge can easily be in the 10-20 range for reasonable sized gaps) and turn it into 1 or 2. Complexity wise, this will run in \(O(\log(k - lo)\), where \(k\) is the value returned, rather than the original \(O(hi - lo)\).

This idea isn't as intrinsically valuable as binary search, because it doesn't really improve the constant factors or the response to random data, but it's still very useful in a lot of real world applications. I first came across it in the context of timsort, which uses this to find a good merge point when merging two sublists in its merge step.

Edit to note: It was pointed out to me on Twitter that I'm relying on python's bigints to avoid the overflow problem that binary search will have if you implement it on fixed sized integers. I did know this at one point, but I confess I had forgotten. The above code works fine in Python, but if int is fixed size you want the following slightly less clear versions:

```
def midpoint(lo, hi):
if lo <= 0 and hi >= 0:
return (lo + hi) // 2
else:
return lo + (hi - lo) // 2
def binary_search(f, lo, hi):
# Invariant: f(hi) != f(lo)
while lo + 1 < hi:
assert f(lo) != f(hi)
mid = midpoint(lo, hi)
if f(mid) == f(lo):
lo = mid
else:
hi = mid
return lo
def exponential_probe(f, lo, hi):
gap = 1
midway = midpoint(lo, hi)
while True:
if f(lo + gap) == f(lo):
lo += gap
if lo >= midway:
break
else:
gap *= 2
else:
hi = lo + gap
break
return lo, hi
```

These avoid calculating any intermediate integers which overflow in the midpoint calculation:

- If \(lo \leq 0\) and \(hi \geq 0\) then \(lo \leq hi + lo \leq hi\), so is representable.
- If \(lo \geq 0\) then \(0 \leq hi - lo \leq hi\), so is representable.

The reason we need the two different cases is that e.g. if \(lo\) were INT_MIN and \(hi\) were INT_MAX, then \(hi - lo\) would overflow but \(lo + hi\) would be fine. Conversely if \(lo\) were INT_MAX - 1 and \(hi\) were INT_MAX, \(hi - lo\) would be fine but \(hi + lo\) would overflow.

The following should then be a branch free way of doing the same:

```
def midpoint(lo, hi):
large_part = lo // 2 + hi // 2
small_part = ((lo & 1) + (hi & 1)) // 2
return large_part + small_part
```

We calculate (x + y) // 2 as x // 2 + y // 2, and then we fix up the rounding error this causes by calculating the midpoint of the low bits correctly. The intermediate parts don't overflow because we know the first sum fits in \([lo, hi]\), and the second fits in \([0, 1]\). The final sum also fits in \([lo, hi]\) so also doesn't overflow.

I haven't verified this part too carefully, but Hypothesis tells me it at least works for Python's big integers, and I think it should still work for normal C integers.

]]>