How to make good things

I like to make things that are good. Sometimes I even succeed.

I realised recently that there’s a pattern a lot of my successful attempts (and many of my unsuccessful attempts too. There is no royal road to quality) match that feels different from how other people often seem to go about doing this, so I thought it might be worth writing down and codifying.

As with all such patterns, this post is probably not literally true, but I think it might be quite helpful anyway.

So suppose you want to make something good? What do you do?

“Make” can be a fairly general term here. I use something like this technique for making recommendations to people as much as I do for creating new things. The creative examples are probably more interesting because they’re harder work and easier to get wrong, but I’ll use both for examples.

The system that I have more or less boils down to two main principles:

  • Start with examples of things that are good that you don’t want to make
  • Define rules for your project and stick to them until you can’t feasibly do so

If you want the short, pithy, advice and don’t care about why, you can stop here. The rest of this article is going to be about explaining where these come from, how they help, and how to effectively work with them.

These principles are mostly designed around the desire to take an unstructured creative process and lightly structure it in a way that doesn’t add much cost but tends to improve the outcome.

When trying to make something, I think most people rely on a fairly unstructured process for figuring it out, which results in a plan of attack that looks something like the following:

  1. Decide what “good” looks like for this problem.
  2. Make something good.

It’s not an unreasonable plan of attack, and in many cases it will succeed.

Examples of how applying this strategy looks:

  1. When recommending a book, just suggest a book you really liked.
  2. When writing software, ask people what their problem is and write software that solves that problem.
  3. When writing a story, come up with an idea for it and write that story really well.

All of these more or less succeed or fail on the strength of how well your skills fit the problem and how well you apply them to it: If your taste in books is a good proxy for the person you’re recommending a book to, you will recommend books well. If you’re good at writing software and understanding peoples’ problems you will write software that solves their problems. If you’re a good writer, you will produce a good book.

There’s a lot more to say about it than that, but it’s more or less besides the point of this essay. For now I will take it as read that if you’re good at what you do and you execute it well you will mostly produce something good.

Mostly.

The major problem with this plan of attack for me is novelty.

Often novelty makes the difference between something that is good and something that is merely OK.

Example: I actually really enjoy Elementary as a TV show, but I wouldn’t exactly describe it as good.

Because let’s be honest. We’ve seen this show before.

There’s nothing wrong with making the same thing over and over again. It’s a useful skill building exercise, particularly when you’re starting out, and often as a consumer of stuff I don’t necessarily want new and challenging, you just want something fun that you know you’re going to like.

But even within that some novelty is important:

For learning: Making the same large sort of thing over and over again quickly plateaus as a learning technique. Practising a skill repeatedly only works well if the problem is extremely small so you get fast feedback. At a larger scale if you’re not doing something substantially different you’re learning progressively less and less with each repeat.

For consumption: Even within the brain candy category, I think we overestimate how much we want the same thing over and over again. If we really wanted that we’d just watch reruns or reread the book. You know how an author’s second book is often a lot worse than their first book? It’s probably mostly not. It’s just that the spice that novelty added has worn off and you’re seeing a more “objective” measure of how good their work actually is.

So those are the reasons why novelty is Actually Important. It’s a stronger priority than that for me – I just don’t like retreading well explored ground – but even if you don’t go as far as me on that front novelty is still important.

(side note: This will probably prompt someone to send me a link to prior art for the subject of this post as a hilarious joke. Please do. I don’t read enough on this and I’d like to read more, but it’s hard to separate the wheat from the chaff so I often don’t bother)

So, the key to producing something good is novelty? Great. Let’s make something novel.

Unfortunately just setting out to produce something novel also doesn’t work. If you just seek novelty then you will probably fail at the goal of being good. Novelty is easy: Pick a bunch of random things (use dice or cards or computers because humans are bad at random), throw them together, see what happens.

What will usually happen is it won’t work. I can pick a random book and the person I’ve recommended probably won’t have read it, but they probably also won’t like it. I can implement this Facebook for Dogs in a custom Forth interpreter implemented in C using continuation passing style, but it will probably segfault and even if it doesn’t it’s still Facebook for Dogs. I can write a story about ten random characters saving the world from an attack force of flying cheese graters, but at best you’re going to get an absurdist comedy.

Even at milder levels, novelty can often be a trap: You get recommendations that are more weird than good, half-baked prototypes, or stories where the core idea doesn’t really work. (I have done all these things).

So you need to find a way to produce things that are both novel and good. Neither suffices on their own.

You can’t optimise for two things simultaneously, so you need to figure out a way to combine them. One way would be to allow the two to trade off against each other – some amount of novelty can be sacrificed for some amount of goodness – but I mostly find that produces results where you’re not very satisfied with it on either front.

Instead the way I like to think about this is that novelty should never be the goal, but it can reasonably be a constraint. You are not trying to produce something maximally novel, you are trying to produce something maximally good which is also novel enough.

This matches what we’re trying to do much better: Goodness is the priority, but novelty is a necessity.

And the way to work for this is not to seek novelty, but to have a process that produces novelty automatically while you work on making it good. Fortunately, I have one of those to share with you.

The basic starting point is to take this idea of novelty as a constraint and modify the starting procedure as follows:

  1. Decide what “good” looks like here.
  2. Make something good that is also novel.

Note: This is a bad plan.

The problem with this plan is that the constraint “it should be novel” is too fuzzy. Fuzzy constraints are the enemy – they either get compromised on as you work or you constantly second guess yourself about them. If you try to follow the above plan you will usually end up with something where you’re not really sure it’s good and you’re not really sure it’s novel (regardless of how good or novel it actually is if you’re anything like me on this front). You end up with something mostly like the attempt to optimise for the combination of both but with more self-doubt.

The following refined plan is better:

  1. Decide what “good” looks like here.
  2. Decide how what you make will be novel.
  3. Make something good that is also novel in the prescribed way.

This is better. It solves the fuzzy constraints problem by making them non-fuzzy.

But it’s still not great. The problem is that it forces you to perform most of your novelty up front when you least understand the problem. That’s not really how creativity works – often the most interesting ideas will only occur to you after you’ve been bashing your head against the problem for a while.

But you can get around that quite easily: The idea is to not come up with a specific novel feature out of thin air, but instead to create constraints that when satisfied will automatically produce novelty.

That is, instead of deciding how what you make is going to be novel, you invert it. You decide how it won’t be like the prior art.

You do this by producing a set of principles with which the work must comply, generated by something that looks roughly like the following algorithm:

  1. Pick some prior art you like.
  2. Find something about it you don’t want to emulate. If there’s something about it you actively don’t like then that’s great, use that. If not just find some other way it would be interesting to be different from it.
  3. Create a simple rule that splits the space of possibilities on an interesting axis and avoids that thing.

Iterate this, each time picking prior art that satisfies the rules you have so far, until you’re bored of doing this or you can’t think of anything you like that doesn’t satisfy those rules.

Here are some examples of me doing this:

  1. Recommending fantasy that is not like Tolkien and doesn’t contain elves.
  2. It is really annoying when QuickCheck and similar fail in erratic ways, so Hypothesis takes as a core design constraint that when you rerun a failing test it should immediately fail with the same error as before.
  3. In Programmer at Large, in order to avoid it being like just about every other nerd power fantasy book (which, to be clear, is a genre that I totally read and enjoy as a nerd with power fantasies), the protagonist is specifically designed to not be especially brilliant or competent and just be run of the mill good at their job.

An important thing to note is that you are not picking prior art that is bad. You are picking prior art that is good. It’s easy to pick on examples you hate, but it won’t produce nearly as interesting results. Just trying to avoid being bad is a recipe for mediocrity. The key here is that we’re trying to be good but different.

This solution almost works, but it has two failure modes that it needs adapting to avoid:

  1. You might fail to produce something that is both good and satisfies the rules (either because you couldn’t satisfy all the rules at all, or because you could but the result wasn’t very good).
  2. You produced something good that satisfied the constraints but it turns out to not be all that novel and in fact is quite similar to something that already exists in ways that annoy you.

The second is only really a failure mode in the sense that you’ve produced something no better than the original method: You’ve still produced something good, even if it isn’t novel enough. You might be happy with that. One worry is that you could have produced something significantly less good than you otherwise have without the constraints, but I don’t usually find that’s a problem – often solving in a constrained problem domain is just a good way to make you think about the problem harder and produces results that are better even when they’re not novel. The relation between creativity and constraints is so well trodden at this point that it’s practically a cliche.

This also tends to mean that in the first failure mode the case to worry about is not really “I produced something that satisfied these constraints but it wasn’t very good”. It’s not that that doesn’t happen, but it tends to happen much less often than you might expect. The real thing to worry about is that you’ve made life too hard for yourself to come up with any solution to all the rules – either because none exists or because it’s too hard to find.

With both of these failure modes you can solve this problem by restarting the process and changing the rules – if you couldn’t satisfy them, figure out some relaxed or different rules. If satisfying them produced something very novel you now have a couple new examples to try to exclude!

Depending on how hard the process is, this might be a perfectly reasonable thing to do. If you’re recommending a book or working on a problem you can solve in an afternoon, this is probably fine. With larger things you can also use whatever is left over from the first iteration as as raw material to stop you from having to start completely from scratch.

But even when restarting isn’t that expensive, it’s probably still best not to do it two much.

The “not actually novel” trap is hard to avoid when you don’t have a good idea of what exists. The best fix for this is familiarity with the prior art, but you can also outsource this – if you can’t think of examples satisfying your rules, ask other people for some! Then become familiar with those examples.

For avoiding the trap of creating too restrictive rules, I find that it’s useful to maintain an existence proof as you build your rules: An example that satisfies all the rules you’ve got so far. This can be an existing thing, or a sketch concept of how you could satisfy the rules. When you add a rule you either pick one that the existence proof satisfies or one where you can find a new example satisfying it and all the existing ones.

The reason why this works and isn’t the same as just solving the problem as you go is this: Your existence proof doesn’t have to be good. In many ways it’s often better if it’s bad because that gives you a starting point for the next part of the creative process: Why is this bad and can I fix that?

Instead you can just solve it through brute force: Solve the problem directly in front of you and do the simplest thing that can possibly work without thinking about the bigger picture. All you’re trying to do is show that a solution is possible, not find a good one right now.

I think of a lot of the early versions of Hypothesis as being like this. The modern implementation of Hypothesis only became possible because it passed through a number of intermediate solutions that were kinda awful but demonstrated that a solution was possible.

So, to put this all together:

  1. Maintain lists of rules along with accompanying reference examples, which starts empty, and an existence proof, which is any example of the thing you’re trying to create that satisfies all the rules (regardless of whether it is good or not).
  2. Come up with an example of something you like that satisfies all the existing rules. This might be your existence proof or it might be something else. If you can’t think of one, ask people. If they can’t think of one either, proceed to step 4.
  3. Come up with a simple rule that excludes the example you came up with, along with an existence proof that it can be satisfied along with all your existing rules (this may be your existing existence proof). Add the rule to your list along with this as its reference example, update your existence proof if necessary, then go back to step 2.
  4. Try to create something good that satisfies all the rules.
  5. If you created something good, check if you still think it’s novel. If it’s not, go back to step 2 with it as the example. If its, stop. You’re done. Congratulations, you made a good thing!
  6. If you failed to create something good, try to figure out what the rule or rules at fault were and see if you can modify them in such a way that they still exclude their reference examples but avoid the blockage you encountered. Then go back to step 2.

I very rarely (never, I think, though now that I’ve written it down I might try it) actually sit down and followed the above steps. I’ve probably never even done something that perfectly executed those steps implicitly – I’m much worse at searching the existing literature than that would imply, even though the results are usually better for it when I do – but I think they capture a core process pretty well nevertheless.

And that process is extremely beneficial. It forces you onto a path which seeks out novelty, and by doing so through creating constraints it does a very good job of inspiring the levels of creativity that are required to create that novelty.

It’s also another technique that is very good for learning while doing. The focus on rules and constraints forces you to learn a lot about how the problem space fits together. Although it elides a lot of details in the steps (how do you produce something good within the constraints? How do you come up with a rule?), by giving you more focused questions to answer and explicitly exploring the shape of the problem I suspect you’ll learn a lot better than just seeking to produce something good will.

Ultimately regardless of how closely you follow the specific steps, I think the ideas here are important, and thinking in terms of them will help many and possibly most creative processes.

(And if you found this useful, and would like me to keep making good things on this blog, you’d be very welcome to donate to my Patreon for supporting my blogging to say so regardless of whether that’s novel)

This entry was posted in Open sourcing my brain on by .

Programmer at Large: What is this?

GNU GENERAL PUBLIC LICENSE
Version 9, 21 January 2089

Copyright (C) 2089 Free Software Foundation, Inc. <http://fsf.org/>
Everyone is permitted to copy and distribute verbatim copies
of this license document, but changing it is not allowed.

I stared at the words on my HUD. It wasn’t the point. I knew it wasn’t the point. But still… I had to ask.

“Wiki, what is this?”

“It’s the software license that the code is provided under. The GNU General Public License, or GPL, was a series of licenses widely used in pre-diaspora civilisations on Earth.”

“Elaborate. What’s a software license?”

“In large contiguous civilisations with strong contract law it is common that rather than selling software you sell licenses, which grant the buyer the rights to use the software in a particular way.”

“I don’t understand. Why would you ever buy that instead of the software?”

“Typically the software is not made available for purchase.”

“Why?”

“Because the sellers feel that that would limit their ability to sell licenses – the purchasers could simply turn around and undercut them.”

“No but why were they able to do that at all? Why didn’t they immediately get out-competed by people who were selling software?”

“I don’t have a short answer to that question. There is currently a 650 millivote bounty on this question, and I can provide you with several detailed ethnographic studies on the subject if you wish to attempt one?”

“No, never mind”

“OK. Would you like to leave a bounty?”

“Sure. Add, say, 5 millivotes to the bounty.”

“Done”

“What does the license require?”

“It requires that if you provide the software to anyone then you must provide the source code.”

“Sorry, what?”

“I don’t understand. What are you confused by?”

“How were people providing the software without providing the source code? Isn’t the software the same as the source code?”

“At the time that this license was popular it was common that the version of the software that would be provided with a license was in a purely binary form that allowed the software to be executed but not easily modified.”

“You mean they were just providing people with build artifacts?

“That’s correct.”

My skin crawled. You can’t crew an interstellar trader without some exposure to local cultures, and the nature of software archaeology is that you often have to understand the historical context in which things were written, but it’s rare to run into such direct evidence of outright perversion.

“Do people still do that?”

“Approximately 30% of planetary civilizations we visit engage in this practice, but it is commonly understood that it does not make sense for interstellar trade so we rarely encounter the practice directly.”

“Wow.”

The wiki is silent. It’s programmed not to respond to simple exclamations like that.

“Are we compliant with the terms of the license?”

“As far as we can be. Many of the terms of the license refer to concepts that no longer exist or apply to us, and the rest are automatically satisfied by modern software practices. It is generally felt that the creators of the license would be very happy with how we use the software.”

“But we wouldn’t be compliant if we deleted the license header?”

“That is correct.”

“Would it matter if we did it anyway?”

“No entity who could enforce the license still exists. However, the last experiment at removing it globally caused 4,197 build steps to fail, and a 11253CE vote in our inherited constitution declared them to be important cultural heritage which should be preserved.”

“OK, fine, but what if-“

At this point my distraction alarm pinged. I’d passed some threshold of deviance from my intended task and it was making sure I was aware of that.

I could override it – this was a lot more interesting than the mess I was supposed to be looking in to, and I didn’t really feel like spending the time it would take to learn whatever ancient grounder language this C++ was right now – but it was right, I was way off track.

Which probably meant my mind was wandering and it was time to take a break. I waved my HUD into casual mode and exited my pod to head for the common area.

Next Chapter, “What’s your name?”.

This entry was posted in Fiction, Programmer at Large on by .

Convergence of alternating sums

Colin Beveridge tweeted about the following Formula for Pi the other day:

\(\pi = 1 + \frac{1}{2} + \frac{1}{3} + \frac{1}{4} – \frac{1}{5} + \frac{1}{6} + \ldots\)

With the exact formula being:

After the first two terms, the signs are determined as follows: If the denominator is a prime of the form 4m – 1, the sign is positive; if the denominator is a prime of the form 4m + 1, the sign is negative; for composite numbers, the sign is equal the product of the signs of its factors.

This is due to Euler.

(If you just want to see a proof of this, skip to near the end of this post)

I opined that it wasn’t that surprising, because for any real number \(x\) you can get a sequence of alternating signs for this sum which converge to it as follows:

Inductively define \(\epsilon_n\) as follows: \(\epsilon_{n+1}\) is \(1\) if \(\sum\limits_{i \leq n} \frac{\epsilon_i}{n} < x\), else \(\epsilon_{n+1} = -1\).

After the first time the partial sums \(s_n = \sum\limits_{i \leq n} \frac{\epsilon_i}{n} \) cross \(x\), you must have \(|s_n – x| \leq \frac{1}{n}\), so the sum converges to \(x\).

There are many inequivalent sums that lead to \(x\) too. e.g. You can make the convergence arbitrarily slow if you like: If you have some sequence of non-negative numbers \(b_n \to 0\) then you can construct a sequence \(\epsilon_n\) as above where instead of changing the sign every time you cross \(x\), you change the sign to negative when you cross \(x + b_n\), then back to positive when you cross \(x – b_{n+1}\), etc.

There’s nothing special about \(\frac{1}{n}\) here either – all that matters is that it’s a non-negative sequence tending to zero whose sum does not converge.

But there is something special about \(\frac{1}{n}\) here that means we can probably expect a particularly rich set of “natural” examples from it due to a slightly stronger condition it satisfies: It’s in \(l^2\). That is, \(\sum\frac{1}{n^2}\) converges (specifically to \(\frac{\pi^2}{6}\), but that doesn’t matter for now).

Why does this matter?

Well it turns out to be interesting for the following reason: If \(a_n\) is a sequence in \(l^2\), and \(B_1, \ldots B_n, \ldots\) are independent Bernoulli random variables, then \(\sum (-1)^{B_n} a_n\) converges to a finite value with probability 1. That is, if we just randomly pick each sign then then result will converge to something.

This follows from the following stronger theorem: Let \(A_i\) be a sequence of independent random variables with \(E(A_i) = 0\) and \(\sum E(A_i)^2 < \infty\). Then \(\sum A_i\) converges to a finite value with probability one.

First a thing to note: Because of Kolmogorov’s zero-one law, such a sequence will converge with probability \(0\) or \(1\) – no in between values are possible. I think this makes the result significantly less surprising (though we won’t actually use this fact in the proof).

The key idea of the proof is this: We’ll use the bound on the tail behaviour that convergence gives us to look at the \(\lim\inf\) and the \(\lim\sup\) (the limit inferior and limit superior) of the sums of the random variables and show that these are equal almost surely. This implies that the limit exists and is equal to their common value (which might be infinite).

And the way we’ll do that is we’ll consider the possibility that they are well separated: Let \(D_{a, b} = \{ \lim\inf S_n < a < b < \lim\sup S_n\}\), where \(S_n = \sum\limits_{i \leq n} A_i\). Then \(\{\lim\inf S_n < \lim\sup S_n\} = \bigcup\limits_{a, b \in \mathbb{Q}} D_{a, b}\), which is a countable union, so if each \(D_{a, b}\) has probability zero then so does the whole set, and the limit exists.

So let’s now establish an upper bound on the probability of lying in \(D_{a, b}\). Pick any \(N\). If \(D_{a, b}\) occurs then the sums lie below \(a\) and above \(b\) (not at the same time) infinitely often after \(N\) because of the definitions of limit inferior and superior. So we can find some \(N < m < n\) with \(S_m < a\) and \(S_n > b\).

But then \(\sum\limits_{i=m}^n A_i > b – a\), and hence \(\sum\limits_{i=m}^n |A_i| > b – a\).

By the Cauchy-Schwartz inequality, for any \(t_1, \ldots t_n \geq 0\) we have \(n \sum t_i = (1, \ldots, 1) \cdot (t_1, \ldots, t_n) \leq ||(1, \ldots, 1)|| ||(t_1, \ldots, t_n)|| = \sqrt{n} (\sum t_i^2)^{\frac{1}{2}}\), so \(\sum t_i^2 \geq n (\sum t_i)^2\). Edit: This step is wrong. The result is still true because of deeper theory than I go into here (it comes from Martingale theory) and this step may be recoverable. I’ll try to fix it later. Thanks to @sl2c for pointing this out.

In particular applied to the above this means that  \(\sum\limits_{i=m}^n |A_i|^2 \geq (n – m) (\sum |A_i|)^2 \geq (n – m)(b – a)^2 \geq (b – a)^2 \).

But \(\sum\limits_{i=m}^n |A_i|^2 \leq \sum\limits_{i=N}^\infty |A_i|^2\). So we must have \(\sum\limits_{i=N}^\infty |A_i|^2 \geq (b – a)^2\).

But for any non-negative random variable, \(P(X \geq x) \leq \frac{E(X)}{x}\). This means we must have \(P(D_{a, b}) \leq P\left(\sum\limits_{i=N}^\infty |A_i|^2 \geq (b – a)^2\right) \leq \frac{E(\sum\limits_{i=N}^\infty |A_i|^2)}{(b – a)^2} = \frac{\sum\limits_{i=N}^\infty E(|A_i|^2))}{(b – a)^2} \).

But \(N\) was arbitrary, and we know \(\sum\limits_{i=N}^\infty E(|A_i|^2) < \infty\), so as \(N \to 0\) the right hand side tends to zero. Therefore \(P(D_{a, b}) = 0\) as desired and the result is almost proved.

Note that we’ve not used the independence of the \(A_i\) so far! It’s only necessary to prove that the sums converge to a finite value. To see that we need some sort of condition for that, consider the following: Let \(X\) be a random variable that takes the values \(1, -1\) with equal probability. Let \(A_i = \frac{X}{n}\). Then \(E(A_i) = \) and \(\sum E(A_i^2) < \infty\), but \(\sum A_i\) takes the values \(\infty, -\infty\) with equal probability (and no other values).

But with independence this can’t happen for the following reason: For any random variable we have \(E(X) \leq E(X^2)^{\frac{1}{2}}\) (because \(0 \leq \mathrm{Var}(X) = E(X^2) – E(X)^2\), though it also follows from other less elementary results). So we have \(E(|S_n|) \leq E(S_n^2)^{\frac{1}{2}}\). But \(E(S_n)^2 = \sum\limits_{i, j \leq n} A_i A_j = \sum\limits_{i \leq n} A_n^2\), because \(A_i, A_j\) are independent for \(i \neq j\)  so \(E(A_i A_j) = E(A_i) E(A_j) = 0\).

This means that \(E(|\sum A_i|) = \lim E(|S_n|) \leq \sqrt{\sum E(A_i)^2} < \infty\). Thus, necessarily, \(P(|\sum A_i| = \infty) = 0\).

And now the result is proved.

The proper setting for the above result is really the theory of Martingales, and the proof I gave above is mostly a modified proof of Doob’s Second Martingale convergence theorem, where I’ve used a stronger hypothesis to bound the difference in the inferior and the superior limits.

In fact, for the special case of our coin tossing a stronger statement is true: Suppose \(a_n\) is a sequence of positive numbers such that \(\sum a_n = \infty\) and \(\sum a_n^2 < \infty\). Let \(A_n\) be a sequence of independent random variables with \(A_n = \pm a_n\), taking each value with equal probability. Then for any interval \(a < b\), \(P(a < \sum A_n < b) > 0\).

To see this, write \((a, b) = (x – \epsilon, x + \epsilon)\) by letting \(x = \frac{a + b}{2}\) and \(\epsilon = \frac{b – a}{2}\).

Run the process above for constructing a sequence converging to \(x\) until we have \(N\) such that our choices so far have lead us to \(|\sum_{i \leq N} A_i  – x| < \frac{\epsilon}{2}\) and \(\sum_{i > N} a_i^2 < \frac{\epsilon^2}{8}\). The initial sequence of choices that lead us here happens with probability \(2^{-N}\), and the condition on the sum of the tail guarantees via Chebyshev’s inequality that \(P(|\sum_{i \geq N} A_i| \leq \frac{\epsilon}{2}) \geq \frac{1}{2}\) so  we have

\(P(|\sum A_i – x| < \epsilon) \geq P(\sum\limits_{i=0}^N A_i – x < \frac{\epsilon}{2}) P(\sum\limits_{i > N} A_i) < \frac{\epsilon}{2}) \geq 2^{-N-1} > 0\).

So any value is not only possible but at least somewhat probable.

But this is a true continuous distribution, in the sense that \(P(\sum A_i = u) = 0\) for any value \(u\). To see this, consider the values taken only on a sum of indices. Let \(A(T) = \sum\limits_{i \in T} A_i\). Then \(A(T)\) and \(A(\mathbb{N} \setminus T)\) are independent, so if \(A\) has atoms then both of \(A(T)\) and  \(A(\mathbb{N} \setminus T)\) must also. But if we pick \(T\) to be some subsequence \(t_n\) where \(a_{t_n} < 2^{-n}\) then all assignments of signs produce a unique result, so the probability of achieving any given value is zero. (Thanks to Robert Israel for this proof).

But still, \(\pi\) is a bit of an outlier: The variance of this distribution is of course \(\sum \frac{1}{n^2} = \frac{\pi^2}{6}\), so \(\pi\) is \(\sqrt{6} \approx 2.45\) standard deviations out. I don’t have an exact estimate of how probable that is for this distribution, but again by Chebyshev’s inequality we know that we can’t get a result at least this extreme more than a sixth of the time.

So why do we get this particular value here?

Well it turns out to be an unsurprising result for another reason, which is that it comes from a general technique for producing interesting infinite sums by starting from interesting products.

Suppose we have \(\sum f(n) \frac{1}{n}\) where \(f(n)\) is some multiplicative function. i.e. \(f(mn) = f(m) f(p)\). Then we can write this as \(\prod (1 – f(p_i) \frac{1}{p_i})^{-1}\), with \(p_i\) being the i’th prime. There are issues of convergence we should worry about here, but in the finest Eulerian style we won’t. Both sides converging is probably a sufficient condition. A lot of this can probably be fixed by introducing a \(t\) parameter, assuming \(|t| < 1\) and taking the limit \(t \to 1\) of \(\sum t^n f(n) \frac{1}{n}\) and  \(\\prod (1 – t f(p_i) p_i)^{-1}\), then because mumble mumble analytic continuation the two limits are equal. I haven’t checked the details, so I won’t do this here and will just follow Euler’s convention of assuming infinity isn’t a problem.

To prove this, we’ll consider the following sequence of infinite sums, \(P_0 = \sum\limits_{n \in R_n} f(n) \frac{1}{n} \), where \(R_n\) is the set of numbers not divisible by any of the first \(n\) primes (so \(R_0 = \mathbb{N}\)).

If we split \(R_n\) up into the set of numbers which are a multiple of \(p_{n+1}\) and those that are not, we get the recurrence relationship that \(P_n = \frac{f(p_{n+1})}{p_{n+1}} P_n + P_{n+1}\), by just taking out a single factor of \(\frac{f(p_{n+1})}{p_{n+1}} \) from the components of the sum over the values that are divisible by \(p_{n+1}\). So \((1 – \frac{f(p_{n+1})}{p_{n+1}} ) P_n = P_{n+1}\)

Iterating this, we get \(P_0 \prod\limits_1^\infty (1 – (\frac{a_{p_{n+1}}}{p_{n+1}} ) = \lim\limits_{n \to \infty} P_n = 1\).

i.e. \(\sum f(n) \frac{1}{n} = P_0 = \prod\limits_1^\infty (1 – \frac{f(p_{n+1)}}{p_{n+1}})^{-1} \) as desired.

We can also go from products to sums: If we have \(\prod (1 – b_i )^{-1} \) then we can write it as \(\sum S_n \), where \(S_n\) is the sum of all products of sequences of \(n\) of the \(b_i\) (allowing repetitions).

If we then have \(b_i = f(p) \frac{1}{p_i}\), this becomes \(\prod (1 – f(p_i) \frac{1}{p_i})^{-1}\), the products all become unique, and we have \(\sum \frac{a_n}{n}\) again, where we extend \(f\) to all numbers by multiplying its values on their prime factors.

Using these tools, proving the sequence result for \(\pi\) becomes a matter of (somewhat tedious) computation.

We start with \(\frac{\pi}{4} = \arctan(1) = 1 – \frac{1}{3} + \frac{1}{5} – \ldots\) using the standard power series for arctan. This can be written in the above form with \(f(n) = 0\) if \(n\) is even, \(f(n) = -1\) if \(n \mod 4 = 3\) and \(f(n) = 1\) if \(n = 1 \mod 4\). So we have:

\(\frac{\pi}{4} = \prod (1 – f(p_i) \frac{1}{p_i})^{-1}\).

We can do the same with the standard sum \(\frac{\pi^2}{6} = \sum \frac{1}{n^2} = \prod (1 – \frac{1}{p_i^2})^{-1}\) (because \(n \to \frac{1}{n}\) is certainly multiplicative).

Now, divide the second product by the first and we get:

\(\frac{2}{3} \pi = \frac{1}{1 – \frac{1}{2^2}} \prod\limits_{p_i > 2} \left( \frac{1 – \frac{1}{p_i^2}}{1 – f(p_i) \frac{1}{p_i}} \right)^{-1}\)

The term outside the product comes from the fact that there’s no \(2\) term in our product for \(\frac{\pi}{4}\) and is equal to \(\frac{4}{3}\), so rearranging we get \(\frac{\pi}{2} = \prod\limits_{p_i > 2} \ldots\).

The term inside the product actually splits up fairly nicely: Because we can factor \(1 – \frac{1}{p_i^2} = (1 – \frac{1}{p_i})(1 + \frac{1}{p_i})\). The bottom is now one of these two terms, so this factors into whichever of the two the bottom is not. i.e. as \(1 + f(p_i)\).

So from this we conclude that

\(\frac{\pi}{2} = \prod_{p_i > 2} (1 + f(p_i))^-1\), or \(\pi = \frac{1}{1 – \frac{1}{2}}  \prod_{p_i > 2} (1 + f(p_i))^-1\).

If we define \(g\) as \(g(2) = 1\), \(g(p) = -f(p)\) for \(p\) an odd prime, and extend to composite numbers multiplicatively, this then becomes  \(\pi = \prod (1 – g(p_i))^{-1}) = \sum g(n) \frac{1}{n}\), which was our desired sum.

This proof more or less follows Euler’s, via the translation by Springer (which I borrowed off a friend and would honestly recommend you do the same rather than paying more than £100 for). The details are a bit different mostly because it was the only way I could follow them – in particular he tends to eschew the detailed algebra and just manipulate examples – and unlike him I feel guilty about the lack of precise analysis of convergence, but it’s certainly derived from his.

The product representation also allows us to strengthen our original result: Not only can any number be the limit of the sum by choosing primes appropriately, we can insist that the signs are multiplicative in the sense that the sign assigned to \(mn\) is the product of the signs assigned to \(m\) and \(n\).

The reason for this is that \(\sum \frac{1}{p_i}\) diverges. As a result, \(\prod\limits_{p_i > n} (1 + \frac{1}{p_i}) = \infty\) and \(\prod\limits_{p_i > n} (1 – \frac{1}{p_i}) = 0\). This means we can just repeat our original construction with the product instead of the sum: Assign positive signs as long as the partial product is less than the desired result, assign negative ones as long as it’s greater. The result is a product converging to the desired value, which we can then turn into a sum converging to the desired value with multiplicative signs.

So that more or less completes the journey down this particular rabbit hole: You can get any number by manipulating the signs, you should expect most choices of numbers to be at least reasonably plausible, and we have some useful machinery for picking good choices of signs for leading to specific numbers.

Whether or not it’s still surprising is up to you.

(This post turned out to be a ridiculous amount of work to write. If you want more like it, or just want to say thanks, I do have a Patreon for this blog. Any donations you want to send my way would be warmly appreciated)

This entry was posted in Numbers are hard on by .

Faster random sampling by handling rejection well

Attention conservation notice: You probably don’t have the problem this post solves. It might be interesting anyway.

As part of some ongoing development for Hypothesis, I find myself in need of a well optimised solution to the discrete sampling problem: We have weights \(w_1, \ldots, w_n\) and we want to produce a number in \(\{1, \ldots, n\}\) such that \(i\) is drawn with probability proportional to \(w_i\).

All solutions to this problem are intrinsically \(O(n)\): If you have \(w_i = 1\) for \(i \neq j\) and \(w_j = 10^6\) then you need to inspect \(j\) to know whether to generate \(i \neq j\) often or rarely.

You can however use that \(O(n)\) work once to build a data structure for repeated drawing. For example you can use the alias method to build a data structure that lets you do future draws in \(O(1)\).

Which is great and all if you’re expecting to repeatedly draw from the same distribution over and over again. Unfortunately that’s not what my workload looks like – the distributions used are many and varied. Distributions will be reused, but not in any easy to predict manner.

You can of course do cached lookup to an alias method sampler. This works, but if your cache hit rate gets too high (which it often will be in my workload) this doesn’t actually improve matters. Additionally, depending on how good your caching is and how much is in there, it might not be much cheaper to look up something in the cache  than it is to just create a new one from scratch.

So what we’d like is a method that is significantly cheaper to initialise. Fortunately there is one, it’s called rejection sampling!

You just need to calculate \(W = \max w_i\). Now you iterate the following procedure:

  1. Pick \(i\) uniformly at random
  2. With probability \(\frac{w_i}{W}\) return \(i\)
  3. Go to 1

This is very cheap to initialise but not so great in terms of run time: If the weights are uniform then it’s optimal (because you always return \(i\) in step 2), but if they’re very far from uniform then you’re going to make a lot of trips (potentially \(O(n)\) in the case where one weight dominates everything else) through that loop.

But there turns out to be a hybrid solution that works pretty well and gets us most of the cheapness of initialising rejection sampling with most of the performance of our better sampling methods, with a tuning parameter deciding how much of each we want.

Pick an integer parameter \(R > 0\) (good choices for \(R\) are probably \(1\) or \(2\)) and then run the following algorithm:

  1. Pick \(i\) uniformly at random
  2. With probability \(\frac{w_i}{W}\) return \(i\)
  3. If we’ve reached this point fewer than \(R\) times, go to 1.
  4. Look up or initialise a new sampler for these weights and draw from that.

So we perform the cheap rejection sampling until it’s obvious that it’s not helping much and then move to a proper sampling method.

For cases where we’re near the optimum case for rejection sampling we’ll do well and mostly use it, for the rest of the cases we’ll spend a bit of an extra cost. Potentially we could also do some extra book keeping – if we tend to find rejection sampling never works we could just turn off that phase of the algorithm entirely.

Experiments so far in the thing I’m implementing suggests that this idea provides a modest improvement compared to initializing the sampler (I’m actually using a data structure from Succinct Sampling from Discrete Distributions, on grounds of it being substantially cheaper to initialize at the cost of being slightly more expensive to run. Plausibly this would give a much bigger improvement if I were using the alias method).

This probably isn’t that useful a trick. The work load I’m optimising for is a bit odd, and the performance improvements under that workload are only modest. Additionally, the baseline is already fairly fast. But it’s not something I’d seen before and it seemed worth noting down.

(Something something Patreon something. It feels a bit cheap to plug it here given the relatively niche nature of this post, but hey I was doing the Hypothesis research that lead to it for free, so I guess it balances out?)

This entry was posted in programming on by .

On persuasion and anger

Humans are messy and complicated, and there are very few universal rules of human behaviour, so when you encounter a useful one it’s worth paying attention to it, because it’s basically gold dust.

There’s one I’ve had to learn the hard way. Most people seem to know it when you point it out, but very few people seem able to bear it in mind in their actions.

The rule is this: The angrier you make someone at you while trying to persuade them of something, the less likely they are to be persuaded of that thing and the harder they will be to persuade of it in future.

That’s it that’s the whole rule.

Importantly, the rule applies no matter who is correct. Sadly, truth is not nearly as relevant to persuasion as one might hope.

Obviously it’s not actually completely universal – there are always going to be a few exceptions, whether they are people, subjects or specific circumstances – but it’s close enough to universal that I think it’s worth just treating it as true until you’ve got overwhelming evidence that you’re in one of those special cases. Certainly I’ve seen it play out this way a lot, from both sides of the argument.

A little bit of introspection will probably give you plenty of evidence of your doing this. Think of someone condescendingly explaining your error to you, insulting your intelligence along the way. Are you likely to want to listen? What if you’re actually wrong? Will you get the point faster or slower than if they’d just said “Hey, are you sure about that? I think X, Y and Z”.

I was going to include some examples here, but the problem with doing that is that it will just make the groups I use as examples angry and they’ll not be persuaded of this point. Also it’s somewhat unnecessary given that my primary example is everyone.

So if you can’t think of examples of this yourself, feel free to disregard this post, but I wish you wouldn’t because I think it’s really important if you have any desire to change peoples’ belief.

Because so many attempts to evangelise or persuade completely ignore this and instead go straight for making the person you’re talking as angry as possible.

If you lecture someone, or treat them like they’re a bad person for being wrong, or declare your superiority over them in some way, you will make them angry.

If you try and engage with them calmly, try to understand where they’re coming from, and work with them to find common ground and identify where your disagreement lies, you probably won’t. You won’t necessarily succeed at persuading them, but you’re a lot more likely to than if you told them they were an immoral idiot and the correctness of your position was obvious to anyone with two brain cells to rub together.

Most attempts to persuade on issues that people feel strongly about look a lot more like the first form (“How can you disagree with my obviously correct position you monster?”) than the second (“I’m not sure I get where you’re coming from. Can we discuss this?”), despite it being an entirely ineffective strategy.

I’m not totally sure why this happens, but I can think of a couple of plausible reasons:

  • Persuading people is hard. Most attempts are going to fail, so it’s hard to get feedback on what actually works because most of the time nothing does.
  • When we are angry, it takes a lot of self-control to behave calmly, and behaving angrily will tend to cause people to reciprocate.
  • Engaging people constructively is intrinsically hard and doesn’t feel that rewarding. A good rant will make you feel good about yourself and is much lower effort. If you don’t really expect to succeed, why not go for the easy option that makes you feel better?
  • We socially reward posturing and dominance much more than we do a successful conversion. The people who already agree with you (who probably make up a large fraction of your social group) are much more likely to go “Nice put down! You really showed that guy!” than they are “Well done on patiently sitting there educating that person”.
  • There may be other non-persuasion benefits to getting someone angry (e.g. what it reveals to the audience. Though I think this one often backfires)

So we’ve got an easy option that will make you feel good and a hard option that won’t. It’s perhaps not actually that surprising which one we tend to pick.

You probably think I’m going to tell you that you have to do the hard thing, but in all honesty I’m not sure it’s really worth it most of the time.

By and large, persuasion will only happen when someone is ready to be persuaded. If someone seems able and willing to be persuaded and tries to engage you in productive dialogue about it and you feel up to reciprocating, go for it. It might do some good, or at least you might get an interesting conversation out of it.

The rest of the time though? It’s up to you whether you really want to or not. I’m increasingly disinclined to, myself.

But I do recommend doing less of the easy thing, or at least be aware of what it is you’re really doing: If you find yourself shouting at someone, maybe stop being surprised that it doesn’t result in them agreeing with you no matter how loudly or repeatedly you do it, or how obviously correct your position is.

If you want to keep doing it anyway because you’re angry and it feels good, go for it I guess. I’m probably not going to be able to persuade you not to.

(This post was done as a Patreon request, but I won’t try to persuade you to donate beyond noting that here, regardless of whether it would make you angry)

This entry was posted in Uncategorized on by .