How to Explain Anything to Anyone

Context: I keep promising that I’m going to write a blog post about explaining things and I’ve tried three times, but it keeps getting unwieldy, so this is an attempt at a short blog post about how to explain things.

Intended Audience: Everybody.

Epistemic Status: I am pretty confident that this is true, but it’s necessarily an oversimplification.


There are exactly three things you need to do in order to explain things well:

  1. Decide what you want to explain.
  2. Find out what the listener already knows.
  3. Express the first in terms of the second.

People routinely miss anywhere between one and all three of these steps, which is unfortunate because every one of these steps is vital if you want your explanation to do anything useful.

I will now briefly elaborate on each step.

Decide What You Want to Explain

This may seem easy but it’s not. This section is the important bit.

The key thing that people miss is that you are not trying to explain everything that is possibly relevant, you are trying to find the subset that is useful to someone.

Take my research. I’m a PhD student, so people often ask me what my research is about. They usually do this with a note of trepidation in their voice hinting “I’m too stupid to understand this, but…”. That is because they expect that as a PhD student I am bad at explaining things to normal people.

The following is what I say:

I work on a problem called test-case reduction. That is, you’ve got some program with a bug, and you’ve got something triggering a bug. Say a big document crashes word or something. Debugging these problems is annoying when the thing triggering the bug is large and complicated, so to understand the bug we want to try to trigger it with something small and simple. Doing that by hand is tedious, so I work on tools for automating the process of turning that initial large, complicated, example into something smaller and simpler that is easier for a person to debug.

I could talk about why this is hard, or algorithmic complexity issues, or shortlex optimisation, and if I did that then the person would nod and smile and go find someone else to talk to because none of that is relevant to them.

A good rule of thumb here is that you should always explain the problem, not the solution, unless the person you are talking to already understands the problem. This is also the approach I took with my neural networks post – it makes no attempt to really explain what a neural network is, only the problems they solve.

Find Out What the Listener Already Knows

Ask them. I usually start my explanations of my research with the question “How familiar are you with software testing?” followed by “Have you heard of QuickCheck or Hypothesis?” if they say they are fairly familiar.

If you can’t ask them because it’s an article, decide what your target audience is and assume they’re on the lower end of understanding for that audience. Ideally find some beta readers who are in that target audience.

Edit to add: David Jones pointed out on Twitter something you can do when writing that helps here, and that I don’t do enough, which is to be explicit about who the article is for so that your audience can self-select in or out. My “Attention Conservation Notice” and the new “Intended Audience” classifiers help a bit here, but it can be useful to have a longer paragraph saying “This is what you’ll need to already know to get much out of this post. Here’s a link to some background if you don’t know that.”

Express the First in terms of the Second

In order to explain things to someone you must do so in terms of concepts they already have. This may require you to give them concepts that they do not already have before explaining the thing you actually want to explain. The term I use for this is “the abstraction stack” – you have a stack of abstract concepts each building on top of the ones below it.

For example, in my “what is a neural network” post, I had to explain what machine learning was before I could explain what a neural network was. Indeed, most of the post was about explaining what machine learning was. Similarly, when explaining Hypothesis, I usually spend more time explaining software testing to lay people than I do explaining Hypothesis or property-based testing.

Broadly, the goal of explanation is to help someone build up their abstraction stack until it contains the thing you want to explain.

The following two techniques are the most useful things I can tell you about how to do that:

  1. Make sure they understand the current level before putting something on top of it. Generally this means you need to explain the foundational things more than the one you are explaining.
  2. Every time you introduce a new concept, illustrate it with examples of the concept.

How do people get this wrong?

The main thing that causes people to get this wrong is what’s called “curse of knowledge” – you tend to discount things you already understand as much easier than they actually are. This causes people to skip over details.

For example, suppose someone at a party asked me what Hypothesis was. I could say something like:

It does a thing called property-based testing, which lets you parametrize a software test by a source of input data. This comes from a tool called QuickCheck, whose goal was to bridge some of the gap between software testing and formal verification.

This is a perfectly reasonable explanation that will almost certainly fail to land, because I have assumed people understand all sorts of things that they almost certainly don’t.

Another way this can fail is to rush over bits that are obvious to you. For example I could have explained neural networks as follows:

Machine learning is a way of making automated decisions. Neural networks use multiple layers of “neurons” to try to make these decisions better. The stacks of layers let you build up complex features that understand structural information about the data that flatter machine learning techniques will fail to pick up on.

This is true and if the listener already understands machine learning is probably fine, but because of the amount of elided context they will probably get a really bad idea about what is possible and assume machine learning is much more powerful than it actually is.

How can you start getting it right?

The following three rules will put you on the right path:

  1. Begin by explicitly deciding what you want people to get out of your explanation. Try to keep this as small as possible.
  2. Begin all explanations either by asking what they already understand or by considering a model recipient of the explanation.
  3. Think explicitly in terms of the abstraction stack between your end goal and where the recipient is now.

These won’t result in perfect explanations, but if you are not already doing these things then fixing that will almost certainly result in improved explanations.

Caveats

Despite the title there are a bunch of limitations to this method – not cases where it doesn’t apply, but cases where it on its own is insufficient.

Danil Suits points out on Twitter that there’s nothing in the above for helping people unlearn prior flawed understanding. That’s true. I don’t currently have any good advice for that. I think all of the above still helps in that scenario, but you need an entire separate set of skills and techniques for debugging failure to understand, and you may also need to be good at persuading people for the cases where they refuse to understand or actively disagree.

Another thing I’ve since realised is that this method assumes that they are able to understand the thing you are explaining – someone with cognitive difficulties might struggle with that, or require reframing of some of the abstractions that you used to understand the problem.

It also assumes that the recipient is motivated to understand. I struggle with this when teaching – I can generally do a good job of framing things in terms of reasons why someone should care about them, but at some point you hit a wall when someone is fundamentally uninterested in learning, and I don’t know what to do about that (part of why I’m not a teacher I guess).

This entry was posted in Uncategorized on by .

What is a neural network?

A friend asked for articles to explain neural networks to business people, and I couldn’t find any articles I liked, so I decided to write one. This was originally written in my notebook, but it fits the format here better, so I’ve moved it over.


Neural networks are very hot right now, and people seem to treat them as a magic black box that can solve any problem. Unfortunately, while neural networks are a powerful and useful tool, they are much more limited than their popular representation would suggest, and I’d like to give you a bit of a sense of what they are, why they’re useful, and where that use falls down.

The following are the most important things to understand about neural networks:

  • Neural networks are an interesting implementation technique for a well studied set of problems.
  • Neural networks do not allow a machine to “think” in any general sense – they are mostly useful for implementing simple decision procedures and predictive models, but they do not do any general reasoning or planning (they may be used as components in systems that do though).
  • Neural networks are only as good as the data you give them.

What do neural networks do?

The general problem that neural networks solve is called machine learning. This is a somewhat misleading term because how machines “learn” in this sense doesn’t map that well to how humans learn. Machine learning is really just a particular type of automated statistics that can be used to make simple predictions and decisions based on similarity to previously observed data. There are many ways to do machine learning, and any given instance of it is a particular algorithm, a precise set of rules that describe how the computer “learns” in this particular problem.

Any machine learning always starts with some set of data that we “train” on. In the same way that “learning” is misleading, “training” is also misleading. The analogy is that we are teaching the machine to do something by showing it lots of examples, but the reality is more that a program does statistics to determine patterns in the data that it can use to make predictions later.

Typically we present this to the machine learning algorithm as a collection of features, which are a way of breaking down the objects we want to work on into a set of numeric or categorical (i.e. one of a small number of possibilities) properties. For example:

  • We might represent an image as red-green-blue integer values for each pixel.
  • We might represent a person in terms of “age”, “country of birth”, “gender”.
  • We might represent a piece of text as a “bag of words”, which counts the number of times each word appears in it.

All decisions made by the machine learning process are based only on these features – two things that are somehow different in a way not captured by their features will still get the same result. Features are not intended to fully represent the things we study, but capture some aspects of it that we think are important and should be sufficient to predict the results we want to know. The process of turning real world things into useful features is something of an art, and is often as or more important than the actual machine learning you do.

Once we have turned our data set into features, we train on it. This is the process of taking our data set and turning it into a model that we will use as the basis of our future decisions. Sometimes we do this with the data presented entirely up front, and sometimes the algorithm is designed so that it can learn as you go. The former is much more common, especially with neural networks, but both approaches are possible.

One important difference between machine and human learning is that machine learning tends to need a lot more data than you would expect a human to. Often we train our machine learning algorithms on hundreds of thousands, or millions of data points, where you might expect a human to learn after only a few. It takes a lot longer for most machine learning approaches to learn what a picture of a dog looks like than it does a toddler. Why is complicated, and how to do better is an open research problem, but the main reasons are:

  • the toddler has a lot of built in machinery about how to do image processing already, while the machine learning system has to learn that from scratch.
  • the machine learning system is much more general and can learn things that humans are bad at as easily as it can learn things that humans are good at – e.g. a toddler would struggle to do much with actuarial data that a machine learning algorithm would find easier than recognising dogs.

The need for these large data sets for training is where “big data” comes in. There are a lot of arguments as to what should count as big data, but a good rule of thumb is “if you can fit it on a single computer then it’s not big data”. Given how large computers can affordably get these days, most things that people call big data probably aren’t.

Anyway, given a set of input data, there are roughly three types of commonly studied machine learning that we can do with it:

  • Supervised learning takes some input data and some labels or scores for it, and tries to learn how to predict those labels or scores for other similar data. For example, you might want to classify an email as “spam” or “not spam”, or you might want to predict the life expectancy of someone given a set of data about their health.
  • Reinforcement learning is for making decisions based on data. Based on the outcome of the decision, you feed back into the process with either a “reward” or a “punishment” (I want to emphasise again that this is a metaphor and the algorithm is not in any meaningful sense thinking or able to experience pleasure or pain), and the algorithm adjusts its decision making process to favour decisions that are similar to previous ones that have worked well and different from previous ones that have worked badly. e.g. you might use this for stock picking, and reward the algorithm for picking stocks that do well and punish it for picking stocks that do badly.
  • Unsupervised learning builds a model of the “shape” of the data. I won’t talk about this too much, but if you’ve seen those deep dream pictures or “I trained a bot on this corpus of text and this is what I got” articles, that is usually what’s going on here (although the much more common case, particularly with the articles, is that someone has just made it up and no machine learning was involved at all): The system has built a predictive model of the data and is used to randomly generate something that matches that model.

Almost all applications of machine learning you hear about are one of these three things, possibly with some additional logic layered on top. For example AlphaGo, Google’s Go playing AI, is roughly a combination of supervised learning and unsupervised learning with a rules based system that describes the rules of Go, and uses the output of the machine learning to choose moves.

Neural networks are generally not used for reinforcement learning. I’m unclear on to what degree this is due to intrinsic limitations and to what degree it’s not just well supported in the tooling but the way we usually build neural networks requires a large batch training process, so applications of neural networks will generally be supervised or unsupervised learning.

There are a very large number of different approaches you can take to these problems, of which neural networks are only one of them. An easy to understand and classic approach to machine learning is the use of decision trees: Simple rules of the form “If (this feature has this range of values) then (do this) else (do this other thing)”. These are very easy to learn – you pick some feature that splits the data set well, break the data set into two parts based on that, and then try again on the smaller parts. If nothing works particularly well, you add a rule that says “Predict (most common outcome)” (I am eliding a lot of details here that don’t matter to you unless you want to actually implement these ideas).

One classic limitations that machine learning struggles with is in learning patterns that require understanding complex aspects of the data that are not easily understood from a small number of features. If you imagine image processing, to the computer a small image is just a list of a couple hundred thousand numbers. A human would struggle to get anything useful from that too!

Neural networks attempt to overcome this by learning in layers. Each layer consists of something that looks a bit like unsupervised learning and a bit like supervised learning, where each layer “learns” to extract high level features from the one below. In the image case you start with the bottom layer that looks at the raw pixel data, and then it might e.g. try to identify patterns of lines in that data. Each layer is essentially a new set of features that takes the overly detailed previous layer’s features and tries to turn it into a representation that it can more easily work with. This allows us to combine a very simple learning technique (the “neurons” that are really just a very simple bit of machine learning that tries to turn a set of features into a score between zero and one) into a much more complex one that is able to handle high level features of the data. This is where the “deep” in “deep learning” comes from – a deep neural network is one with many layers.

The result of this is that where a “shallower” machine learning system might suffer from a problem of not being able to see the wood for the trees, neural networks can make predictions based on a more structured representation of the data, which makes things obvious that were hard for the algorithm to see from the more fine-grained representation. Some times these structured representations will be ones that are obvious to humans (e.g. lines in images), but often they will not be, especially (e.g. they capture some strategic aspect of the Go board).

Why are neural networks cool right now?

Neural networks are not at all new technology, but are recently seeing a revival for a number of reasons:

  • We have a lot more data now, which allows us to paper over any limitations by just training it more.
  • We have much faster computers now, particularly as a lot of neural network training can be made highly parallel (that means that you can break it up into small chunks that can be run at the same time without waiting for each other, then combine the results).
  • We have made a number of incremental improvements to the algorithms that allow us to speed up our training and improve the quality of results.

This has allowed us to apply them to problems where it was previously infeasible, which is the main source of all of the current progress in this field.

What is going to go wrong when I use neural networks?

This is important to understand, because things will go wrong. The main things that it is important to understand about this are:

  1. It is much more important to have good input data than good algorithms, and gathering good input data is expensive. Machine learning is only as good as its training, and if your input data is biased or missing important examples, your machine learning will perform badly. e.g. A classic failure mode here is that if you train image recognition only on white people, it will often fail to see people of colour.
  2. Compared to a human decision maker, neural networks are fragile. There is an entire class of things called “adversarial examples” where carefully chosen trivial changes that a human wouldn’t even notice can cause the neural network to output an entirely different answer.
  3. Even without adversaries, machine learning algorithm will make stupid goofs that are obvious to a human decision maker. It will do this even if it is on average better than a human decision maker. This is simply because different things are obvious to machines and humans. Depending on how visible these decisions are, this will probably make you look silly when it happens.

When should I use neural networks?

First off, you should not be making a decision about whether to use neural networks if this article is teaching you new things. You should be making a decision on whether to usemachine learning. Leave the decision of what type you should be using to an expert – there is a good chance they will choose neural networks (tech people are very trend driven), but there might be something simpler and better suited for your problem.

Roughly the following rules of thumb should help you decide whether to use machine learning:

  1. Is this a problem that would benefit from being able to make lots of fairly constrained decisions very fast? If no, then maybe talk to an expert (some problems don’t look like this on the face of it but can still benefit – e.g. translation isn’t obviously of this form, but it can benefit from machine learning), but you’re probably out of luck and even if you’re not this is going to be a huge and expensive research project.
  2. Could you just write down a short list of rules that are sufficient to make those decisions? If yes, just do that instead of trying to learn them from data.
  3. If you had an averagely intelligent human with maybe a couple of days of on the job training and enough background knowledge to understand the problem making those decisions, would their answers be good enough? If no, you’re really going to struggle to get your machine learning to be good enough.
  4. Think through the failure modes of the previous section. Do you have a plan in place to avoid them, or to mitigate them when they occur? Is the cost of that plan worth the benefits of using machine learning?
  5. Reflect on the following question: When someone in the media says “Why did you use a machine rather than a person here?” and your answer is “It was cheaper”, how is it going to play out and are you happy with that outcome? If no, your problem is probably politically a poor fit for machine learning.

If your problem passed all of those tests, it might well be amenable to machine learning. Go talk to an expert about it.

This entry was posted in Uncategorized on by .

My Position on Functional Programming

Note: I’m going to be lazy and often use “functional programming” to mean “statically typed functional programming”. Pretend I’m saying “algebraic programming” if that makes you feel better.

I seem to get cited a lot as an example of someone who has “abandoned” functional programming. I thought it would be useful to have an explanation in one place that people can just link to rather than having to explain my position each time the subject comes up.

I do not intend this piece to convince anyone of anything, it’s just documentation of who I am in this context and where I stand.

Background

I learned to program in Standard ML during my pure mathematics degree, but not to any very high standard – I understood the language tolerably well, but didn’t really know much (if anything) about software development. When I got out of university, I got a job as a Java developer. I was quickly frustrated by the language and in my spare time also learned Haskell and Scala.

I have never shipped production Haskell code. I have shipped some production Scala code, and some of my code was included in the Scala standard library and compiler (but not a lot, and it wasn’t very good).

have done a fair bit of writing about both Haskell and Scala, and people seem to have found that writing useful. In addition, I was at one point a fairly prominent (read: loud) member of the Scala community, and for better or for worse had some influence on the development of the language. CanBuildFrom was indirectly my fault (I proposed it as a thought experiment, and argued against its actual inclusion).

I would describe myself as tolerably competent at the details of both languages. I’m rather rusty when it comes to write them, but I can do it. Due to my relatively restricted experience I do not have much big picture knowledge of software development in them.

I formally decided to stop writing Scala in 2009 (I think). Since then most of my work has been in Python or Ruby, with a little bit of C, a little bit of Rust, and the occasional dabbling in other languages.

What I think of FP

Short version: Fine but over-hyped. Not worth dealing with the community.

You can probably infer the rest of what I’m going to say, but if if you can’t then long version now follows.

There are a lot of claims of statically typed functional programming’s obvious superiority, up to and including claims that people who don’t use it are “insane” or “unethical”. I think people making these claims are bad and they should feel bad. I refer you to Hillel for more details

Broadly speaking, I like having a type checker. I happen to mostly be writing dynamically typed languages at the moment, but that’s mostly coincidence. One of the reasons I’d like to be writing more Rust (despite being a bit lukewarm on the language in many ways) is that I like having the type checker. I just think they’re oversold, and the evidence that they are in any way essential is non-existent to unconvincing.

I’m not a big fan of Haskell’s insistence on purity, or Scala’s design in general. I think both languages are large and complex enough that any decision to use them or not doesn’t come down to any one thing about them (this is true of essentially every mature programming language. It almost can’t be otherwise), so a rejection of them is not a rejection of FP per se. I keep meaning to get around to giving OCaml another try – Rust is nodding in the right direction, but isn’t quite the right sort of thing for scratching this itch.

The big problem I have with FP is the community. That’s not to say that only bad people do FP – this is absolutely not the case. The overwhelming majority of people I know who do FP are nice, kind, and interesting people. Similarly anyone telling you that FP is for academics and elitists is full of it – I know many people who are into FP for extremely practical reasons.

The problem is that a nice community is not one that contains nice people, it’s one where your interactions with the community are nice. If the community is almost entirely nice people but there is one jerk who you are guaranteed to interact with, then it is not actually a nice community.

In this sense, the FP community is awful, and I have absolutely no patience for it or inclination to deal with it. There is a relatively small number of you who are ruining it for others (or at least for me, and people within the community I know accept that they are problems and do not know what to do about it), and the community seems unable to self-police on this front.

What should you do with this information?

Link to this blog post rather than @-ing me into conversations about this on Twitter. Otherwise, probably nothing.

This entry was posted in Uncategorized on by .

More blogging over there

I recently created a notebook site and a lot of my recent writing has been happening over there.

It’s got a bit of a different character to this blog, and I do intend to return to better long-form blogging over here, but I’ve been a bit rushed off my feet by a variety of things in the last month so I’ve not had a huge amount of time to work on more “proper” blog posts.

In the long run I suspect I will be merging this blog into the notebook in some manner – I’m finding it a lot more pleasant to write in than WordPress by far – but for now I will attempt to balance the two.

This entry was posted in Uncategorized on by .

The Can Opener Principle

Context: What’s that? I have a paper deadline in 28 hours? Why don’t I write about literally anything else!

Epistemic Status: It’s a normative theory.


A thing I think about a lot because my brain is completely normal and as a result I spend a lot of time thinking about completely normal things is the role of consent in the ethics of explanation.

The following is a line from the Hypothesis code of conduct:

It’s OK not to want to know something. If you think someone’s question is fundamentally flawed, you should still ask permission before explaining what they should actually be asking.

The Hypothesis Code of Conduct

This is a response to The XY Problem, which is a legitimate problem but often used as an excuse for explaining to someone why their question is bad and they should feel bad. I do think there are cases where non-consensual explanation may be ethically justified, or even mandatory, but they are generally things of the form “You are causing harm by your ignorance and I need you to understand this”.

That’s not to say that in normal interactions you need to bend over backwards to make sure that people are consenting to every word you say to them, but it’s worth checking in with someone before you launch into a significant attempt to educate them.

A thing I realised recently is that it’s not just consent that’s important, it’s informed consent. If you are explaining things to someone, then hopefully you understand it better than they do, which means that you have a burden of responsibility to help guard against infohazards.

This was prompted by a conversation in which a friend who had previously mostly written Ruby was asking me about Python’s class system. We got a while down the rabbit hole before I realised how bad this was going to get and said the following:

I am happy to give you as much information on this subject as you want me to, but my considered advice is that my life has not been made better by the fact that I understand this and neither will yours be.

I think the following is a nice framing of this sort of interaction:

Do not hand a can opener to someone until you are sure that they understand that the can they are holding is full of worms

https://twitter.com/DRMacIver/status/1032882236918517761

It’s explanation as gentle knowledge editing rather than explanation as gate keeping: If they do understand that they’re holding a can of worms (maybe it’s for their garden?), handing them the can opener is fine, but as the current holder of the can opener it is your responsibility to make sure they know what they’re letting themselves in for.

This entry was posted in Uncategorized on by .