Author Archives: david

What is a neural network?

A friend asked for articles to explain neural networks to business people, and I couldn’t find any articles I liked, so I decided to write one. This was originally written in my notebook, but it fits the format here better, so I’ve moved it over.


Neural networks are very hot right now, and people seem to treat them as a magic black box that can solve any problem. Unfortunately, while neural networks are a powerful and useful tool, they are much more limited than their popular representation would suggest, and I’d like to give you a bit of a sense of what they are, why they’re useful, and where that use falls down.

The following are the most important things to understand about neural networks:

  • Neural networks are an interesting implementation technique for a well studied set of problems.
  • Neural networks do not allow a machine to “think” in any general sense – they are mostly useful for implementing simple decision procedures and predictive models, but they do not do any general reasoning or planning (they may be used as components in systems that do though).
  • Neural networks are only as good as the data you give them.

What do neural networks do?

The general problem that neural networks solve is called machine learning. This is a somewhat misleading term because how machines “learn” in this sense doesn’t map that well to how humans learn. Machine learning is really just a particular type of automated statistics that can be used to make simple predictions and decisions based on similarity to previously observed data. There are many ways to do machine learning, and any given instance of it is a particular algorithm, a precise set of rules that describe how the computer “learns” in this particular problem.

Any machine learning always starts with some set of data that we “train” on. In the same way that “learning” is misleading, “training” is also misleading. The analogy is that we are teaching the machine to do something by showing it lots of examples, but the reality is more that a program does statistics to determine patterns in the data that it can use to make predictions later.

Typically we present this to the machine learning algorithm as a collection of features, which are a way of breaking down the objects we want to work on into a set of numeric or categorical (i.e. one of a small number of possibilities) properties. For example:

  • We might represent an image as red-green-blue integer values for each pixel.
  • We might represent a person in terms of “age”, “country of birth”, “gender”.
  • We might represent a piece of text as a “bag of words”, which counts the number of times each word appears in it.

All decisions made by the machine learning process are based only on these features – two things that are somehow different in a way not captured by their features will still get the same result. Features are not intended to fully represent the things we study, but capture some aspects of it that we think are important and should be sufficient to predict the results we want to know. The process of turning real world things into useful features is something of an art, and is often as or more important than the actual machine learning you do.

Once we have turned our data set into features, we train on it. This is the process of taking our data set and turning it into a model that we will use as the basis of our future decisions. Sometimes we do this with the data presented entirely up front, and sometimes the algorithm is designed so that it can learn as you go. The former is much more common, especially with neural networks, but both approaches are possible.

One important difference between machine and human learning is that machine learning tends to need a lot more data than you would expect a human to. Often we train our machine learning algorithms on hundreds of thousands, or millions of data points, where you might expect a human to learn after only a few. It takes a lot longer for most machine learning approaches to learn what a picture of a dog looks like than it does a toddler. Why is complicated, and how to do better is an open research problem, but the main reasons are:

  • the toddler has a lot of built in machinery about how to do image processing already, while the machine learning system has to learn that from scratch.
  • the machine learning system is much more general and can learn things that humans are bad at as easily as it can learn things that humans are good at – e.g. a toddler would struggle to do much with actuarial data that a machine learning algorithm would find easier than recognising dogs.

The need for these large data sets for training is where “big data” comes in. There are a lot of arguments as to what should count as big data, but a good rule of thumb is “if you can fit it on a single computer then it’s not big data”. Given how large computers can affordably get these days, most things that people call big data probably aren’t.

Anyway, given a set of input data, there are roughly three types of commonly studied machine learning that we can do with it:

  • Supervised learning takes some input data and some labels or scores for it, and tries to learn how to predict those labels or scores for other similar data. For example, you might want to classify an email as “spam” or “not spam”, or you might want to predict the life expectancy of someone given a set of data about their health.
  • Reinforcement learning is for making decisions based on data. Based on the outcome of the decision, you feed back into the process with either a “reward” or a “punishment” (I want to emphasise again that this is a metaphor and the algorithm is not in any meaningful sense thinking or able to experience pleasure or pain), and the algorithm adjusts its decision making process to favour decisions that are similar to previous ones that have worked well and different from previous ones that have worked badly. e.g. you might use this for stock picking, and reward the algorithm for picking stocks that do well and punish it for picking stocks that do badly.
  • Unsupervised learning builds a model of the “shape” of the data. I won’t talk about this too much, but if you’ve seen those deep dream pictures or “I trained a bot on this corpus of text and this is what I got” articles, that is usually what’s going on here (although the much more common case, particularly with the articles, is that someone has just made it up and no machine learning was involved at all): The system has built a predictive model of the data and is used to randomly generate something that matches that model.

Almost all applications of machine learning you hear about are one of these three things, possibly with some additional logic layered on top. For example AlphaGo, Google’s Go playing AI, is roughly a combination of supervised learning and unsupervised learning with a rules based system that describes the rules of Go, and uses the output of the machine learning to choose moves.

Neural networks are generally not used for reinforcement learning. I’m unclear on to what degree this is due to intrinsic limitations and to what degree it’s not just well supported in the tooling but the way we usually build neural networks requires a large batch training process, so applications of neural networks will generally be supervised or unsupervised learning.

There are a very large number of different approaches you can take to these problems, of which neural networks are only one of them. An easy to understand and classic approach to machine learning is the use of decision trees: Simple rules of the form “If (this feature has this range of values) then (do this) else (do this other thing)”. These are very easy to learn – you pick some feature that splits the data set well, break the data set into two parts based on that, and then try again on the smaller parts. If nothing works particularly well, you add a rule that says “Predict (most common outcome)” (I am eliding a lot of details here that don’t matter to you unless you want to actually implement these ideas).

One classic limitations that machine learning struggles with is in learning patterns that require understanding complex aspects of the data that are not easily understood from a small number of features. If you imagine image processing, to the computer a small image is just a list of a couple hundred thousand numbers. A human would struggle to get anything useful from that too!

Neural networks attempt to overcome this by learning in layers. Each layer consists of something that looks a bit like unsupervised learning and a bit like supervised learning, where each layer “learns” to extract high level features from the one below. In the image case you start with the bottom layer that looks at the raw pixel data, and then it might e.g. try to identify patterns of lines in that data. Each layer is essentially a new set of features that takes the overly detailed previous layer’s features and tries to turn it into a representation that it can more easily work with. This allows us to combine a very simple learning technique (the “neurons” that are really just a very simple bit of machine learning that tries to turn a set of features into a score between zero and one) into a much more complex one that is able to handle high level features of the data. This is where the “deep” in “deep learning” comes from – a deep neural network is one with many layers.

The result of this is that where a “shallower” machine learning system might suffer from a problem of not being able to see the wood for the trees, neural networks can make predictions based on a more structured representation of the data, which makes things obvious that were hard for the algorithm to see from the more fine-grained representation. Some times these structured representations will be ones that are obvious to humans (e.g. lines in images), but often they will not be, especially (e.g. they capture some strategic aspect of the Go board).

Why are neural networks cool right now?

Neural networks are not at all new technology, but are recently seeing a revival for a number of reasons:

  • We have a lot more data now, which allows us to paper over any limitations by just training it more.
  • We have much faster computers now, particularly as a lot of neural network training can be made highly parallel (that means that you can break it up into small chunks that can be run at the same time without waiting for each other, then combine the results).
  • We have made a number of incremental improvements to the algorithms that allow us to speed up our training and improve the quality of results.

This has allowed us to apply them to problems where it was previously infeasible, which is the main source of all of the current progress in this field.

What is going to go wrong when I use neural networks?

This is important to understand, because things will go wrong. The main things that it is important to understand about this are:

  1. It is much more important to have good input data than good algorithms, and gathering good input data is expensive. Machine learning is only as good as its training, and if your input data is biased or missing important examples, your machine learning will perform badly. e.g. A classic failure mode here is that if you train image recognition only on white people, it will often fail to see people of colour.
  2. Compared to a human decision maker, neural networks are fragile. There is an entire class of things called “adversarial examples” where carefully chosen trivial changes that a human wouldn’t even notice can cause the neural network to output an entirely different answer.
  3. Even without adversaries, machine learning algorithm will make stupid goofs that are obvious to a human decision maker. It will do this even if it is on average better than a human decision maker. This is simply because different things are obvious to machines and humans. Depending on how visible these decisions are, this will probably make you look silly when it happens.

When should I use neural networks?

First off, you should not be making a decision about whether to use neural networks if this article is teaching you new things. You should be making a decision on whether to usemachine learning. Leave the decision of what type you should be using to an expert – there is a good chance they will choose neural networks (tech people are very trend driven), but there might be something simpler and better suited for your problem.

Roughly the following rules of thumb should help you decide whether to use machine learning:

  1. Is this a problem that would benefit from being able to make lots of fairly constrained decisions very fast? If no, then maybe talk to an expert (some problems don’t look like this on the face of it but can still benefit – e.g. translation isn’t obviously of this form, but it can benefit from machine learning), but you’re probably out of luck and even if you’re not this is going to be a huge and expensive research project.
  2. Could you just write down a short list of rules that are sufficient to make those decisions? If yes, just do that instead of trying to learn them from data.
  3. If you had an averagely intelligent human with maybe a couple of days of on the job training and enough background knowledge to understand the problem making those decisions, would their answers be good enough? If no, you’re really going to struggle to get your machine learning to be good enough.
  4. Think through the failure modes of the previous section. Do you have a plan in place to avoid them, or to mitigate them when they occur? Is the cost of that plan worth the benefits of using machine learning?
  5. Reflect on the following question: When someone in the media says “Why did you use a machine rather than a person here?” and your answer is “It was cheaper”, how is it going to play out and are you happy with that outcome? If no, your problem is probably politically a poor fit for machine learning.

If your problem passed all of those tests, it might well be amenable to machine learning. Go talk to an expert about it.

This entry was posted in Uncategorized on by .

My Position on Functional Programming

Note: I’m going to be lazy and often use “functional programming” to mean “statically typed functional programming”. Pretend I’m saying “algebraic programming” if that makes you feel better.

I seem to get cited a lot as an example of someone who has “abandoned” functional programming. I thought it would be useful to have an explanation in one place that people can just link to rather than having to explain my position each time the subject comes up.

I do not intend this piece to convince anyone of anything, it’s just documentation of who I am in this context and where I stand.

Background

I learned to program in Standard ML during my pure mathematics degree, but not to any very high standard – I understood the language tolerably well, but didn’t really know much (if anything) about software development. When I got out of university, I got a job as a Java developer. I was quickly frustrated by the language and in my spare time also learned Haskell and Scala.

I have never shipped production Haskell code. I have shipped some production Scala code, and some of my code was included in the Scala standard library and compiler (but not a lot, and it wasn’t very good).

have done a fair bit of writing about both Haskell and Scala, and people seem to have found that writing useful. In addition, I was at one point a fairly prominent (read: loud) member of the Scala community, and for better or for worse had some influence on the development of the language. CanBuildFrom was indirectly my fault (I proposed it as a thought experiment, and argued against its actual inclusion).

I would describe myself as tolerably competent at the details of both languages. I’m rather rusty when it comes to write them, but I can do it. Due to my relatively restricted experience I do not have much big picture knowledge of software development in them.

I formally decided to stop writing Scala in 2009 (I think). Since then most of my work has been in Python or Ruby, with a little bit of C, a little bit of Rust, and the occasional dabbling in other languages.

What I think of FP

Short version: Fine but over-hyped. Not worth dealing with the community.

You can probably infer the rest of what I’m going to say, but if if you can’t then long version now follows.

There are a lot of claims of statically typed functional programming’s obvious superiority, up to and including claims that people who don’t use it are “insane” or “unethical”. I think people making these claims are bad and they should feel bad. I refer you to Hillel for more details

Broadly speaking, I like having a type checker. I happen to mostly be writing dynamically typed languages at the moment, but that’s mostly coincidence. One of the reasons I’d like to be writing more Rust (despite being a bit lukewarm on the language in many ways) is that I like having the type checker. I just think they’re oversold, and the evidence that they are in any way essential is non-existent to unconvincing.

I’m not a big fan of Haskell’s insistence on purity, or Scala’s design in general. I think both languages are large and complex enough that any decision to use them or not doesn’t come down to any one thing about them (this is true of essentially every mature programming language. It almost can’t be otherwise), so a rejection of them is not a rejection of FP per se. I keep meaning to get around to giving OCaml another try – Rust is nodding in the right direction, but isn’t quite the right sort of thing for scratching this itch.

The big problem I have with FP is the community. That’s not to say that only bad people do FP – this is absolutely not the case. The overwhelming majority of people I know who do FP are nice, kind, and interesting people. Similarly anyone telling you that FP is for academics and elitists is full of it – I know many people who are into FP for extremely practical reasons.

The problem is that a nice community is not one that contains nice people, it’s one where your interactions with the community are nice. If the community is almost entirely nice people but there is one jerk who you are guaranteed to interact with, then it is not actually a nice community.

In this sense, the FP community is awful, and I have absolutely no patience for it or inclination to deal with it. There is a relatively small number of you who are ruining it for others (or at least for me, and people within the community I know accept that they are problems and do not know what to do about it), and the community seems unable to self-police on this front.

What should you do with this information?

Link to this blog post rather than @-ing me into conversations about this on Twitter. Otherwise, probably nothing.

This entry was posted in Uncategorized on by .

More blogging over there

I recently created a notebook site and a lot of my recent writing has been happening over there.

It’s got a bit of a different character to this blog, and I do intend to return to better long-form blogging over here, but I’ve been a bit rushed off my feet by a variety of things in the last month so I’ve not had a huge amount of time to work on more “proper” blog posts.

In the long run I suspect I will be merging this blog into the notebook in some manner – I’m finding it a lot more pleasant to write in than WordPress by far – but for now I will attempt to balance the two.

This entry was posted in Uncategorized on by .

The Can Opener Principle

Context: What’s that? I have a paper deadline in 28 hours? Why don’t I write about literally anything else!

Epistemic Status: It’s a normative theory.


A thing I think about a lot because my brain is completely normal and as a result I spend a lot of time thinking about completely normal things is the role of consent in the ethics of explanation.

The following is a line from the Hypothesis code of conduct:

It’s OK not to want to know something. If you think someone’s question is fundamentally flawed, you should still ask permission before explaining what they should actually be asking.

The Hypothesis Code of Conduct

This is a response to The XY Problem, which is a legitimate problem but often used as an excuse for explaining to someone why their question is bad and they should feel bad. I do think there are cases where non-consensual explanation may be ethically justified, or even mandatory, but they are generally things of the form “You are causing harm by your ignorance and I need you to understand this”.

That’s not to say that in normal interactions you need to bend over backwards to make sure that people are consenting to every word you say to them, but it’s worth checking in with someone before you launch into a significant attempt to educate them.

A thing I realised recently is that it’s not just consent that’s important, it’s informed consent. If you are explaining things to someone, then hopefully you understand it better than they do, which means that you have a burden of responsibility to help guard against infohazards.

This was prompted by a conversation in which a friend who had previously mostly written Ruby was asking me about Python’s class system. We got a while down the rabbit hole before I realised how bad this was going to get and said the following:

I am happy to give you as much information on this subject as you want me to, but my considered advice is that my life has not been made better by the fact that I understand this and neither will yours be.

I think the following is a nice framing of this sort of interaction:

Do not hand a can opener to someone until you are sure that they understand that the can they are holding is full of worms

https://twitter.com/DRMacIver/status/1032882236918517761

It’s explanation as gentle knowledge editing rather than explanation as gate keeping: If they do understand that they’re holding a can of worms (maybe it’s for their garden?), handing them the can opener is fine, but as the current holder of the can opener it is your responsibility to make sure they know what they’re letting themselves in for.

This entry was posted in Uncategorized on by .

Research Is A Lot Like Sex

Context: Facetious response to John Regehr’s generally excellent “Why Research Isn’t Like Sex“. I agree with all of the actual advice in that article, but I have a paper deadline tomorrow so I would literally rather do anything but work on the paper and my sense of humour is in a weird place.

Attention Conservation Notice: Surprisingly SFW.


Nobody really agrees on the definition, but they’ve got really strong opinions about what counts, and on what the right way to do it is.

You probably shouldn’t take the version of it you see on TV too seriously.

Nobody is good at it the first time.

It’s usually a bad sign when someone tells you they’re great at it.

When it’s going well it’s amazing, when it’s going badly it’s awful.

It’s a collaborative activity, and works much better with a good partner.

The best bits are spontaneous, but it takes a lot of work and cooperation to get to that point.

Not everybody is into it and that’s OK.

Those who are into it are into all sorts of different types anyway, and do it in a huge variety of different ways.

It’s very important that you get consent from all the participants.

Most of us worry that everybody else is doing a lot more of it than we are.

Sometimes it’s useful, but that’s mostly not why we do it. Either way, it’s a bad sign if you’re not having fun.

If the people you’re doing it with are laughing, you’ve either done something very wrong or very right.

This entry was posted in Uncategorized on by .