Weekly Reading Post #6 (Bi-Weekly Edition)

There was very little to report last week so I ended up skipping posting this for a week.

Random Links

Selected Links

  • Explaining and Harnessing Adversarial Examples – a neat paper about the fragility of neural network models when given things that are just slightly off the typical distribution.
  • E-Prime language – a linguistic style in which one avoids statements of the form “X is Y” in preference for “X does Z”. This was sent to me in response to my pointing out that arguments about the former tend to be very frustrating and largely go away if you focus on the latter.
  • How to Grow a Weetabix – an interesting breakdown of the effect of a possible exit from the EU on the British farming industry and landscape, along with a lot of interesting related information about British farming subsidies and their ecological focus.


I’ve been reading Thinking, Fast and Slow this week. It’s come very highly recommended, but to be honest I’m not very convinced by it. Part of the problem is that I distrust a lot of the underlying research.

Elements of Information Theory arrived as a gift from Zack M. Davis. Thanks, Zack!

This entry was posted in Weekly Reading Posts on by .

Windows Progress Report

Well it’s been about a month since I switched to Windows, so I thought I’d mention how it’s going.

Advance warning: This is just a braindump of some thoughts and is not particularly coherent.

It’s going… OK. One of the interesting things about switching from a niche to a mainstream implementation of the same thing is that you find out that a lot of the things that you thought didn’t work correctly because of your weird life choices actually just don’t work correctly. In particular, the following work less well now that I’ve switched to Windows:

  1. WiFi
  2. Suspend
  3. Copy and Paste

As you can imagine, I have been somewhat frustrated to discover this.

My original plan of trying to use Windows purely as a host and running a desktop environment on a VM didn’t work very well, for a variety of reasons. I’m tempted to start trying it again, as the alternatives are also annoying me and it might be worth trying. Instead I’ve been running local VMs using Vagrant setups for each project.

To be honest, I’ve had a long-standing prejudice against the local use of VMs and this has mostly reinforced it. They work pretty well on the server where someone has already done the hard work of management and you don’t have to care about the interaction with the host OS at all, but locally they’re just pissing me off. In particular:

  1. Hyper-V looks nice but doesn’t support shared folders
  2. Virtualbox alleges to support shared folders but the reality is that it doesn’t even slightly work when trying to share between a Linux guest and a Windows host unless you have a definition of “work” that involves routine data corruption.
  3. Virtualbox also has completely unusable graphics performance
  4. VMWare… mostly works. Though you will have to fork out a significant amount of money for this option (both for VMware itself and Vagrant if you end up wanting to use it on VMWare).
  5. About the only way to actually get a stable way of SSHing into a local VM is to use Vagrant. Installing iTunes so that Windows understands zeroconf might work, but I didn’t really feel like trying it.

My current development environment involves vagranting all the things, using the windows command prompt (it’s surprisingly OK in Windows 10) and gvim. I was trying to use IntelliJ for a while, but I found it pissed me off too much in too many different ways.

One of the most annoying things in general so far has been getting tools to understand that yes I really do want Unix line endings. No I do not want any of this carriage return nonsense in my files. Even tools that should know better seem determines to try to guess and do the wrong thing as a result.

One of the most pleasant surprises is that actually Windows 10 window management is very good. It’s more or less a tiling window manager with good keyboard controls.

Overall: I hope I can install Linux in a working manner on this laptop again in the near future. It’s been nice to have the non-development stuff Just Work to a slightly higher degree, but to be honest I’ve had more random breakage on Windows than I did running Linux on a well supported set of hardware. That being said, this hasn’t been awful and has merely been slightly annoying. It’s certainly a viable way of working.

This entry was posted in Uncategorized on by .

Weekly reading post #5

This is the weekly reading post.

Random Links

These are links to things that I was reading when I answered a tagtime ping with nonfiction.

Selected Links


I’ve not done a huge amount of actual book reading this week but spent a little bit of time on Combinatorial OptimizationHow To Read A Book and Switch: How to change when change is hard this week. See last week for opinions on the first two.

Switch is interesting. There’s a lot I’m unconvinced by / disagree with in it, but I’m definitely thinking about the material on organisational change in the context of Making Work Better, and it will probably influence my suggested approach for the better. More on it when I’ve devoted a bit more time to it.

No wishlist arrivals this week.

This entry was posted in Weekly Reading Posts on by .

L* Search and Inference of Deterministic Finite Automata

This post follows on from my previous one about the Myhill–Nerode theorem. You should read that one first.

When trying to construct a deterministic finite automaton for a language in practice, one thing we need to know is how it’s represented. If all we have is a yes/no predicate, the problem is in general intractable (it could be any set of strings at all!). If all we have is a set of examples and non-examples and we want to construct a minimal deterministic finite automaton (DFA), the problem is NP hard.

This post is about what can do if we augment the yes/no predicate only slightly: We have what is called a minimally adequate teacher. It can answer two types of questions about our language \(L\):

  1. Is this string in the language?
  2. Given this DFA, does it correctly recognise the language and if not what is an example of a string where it gives the wrong answer?

In this scenario there is an polynomial time algorithm in the size of the minimal DFA (and logarithmic in the size of the longest counterexample the teacher returns) which will terminate in a correct DFA for the language.

This algorithm was originally presented by Dana Angluin in her paper “Learning Regular Sets from Queries and Counterexamples” under the name L* search. The version I’m going to present here is in fact a slightly later version of the algorithm presented by Ronald L. Rivest and Robert E. Schapire in “Inference of Finite Automata Using Homing Sequences” which improves on the algorithmic complexity and makes the underlying nature of the algorithm slightly clearer. Regardless, I’m still going to call it L* search.

The idea of L* search (although not how it was originally presented) is that we are going to try to construct the Myhill–Nerode automaton , \(Z\), directly. We will do this as follows:

  1. We will maintain a sequence \(w_1, \ldots, w_m\) of witnesses and a sequence \(t_1, \ldots t_n\) of tests. \(w_i\) is a sequence of strings such that each string is not word equivalent to any other, i.e. is a member of a distinct equivalence class in \(Z\), and \(t_i\) is a sequence of tests that witness that: i.e. For any \(i, j \leq m\) with \(i \ne j\) there is some \(k\) such that \(w_i t_k \in L \ne w_j t_k \in L\).
  2. We start with \(w_1 = t_1 = \epsilon\).
  3. At each stage we either terminate or we will extend \(t_i\) by one test and \(p_i\) by one or more strings.
  4. This must terminate, because we’re creating a sequence of non word equivalent strings so we can’t have more elements in it than there are states in \(Z\) (because the states are equivalence classes of strings). It’s not currently obvious that we can always make progress prior to that point, but that will emerge in the construction of the algorithm.

The core idea is to use the \(t_i\) to ‘finger print’ every string. The \(t_i\) give us a mapping \(\mathrm{row}: S \to \{0, 1\}^n\) where \(\mathrm{row}(s)_i\) is \(1\) if \(st_i \in L\), else \(0\). We can calculate this fingerprint because we are allowed to ask the teacher whether a string is in the language. We know for a fact that if \(\mathrm{row}(s) \ne \mathrm{row}(t)\) then \(s\) and \(t\) aren’t word equivalent, because one of the \(t_i\) must witness that.

Given our current sequence of tests \(t_i\) and our witnesses \(w_i\), we use the row function to construct the following automaton whose states are the \(w_i\). In the course of constructing it we may have to extend \(w_i\):

For each \(w_i\) and each \(a \in A\), we consider \(\mathrm{row}(w_i a)\). Define  \(\delta(w_i, j) = w_j\), where \(w_j\) is chosen such that \(\mathrm{row}(w_i a) = \mathrm{row}(w_j) \). If no such \(w_j\) exists, then extend the sequence by an element by adding \(w_i a\) to the end so that it does.

We continue this process until the transition function has been defined for all elements of the sequence (including the ones newly added).

We then define the initial state of this finite state machine to be \(w_1 = \epsilon\), and a state \(w_i\) to be accepting if and only if \(w_i \in L\).

The resulting state machine is more or less the same as the the Myhill–Nerode automaton. The only differences are:

  1. We have explicitly chosen an instance of each equivalence class (the \(w_i\))
  2. Instead of considering whether states are distinguishable by any test, we only ask if they are distinguishable by our current experiments \(e_i\). This can cause us to conflate more states than we should, and fixing that will be the core point of what we do next.

So we now have a DFA to present to the teacher. At this point we perform a counter-example query. Either our teacher says that the DFA is correct, in which case we’re done, or it presents us with a counter-example in the form of a string that the DFA predicts the wrong answer for. Call our current DFA \(Q\), and that counter-example \(c\).

What we want to do now is to take that mispredicted string and extract from it an experiment that shows us where we went wrong: At some point we must have made a transition \(\delta(w_i, a) = w_j\) when in actual fact \(w_ia\) and \(w_j\) are not word equivalent (if we never made such a transition then the path we walked is the same as the path we would have walked in \(Z\), which does correctly recognise the language). So what we want to do is find an experiment \(t\) such that \(w_i a t \in L \ne w_j t \in L\), and add \(t\) as our next experiment.

If we do that, then when we build the next version of the DFA with the new set of experiments the number of states must increase: We already know that \(w_i a\) is not word-equivalent to \(w_k\) for any \(k \ne j\). We have just shown that it is also not word equivalent to \(w_j\). Thus, in order to find a state to transition to when we reach \(i, a\), we must either have already added a new state or we add one there and then.

The idea to find such a transition, and an experiment that would distinguish it, is this: We use the suffixes of \(c\) as our experiment candidates.

We’ll look at a sequence of strings that should be equivalent if our DFA were correct, and look for the exact moment that something goes wrong to find our experiment.

Consider splitting \(c\) up into a sequence of strings \(u_i v_i\), with \(0 \leq i \leq |c|\) and \(|u_i| = i\). i.e. \(u_i\) is the first \(i\) characters of \(c\) and \(v_i\) is the remaining set of characters.

(Notation: \(|s|\) is the length of the string \(s\)).

Now, we can replace \(u_i\) with \(\delta_Q(q_0, u_i)\) – the string that represents the state that you get to after reading the first \(i\) characters – and not have this change whether the result is in the language recognized by \(Q\). However, the same is not true of the language \(L\): If we consider \(i = |c|\) then this string is \(\delta(q_0, c)\), and by the fact that \(c\) is a counter-example we know that \(\delta_Q(\epsilon, c) \in L \ne c \in L\).

Call a value of \(i\) bad if it has the property that \(\delta_Q(\epsilon, u_i) v_i \in L \ne c \in L\). i.e. if replacing the state we expect to be in after \(i\) characters with the representative for that state changes whether the string is recognised as being in the desired language. Necessarily, \(|c|\) is bad and \(0\) is good (because \(u_i = \epsilon\) so the replacement string is just \(c\) again).

What we want to do is find an index \(i\) such that \(i\) is good and \(i + 1\) is bad. We’ll see how to do that in a moment, but first lets see why it would help:

What we want to show is that if we have such an \(i\) then the transition from \(\delta(q_0,  u_i)\) to \delta(q_0,  u_{i + 1})\) via \(c_{i + 1}\) is the bad transition we were looking for, and that \(v_{i+1}\) is a test that witnesses this.

To see this, note that because \(i + 1\) is bad we must have \(\delta(q_0,  u_{i + 1}) v_{i + 1} \ne c \in L\) and because \(i\) is good we must have \(\delta(q_0,  u_{i}) v_{i} = c \in L\), so \(\delta(q_0,  u_{i}) v_{i} \ne \delta(q_0,  u_{i + 1}) v_{i + 1} \). But \(v_i = c_{i + 1} v_{i + 1}\), so this means we have   \(\delta(q_0,  u_{i}) c_{i + 1} v_{i + 1} \ne \delta(q_0,  u_{i + 1}) v_{i + 1} \).

i.e. \(v_{i + 1}\) is an experiment distinguishing  \(\delta(q_0,  u_{i}) c_{i + 1}\) from \(\delta(q_0,  u_{i + 1})\). But by construction,  \(\delta(q_0,  u_{i + 1}) = \delta(\delta(q_0, u_i), c)\), which makes \(v_{i + 1}\) our desired experiment to add to the list.

Now, to complete the process: How do we find such an \(i\)?

The most direct approach would be to just do a linear scan, but we can do better: You can use binary search, by using the observation that the general binary search algorithm works just fine without anything to do with sorting at all if you regard it as looking for a point at which a function changes in a particular way.

In python pseudocode, we can find such an \(i\) as follows:

# Invariants: lo is good, hi is bad
lo = 0
hi = len(c)
while lo + 1 < hi:
    mid = (lo + hi) // 2
    if bad(mid):
         hi = mid
         lo = mid

This then takes \(\log_2(|c|)\) queries and when it terminates we can take \(i = \mathrm{lo}\).

Which, was the final step in our puzzle for the construction: As long as our automaton does not recognise \(L\), we can find a new experiment, add it to the list, and thus force our automaton to keep growing. When it can grow no longer it must be equivalent to \(Z\), because it is an automaton of the same size which recognises \(L\).

We’re almost done, but lets do some analysis on how many queries we made here (in this analysis I will assume that all answers to membership queries are cached so we only make them once).

Suppose our final set of tests is \(|T|\). Note that \(|T| \leq |Z|\) because every time we added to \(T\) we increased the number of states by at least one, and the algorithm terminated when the number of states was equal to \(|Z|\).

We have performed precisely \(|T|\) counter-example queries because every time we performed a counter-example query we added exactly \(1\) element to \(T\).

Now lets see how many membership queries we’ve performed.

We performed membership queries in two places: First, when building the transition table. Secondly, when trying to construct the tests.

For the first: For every witness, every \(a \in A\) and every \(t \in T\) we have calculated whether it is a member of \(L\) in order to compute the rows and the transitions. So that comes to \(|Z||A||T| \leq |A| |Z|^2\) membership queries.

In calculating the test, we had to perform one membership query at every point in the binary search in order to determine if mid were bad. Therefore we performed \(O(\log(|c|)\) membership queries. Letting \(m\) be the maximum length of any counter-example provided in the course of our run, we thus performed a total of \(O(|T| \log(m)) = O(|Z| \log(m))\) membership queries.

So, in our final analysis, running L* search with a minimally adequate teacher for our language requires:

  1. \(O(|A||Z|^2 + |Z| \log(m))\) membership queries
  2. \(O(|Z|)\) counterexample queries

In fact these estimates are a bit too conservative. As we’ll see in a later post, it is sometimes going to be the case that \(|T| \ll |Z|\).

Why do we care?

Personally, I mostly care because I think L* is a super-neat algorithm, and the concept of a minimally adequate teacher as a learning environment is surprisingly general.

You can also put this into a stochastic setting to relax the requirement of the existence of a minimally adequate teacher (which may be hard to obtain) , in which case you instead get to build DFAs which give the correct answer with at least some chosen probability when the string is drawn from a fixed choice of distribution.

However, the evidence seems to be that most people in the field don’t care about this algorithm in the field of language learning. There hasn’t been a great deal of research on it in recent years that I’ve been able to find, and most of the research on it from back in the day was less concerned with languages than automata: You frequently got this framed in the context of a robot learning to navigate a world.

I recently read Grammatical Inference: Learning Automata and Grammars, which if you found this post interesting I can highly recommend, as there won’t be many more like it in this series.  The next post will use some of the ideas from it, but will be moving away from the area of inference.


This entry was posted in Automata Theory on by .

Weekly reading post #4

Admin notes:

  • I’m going to separate out books from links to articles. Books will no longer appear on the random reading list on the grounds that any book I’m actually reading I’ll read long enough for a ping.
  • I’ll be writing a little bit about each link/book in order to provide more context for whether it’s worth following.
  • There’s now a Weekly Reading Post category if you want to see all of these.

Random links

These are links to things that I was reading when I answered a tagtime ping with nonfiction. There are a few omissions if I wasn’t able to conveniently note something down when I was reading it, but no deliberate omissions other than books.

Selected links

These are links that I think are worth highlighting but did not get subjected to a tagtime ping (or did get subjected to one but I feel are especially worth noting).

Apparently the main selected links this week are all about conversational styles.

Miller’s Law in the Archipelago of Weird is about a variety of things, but I particularly like the point about conversational styles (in this case around performing small talk) differing strongly between autistic and allistic people and this creating a situation of competing needs.

It also leads into the next batch of selected links, which are about a different aspect of conversational styles, and the role interruption plays in conversation:

I was looking for the first one and got linked to the second two by other people (Ceren and Jeremy), but they’re all good reads.

The major take away point is that in some cultures interruption is the height of rudeness because you’re signalling that you think you are more important than the current speaker, while in others not interrupting is the height of rudeness because you’re signalling that you don’t care about the conversation. Interruption can be combative, or it can be collaborative, and clash over this without understanding that that’s what’s going on can lead to some horrible conversations.

Other things I liked that doen’t fit the theme:

Books read this week

These are any books I spent a substantial amount of time with this week.

  • So Good They Can’t Ignore You – I was somewhat skeptical about this book going in but decided to give it a go off the back of a good essay by the author (I forget which one), but my skepticism was wholly unwarranted. This is actually a really good book with really good career advice. In particular it’s a nice antidote to “follow your passion, be brave” style advice. Would recommend to just about anyone in a knowledge work style field. Status: Finished.
  • Soul of a New Machine – this is a book about Data General’s development of an earlyish 32-bit machine. It was interesting, but I felt like I was not the target audience of this book. It felt a bit too real for me – the culture of this team read like some of the terrible startups I’ve seen dialled up to 11. Would recommend to people who want to understand what that’s like, wouldn’t necessarily recommend to people who already know what it’s like. Status: Finished
  • How To Read A Book – this is a book about active reading – reading by not just passively absorbing material but by full on engaging with the material in order to understand it. It’s already changed a lot about how I read, and I can recommend it for that, but it feels like a lot of book for what it actually covers so I’ve been following their advice and reading it highly non-linearly. Status: Ongoing
  • Combinatorial Optimization – this was gifted to me a while back and I’ve been feeling bad about the fact that my reading of it has stalled quite a lot. As mentioned in my last post I’m trying a  new technique to properly engage in the material (loosely based off the “How to read a book” material), so I’ve resumed reading on it. It’s a good book, but it’s both huge and dense, so I would recommend it if you’re willing to invest the time to properly get to grips with the material and not if you want a light summary of the subject matter. Status: Ongoing

Books Gifted This Week

One book arrived from my wishlist this week: Switch: How to change when change is hard. This was bought for me by Zack M. Davis.

This entry was posted in Weekly Reading Posts on by .