I recently learned a whole bunch about a random corner of language theory that I hadn’t previously known much about. I thought it might be interesting to write up some of it. So this is the first post amongst several (I currently have two others planned but they may spawn more).

This is a post about the Myhill–Nerode theorem, behind which is a nice construction (in the abstract sense. You couldn’t perform it directly on an actual machine) for the minimal deterministic automaton matching any language.

Lets start with some terms: We’ve got some finite non-empty alphabet \(A\). A string on \(A\) is any finite sequence of elements from \(A\). A language on \(A\) is any set of strings.

For this post we’ll fix \(A\) and then \(S\) will be the set of strings on \(A\).

String notation:

- \(\epsilon\) is the string of length \(0\)
- If \(u, v\) are strings then \(uv\) is the string made by concatenating them in order.
- Similarly if \(a \in A\) then \(au\) is the string that consists of prepending \(a\) to \(u\) and \(ua\) is the string that consists of appending \(a\) to \(u\).

In general we are interested in languages which have nice compact definitions rather than having to specify the whole set. One nice way of defining languages which often provides this is a deterministic automaton. These also have the advantage of being very simple to work with computationally once you have one.

A deterministic automaton on \(A\) consists of:

- A set of states \(Q\), (possibly infinite – if it’s a finite set we call it a deterministic finite automaton)
- An initial state \(q_0 \in Q\)
- An update function \(\delta : Q \times A \to Q\)
- A set of accepting states \(F \subseteq Q\).

We can extend the update function to take any finite string: Let \(\delta(q, \epsilon) = q\) and \(\delta(q, au) = \delta(\delta(q, a), u)\). i.e. we just follow the update function along the string.

We will additionally assume that every state \(Q\) is reachable. That is it’s \(\delta(q_0, s)\) for some string \(s\). This is a fairly harmless restriction because for any automaton we can just create the automaton that replaces \(Q\) and \(F\) with their reachable subsets.

A deterministic automaton *recognises* a language \(L\) if \(s \in L\) if and only if \(\delta(q_0, s) \in F\). i.e. if you follow the automaton along the string then you end up in an accepting state if and only if the string is in the desired language.

Every language can be recognised by a deterministic automaton:

- Let \(Q = S\)
- Let \(q_0 = \epsilon\)
- Let \(\delta(s, a) = sa\)
- Let \(F = L\)

This works because every state in this automaton has a unique string that reaches it from the initial state (which is the string representing the state), and the state is accepting if and only if that string is in the language, so by construction recognises the language.

This simplifies matters not even slightly because we’ve just reframed the problem and determining if a state is accepting is equivalent to determining whether the string is in our language, and the automaton constructed this way is *always* infinite (though may have finitely many accepting states if \(L\) is finite), even if a very small finite automaton might recognise our language. If anything we’ve made the problem worse because we’ve taken a potentially finite language and moved to an always infinite representation.

So, can we find a smaller automaton that recognises a given language?

In fact we can. For any language there is a unique minimal deterministic automaton for any language. We’ll see exactly what that means in a bit, but first we’ll construct the automaton we want.

The idea for constructing it is this: We take the above automaton, and we try to collapse it by conflating equivalent states. We look for equivalent states with the notion of a *test*.

A test is just a string. The idea is that if we have some string \(t\) and two strings \(u\) and \(v\) then if \((ut \in L) \ne (vt \in L)\), \(u\) and \(v\) we must have \(\delta(q_0, u) \ne \delta(q_0, v)\) for any automaton recognising \(L\) – if not then we would have \(\delta(q_0, ut) = \delta(q_0, vt)\), which is impossible because one is an accepting state and the other is not.

Two strings are said to be word equivalent if there is no test that distinguishes them. The idea is that we can then shrink our above automaton by conflating all states that are word equivalent.

So if \(s\) is a string let \([s]\) be the set of all strings that are word equivalent to \(s\). We can then define the following automaton:

- Let \(Q = \{[s]: s \in S\}\)
- Let \(q_0 = [\epsilon]\)
- Let \(\delta([s], a) = [sa]\)
- Let \(F = \{[s]: s \in L\}\)

We’ll call this the Myhill-Nerode automaton and label it \(Z\).

We first need to check that \(delta\) is well defined. i.e. if \(u\) is word equivalent to \(v\) then \(ua\) is word equivalent to \(va\). This follows from the fact that if \(t\) is a test distinguishing \(ua\) and \(va\) then \(at\) is a test distinguishing \(u\) and \(v\).

So this is a well defined state machine, but does it recognise \(L\)?

First note:

- \(\delta(q_0, s) = [s]\) (more or less by definition)
- If \(u, v\) are word-equivalent then \(u \in L = v \in L\) because otherwise \(\epsilon\) is a test distinguishing them

So the \(\delta(q_0, s)\) is an accepting state if and only if it is \([t]\) for some \(t \in L\), which is true if and only if \(s \in L\). i.e. this automaton recognises \(L\).

So we’ve got another automaton recognising \(L\), and it sure *looks* smaller than the first one, but is it minimal?

And what do we even mean by minimal?

For that we’ll need the idea of an automaton reduction: Let \(P, Q\) be automata (call the initial state of \(P\) \(p_0\) and we’ll use subscripts to distinguish \(\delta\) and \(F\). \(Q\) is a reduction of \(P\) if there is a function \(r: P \to Q\) such that:

- \(r(p_0) = q_0\)
- If \(p \in P\) and \(a \in A\) then \(r(\delta_P(p, a)) = \delta_Q(r(p), a)\).
- \(r(p) \in F_Q\) if and only if \(p \in F_P\)
- Every \(q \in Q\) is \(r(p)\) for some \(p\)

i.e. we label every state in \(P\) with a state in \(Q\) such that everything lines up correctly for transitions.

Our claim of minimality is this: If \(Q\) is any automaton recognising \(L\), then \(Z\) is a reduction of \(Q\).

The key property for proving this is that if \(u\) and \(v\) are two strings with \(\delta_Q(q_0, u) = \delta_Q(q_0, v) = q\) then they must be word equivalent. Otherwise there would be some test \(t\) such that \(\delta(q, t) \ne \delta(q, t)\). So we choose as our reduction function \(r\) the function \(r(q) = [s]\) where \(s\) is any string such that \(\delta_Q(q_0, s) = q\) (recall we’re assuming all states are reachable).

Lets check that this has the desired properties:

- \(\delta_Q(q_0, \epsilon) = q_0\), so \(r(q_0) = [\epsilon] = z_0\)
- If \(\delta_Q(q_0, s) = q\) then \(delta_Q(q_0, sa) = \delta(q, a)\), so \(r(\delta_Q(q_0, s), a)) = [sa] = \delta_Z([s], a) = \delta_Z(r(q), a)\) as desired
- This follows automatically because both \(Q\) and \(Z\) recognise \(L\)
- \([s] = r(\delta(q_0, s))\)

So for any automaton we can reduce it to \(Z\), as desired.

And finally we need to prove uniqueness: The uniqueness claim is that if \(Q\) is any other automaton that has this property then a reduction \(r: Z \to Q\) must also have the property that if \(s \ne t\) then \(r(s) \ne r(t)\). i.e. \(r\) is an isomorphism, just relabelling the vertices.

Proof: Suppose \([u] \ne [v]\) are two distinct states of \(Z\). Then there is some \(t\) such that \(\delta_Z([u], t) \in F_Z \ne \delta_Z(([v], t) \in F_Z\). But then by the properties of the reduction we must have \(\delta_Q(r([u]), t) \in F_Q \ne \delta_Q((r([v]), t) \in F_Q\) and thus \(r([u]) \ne r([v])\) as desired.

We’ll finally finish with the title of this post: The Myhill–Nerode theorem.

In the language we have so far, this states that \(L\) is regular if and only if the \(Z\) we constructed is finite. But this follows immediately from what we’ve done above: \(Z\) is an automaton recognizing \(L\), so by definition if \(Z\) is finite then \(L\) must be regular. Conversely, if there is any finite automaton that recognizes \(L\) then it reduces to \(Z\), so \(Z\) must also be finite.

Why do we care about all of this?

Well, as far as I know, the Myhill–Nerode theorem itself isn’t actually very important. It’s mostly an intellectual curiousity. I do like that it shows that *every *language regardless of structural properties has a minimal deterministic automaton, but I’m not sure I have any actual use for that fact.

But the machinery used to prove it is independently quite interesting. In particular, minimal (or at least small) deterministic finite automaton are very useful in practice and we can use a similar construction starting from an existing finite automaton to produce one (this is Hopcroft’s Algorithm).

Moreover, the idea of a *test* that we started from proves quite interesting, and you can use it to construct alternative (often more compact) representations of a DFA, and it forms the basis for the inference of regular languages in certain circumstances. More on that in later posts.

Pingback: The Derivative as an Alternative Minimal Deterministic Automaton | David R. MacIver

Pingback: Weekly reading post #4 | David R. MacIver

Pingback: L* Search and Inference of Deterministic Finite Automata | David R. MacIver