The Derivative as an Alternative Minimal Deterministic Automaton

This is another post about formal language theory. It builds on yesterday’s post about the the Myhill–Nerode theorem and you should read that one first. It’s not one of my originally intended posts for this series, so there are still at least two more to go.

There turns out to be another construction for the minimal deterministic automaton of a language that is often nicer to work with in practice. This is to use the Brzozowski derivative.

The idea of the Brzozowski derivative is that given a language and some prefix you can construct a new language for the suffixes of strings in the language starting with that prefix. i.e for a language \(L\) and a string \(u\), the derivative is \(\delta(L, u) = \{v \in S: uv \in L\}\). The intuition is that we read some amount of the input for the string and then have a new language matching only what’s left.

You can use this to construct a different deterministic automaton matching the language as follows:

  • Let \(Q\) be the set of all languages on \(A\). Note that this is a very large set, but in practice the subset of it which is reachable will usually be much smaller.
  • Let \(q_0 = L\).
  • Let \(\delta(q, a)\) be the Brzozowski derivative as defined above
  • Let \(F = \{ M \subseteq S \}, \epsilon \in M\} \).

i.e. we label states with languages, we evolve a state by taking the derivative, and a state is accepting if and only if the corresponding language matches the empty string.

In order to see that this really is an automaton matching the language, the first thing to notice is that the Brzozowski derivative and the transition function agree on all strings, not just on single character ones (you can prove this by induction on the length of the string and the definition of how we extend the transition function).

This then lets us make sense of the definition of whether a state is accepting: A state is accepting if and only if it matches the empty string, i.e. a string \(s\) is accepted if and only if \(\epsilon \in \delta(L, s\)\). But this happens precisely when \(s\epsilon \in L\), by definition of the derivative, and \(s\epsilon = s\) because \(\epsilon\) is the empty string. Thus \(s\) is accepted if and only if \(s \in L\), as desired.

We define the derivative automaton, \(D\), to be the automaton constructed by restricting the above to its set of reachable states.

This turns out to be a minimal deterministic automaton too, which we can see by showing that it’s isomorphic to the Myhill–Nerode automaton.

We’ll do this by showing that the reduction function \(r: D \to Z\) must be injective. i.e. if \(S \ne T\) then \(r(S) \ne r(T)\).

We have some string \(u\) in one of \(S, T\) and not the other. Lets say it’s in \(S\) but not \(T\) by swapping \(S\) and \(T\) if necessary.

Then \(\delta(S, x)\) is an accepting state and \(\delta(T, x\)\) is not. Thus \(\delta(r(S), x)\) is accepting and  \(\delta(r(T), x)\) is not, and so \(r(S) \ne r(T)\).

Because the derivative automaton is isomorphic to the Myhill–Nerode automaton, we thus get for free that it must in fact be a minimal finite automaton itself.


Why do we care?

Well, the main reason we care is that in practice this is a much more tractable construction than the  Myhill–Nerode automaton. Given standard definitions of a regular or context-free language, you can calculate the derivative directly, and you can build the infinite automaton lazily as you go rather than having to do a up front construction. This turns out to be very useful.

One limitation is that determining the equivalence of even context free languages is undecidable, so when used in practice this will not always produce a minimized automaton, because some states that should be conflated will not be. However a lot of simple reductions are possible for keeping the number of duplicated states relatively tractable.

I’ve never actually seen someone suggest using this approach for DFA minimization. It might be tractable, because there are decent algorithms for testing DFA equivalence without minimization, but it seems sufficiently obvious from adjacent theory that I assume that if it worked someone would have tried it already.

This entry was posted in Automata Theory on by .

The Myhill–Nerode theorem and the Minimal Deterministic Automaton

I recently learned a whole bunch about a random corner of language theory that I hadn’t previously known much about. I thought it might be interesting to write up some of it. So this is the first post amongst several (I currently have two others planned but they may spawn more).

This is a post about the Myhill–Nerode theorem, behind which is a nice construction (in the abstract sense. You couldn’t perform it directly on an actual machine) for the minimal deterministic automaton matching any language.

Lets start with some terms: We’ve got some finite non-empty alphabet \(A\). A string on \(A\) is any finite sequence of elements from \(A\). A language on \(A\) is any set of strings.

For this post we’ll fix \(A\) and then \(S\) will be the set of strings on \(A\).

String notation:

  • \(\epsilon\) is the string of length \(0\)
  • If \(u, v\) are strings then \(uv\) is the string made by concatenating them in order.
  • Similarly if \(a \in A\) then \(au\) is the string that consists of prepending \(a\) to \(u\) and \(ua\) is the string that consists of appending \(a\) to \(u\).

In general we are interested in languages which have nice compact definitions rather than having to specify the whole set. One nice way of defining languages which often provides this is a deterministic automaton. These also have the advantage of being very simple to work with computationally once you have one.

A deterministic automaton on \(A\) consists of:

  • A set of states \(Q\), (possibly infinite – if it’s a finite set we call it a deterministic finite automaton)
  • An initial state \(q_0 \in Q\)
  • An update function \(\delta : Q \times A \to Q\)
  • A set of accepting states \(F \subseteq Q\).

We can extend the update function to take any finite string: Let \(\delta(q, \epsilon) = q\) and \(\delta(q, au) = \delta(\delta(q, a), u)\). i.e. we just follow the update function along the string.

We will additionally assume that every state \(Q\) is reachable. That is it’s \(\delta(q_0, s)\) for some string \(s\). This is a fairly harmless restriction because for any automaton we can just create the automaton that replaces \(Q\) and \(F\) with their reachable subsets.

A deterministic automaton recognises a language \(L\) if \(s \in L\)  if and only if \(\delta(q_0, s) \in F\). i.e. if you follow the automaton along the string then you end up in an accepting state if and only if the string is in the desired language.

Every language can be recognised by a deterministic automaton:

  • Let \(Q = S\)
  • Let \(q_0 = \epsilon\)
  • Let \(\delta(s, a) = sa\)
  • Let \(F = L\)

This works because every state in this automaton has a unique string that reaches it from the initial state (which is the string representing the state), and the state is accepting if and only if that string is in the language, so by construction recognises the language.

This simplifies matters not even slightly because we’ve just reframed the problem and determining if a state is accepting is equivalent to determining whether the string is in our language, and the automaton constructed this way is always infinite (though may have finitely many accepting states if \(L\) is finite), even if a very small finite automaton might recognise our language. If anything we’ve made the problem worse because we’ve taken a potentially finite language and moved to an always infinite representation.

So, can we find a smaller automaton that recognises a given language?

In fact we can. For any language there is a unique minimal deterministic automaton for any language. We’ll see exactly what that means in a bit, but first we’ll construct the automaton we want.

The idea for constructing it is this: We take the above automaton, and we try to collapse it by conflating equivalent states. We look for equivalent states with the notion of a test.

A test is just a string. The idea is that if we have some string \(t\) and two strings \(u\) and \(v\) then if \((ut \in L) \ne (vt \in L)\), \(u\) and \(v\) we must have \(\delta(q_0, u) \ne \delta(q_0, v)\) for any automaton recognising \(L\) – if not then we would have  \(\delta(q_0, ut)  = \delta(q_0, vt)\), which is impossible because one is an accepting state and the other is not.

Two strings are said to be word equivalent if there is no test that distinguishes them. The idea is that we can then shrink our above automaton by conflating all states that are word equivalent.

So if \(s\) is a string let \([s]\) be the set of all strings that are word equivalent to \(s\). We can then define the following automaton:

  • Let \(Q = \{[s]: s \in S\}\)
  • Let \(q_0 = [\epsilon]\)
  • Let \(\delta([s], a) = [sa]\)
  • Let \(F = \{[s]: s \in L\}\)

We’ll call this the Myhill-Nerode automaton and label it \(Z\).

We first need to check that \(delta\) is well defined. i.e. if \(u\) is word equivalent to \(v\) then \(ua\) is word equivalent to \(va\). This follows from the fact that if \(t\) is a test distinguishing \(ua\) and \(va\) then \(at\) is a test distinguishing \(u\) and \(v\).

So this is a well defined state machine, but does it recognise \(L\)?

First note:

  • \(\delta(q_0, s) = [s]\) (more or less by definition)
  • If \(u, v\) are word-equivalent then \(u \in L = v \in L\) because otherwise \(\epsilon\) is a test distinguishing them

So the \(\delta(q_0, s)\) is an accepting state if and only if it is \([t]\) for some \(t \in L\), which is true if and only if \(s \in L\). i.e. this automaton recognises \(L\).

So we’ve got another automaton recognising \(L\), and it sure looks smaller than the first one, but is it minimal?

And what do we even mean by minimal?

For that we’ll need the idea of an automaton reduction: Let \(P, Q\) be automata (call the initial state of \(P\) \(p_0\) and we’ll use subscripts to distinguish \(\delta\) and \(F\). \(Q\) is a reduction of \(P\) if there is a function \(r: P \to Q\) such that:

  1. \(r(p_0) = q_0\)
  2. If \(p \in P\) and \(a \in A\) then \(r(\delta_P(p, a)) = \delta_Q(r(p), a)\).
  3. \(r(p) \in F_Q\) if and only if \(p \in F_P\)
  4. Every \(q \in Q\) is \(r(p)\) for some \(p\)

i.e. we label every state in \(P\) with a state in \(Q\) such that everything lines up correctly for transitions.

Our claim of minimality is this: If \(Q\) is any automaton recognising \(L\), then \(Z\) is a reduction of \(Q\).

The key property for proving this is that if \(u\) and \(v\) are two strings with \(\delta_Q(q_0, u) = \delta_Q(q_0, v) = q\) then they must be word equivalent. Otherwise there would be some test \(t\) such that \(\delta(q, t) \ne \delta(q, t)\). So we choose as our reduction function \(r\) the function \(r(q) = [s]\) where \(s\) is any string such that \(\delta_Q(q_0, s) = q\) (recall we’re assuming all states are reachable).

Lets check that this has the desired properties:

  1. \(\delta_Q(q_0, \epsilon) = q_0\), so \(r(q_0) = [\epsilon] = z_0\)
  2. If \(\delta_Q(q_0, s) = q\) then \(delta_Q(q_0, sa) = \delta(q, a)\), so \(r(\delta_Q(q_0, s), a)) = [sa] = \delta_Z([s], a) = \delta_Z(r(q), a)\) as desired
  3. This follows automatically because both \(Q\) and \(Z\) recognise \(L\)
  4. \([s] = r(\delta(q_0, s))\)

So for any automaton we can reduce it to \(Z\), as desired.

And finally we need to prove uniqueness: The uniqueness claim is that if \(Q\) is any other automaton that has this property then a reduction \(r: Z \to Q\) must also have the property that if \(s \ne t\) then \(r(s) \ne r(t)\). i.e. \(r\) is an isomorphism, just relabelling the vertices.

Proof: Suppose \([u] \ne [v]\) are two distinct states of \(Z\). Then there is some \(t\) such that \(\delta_Z([u], t) \in F_Z \ne \delta_Z(([v], t) \in F_Z\). But then by the properties of the reduction we must have \(\delta_Q(r([u]), t) \in F_Q \ne \delta_Q((r([v]), t) \in F_Q\) and thus \(r([u]) \ne r([v])\) as desired.

We’ll finally finish with the title of this post: The Myhill–Nerode theorem.

In the language we have so far, this states that \(L\) is regular if and only if the \(Z\) we constructed is finite. But this follows immediately from what we’ve done above: \(Z\) is an automaton recognizing \(L\), so by definition if \(Z\) is finite then \(L\) must be regular. Conversely, if there is any finite automaton that recognizes \(L\) then it reduces to \(Z\), so \(Z\) must also be finite.


Why do we care about all of this?

Well, as far as I know, the Myhill–Nerode theorem itself isn’t actually very important. It’s mostly an intellectual curiousity. I do like that it shows that every language regardless of structural properties has a minimal deterministic automaton, but I’m not sure I have any actual use for that fact.

But the machinery used to prove it is independently quite interesting. In particular, minimal (or at least small) deterministic finite automaton are very useful in practice and we can use a similar construction starting from an existing finite automaton to produce one (this is Hopcroft’s Algorithm).

Moreover, the idea of a test that we started from proves quite interesting, and you can use it to construct alternative (often more compact) representations of a DFA, and it forms the basis for the inference of regular languages in certain circumstances. More on that in later posts.

This entry was posted in Automata Theory on by .

Weekly reading post #3

Random things I’ve been reading this week

(I’m going to stop recording hour counts here, as it’s fiddly to keep track of separately from tagtime and you probably don’t care. This remains a random sample by time though)

Specific things I’ve been reading this week

Here are some things I’ve read in the last week that were memorable and interesting enough to note here but didn’t actually get a tagtime ping (this is mostly mined from my read items in feedbin and stars on twitter). As such this is a much more biased sample of what I actually read than the above, but maybe it’s also interesting for rather than despite that.

Gifted books

No wishlist books this week. That’s really very OK given that it was a slow reading week and I’m massively backed up on books right now.

I did do an interesting thing with my wishlist though, which is that I wrote a script that randomly shuffles items to the top: The list is rather huge and keeps growing, and because of the way Amazon displays it to people that creates a massive recency bias, so this is designed to offset that by taking five random books from history and move them to the top.

Caveat emptor: I’ve only tested this on my one wishlist, and while I believe it is allowed in the Amazon terms of services I Am Not A Lawyer.

This entry was posted in Books on by .

You’re now all on commission

I would like more customers. This requires me more sales and marketing. I am bad at sales and marketing.

The solution is of course to have someone else do it for me. Which is why you’re all now on commission.

Starting from today, I’m offering literally everyone the following deal:

If you introduce me (personally, rather than just a scattershot “You should talk to these people”, though I’ll probably figure out some way to reward the latter too) to a business or anyone else who becomes a customer of mine, I will pay you 20% commission on up to the first £5000 I invoice to that customer. i.e. 20% of whatever I invoice, capped to giving you a maximum of £1000 per customer you introduce me to. This will apply to all invoices to that customer up to the cap.

The main things I offer customers are listed over on hypothesis.works but in particular I offer:

I’m also potentially available for similar work for things that aren’t necessarily directly Hypothesis related. e.g . I have a ScalaCheck related contract coming up soon, and I can offer consulting on a variety of algorithmic or design problems. The commissions aren’t limited to anything in particular – it’s for any paid work (of course I may not accept any particular work just because you suggest it).

If you want to introduce me to someone or have any questions about this, please email me at [email protected].

Notes

  • For the sake of avoiding this being outright bribery, I’m not going to offer commission for introducing me to a company that you actually work for. I might change that later once I’m clearer on the implications.
  • If for some reason you are not able or willing to accept money from me I am happy to donate the commission to a charity of your choice.
  • What I invoice depends a lot on the project and the customer, but hitting that £1000 limit is definitely not implausible.
  • Neither this nor the 10% of each invoice that I donate to charity come out of each other. You get 20%, charity gets 10%, I get 70%.
This entry was posted in Business, Python on by .

Weekly reading post #2

This is the weekly reading post. It should have been published yesterday but I forgot.

This Week’s Reading

Gifted books

The following books have arrived from my thank you wishlist:

Reading Suggestion Request

I’ve noticed that my non-fiction reading habits skew very male (my fiction reading habits skew in the opposite direction). A lot of that is about the gender bias of who publishes in areas I’m most interested in, but this still bothers me and I’d like to put some effort into fixing it.

So, I’d like recommendations for blogs and books by women that you think I’d find interesting as well. Comments are open for once, so feel free to post there or elsewhere as you prefer.

This entry was posted in Books on by .