David R. MacIver's Blog: The Derivative as an Alternative Minimal Deterministic Automaton

The Derivative as an Alternative Minimal Deterministic Automaton

29 May 2016

This is another post about formal language theory. It builds on yesterday’s post about the the Myhill–Nerode theorem and you should read that one first. It’s not one of my originally intended posts for this series, so there are still at least two more to go.

There turns out to be another construction for the minimal deterministic automaton of a language that is often nicer to work with in practice. This is to use the Brzozowski derivative.

The idea of the Brzozowski derivative is that given a language and some prefix you can construct a new language for the suffixes of strings in the language starting with that prefix. i.e for a language $L$ and a string $u$, the derivative is $\delta(L, u) = \{v \in S: uv \in L\}$. The intuition is that we read some amount of the input for the string and then have a new language matching only what’s left.

You can use this to construct a different deterministic automaton matching the language as follows:

Let $Q$ be the set of all languages on $A$. Note that this is a very large set, but in practice the subset of it which is reachable will usually be much smaller.
Let $q_0 = L$.
Let $\delta(q, a)$ be the Brzozowski derivative as defined above
Let $F = { M S }, M} $.

i.e. we label states with languages, we evolve a state by taking the derivative, and a state is accepting if and only if the corresponding language matches the empty string.

In order to see that this really is an automaton matching the language, the first thing to notice is that the Brzozowski derivative and the transition function agree on all strings, not just on single character ones (you can prove this by induction on the length of the string and the definition of how we extend the transition function).

This then lets us make sense of the definition of whether a state is accepting: A state is accepting if and only if it matches the empty string, i.e. a string $s$ is accepted if and only if $\epsilon \in \delta(L, s$$. But this happens precisely when $s\\epsilon \\in L$, by definition of the derivative, and $s\\epsilon = s$ because $\\epsilon$ is the empty string. Thus $s$ is accepted if and only if $s \\in L$, as desired.

We define the derivative automaton, $D$, to be the automaton constructed by restricting the above to its set of reachable states.

This turns out to be a minimal deterministic automaton too, which we can see by showing that it’s isomorphic to the Myhill–Nerode automaton.

We’ll do this by showing that the reduction function $r: D \\to Z$ must be injective. i.e. if $S \\ne T$ then $r(S) \\ne r(T)$.

We have some string $u$ in one of $S, T$ and not the other. Lets say it’s in $S$ but not $T$ by swapping $S$ and $T$ if necessary.

Then $\\delta(S, x)$ is an accepting state and $\\delta(T, x$$ is not. Thus $\\delta(r(S), x)$ is accepting and $\\delta(r(T), x)$ is not, and so $r(S) \ne r(T)$.

Because the derivative automaton is isomorphic to the Myhill–Nerode automaton, we thus get for free that it must in fact be a minimal finite automaton itself.

Why do we care?

Well, the main reason we care is that in practice this is a much more tractable construction than the Myhill–Nerode automaton. Given standard definitions of a regular or context-free language, you can calculate the derivative directly, and you can build the infinite automaton lazily as you go rather than having to do a up front construction. This turns out to be very useful.

One limitation is that determining the equivalence of even context free languages is undecidable, so when used in practice this will not always produce a minimized automaton, because some states that should be conflated will not be. However a lot of simple reductions are possible for keeping the number of duplicated states relatively tractable.

I’ve never actually seen someone suggest using this approach for DFA minimization. It might be tractable, because there are decent algorithms for testing DFA equivalence without minimization, but it seems sufficiently obvious from adjacent theory that I assume that if it worked someone would have tried it already.