Attention conservation notice: Unless you fit into a very similar academic niche to me this post likely uses techniques you don’t know to solve problems you don’t care about. Even if you are in that niche this is more of a theoretical curiousity than a useful technique at present, but it was useful for me to think through this to see how some stuff connects up.

I’ve been reading a bit about probabilistic bisimilarity distances recently, and I wanted to explore how it relates to things I’ve previously studied about minimizing deterministic finite automata.

The basic trick they use is to realise the object we want (an equivalence relationship or a pseudometric in this case) and consider it as an element of some complete lattice, define a monotonic function on that lattice such that a fixed point of it will have the desired properties, and then use the Knaster-Tarski fixed point theorem to show that the set of such fixed points is no-empty. The desired object is then either the smallest or largest fixed point (both of which the theorem guarantees to exist, and indeed shows us how to construct).

This turns out to work with deterministic finite automata too!

Suppose we have some deterministic finite automaton \(M\) with states \(S = \{s_1, \ldots, s_n\}\), alphabet \(A\), transition function \(\tau : A \times S \to S\) and acceptance function \(\alpha: S \to \{0, 1\}\). We want to construct the minimal deterministic finite automaton which matches the same language by merging equivalent states. This requires us to define an equivalence relationship \(\sim\) on \(M_S\) such that \(s_i \sim s_j\) if and only if the languages starting from \(s_i\) and \(s_j\) are the same. This is called the Myhill-Nerode relationship.

We can realise this equivalence relationship as follows:

We use the observation that the set of equivalence relationships over any set forms a complete lattice (to get the upper bound of a set of equivalence relationships, just take their unions as sets and then take the transitive closure to get back to an equivalence relationship).

What we are looking for is an equivalence relationship with the following two properties:

- If \(s_i \sim s_j\) then \(\alpha(s_i) = \alpha(s_j)\).
- If \(s_i \sim s_j\) and \(a \in A\) then \(\tau(s_i, a) \sim \tau(s_j, j)\).

i.e. no accepting state is equivalent to a non-accepting state, and the equivalence relationship works with the transition function so that if two states are equivalent they transition to equivalent states given the same character.

Any such equivalence relationship is called a bisimilarity.

We can find such an equivalence relationship as follows:

Let \(E\) be the set of equivalence relations over \(S\) and define \(F: E \to E\) as \(F(R) = \{(s, t) \in S^2: \alpha(s) = \alpha(t) \wedge \forall a \in A, (\tau(a, s), \tau(a, t)) \in R\}\).

(The fact that if \(R\) is an equivalence relationship then so is \(F(R)\) is not immediately obvious, but follows from some fairly straightforward mechanical checking that I will leave as an exercise for the interested reader)

This definition may seem somewhat non-obvious, so lets unpack it:

- We require that the resulting equivalence relationship always respects \(\alpha\). i.e. we throw away anything we would otherwise get where it disagrees on whether the state should be accepting.
- We take the new relationship to be the set of pairs that the old relationship cannot distinguish any transitions from. So for example if we had states \(s_i, s_j\) such that every transition from them went to an \(R\)-equivalent state, \(F(R)\) would treat them as equivalent. Equally, if you had two R-equivalent states such that \((\tau(a, s_i), \tau(a, s_j)) \not\in R\) (i.e. R distinguishes some transition from them)

So you can think of \(F\) as taking an equivalence relationship and returning something that is in some sense “a bit closer” to being a bisimilarity. If that intuition is correct, then it should be the case that if \(R\) is a fixed point of \(F\) then it is a bisimilarity.

Suppose \(R = F(R)\). Then by construction of \(F\) we have the first property satisfied (that if \(s \sim t\) then \(\alpha(s) = \alpha(t)\).

For the second, suppose that we have \(s, t, a\) with \((\tau(a, s), \tau(a, t)) \not\in R\). Then by definition of \(F\) we musts have that \((s, t) \not\in F(R) = R\). i.e. if \((s, t) \in R\) then \((\tau(a, s), \tau(a, t)) \in R\).

So in order to find a bisimilarity we just need to find fixed points of \(F\), but this is exactly what the Knaster-Tarski fixed point theorem lets us do. \(F\) is montonic, i.e. if \(R \subseteq R’\) then \(F(R) \subseteq F(R’)\) (another “not totally obvious but a mechanical exercise for the interested reader to check” claim), the set of equivalence relationships form a complete lattice, so the set of fixed points of \(F\) also form a complete lattice.

In particular there is a greatest fixed point of \(F\), called the bisimulation relationship. This is exactly the \(\sim\) relationship we were looking for (because if a word causes two states to reach a different language then they could not have been bisimilar, so that relationship is necessarily maximal).

Additionally, the proof of the Knaster-Tarski fixed point theorem shows us how to construct this relationship: We just start with the largest equivalence relationship on \(S\) and repeatedly apply \(F\) to it until we get a fixed point (in general “repeatedly” might require transfinite induction, but because \(S\) is finite so is the set of equivalence relationships over \(F\) and as a result we only have to iterate finitely many times).

But if you squint at it right, this is actually just Hopcroft’s algorithm for DFA minimization! We start with an equivalence relationship which conflates everything, and then we split states apart as we find that they have the wrong label or lead to states that are now known to be equivalent, iterating this until we reach a point where there are no more states to split (i.e. a fixed point of \(F\)). The steps are slightly different in that we actually just split the states by labels once at the beginning and then don’t have to do so again, but because of the way \(F\) behaves in these circumstances the difference is pretty inessential (in particular we always have \(F(R) \subseteq R\) for \(R\) obtained by iterating \(F\) from the largest equivalence relationship, even though this does not hold for arbitrary \(R\)).

Is this observation useful?

Well, not really. It’s useful in the sense that it helped me understand why fixed points on lattices are such a useful technique, and it was nice to relate this back to some things I already understood. So it’s a useful exercise in pedagogy.

Are there any applications of this? Well, maybe. In the context of the probabilistic work the next step here is to look at other interesting lattices which can use similar structure, and that’s pretty interesting, but I’m not sure how useful it is in the language case. Also most of the ways in which it’s useful are likely better served by transforming it into the probabilistic case.