Windows Progress Report

Well it’s been about a month since I switched to Windows, so I thought I’d mention how it’s going.

Advance warning: This is just a braindump of some thoughts and is not particularly coherent.

It’s going… OK. One of the interesting things about switching from a niche to a mainstream implementation of the same thing is that you find out that a lot of the things that you thought didn’t work correctly because of your weird life choices actually just don’t work correctly. In particular, the following work less well now that I’ve switched to Windows:

  1. WiFi
  2. Suspend
  3. Copy and Paste

As you can imagine, I have been somewhat frustrated to discover this.

My original plan of trying to use Windows purely as a host and running a desktop environment on a VM didn’t work very well, for a variety of reasons. I’m tempted to start trying it again, as the alternatives are also annoying me and it might be worth trying. Instead I’ve been running local VMs using Vagrant setups for each project.

To be honest, I’ve had a long-standing prejudice against the local use of VMs and this has mostly reinforced it. They work pretty well on the server where someone has already done the hard work of management and you don’t have to care about the interaction with the host OS at all, but locally they’re just pissing me off. In particular:

  1. Hyper-V looks nice but doesn’t support shared folders
  2. Virtualbox alleges to support shared folders but the reality is that it doesn’t even slightly work when trying to share between a Linux guest and a Windows host unless you have a definition of “work” that involves routine data corruption.
  3. Virtualbox also has completely unusable graphics performance
  4. VMWare… mostly works. Though you will have to fork out a significant amount of money for this option (both for VMware itself and Vagrant if you end up wanting to use it on VMWare).
  5. About the only way to actually get a stable way of SSHing into a local VM is to use Vagrant. Installing iTunes so that Windows understands zeroconf might work, but I didn’t really feel like trying it.

My current development environment involves vagranting all the things, using the windows command prompt (it’s surprisingly OK in Windows 10) and gvim. I was trying to use IntelliJ for a while, but I found it pissed me off too much in too many different ways.

One of the most annoying things in general so far has been getting tools to understand that yes I really do want Unix line endings. No I do not want any of this carriage return nonsense in my files. Even tools that should know better seem determines to try to guess and do the wrong thing as a result.

One of the most pleasant surprises is that actually Windows 10 window management is very good. It’s more or less a tiling window manager with good keyboard controls.

Overall: I hope I can install Linux in a working manner on this laptop again in the near future. It’s been nice to have the non-development stuff Just Work to a slightly higher degree, but to be honest I’ve had more random breakage on Windows than I did running Linux on a well supported set of hardware. That being said, this hasn’t been awful and has merely been slightly annoying. It’s certainly a viable way of working.

This entry was posted in Uncategorized on by .

Weekly reading post #5

This is the weekly reading post.

Random Links

These are links to things that I was reading when I answered a tagtime ping with nonfiction.

Selected Links

Books

I’ve not done a huge amount of actual book reading this week but spent a little bit of time on Combinatorial OptimizationHow To Read A Book and Switch: How to change when change is hard this week. See last week for opinions on the first two.

Switch is interesting. There’s a lot I’m unconvinced by / disagree with in it, but I’m definitely thinking about the material on organisational change in the context of Making Work Better, and it will probably influence my suggested approach for the better. More on it when I’ve devoted a bit more time to it.

No wishlist arrivals this week.

This entry was posted in Weekly Reading Posts on by .

L* Search and Inference of Deterministic Finite Automata

This post follows on from my previous one about the Myhill–Nerode theorem. You should read that one first.

When trying to construct a deterministic finite automaton for a language in practice, one thing we need to know is how it’s represented. If all we have is a yes/no predicate, the problem is in general intractable (it could be any set of strings at all!). If all we have is a set of examples and non-examples and we want to construct a minimal deterministic finite automaton (DFA), the problem is NP hard.

This post is about what can do if we augment the yes/no predicate only slightly: We have what is called a minimally adequate teacher. It can answer two types of questions about our language \(L\):

  1. Is this string in the language?
  2. Given this DFA, does it correctly recognise the language and if not what is an example of a string where it gives the wrong answer?

In this scenario there is an polynomial time algorithm in the size of the minimal DFA (and logarithmic in the size of the longest counterexample the teacher returns) which will terminate in a correct DFA for the language.

This algorithm was originally presented by Dana Angluin in her paper “Learning Regular Sets from Queries and Counterexamples” under the name L* search. The version I’m going to present here is in fact a slightly later version of the algorithm presented by Ronald L. Rivest and Robert E. Schapire in “Inference of Finite Automata Using Homing Sequences” which improves on the algorithmic complexity and makes the underlying nature of the algorithm slightly clearer. Regardless, I’m still going to call it L* search.

The idea of L* search (although not how it was originally presented) is that we are going to try to construct the Myhill–Nerode automaton , \(Z\), directly. We will do this as follows:

  1. We will maintain a sequence \(w_1, \ldots, w_m\) of witnesses and a sequence \(t_1, \ldots t_n\) of tests. \(w_i\) is a sequence of strings such that each string is not word equivalent to any other, i.e. is a member of a distinct equivalence class in \(Z\), and \(t_i\) is a sequence of tests that witness that: i.e. For any \(i, j \leq m\) with \(i \ne j\) there is some \(k\) such that \(w_i t_k \in L \ne w_j t_k \in L\).
  2. We start with \(w_1 = t_1 = \epsilon\).
  3. At each stage we either terminate or we will extend \(t_i\) by one test and \(p_i\) by one or more strings.
  4. This must terminate, because we’re creating a sequence of non word equivalent strings so we can’t have more elements in it than there are states in \(Z\) (because the states are equivalence classes of strings). It’s not currently obvious that we can always make progress prior to that point, but that will emerge in the construction of the algorithm.

The core idea is to use the \(t_i\) to ‘finger print’ every string. The \(t_i\) give us a mapping \(\mathrm{row}: S \to \{0, 1\}^n\) where \(\mathrm{row}(s)_i\) is \(1\) if \(st_i \in L\), else \(0\). We can calculate this fingerprint because we are allowed to ask the teacher whether a string is in the language. We know for a fact that if \(\mathrm{row}(s) \ne \mathrm{row}(t)\) then \(s\) and \(t\) aren’t word equivalent, because one of the \(t_i\) must witness that.

Given our current sequence of tests \(t_i\) and our witnesses \(w_i\), we use the row function to construct the following automaton whose states are the \(w_i\). In the course of constructing it we may have to extend \(w_i\):

For each \(w_i\) and each \(a \in A\), we consider \(\mathrm{row}(w_i a)\). Define  \(\delta(w_i, j) = w_j\), where \(w_j\) is chosen such that \(\mathrm{row}(w_i a) = \mathrm{row}(w_j) \). If no such \(w_j\) exists, then extend the sequence by an element by adding \(w_i a\) to the end so that it does.

We continue this process until the transition function has been defined for all elements of the sequence (including the ones newly added).

We then define the initial state of this finite state machine to be \(w_1 = \epsilon\), and a state \(w_i\) to be accepting if and only if \(w_i \in L\).

The resulting state machine is more or less the same as the the Myhill–Nerode automaton. The only differences are:

  1. We have explicitly chosen an instance of each equivalence class (the \(w_i\))
  2. Instead of considering whether states are distinguishable by any test, we only ask if they are distinguishable by our current experiments \(e_i\). This can cause us to conflate more states than we should, and fixing that will be the core point of what we do next.

So we now have a DFA to present to the teacher. At this point we perform a counter-example query. Either our teacher says that the DFA is correct, in which case we’re done, or it presents us with a counter-example in the form of a string that the DFA predicts the wrong answer for. Call our current DFA \(Q\), and that counter-example \(c\).

What we want to do now is to take that mispredicted string and extract from it an experiment that shows us where we went wrong: At some point we must have made a transition \(\delta(w_i, a) = w_j\) when in actual fact \(w_ia\) and \(w_j\) are not word equivalent (if we never made such a transition then the path we walked is the same as the path we would have walked in \(Z\), which does correctly recognise the language). So what we want to do is find an experiment \(t\) such that \(w_i a t \in L \ne w_j t \in L\), and add \(t\) as our next experiment.

If we do that, then when we build the next version of the DFA with the new set of experiments the number of states must increase: We already know that \(w_i a\) is not word-equivalent to \(w_k\) for any \(k \ne j\). We have just shown that it is also not word equivalent to \(w_j\). Thus, in order to find a state to transition to when we reach \(i, a\), we must either have already added a new state or we add one there and then.

The idea to find such a transition, and an experiment that would distinguish it, is this: We use the suffixes of \(c\) as our experiment candidates.

We’ll look at a sequence of strings that should be equivalent if our DFA were correct, and look for the exact moment that something goes wrong to find our experiment.

Consider splitting \(c\) up into a sequence of strings \(u_i v_i\), with \(0 \leq i \leq |c|\) and \(|u_i| = i\). i.e. \(u_i\) is the first \(i\) characters of \(c\) and \(v_i\) is the remaining set of characters.

(Notation: \(|s|\) is the length of the string \(s\)).

Now, we can replace \(u_i\) with \(\delta_Q(q_0, u_i)\) – the string that represents the state that you get to after reading the first \(i\) characters – and not have this change whether the result is in the language recognized by \(Q\). However, the same is not true of the language \(L\): If we consider \(i = |c|\) then this string is \(\delta(q_0, c)\), and by the fact that \(c\) is a counter-example we know that \(\delta_Q(\epsilon, c) \in L \ne c \in L\).

Call a value of \(i\) bad if it has the property that \(\delta_Q(\epsilon, u_i) v_i \in L \ne c \in L\). i.e. if replacing the state we expect to be in after \(i\) characters with the representative for that state changes whether the string is recognised as being in the desired language. Necessarily, \(|c|\) is bad and \(0\) is good (because \(u_i = \epsilon\) so the replacement string is just \(c\) again).

What we want to do is find an index \(i\) such that \(i\) is good and \(i + 1\) is bad. We’ll see how to do that in a moment, but first lets see why it would help:

What we want to show is that if we have such an \(i\) then the transition from \(\delta(q_0,  u_i)\) to \delta(q_0,  u_{i + 1})\) via \(c_{i + 1}\) is the bad transition we were looking for, and that \(v_{i+1}\) is a test that witnesses this.

To see this, note that because \(i + 1\) is bad we must have \(\delta(q_0,  u_{i + 1}) v_{i + 1} \ne c \in L\) and because \(i\) is good we must have \(\delta(q_0,  u_{i}) v_{i} = c \in L\), so \(\delta(q_0,  u_{i}) v_{i} \ne \delta(q_0,  u_{i + 1}) v_{i + 1} \). But \(v_i = c_{i + 1} v_{i + 1}\), so this means we have   \(\delta(q_0,  u_{i}) c_{i + 1} v_{i + 1} \ne \delta(q_0,  u_{i + 1}) v_{i + 1} \).

i.e. \(v_{i + 1}\) is an experiment distinguishing  \(\delta(q_0,  u_{i}) c_{i + 1}\) from \(\delta(q_0,  u_{i + 1})\). But by construction,  \(\delta(q_0,  u_{i + 1}) = \delta(\delta(q_0, u_i), c)\), which makes \(v_{i + 1}\) our desired experiment to add to the list.

Now, to complete the process: How do we find such an \(i\)?

The most direct approach would be to just do a linear scan, but we can do better: You can use binary search, by using the observation that the general binary search algorithm works just fine without anything to do with sorting at all if you regard it as looking for a point at which a function changes in a particular way.

In python pseudocode, we can find such an \(i\) as follows:

# Invariants: lo is good, hi is bad
lo = 0
hi = len(c)
while lo + 1 < hi:
    mid = (lo + hi) // 2
    if bad(mid):
         hi = mid
    else:
         lo = mid

This then takes \(\log_2(|c|)\) queries and when it terminates we can take \(i = \mathrm{lo}\).

Which, was the final step in our puzzle for the construction: As long as our automaton does not recognise \(L\), we can find a new experiment, add it to the list, and thus force our automaton to keep growing. When it can grow no longer it must be equivalent to \(Z\), because it is an automaton of the same size which recognises \(L\).

We’re almost done, but lets do some analysis on how many queries we made here (in this analysis I will assume that all answers to membership queries are cached so we only make them once).

Suppose our final set of tests is \(|T|\). Note that \(|T| \leq |Z|\) because every time we added to \(T\) we increased the number of states by at least one, and the algorithm terminated when the number of states was equal to \(|Z|\).

We have performed precisely \(|T|\) counter-example queries because every time we performed a counter-example query we added exactly \(1\) element to \(T\).

Now lets see how many membership queries we’ve performed.

We performed membership queries in two places: First, when building the transition table. Secondly, when trying to construct the tests.

For the first: For every witness, every \(a \in A\) and every \(t \in T\) we have calculated whether it is a member of \(L\) in order to compute the rows and the transitions. So that comes to \(|Z||A||T| \leq |A| |Z|^2\) membership queries.

In calculating the test, we had to perform one membership query at every point in the binary search in order to determine if mid were bad. Therefore we performed \(O(\log(|c|)\) membership queries. Letting \(m\) be the maximum length of any counter-example provided in the course of our run, we thus performed a total of \(O(|T| \log(m)) = O(|Z| \log(m))\) membership queries.

So, in our final analysis, running L* search with a minimally adequate teacher for our language requires:

  1. \(O(|A||Z|^2 + |Z| \log(m))\) membership queries
  2. \(O(|Z|)\) counterexample queries

In fact these estimates are a bit too conservative. As we’ll see in a later post, it is sometimes going to be the case that \(|T| \ll |Z|\).


Why do we care?

Personally, I mostly care because I think L* is a super-neat algorithm, and the concept of a minimally adequate teacher as a learning environment is surprisingly general.

You can also put this into a stochastic setting to relax the requirement of the existence of a minimally adequate teacher (which may be hard to obtain) , in which case you instead get to build DFAs which give the correct answer with at least some chosen probability when the string is drawn from a fixed choice of distribution.

However, the evidence seems to be that most people in the field don’t care about this algorithm in the field of language learning. There hasn’t been a great deal of research on it in recent years that I’ve been able to find, and most of the research on it from back in the day was less concerned with languages than automata: You frequently got this framed in the context of a robot learning to navigate a world.

I recently read Grammatical Inference: Learning Automata and Grammars, which if you found this post interesting I can highly recommend, as there won’t be many more like it in this series.  The next post will use some of the ideas from it, but will be moving away from the area of inference.

 

This entry was posted in Automata Theory on by .

Weekly reading post #4

Admin notes:

  • I’m going to separate out books from links to articles. Books will no longer appear on the random reading list on the grounds that any book I’m actually reading I’ll read long enough for a ping.
  • I’ll be writing a little bit about each link/book in order to provide more context for whether it’s worth following.
  • There’s now a Weekly Reading Post category if you want to see all of these.

Random links

These are links to things that I was reading when I answered a tagtime ping with nonfiction. There are a few omissions if I wasn’t able to conveniently note something down when I was reading it, but no deliberate omissions other than books.

Selected links

These are links that I think are worth highlighting but did not get subjected to a tagtime ping (or did get subjected to one but I feel are especially worth noting).

Apparently the main selected links this week are all about conversational styles.

Miller’s Law in the Archipelago of Weird is about a variety of things, but I particularly like the point about conversational styles (in this case around performing small talk) differing strongly between autistic and allistic people and this creating a situation of competing needs.

It also leads into the next batch of selected links, which are about a different aspect of conversational styles, and the role interruption plays in conversation:

I was looking for the first one and got linked to the second two by other people (Ceren and Jeremy), but they’re all good reads.

The major take away point is that in some cultures interruption is the height of rudeness because you’re signalling that you think you are more important than the current speaker, while in others not interrupting is the height of rudeness because you’re signalling that you don’t care about the conversation. Interruption can be combative, or it can be collaborative, and clash over this without understanding that that’s what’s going on can lead to some horrible conversations.

Other things I liked that doen’t fit the theme:

Books read this week

These are any books I spent a substantial amount of time with this week.

  • So Good They Can’t Ignore You – I was somewhat skeptical about this book going in but decided to give it a go off the back of a good essay by the author (I forget which one), but my skepticism was wholly unwarranted. This is actually a really good book with really good career advice. In particular it’s a nice antidote to “follow your passion, be brave” style advice. Would recommend to just about anyone in a knowledge work style field. Status: Finished.
  • Soul of a New Machine – this is a book about Data General’s development of an earlyish 32-bit machine. It was interesting, but I felt like I was not the target audience of this book. It felt a bit too real for me – the culture of this team read like some of the terrible startups I’ve seen dialled up to 11. Would recommend to people who want to understand what that’s like, wouldn’t necessarily recommend to people who already know what it’s like. Status: Finished
  • How To Read A Book – this is a book about active reading – reading by not just passively absorbing material but by full on engaging with the material in order to understand it. It’s already changed a lot about how I read, and I can recommend it for that, but it feels like a lot of book for what it actually covers so I’ve been following their advice and reading it highly non-linearly. Status: Ongoing
  • Combinatorial Optimization – this was gifted to me a while back and I’ve been feeling bad about the fact that my reading of it has stalled quite a lot. As mentioned in my last post I’m trying a  new technique to properly engage in the material (loosely based off the “How to read a book” material), so I’ve resumed reading on it. It’s a good book, but it’s both huge and dense, so I would recommend it if you’re willing to invest the time to properly get to grips with the material and not if you want a light summary of the subject matter. Status: Ongoing

Books Gifted This Week

One book arrived from my wishlist this week: Switch: How to change when change is hard. This was bought for me by Zack M. Davis.

This entry was posted in Weekly Reading Posts on by .

How to read a mathematics textbook

Miikka  asked me how I read a maths textbook the other day, and I didn’t have a better answer than “badly”. I’ve mostly tried to read textbooks linearly cover to cover, and this doesn’t actually work – I sometimes end up understanding the material, but it’s usually not until long after I’ve finished or given up on the book.

After some thought and experimentation, I think I now have a better answer that helps integrate understanding and reading immediately. It’s very much a work in progress, and I’m sure I will refine the process over time, but initial experience with it seems promising.

It’s designed along a couple of important principles:

  1. The goal is not, in fact, to read the textbook. The goal is to understand the material the textbook covers, and reading it helps you achieve that.
  2. Your obligation to the author ended when you bought the book. You may make use of it in whatever way you like, regardless of how the author intended it to be used. In particular you are under no obligation to read the material in the order in which it is presented, and you don’t have to read the bits you don’t care about.
  3. If you’re constantly making progress in the right direction, you will benefit even if you stop midway through the process.
  4. If it feels easy you’re probably not learning very much.

It’s very loosely based on some of the principles of active reading outlined in How To Read A Book, but they mostly don’t cover this subject so I had to invent my own techniques.


The technique we’re going to use is one of goal-directed learning. If we try to take in the entire textbook at once we’ll be overwhelmed completely (textbooks are large), so instead we repeatedly pick a goal and try to achieve that goal. You’ll end up essentially rewriting just enough of the book in your own words to understand the parts of the material that you need to achieve that goal.

A good goal is a theorem or an exercise (really an exercise is just a theorem with the proof omitted): You want to be able to prove the theorem or solve the exercise. For these purposes a proposition or a lemma is just a small theorem.

Pick one. It doesn’t really matter how. If there’s something specific you want to learn, use that. Otherwise, open the book randomly somewhere in the middle and scout around until you find something that looks interesting.

Now, get a large quantity of paper and a pencil or pen. First dedicate two sheets and create two lists: One is your Current list, the other is your Pending list. Items on the Current list are numbered, items on the Pending list are unnumbered.

The Pending list starts empty, the Current list starts with one item on it (numbered item 1), which is your chosen starting point.

Now, repeatedly iterate the following procedure:

  1. If every item on your Current list has a tick next to it, move an item from your Pending list to your Current list by crossing it off from Pending and adding it to the bottom of Current. If your Pending list is empty, stop. You’re done for now.
  2. Pick an item on your Current list that does not have a tick next to it. The highest number is usually a good default.
  3. Read the part of the book (only a paragraph or two usually) that describes that item.
  4. For any definitions or theorems you don’t currently know, add them to the Current list if they’re not already on it. If any of these were already on the Pending list, cross them off it. Once you’ve done this, go back to step 2.
  5. If you understand all the referenced items, start a new sheet of paper with the item number and its name at the top. Write that sheet (see below), then put a tick next to the item on your current list.
  6. Anything interesting you encounter in steps 3 and 4 that is not directly part of solving your current goal, add to the Pending list.

A sheet for an entry on your list depends on the type of item:

If it’s a theorem, the sheet is a proof of the theorem. You should write this out by hand. Try to prove as much of it yourself as you can. If the proof is complicated and you don’t feel it provides much insight into your current goal, add “Prove this theorem” to your pending list and instead just write out the statement of the theorem in a way you understand and treat the proof as a black box for now.

A sheet for a definition consists of a statement of that definition that you understand followed by the answer to two or three questions:

  1. What is an example of this?
  2. Why do we care? (This may be just “We need it for X on the list”)
  3. Why is this well defined? (Optional as this is often obvious)

At the end of this, you should have written a proof that you understand of the thing that you wanted to understand, and also picked up a significant chunk of the textbook along the way. Congratulations! You should definitely now take a break (to be honest, this process probably took you multiple days so you’d better have been taking breaks along the way anyway). When you’re refreshed, you can either decide you’ve got enough out of the textbook or pick another starting point. Try to pick one that is at least related to your last one so you can build on previous understanding.

Worked Example

I am applying this technique to the combinatorial optimization textbook I got as a gift, and it seems to be helping a lot, thought it’s early in the process so it’s hard to say for sure.

For my purposes I picked “Edmond’s Matroid Intersection Algorithm” because it sounded interesting and I didn’t know what a Matroid was so it seemed like a good test. I’m in the middle of it still, so the following is a work in progress.

My Current list is:

  1. Edmond’s Algorithm
  2. Matroid intersection problem ✓
  3. Matroid ✓
  4. Independence Oracle ✓
  5. Circuit ✓
  6. Independence System ✓
  7. Proposition 13.4
  8. Lower Rank
  9. Rank Quotient
  10. Proposition 13.29
  11. Rank Function ✓
  12. C(X, e) ✓
  13. Theorem 13.12(b)
  14. Theorem 13.8
  15. Proposition 13.7
  16. Theorem 13.10 ✓

My Pending list is:

  • Duality of Matroids
  • Best-in greedy algorithm
  • Worst-out greedy algorithm
  • Theorem 13.19
  • Theorem 13.20

How does it feel?

Well, it feels like a lot of work. But it feels like useful work. I think I have a decent grasp on what a Matroid is and why they’re interesting, and how we can work with them.

Also, understanding this book was always going to be a lot of work. The problem was that I wasn’t doing the work, and the great thing about this technique is that it’s given me a process which actually lets me do the work without feeling overwhelmed, because I can let go of the idea of trying to read the whole book and just focus on the specific goal and the process of getting there.

Once I’ve completed this section I’m going to turn my attention to the Simplex algorithm section, which is entirely different in that I am familiar with the Simplex algorithm but have never really managed to understand it to any significant degree. If this approach helps me achieve that then I’ll definitely regard it as a winner.

This entry was posted in Numbers are hard on by .