Doing the Two-Step for Efficient Reversible Deletes

Attention conservation notice: I’m back to blogging about tech again. Sorry.

Credit: Thanks to Suzy Hamilton for useful information about dance types.

I recently read Donald Knuth’s dancing links paper. It was sent to me as an example of good writing, which I don’t entirely agree with (it’s locally good writing, but I didn’t really understand it until I rewrote the first couple of pages in my own words and notation and then it was fine), but regardless of whether it’s good writing it’s definitely an interesting technique that I was previously vaguely aware of but didn’t appreciate.

Roughly the idea of dancing links is this:

  1. For efficiently implementing depth first search for solving some hard combinatorial problem (e.g. SAT solving, but also others. The main example of the paper is the exact cover problem) it is very useful to be able to have mutable data structures with undoable operations. In your recursive call you make a bunch of changes, and you undo them all before returning.
  2. A particularly useful category of undoable change is to be able to maintain a list of elements that you can delete individual elements from, then undo those deletes.
  3. A doubly-linked list allows you to efficiently implement that by removing a node from the list and just keeping it around (with its pointers still intact) and then just reinserting it when you undo. Apparently this is like the links in the doubly-linked list “dancing”, hence the name.

1 and 2 are interesting points that I think will have a significant impact on how I design things. I am very glad to have read the paper for that.

Point 3, eh. No thanks.

There are two problems with it:

  1. Doubly-linked lists are a bullshit data structure
  2. Point 1 of this list is in large part responsible for one of my main problems with wrapping my head around the paper: If you give me an interesting algorithmic approach then I’m going to want to wrap it up as an ADT so I can treat it in terms of an abstract set of operations it implements and reason about that separately from the algorithm. Doing this as a doubly linked list makes that hard because of what the deletion operation ends up looking like.

It turns out that it’s actually really easy to fix both these problems if you are willing to relax one of the constraints very slightly: The desire to keep all of the elements in a particular order.

There is a data structure that supports the following operations:

  • Initialise with N elements in O(N).
  • Get the current number of elements in O(1)
  • Get the element at index i in O(1)
  • Delete the element at index i, putting the element that was previously at index length – 1 in its place (or just delete it if i = length – 1) in O(1).
  • Undo a single delete O(1)

So if you perform k deletes and then call undo k times, the list is back in the state it was before the deletes.

Moreover there is very little indirection involved (everything is just stored in an array) so algorithms using this involve almost no pointer-chasing, unlike a doubly linked list, and is likely substantially more efficient as a result (but I have done literally zero benchmarking to demonstrate this). It also uses less memory per item.

One important thing to note is that although the order of the elements changes, it does so in a sufficiently predictable way that forward iteration over the list is easy: You just iterate as normal, and if you want to delete the current element you just skip incrementing the loop index.

How does this data structure work?

  1. Put all the desired elements in an array of length N. Store a length field and an undo stack. The length field keeps track of the end of the currently present elements – everything that has been deleted is kept in the array but stored starting from index length.
  2. When you delete the element at index i, push i onto the undo stack, decrement the length by 1, and swap the elements at position i and position length.
  3. When you undo, pop i from the undo stack, swap positions i and length, and then increment length.

That’s it. We’re done.

This is probably a known technique – it only took me about an hour of thinking about the problem to come up with it, so it would be frankly bizarre if nobody had noticed this since the dancing links technique was invented in 1979 – but some cursory googling didn’t find anything, so if nothing else this post will hopefully make the idea more visible and maybe someone who knows of it can send me a link (dancing or otherwise).

In general this technique of “You can remove an element from an array in O(1) if you don’t mind losing the order” is surprisingly powerful and I think underappreciated. It’s pretty straightforward once you see it, but allows for some pleasingly elegant data structure designs.

This entry was posted in programming on by .