This is another post brought to you by the school of the bloody obvious. Or at least it should be, but despite having implemented a trivial variation on this data structure before it still took me far too long to remember it existed.
My previous use case was the original version of IntAllocator in the rabbitmq Java client API (it’s an internal class used for allocating channel numbers). In looking it up I’ve noticed that it no longer does, for reasons that do make sense, but those reasons don’t apply for my current use case (basically the new version uses an internal bitset, trading space for time. RabbitMQ only needs to allocate one of these, and the range isn’t that large, so trading space for time is completely acceptable).
My current use case was for building separation graphs for metric spaces. Part of an algorithm I’m playing with requires keeping a candidate list for each element: That is, a list of elements which might be > t away but we’re not sure. This needs to not take up O(n) space when the answer is “the whole set” and we need to be able to remove elements from it efficiently. Hence the constraints in the title.
The resulting structure is basically a range compressed bitset built out of a balanced binary tree. It works as follows:
A node in the tree may be either empty, an inclusive range represented by its endpoints or a split around a pivot of m such that the left hand side of the tree contains elements <= m and the right contains elements > m. We enforce the invariants that neither child of a split may be empty, and that ranges must have endpoints which admit at least one element.
Allocating an entire range is then just a matter of creating a range node holding two integers, i.e. O(1).
Deleting an element is harder, but not much.
- Deleting an element from an empty node does nothing.
- Deleting an element from a range has three distinct cases:
- If the element is not contained within the range, do nothing
- If the element is one of the endpoints, decrement or increment the relevant endpoint. If this results in the range being empty, replace this with an empty node
- If the element is internal to the range, split this range around its midpoint and delete from the split instead
- Deleting an element from a split is even more straightforward: You determine which child the element should belong in and delete from that. If this results in that child being empty, replace this node with the other child.
- I’ve put together a Haskell implementation to demonstrate the algorithm
- There are at least two sensible ways to split a range. You can either do it around the element being deleted (which guarantees you’ll not have to recurse – you just allocate a split and two new ranges) or you can do it around the midpoint of the range. The former is cheaper but can result in the tree becoming unbalanced, the latter guarantees you’ll never need more than log(n) levels of recursion but also means most of the time you’ll use log(n) levels of recursion. You could also implement a hybrid where you split around the deleted point unless you’re below a certain depth in the tree, but I’m not sure it’s worth the added implementation cost
- It is useful for split nodes to keep track of the smallest interval containing them as you can then shortcut if the element to be deleted lies outside that interval
- You can also implement deleteRange efficiently (I think in O(log(n), though there’s a detail I haven’t checked there). This is left as an exercise for the interested reader
- When I implemented IntAllocator I added a cache for this: A bounded size stack which you pushed elements to delete onto. When the stack grew full you sorted it into ranges and deleted the ranges all at once. There was a specific reason for this that doesn’t apply (sometimes you wanted to say “pop an element, any element” and that came from the stack if it was non-empty), but this might still be a useful thing to do for imperative versions of this code (it’s a bad idea for pure ones)
Pingback: David R. MacIver
I’d be interested to hear if you come up with sensible approaches to union, intersection, and set subtraction for this datastructure as well. I’ve attempted to come up with a solution before, but it was difficult and messy and the resulting code ended up with weird time and space behaviour.
Hm. That’s a good question. I’ve not needed to do that, so haven’t worried about it yet.
Set removal and intersection don’t look like they’d be *too* awkward, but I’m not sure you can do them with sensible bounds because you can’t easily ensure both argument shrink when recursing. I’m not even sure what union looks like at the moment.
I’ll have a think about it.
Pingback: Compressed ranges for intmap | David R. MacIver