# Optimal solutions to FAST

This is another post about FAST which I mentioned in the last post and provided some heuristic solutions to. In this I’d like to talk about finding exact solutions to it.

Note: There’s a lot of literature about this and I am only starting to get to grips with it (I have a book on order which will help with that). What I’m presenting does not I think add anything very new to the state of the art except that it was a lot simpler for me to understand than anything equivalently good I could find.

My not very well optimised python version of this can handle $$20 \times 20$$ matrices pretty quickly (a couple of seconds) and $$25 \times 25$$ in under a minute. This is actually a pretty good achievement. I think a reasonably well written C or C++ version that wasn’t committing atrocities in how it communicated with the LP solver could easily be an order of magnitude and possibly two faster, which might bring things up to about $$30 \times 30$$ within feasible reach.

Also, if you want to see code instead of maths, you can just skip straight to the repo where I’ve implemented this solution (plus a few details not mentioned here).

There are a couple ideas in this solution:

Firstly, note that we only need to find the weight of the optimal solution. If we can do this then we can reconstruct an optimal solution with only an additional $$O(n^2)$$ additional calls (and because of the way this solution will use dynamic programming this will actually be a lot faster than it sounds): We can find the head of the solution by looking at the weights of optimal solutions for each of the subsets of size $$n – 1$$ and then recursively find an ordering for the best subset. So specifically we want the function $$\mathrm{FasWeight}(A) = \min\limits_{\sigma \in S_n} w(A, \sigma)$$.

From henceforth, let $$A$$ be fixed. We can also assume without loss of generality that $$A$$ is non-negative and integer valued, so we will because it makes our life easier.

The starting point for our solution will be the following recursive brute force implementation of the function $$\mathrm{FasWeightSubset}(U)$$ where $$U \subseteq N = \{1, \ldots, n\}$$.

If $$|U| \leq 1$$ then $$\mathrm{FasWeightSubset}(U) = 0$$

Else $$\mathrm{FasWeightSubset}(U) = \min\limits_{x \in U} \left[ \mathrm{FasWeightSubset}(U – \{x\}) +\mathrm{HeadScore}(x, U) \right]$$

Where $$\mathrm{HeadScore}(x, U) = \sum\limits_{y \in U, y \neq x} A_{yx}$$.

What we’re doing here is conditioning on each element of $$x$$ whether it’s at the front of the list. If it is then the optimal score is the optimal score for ordering $$U$$ without $$x$$ plus the cost of all the backward arcs to $$x$$. We take the minimum over all $$x$$ and get the desired result.

If one uses dynamic programming to memoize the result for each subset (which you really have to do in order for it to be remotely feasible to run this) then this is an $$O(n^2 2^n)$$ algorithm. It also uses rather a lot of memory ($$\sum\limits_{k \leq n} k {n \choose k}$$, which is probably some nice formula but basically is “lots”).

The idea for improving this is that if we have some fast way of calculating a lower bound for how good a set could possibly be then we may be able to skip some subsets without actually doing the full calculation. e.g. a trivial lower bound is $$0$$. If a subset would need a negative score to improve the current best then we can disregard it. A slightly less trivial lower bound is $$\sum\limits_{x, y \in U, x < y} \min(A_{xy}, A_{yx})$$, because for each pair of indices you  must pay the cost of an arc in at least one direction. It turns out that neither of these scores are good enough, but we’ll skip the question of what is. For now assume we have a function $$\mathrm{LowerBound}(U)$$ which provides some number such that every ordering of $$U$$ creates weight $$\geq \mathrm{LowerBound}(U)$$.

We can now use this to define the  function $$\mathrm{CappedFasWeightSubset}(U, t)$$. This will be equal to $$\min(\mathrm{FasWeightSubset}(U), t)$$ but we can offer a more efficient algorithm for it:

For $$U$$ with $$\mathrm{LowerBound}(U) \geq t$$, let $$\mathrm{CappedFasWeightSubset}(U, t) = t$$.

Otherwise, $$\mathrm{CappedFasWeightSubset}(U, t) = \min\limits_{x \in U} \left[ \mathrm{CappedFasWeightSubset}(U – \{x\}, t – \mathrm{HeadScore}(x, U)) + \mathrm{HeadScore}(x, U) \right]$$

At each recursive call, placing $$x$$ at the head incurs a certain cost and thus lowers the threshold we pass to the tail.

In fact we can do better than this: Given any known good solution lower to that threshold we can lower the threshold to that value, as we’re only interested in whether we can improve on that. This also means that we can use the running minimum of the sub-problems to reduce the threshold. I won’t write that in the current notation because it’s a pain. It’ll be clearer in the code though.

This is basically it. There are a few minor optimisations, and how to memoize the results may not be obvious (you don’t want the threshold in the cache key or you’ll get zero hits), but there’s not much more too it than that.

However: This works about as badly as the earlier version if your lower bound isn’t good enough – even with an only reasonably good one I found I was only cutting out maybe 70% of the subsets, which was nice but not actually worth the cost of calculating the lower bound.

So where do we get better lower bounds?

Well it turns out that there’s another formulation of the problem. You can treat this as an instance of integer linear programming. It requires $$O(n^2)$$ variables and $$O(n^3)$$ constraints, so the integer linear programming problem rapidly becomes too difficult to calculate, but the convex relaxation of it (i.e. the same linear program without the integer constraints) provides a good lower bound.

The way this works is that we introduce 0/1 $$D_{ij}$$ variables for each pair $$i \neq j$$. $$D_{ij}$$ will define a total ordering $$\prec$$ of $$N$$, with $$D_{ij} = 1$$ if $$i \prec j$$. This gives us the following ILP:

Minimize: $$\sum\limits_{i \neq j} A_{ji} D_{ij}$$ (you pay the cost of the backwards arcs)

Subject to: $$\forall i, j, D_{ij} + D_{ji} + 1$$ (the order is reflexive)

And: $$\forall i, j, k, D_{ij} + D_{jk} + D_{ki} \leq 2$$ (the order is transitive)

Solving this linear program without the integer constraints then gives us a fairly good lower bound.

What’s more, it turns out that often the integer constraints happen to be satisfied anyway. This allows us a further shortcut: If the lower bound we found turns out to have all integer variables, we can just immediately return that as the solution.

This entry was posted in Uncategorized on by .

## 4 thoughts on “Optimal solutions to FAST”

1. pozorvlak

I think you’ve reinvented the branch-and-bound algorithm, which is a common approach to solving ILPs. In particular, GLPK uses it. How large a FAST instance can you solve by asking glpsolve to solve the ILP you describe in the final section?

1. david Post author

No, I’m aware of branch and bound. This is indeed a form of it, but it’s specialised to the particular problem.

…except it turns out you’re right. lp_solve actually handles this with integer constraints much better than I thought it did. I’m not sure if it’s the switch from glpk that did it (lp_solve is supposedly very good at 0/1 integer programming) or whether it was a problem with my earlier interface to the lp solver (which was super broken because it turns out one of the most popular python libraries for this is shit).

I think this is still better, but maybe it’s not as much better as I thought. Hm.

1. david Post author

So some investigation suggests that this solution isn’t terrible compared to lp_solve’s integer programming, but it’s also not good (up to an order of magnitude slower). The interface I’m using to lp_solve is really bad performance-wise though, so TBH for cases where there are lots of branches I would expect that to dominate and it’s hard to draw conclusions as to which is the better approach except for in cases where one is outright not viable and the other one works.

I have plans to replace the interface to the LP with some C code rather than the current shelling out to lp_solve atrocity. We’ll see how well that works when I have, but in the meantime there’s a decent chance you’re right that actually the more normal branch and bound variant works fine for this.