A momentous occasion has just, err, occurred. I got a green build on my weighted feedback arc set solver. This has never happened before.
This might seem like a confusing concept. How can it never have passed its own test suite?
Well, the test suite has always been more aspirational than reality based. The way it effectively works is this: It has a lot of test data and it tracks the best solution found for each file, along with its score. The quality test is that the score of the solution found is within 5% of the best known solution (side-note: part of why the tests are now green is that I did raise this threshold a bit, it was originally at 1%. Most tests still hit that better threshold). A side-effect of this is that improvements to the solver will often make the tests harder to pass because they will find better solutions (and, surprisingly, this is still happening – I’ve spent enough CPU cycles and algorithmic tweaks that I keep thinking I’ve found all the optimal solutions only to be surprised by an improvement. The improvements are rarely large, but they still keep coming)
The result was that I would often make improvements to the FAS solver and then back them out because they were too slow, but leaving the test data behind, and the test would be failing as the FAS solver dreamed of the long lost days when it was cleverer.
Additionally, all the tests are capped at the fairly arbitrary maximum acceptable runtime of one minute.
At this point I should mention that finding even an approximation (I forget to within what bound – I think it’s something like half) of the optimal score for weighted FAS is an NP-hard problem, and that even calculating the score of a given instance is \(O(n^2)\). So this isn’t an easy problem to solve.
I’d largely stalled on development on it, but today I decided to go back to it and after a day of tinkering I’ve managed to make a surprising amount of progress. At the start of the day it was painfully slow on several test cases (several minutes), non-deterministic and badly failing several of its quality goals. Now it is deterministic, runs each test case in under a minute and gets within 2.5% of the best known result on all of them. For added pleasantry, the latest version has even provided the best known result for several of them, and the code is shorter and clearer than it was at the start of the day.
How does it work?
It’s pretty simple. In the end it’s ended up looking up hilariously close to the algorithm I threw together in an hour from a random paper for The Right Tool (now Hammer Principle) a few years ago.
The process is basically a mix of the following techniques:
- Initial scoring. This is basically a Markov chain scoring system as before. The details are slightly different, but not really in an interesting way – the idea is still to transition to better nodes and use an approximation of the stable distribution as a score
- What I refer to in the code as “forcing connectivity”. This is today’s clever idea, and it works really well, but it probably shouldn’t and the fact that it appears to might be a test data artifact. If you’re feeling generous it could be described as a heuristic approach, but a fairer description would perhaps be an unprincipled hack. It’s based on the observation that in sparse data sets you often end up with long chunks of ties, and these almost completely defeat local searches and make it hard to improve things. The force runs through the array one element at a time and if it’s unconnected to the next element pulls forward the next element to which it is connected (it doesn’t concern itself with ordering beyond that – the element might be strictly greater or less than it, it doesn’t care)
- Local sort, as before. Basically an insertion sort on the hopelessly inconsistent ordering the tournament induces. This had largely disappeared from the code – it was implicitly happening anyway, but there was no explicit local sort stage – and bringing it back in was a surprisingly big win
- Window optimisation. Basically, you can brute force the problem for small arrays. Window optimisation is brute forcing every subrange of some length. This is applied at various lengths in order to get mixing
- Single move optimisation. This is a really expensive process, but wins big enough that it’s worth it. Basically we try every possible move of a single element from one point to another and apply any that improve the score
It’s surprising how well this combination works – I’ve tried a whole bunch of more clever things, between various random search approaches, simulated annealing, block sorting (basically: build a new tournament with chunks of the existing one and recurse) and a variety of local search techniques, but they generally seemed to be significantly slower and/or less effective.
Update: The test suite is now failing again. This is partly because I’ve tightened the requirements (down to within 3% rather than 5% of best known result), partly because after some tinkering with code I’ve managed to improve the best known quality of several test cases. This is actually quite pleasant, as it shows that there are some nicely tunable parameters in here.