Ibuildings, a PHP development company with offices in the Netherlands, UK and Italy, have been running a series of challenges, with prizes such as iPads and tickets to the Dutch PHP Conference (DPC, of which they host). The latest completed challenge, the Elephpant challenge, was in the form of a code contest to write the fastest, shortest and least complex algorithm in PHP to solve a Travelling Salesman problem.
I’m happy to say that I was one of the winners for this task, getting 30 points (10 in each scoring method), which was the maximum points possible, for the medium category (people with 2 to 4 years experience), and winning a ticket to the DPC.
Here I will describe the two solutions I researched and completed. I only submitted the one which I thought would give me the biggest chance of winning, but each had their own specialities.

Picture under Creative Commons Attribution Non-commercial license by drewm on Flicker.
First, to understand the problem you must research TSP. There are two types of algorithms, exact and approximate, and a TSP can be asymetric (costs different depending on the direction of travel) or symetric. In the challenge I needed a exact symetric algorithm, of which Wikipedia briefly mentions several different algorithms.
I picked the two most promising algorithms, one based on dynamic programming, and one branch and bound algorithm. The former being shorter and least complex, and the latter being faster and more scalable.
Dynamic Programming
I wont go into too much detail of this one. Although it was the one I picked to submit for the competition it is in fact extremely simple, based on a single premise that a optimal tour can be calculated by finding the minimum tour length of each unvisited node added to the optimal subtour of the remaining nodes. It is recursive in nature, dividing up into smaller subtours until there is a single cost for traveling from one node to another.
Where i is the last landmark and S is the set of landmarks excluding the first and last.

This I managed to write in a total of 33 (PHP_CodeCoverage) / 38 (result) lines of executable code, a C.R.A.P index of 6 (as Ivo mentions in the comments, my result would have been in fact 4 for his calculations), and according to the Ibuildings test a time of 333 seconds. I however recorded the speed at 54 seconds on my laptop (not particularly high spec) with xdebug disabled.
Here is the solution, of which I have fully commented. Please consider this script with having a New BSD license if you wish to use a derivative of it.
Branch and bound
Wikipedia mentions branch and bound algorithms in relation to TSP as being the best algorithm short of cutting-plane algorithms. I ignored the latter due to lack of information found in my research, and the likelyhood of it being much too complex for the challenge.
Branch and bound algorithms, it states as being an improvement in speed over other algorithms such as the dynamic programming one above. It can handle a lot more landmarks on top of this, however the challenge only needed to visit a total of 12 landmarks (including the start and end nodes).
When first testing the dynamic programming algorithm, I found it to be much too slow, and wanted to try a branch and bound algorithm to compete with it. This proved challenging and much harder to research and understand. I finally came across an extremely helpful web page describing a mathematical description of an algorithm.
When writing a branch and bound algorithm, it is important to select a good function to calculate the lower bound (something that is lower cost or equal to the best solution). If a tour is found to be the best calculated so far, all branches with higher lower bounds can be discarded.
The lower bound I used was simply based on the premise that the best tour is the sum of the edges of the tour, and therefore the lower bound must be equal to or less than this.
The rest of the algorithm is to choose the branches, add constraints that better select the lower bound, and check through them, picking each edge one at a time, testing whether to include or exclude it from the list of edges that make up the lower bound. Eventually enough edges will be included to get a tour. This I leave for you to read in the article I mentioned.
Now when I had written this, there were several slow areas, such as calculating the lower bound, and checking to eliminate subtours (tours that do not go through every point, but form clusters of smaller tours). As the article didn’t go into detail on how to do this, I had to come up with an inovative fast check. I researched subtour elimination, and came across this article, which although did not give the exact answer in a way I could understand, gave me an idea.
My previous subtour checks were scanning through the list of included nodes, tracing each path to see if it went back to the start or was too short. This was slow (although twice the speed of my dynamic programming algorithm). My idea was to instead track each cluster of joined edges as they get added, and check the constraints on the ones affected by the addition, making sure they didn’t hit the required edge count for each node without completing a full tour.
This proved to be much faster, totalling a time of 20 seconds to calculate the best tour, however the end result was a large amount of code, with many functions, loops and if checks. I’d have been lucky to get 10 points with an attempt at the fastest algorithm. Anyone who scored less than me in each section (speed, lines of code, and complexity) could have scored better. So it was back to optimising the dynamic programming algorithm above for me.
The solution can be seen here, which also I will license under New BSD. Sorry that there are no comments in the code at the moment.
Speed, lines of executable code and complexity
These were the categories of scoring, which award 10 points for each.
Speed varies a lot with algorithms, and the choice could lead to 0 points if choosing a brute force search, or 10 if choosing the best algorithm. I’d recommend anyone who tries to then optimise the code should make use of xdebug’s profiler to see which bits of code are sub-optimal. Trying to optimise the code without this is just a guessing game, and you may end up wasting time changing code that cannot be optimised further.
Lines of executable code I don’t consider the most important aspect of programming, but it does add to complexity in maintaining code, and shorter code can often be easier to understand. In a challenge, however, it can easily be abused at the expense of readability.
Complexity was calculated using a similar method to Change Risk Analysis and Predictions (CRAP) index. This was new to me as I had not come across it before. It is calculated based on the code coverage of a piece of code and the number of code branches in the code (functions, ifs, loops, logical ands and ors etc). This can also be abused to a point. As you can see in my winning code, when printing out a list of the landmarks for the best tour, I chose rather than to write a foreach to loop through and print out the names of the landmarks, I wrote a single line function to splice the landmarks names onto an ordered list of identifiers.
Conclusion
In the end I submitted my first solution, which ended up winning as I had hoped. The effort gone into the second solution I don’t consider wasted even if it was not quicker than the results. It has the capability to run on a much larger set of landmarks with a good time, although not good enough to tour Sweden.
As for the challenge results, you can see that this was a difficult challenge. None of the entries with the 0-2 years work experience fit the requirements, and even in my category many people failed the test.
One thing I must consider is that in a few months I’ll no longer be in the medium category, and challenges such as this will be much more competitive. I have already completed Ibuildings next challenge, and it may be the last before I can only enter into the senior category.
I congratulate the winner for the senior category, Michiel Brandenburg, winning the iPad, who scored 26. 10 for both lines of code and complexity, and 6 for being the 3rd fastest in his category (which was faster than mine as well).
Also, thanks to Ibuildings for running this challenge. I found it both interesting and challenging. Ibuildings have already started a new challenge you can enter into, based on Test Driven Development.
For any of you reading who are going to the Dutch PHP Conference, I hope to see you there
Now all I need is to somehow win my own stuffed elephpant to complete my own journey from London to the DPC.
Congrats, and thanks for the extensive write-up. It’s very well possible that performance is better on your laptop; since I needed to hog a server for days I had to make do with old hardware lying around. The benefit of a slower machine is that it makes the differences more visible.
I think the ‘nearest neighbour algorithm’ was used the most by the faster implementations, some of them had created optimizations of those. Also, since point 1 and 12 are fixed, you don’t need to take these into account in the whole algorithm.
Since you asked on twitter I did the calculation of the complexity for your entry. It’s not the exact documented CRAP index from PHPUnit since the implementations of those had some caveats, but I’ve used very similar techniques. You would’ve had a complexity of 4 with my calculations, which is indeed better than the complexities the others had.
Hi Ivo, thanks for the response.
I see now, I avoided the nearest neighbour algorithm due to it being an approximate algorithm. Where it may have passed the test for the supplied csv, I had considered that the secret test may fail to give the optimal solution.
Thank you for doing the calculation
Congrats, with winning the DPC10 tickets. Gonna take a look at you solution
I plugged in a few changes to my dynamic programming script to make it a form of branch and bound, based on Erik Snoeijs’s algorithm:
http://www.underdevelopment.eu/2010/06/30/travelingelephpant/
It was much faster and got the correct results for both csvs.
The results would have been:
Speed: 333 * 3.345 / 54 = 20.6 seconds – very subjective calculation I know
LOC: 36
Complexity: 5
My new script can be seen here:
http://andytson.com/files/2010/07/contest-bab2.php
Congrats on your win, I went straight on the B&B approach with custom optimizations, that was real fun, always loved maths, I’ll have a look at your code