May 17, 2010

The Travelling Elephpant challenge, my two solutions

Ibuildings, a PHP development company with offices in the Netherlands, UK and Italy, have been running a series of challenges, with prizes such as iPads and tickets to the Dutch PHP Conference (DPC, of which they host). The latest completed challenge, the Elephpant challenge, was in the form of a code contest to write the fastest, shortest and least complex algorithm in PHP to solve a Travelling Salesman problem.

I’m happy to say that I was one of the winners for this task, getting 30 points (10 in each scoring method), which was the maximum points possible, for the medium category (people with 2 to 4 years experience), and winning a ticket to the DPC.

Here I will describe the two solutions I researched and completed. I only submitted the one which I thought would give me the biggest chance of winning, but each had their own specialities.


Picture under Creative Commons Attribution Non-commercial license by drewm on Flicker.

First, to understand the problem you must research TSP. There are two types of algorithms, exact and approximate, and a TSP can be asymetric (costs different depending on the direction of travel) or symetric. In the challenge I needed a exact symetric algorithm, of which Wikipedia briefly mentions several different algorithms.

I picked the two most promising algorithms, one based on dynamic programming, and one branch and bound algorithm. The former being shorter and least complex, and the latter being faster and more scalable.

Dynamic Programming

I wont go into too much detail of this one. Although it was the one I picked to submit for the competition it is in fact extremely simple, based on a single premise that a optimal tour can be calculated by finding the minimum tour length of each unvisited node added to the optimal subtour of the remaining nodes. It is recursive in nature, dividing up into smaller subtours until there is a single cost for traveling from one node to another.

Where i is the last landmark and S is the set of landmarks excluding the first and last.

This I managed to write in a total of 33 (PHP_CodeCoverage) / 38 (result) lines of executable code, a C.R.A.P index of 6 (as Ivo mentions in the comments, my result would have been in fact 4 for his calculations), and according to the Ibuildings test a time of 333 seconds. I however recorded the speed at 54 seconds on my laptop (not particularly high spec) with xdebug disabled.

Here is the solution, of which I have fully commented. Please consider this script with having a New BSD license if you wish to use a derivative of it.

Branch and bound

Wikipedia mentions branch and bound algorithms in relation to TSP as being the best algorithm short of cutting-plane algorithms. I ignored the latter due to lack of information found in my research, and the likelyhood of it being much too complex for the challenge.

Branch and bound algorithms, it states as being an improvement in speed over other algorithms such as the dynamic programming one above. It can handle a lot more landmarks on top of this, however the challenge only needed to visit a total of 12 landmarks (including the start and end nodes).

When first testing the dynamic programming algorithm, I found it to be much too slow, and wanted to try a branch and bound algorithm to compete with it. This proved challenging and much harder to research and understand. I finally came across an extremely helpful web page describing a mathematical description of an algorithm.

When writing a branch and bound algorithm, it is important to select a good function to calculate the lower bound (something that is lower cost or equal to the best solution). If a tour is found to be the best calculated so far, all branches with higher lower bounds can be discarded.

The lower bound I used was simply based on the premise that the best tour is the sum of the edges of the tour, and therefore the lower bound must be equal to or less than this.

The rest of the algorithm is to choose the branches, add constraints that better select the lower bound, and check through them, picking each edge one at a time, testing whether to include or exclude it from the list of edges that make up the lower bound. Eventually enough edges will be included to get a tour. This I leave for you to read in the article I mentioned.

Now when I had written this, there were several slow areas, such as calculating the lower bound, and checking to eliminate subtours (tours that do not go through every point, but form clusters of smaller tours). As the article didn’t go into detail on how to do this, I had to come up with an inovative fast check. I researched subtour elimination, and came across this article, which although did not give the exact answer in a way I could understand, gave me an idea.

My previous subtour checks were scanning through the list of included nodes, tracing each path to see if it went back to the start or was too short. This was slow (although twice the speed of my dynamic programming algorithm). My idea was to instead track each cluster of joined edges as they get added, and check the constraints on the ones affected by the addition, making sure they didn’t hit the required edge count for each node without completing a full tour.

This proved to be much faster, totalling a time of 20 seconds to calculate the best tour, however the end result was a large amount of code, with many functions, loops and if checks. I’d have been lucky to get 10 points with an attempt at the fastest algorithm. Anyone who scored less than me in each section (speed, lines of code, and complexity) could have scored better. So it was back to optimising the dynamic programming algorithm above for me.

The solution can be seen here, which also I will license under New BSD. Sorry that there are no comments in the code at the moment.

Speed, lines of executable code and complexity

These were the categories of scoring, which award 10 points for each.

Speed varies a lot with algorithms, and the choice could lead to 0 points if choosing a brute force search, or 10 if choosing the best algorithm. I’d recommend anyone who tries to then optimise the code should make use of xdebug’s profiler to see which bits of code are sub-optimal. Trying to optimise the code without this is just a guessing game, and you may end up wasting time changing code that cannot be optimised further.

Lines of executable code I don’t consider the most important aspect of programming, but it does add to complexity in maintaining code, and shorter code can often be easier to understand. In a challenge, however, it can easily be abused at the expense of readability.

Complexity was calculated using a similar method to Change Risk Analysis and Predictions (CRAP) index. This was new to me as I had not come across it before. It is calculated based on the code coverage of a piece of code and the number of code branches in the code (functions, ifs, loops, logical ands and ors etc). This can also be abused to a point. As you can see in my winning code, when printing out a list of the landmarks for the best tour, I chose rather than to write a foreach to loop through and print out the names of the landmarks, I wrote a single line function to splice the landmarks names onto an ordered list of identifiers.

Conclusion

In the end I submitted my first solution, which ended up winning as I had hoped. The effort gone into the second solution I don’t consider wasted even if it was not quicker than the results. It has the capability to run on a much larger set of landmarks with a good time, although not good enough to tour Sweden.

As for the challenge results, you can see that this was a difficult challenge. None of the entries with the 0-2 years work experience fit the requirements, and even in my category many people failed the test.

One thing I must consider is that in a few months I’ll no longer be in the medium category, and challenges such as this will be much more competitive. I have already completed Ibuildings next challenge, and it may be the last before I can only enter into the senior category.

I congratulate the winner for the senior category, Michiel Brandenburg, winning the iPad, who scored 26. 10 for both lines of code and complexity, and 6 for being the 3rd fastest in his category (which was faster than mine as well).

Also, thanks to Ibuildings for running this challenge. I found it both interesting and challenging. Ibuildings have already started a new challenge you can enter into, based on Test Driven Development.

For any of you reading who are going to the Dutch PHP Conference, I hope to see you there :)

Now all I need is to somehow win my own stuffed elephpant to complete my own journey from London to the DPC.

Leave a Reply