An Analysis of Hardness for euclidean Traveling Salesman instances with exact and heuristic algorithms

(1)

Bachelor Beta-Gamma

Major Artificial Intelligence

An Analysis of Hardness for

euclidean Traveling Salesman

instances with exact and

heuristic algorithms

Heijstek, Martijn

10800441

University of Amsterdam

Institute for Interdisciplinary Studies

Faculty of Science

Science park 904

1098 XH Amsterdam

May 22, 2018

Supervisor: dhr. drs. D. van den Berg, Faculty of Science IVI

Credits: 18 ec

tificial

Intelligence

—

University

of

Amsterd

am

(2)

Abstract

This thesis is a partial replication of the methodology of Fischer er al. (2005) [12]. Fis-cher et al. introduced two operators to give a controlled amount of perturbation to grid instances of the Traveling salesman Problem. The effect of the spatial distribution of cities is measured on the performance of Branch and Bound, a Genetic Algorithm, and Simulated Annealing. The results indicate that the spatial distribution of cities is indeed an important feature of TSP instances that might contribute to the hardness of the Traveling Salesman Problem.

Key words: (Euclidean) Traveling Salesman Problem, NP hardness, hardness of instances, TSP order parameter, Branch and Bound, Genetic Algorithm, Simulated Annealing.

Acknowledgements

Foremost I would like to thank my mentor dhr. drs. D. van den Berg, for his guidance and help during this research. Secondly my special regards to Joeri, Markus and Richard for providing additional feedback.

(3)

CHAPTER 1

Introduction

The Travelling Salesman Problem (TSP) is a well known constrained optimisation problem, were a set of cities needs to be connected with a route of minimal length. A route between cities is referred to as a ’tour’ and a solution to the Traveling Salesman Problem is a tour that contains each city exactly once and is cyclic, such that the tour is closed. In the original description of the Traveling Salesman Problem, euclidean distances are used on a two dimensional plane [26], how-ever the general problem definition of the TSP can be applied to any (a)symmetric matrix [13]. An important aspect of the TSP is the complexity of finding an optimal tour. In the worst case all possible tours must be examined to guarantee the shortest path is found. Therefore the TSP is classified as a NP-hard problem (assuming P 6= NP ) .This implies the TSP includes instances with an infeasible [10] computation time when solved with polynomial algorithms. Although the euclidean Traveling Salesman Problem can be approximated by a polynomial time scheme [4], there remain TSP instances that are intrinsically hard to solve with polynomial algorithms.

However as Cheesman et al. [8] suggest, many NP problems have one or more ’order pa-rameters’ that give an indication of the complexity of an instance. According to Cheeseman et al. there might be a specific region of the state space where many, if not all, hard problems occur. This region is centred around the critical value of the order parameter, which makes it possible to predict how hard an instance will be to solve [8]. Cheeseman et al. conclude that all NP-complete problems have at least one such parameter, but pose as open question if this property will extend to the class of NP-hard problems.

One proposed attribute of the TSP, that might contribute to the hardness of instances is the arrangement of cities in the Traveling salesman Problem. Fischer et al. [12] analyse structural features of the euclidean TSP by measuring the influence of the arrangement of cities on the run-time of algorithms. Fischer et al. state ”The most important property of (metric) TSP instances besides their size is the distribution of cities in the given space.”, which suggests that this property could be an order parameter in the terminology of Cheeseman et al. To measure the impact of city distribution on the performance of algorithms Fischer et al. used perfect grid instances with an optimal chessboard-like pattern (figure 3.1), which can be solved in polynomial time [12]. When randomness is introduced to the instances, the variability in run-times increases and Fischer at al. tried to map this variability against the structure of the instance. To distort the structure of TSP instances, Fischer et al. defined two operators; the shake operator and the reduction operator. These two methods apply a controlled amount of perturbation to the perfect grid, after which the instances were solved with both a heuristic algorithm and an exact algo-rithm. Fischer et al. measured the CPU time of Applegate’s exact Branch and Cut algorithm [3] and compared it to the heuristic Lin-Kernighan algorithm from Helsgaun [14] .

Although the findings of Fischer et al. suggest an effect of the structural perturbation on the run-time of the algorithms, it is worth investigating how these results hold under different commonly used algorithms. In this research the methodology of Fischer et al. is partially repli-cated to evaluate the effect of the distribution of cities on the performance of heuristic and exact algorithms when applied to the TSP. Although Fischer et al used both grid and fractal varia-tions, the scope of this research will be limited to the grid instances with euclidean distances.

(4)

After introducing perturbation the instances are solved with the exact Branch and Bound, the heuristic Genetic Algorithm, and the heuristic Simulated Annealing.

The effect of the two operators defined by Fischer et al., is measured on the performance of the three algorithms and using an empirical evaluation, features that determine the hardness of an instance will be tried to distinguish.

The work of Fischer et al. and Cheeseman et al. are part of a body of research around the analysis of complexity for problems in NP. Related research in this field focuses on algorithm run-time prediction by either machine learning or a theoretical proof [19], [15]. The structural component used by Fischer et al. is one of many features of the Traveling Salesman Problem that are used to predict the performance of an algorithm. Besides spatial attributes there are many other Node distribution features, such as clustering, Nearest Neighbour information and Cost matrix attributes, that are used to analyse the TSP [15]. To evaluate these features an Empirical Performance Models can be used to compare the impact on the algorithms [15].

CHAPTER 2

Theory

2.1 NP Hardness and the (euclidean) TSP

NP hardness is currently one of the greatest challenges in software engineering. The question of P versus NP is regarded as one of the millennium prize problems, posed by the Clay Mathematics Institute1 _{and has yet to be solved. Decision problems that have a non deterministic polynomial}

time complexity, belong to the class of NP problems and pose great difficulty for algorithmic per-formance. Especially the class of NP-hard problems which are ’intrinsically harder than those that can be solved by a non-deterministic Turing machine in polynomial time’ [23] tend to cause much variability in performance of algorithms. NP-hard problems may require an exhaustive search of the state space before an optimal solution for a problem instance can be guaranteed. According to Reinelt [24] the computational complexity of the TSP can be classified as both NP-easy as NP-hard, which puts the Traveling Salesman Problem in the class of NP-equivalent decision problems.

The euclidean TSP is a subset of the metric Traveling Salesman Problem where all distances are euclidean and symmetrical [13]. In the unconstrained version of the euclidean there exists a potential edge between all pairs of cities on the grid. Therefore an instance of size n has (n-1) possible edges and the worst case scenario requires (n-1)! operations. However this exhaustive state space search is an absolute worst case and for many average cases of the euclidean TSP the complexity is less dramatic [8] There are even cases of the euclidean Traveling Salesman Problem, for instance the constant TSP and Gilmore-Gomory TSP, that are solvable in polyno-mial time [13]. Although these instances contribute to a relatively small subclass, all euclidean TSP instances can be approximated in polynomial time [4]. Arora [4] found an approximation of an (1 +1_c) optimal tour (where c is is larger than one) with time complexity O(n log nO(c)_).

However this approximation is limited to a 1_c bound above the optimum and for the NP-hard version of the TSP this scheme is not applicable.

1_{CMI awards one million dollars for the solution of the P versus NP problem.} _{http://www.claymath.org/}

(5)

2.2 Heuristic and exact algorithms for the TSP

Many exact algorithms exist for solving the Traveling Salesman Problem, although some seem more effective than others. Especially on large instances, the state space of the TSP is enormous, since the state space is exponential in the instance size [5] and even the most state-of-the-art algorithms can exhibit extreme run-time variations across instances. In addition there can be a dramatic difference in run-time of two algorithms on the same instance [15], therefore the choice of algorithm is essential to the solvability of the TSP. However there is little theoretical knowledge of what causes run-time variation on TSP instances and most research of the past decade is based on empirical research [15]. An important algorithmic distinction is the difference between heuristic and exact methods [2]. Both types of algorithms are fundamentally different in their approach to solve the Traveling Salesman Problem.

The exact algorithms are deterministic and are guaranteed to find the optimal solution, however finding an optimal solution can require a time consumption that increases faster than polynomial with the instance size [2]. A heuristic algorithm produces a valid tour at each point during the optimisation process and can be used as an anytime algorithm [6]. In general heuristic algorithms take less computation time than the exact algorithms at the expense of guaranteed optimality [2]. As mentioned in the introduction Fischer et al. used the heuristic Lin-Kernighan Helsgaun algorithm and the exact Branch and Cut algorithms from Applegate. At the time of implementation (2005) these algorithms were considered high performance [12], although in the past decade there has been an uprise in swarm intelligence algorithms and meta-heuristics [25]. These heuristic algorithms are effective on large TSP instances, although their performance on small instances is comparable to other Evolutionary Algorithms and Branch and Bound [16].

CHAPTER 3

Methods

This section will elaborate the implemented methodology. In order to conduct the experiments, first the used operators and algorithms are described, followed by the experiment set up and evaluation metrics. The project is implemented in Intellij IDEA 2017.1. (java); all experiments are run on a PC with windows 10 and an Intel(R) Core(TM) i5-6400CPU @2.70 GHz.

3.1 Representation of grid instances and tours of the TSP

Fischer et al. conduct experiments with perfectly arranged instances, such as in figure 3.1. The cities are ordered in rows and columns and the distance between two neighbouring cities in the horizontal or vertical axis, is a constant. A perfect grid of m rows, n columns, and a fixed neighbour distance d has a deterministic length of the optimal tour. If either m or n is even, the tour length is given by m ∗ n ∗ d. If both m and n are uneven the length of the optimal tour is (m ∗ n − 1) ∗ d +√2d2_{. Due to this property only instances with at least one even side are used}

for the experiments, in addition the instances should be as square as possible. This restriction is added to keep the ratio between length and width equal between various instance sizes, although this restriction is not mentioned in the methodology of Fischer et al. By refining the dimensions of the perfect grid, the amount of suitable instance sizes is reduced as is shown in figure 6.1.

(6)

Figure 3.1: Left: A perfect grid of size 12. Right: An example of an optimal tour for the TSP

3.2 Perturbation operators

The predictive parameter defined by Fischer et al. is based on the structure of a grid instance. The reduction and shake operator are intended to apply a controlled perturbation in the (spatial) distribution of cities. The reduction operator (algorithm 1) uses the original grid and a given amount of preservation, to produce a perturbed instance. The original grid is distorted where the preservation indicates the percentage of cities preserved. An example is shown in figure 3.2.

Algorithm 1 Reduction operator

1: procedure REDUCE(perfectGrid, preservation)

2: remove ← size(PerfectGrid) - (size(perfectGrid) * preservation)/100

3: for i = 0 to remove do

4: removeRandomCity(perfectGrid)

The shake operator (algorithm 2) applies a controlled amount of offset to each individual city. First the standard deviation of a Gaussian function with zero mean is determined by the original neighbour distance and the gradation of the shake operator. Each sample from the distribution corresponds to the offset for a city, although the methodology of Fischer et al. does not specify how this offset is applied to the cities. There is assumed the offset is used as radius for a circle with as centre the city and the new position of the city will be a random (integer) coordinate on the circumference of the circle. To accomplish this effect the offset is given in a random angle, such that the total offset is decomposed in x- and y- components and the city is moved to the nearest integer position of the new coordinate. The resulting instance has the same amount of cities as the original grid, however the distances between the cities are altered, an example is shown in figure 3.2.

Algorithm 2 Shake operator

1: procedure SHAKE(perfectGrid, perfectNeighbourDistance ,shakeLevel)

2: standardDeviation ← perf ectN eighbourDistance ∗ shakeLevel ∗ 0.01 3: for all city in perf ectGrid do

4: // sample form a gaussian distribution with mean 0

5: of f set ← N (0, standardDeviation) 6: randomAngle ← randomInteger(360)

7: newXcoordinate ← xCoord(city) + sin(randomAngle) ∗ of f set

8: newY coordinate ← yCoord(city) + cos(randomAngle) ∗ of f set

9: //Move the city to the nearest integer coordinate that corresponds to the new x, y

(7)

(a) Reduction operator with 95% preservation

(b) Reduction operator with 75% preservation

(c) Reduction operator with 25% preservation

(d) Shake operator with 5% shake (e) Shake operator with 25% shake

(f) Shake operator with 150% shake

Figure 3.2: Example instances for a perfect grid of size 20, perturbed with the shake/reduce operator

3.3 Algorithms

After applying perturbation with the shake/reduce operator, the instances are solved with three algorithms. The implementation of the Branch and Bound, Genetic Algorithm and Simulated Annealing is elaborated in this section.

3.3.1 Branch and Bound

Branch and Bound is an exact algorithm that works best on small TSP instances [16]. The Branch and Bound algorithm partitions the set of all possible tours into smaller and smaller subsets in a tree like structure. Each node in the tree represents a set of tours and for each node the Branch and Bound algorithm will calculate a lower-bound for the tour length of the set. For an euclidean TSP the Branch and Bound tree is generated with the following principle:

”At the first level in the tree a node representing the partial tour 1 is constructed. At the next level, nodes representing the partial tours [1, 2], [1, 3], [1, 4] . . ., [1, n] are generated. At the third level, nodes representing the partial tours [1, 2, 3], [1, 2, 4], [1, 2, 5], . . ., [1, 2, n], [1, 3, 2], [1, 3, 4], . . ., [1, 3, n], . . ., [1, n, 2], [1, n, 3], . . ., [1, n, n 1]. This pattern continues until at the lowest level, nodes representing full tours are obtained.” Wiener, (2003) [27]

For each of the partial tours a lower-bound is determined by pruning away parts of the tree such that the search space is limited. This operation relies on three basic components [9]: the bounding function, selection strategy, and branching function.

The Bounding function is a component that estimates the lower bound for the tour length of a given set. Finding the exact lower-bound for a given set is an NP-hard problem, however an approximation can be made in polynomial time [9]. This function should be as restrictive as possible, without ever underestimating the lower-bound. A weak [9] bounding function results in redundant Branching, which makes the Branch and Bound algorithm less efficient. Common methods for this operation are based on row/column reduction based on the methods of [18] or minimum spanning trees [14]. For this implementation a computationally inexpensive Bounding operator (see algorithm 3) is implemented, based on row reduction. The idea is based on the principle that, ”In any tour the length of an edge taken when leaving a city must be at least as large as the length of the shortest edge emanating from that city” [27]

(8)

After the lower-bound is computed for the nodes of the tree, the next node to expand has to be chosen. This procedure is known as the selection strategy and various tree traversal methods, such as Depth First Search, Breadth First Search and Best First Search, can be implemented [9]. Best First Search selects among the generated nodes, the set of tours with the current mimium estimated lower-bound. For small instance sizes Best First Search can be used in combination with eager node evaluation to prove optimality, where only the critical nodes have to be assessed [9]. This method is prone to memory problems when applied to large instances [9], however for the scope of this research Best First Search suffices for the selection strategy (see algorithm 4).

The selection strategy provides the node with the lowest estimated lower-bound and this set of tours is most likely to contain the optimal tour. To separate the optimal tour from the other tours in the set, a Branch operator subdivides the search space. This splitting is achieved by polyomic branching [9] of the current optimal set. At each level of the tree the amount of cities in the partial tour is increased by one, which means a tour of n cities is found at level n-1. The Branch operator (see algorithm 5) will recursively branch the state space until the tree depth equals n-1 ; then the tour is completed and the tour length is computed.

Further down the three the lower-bound estimation increases as the size of the partial tour converges to the size of a completed tour. An overview of Branch and Bound is described in algorithm 6.

Operator Implementation Branch Polyomic

Select Best First Search Bound Row reduction

Figure 3.3: Operators used in the Branch and Bound Algorithm

3.3.2 Genetic Algorithm

The Genetic Algorithm is an evolutionary algorithm, where the driving function is the ’fitness’ of a given object. The aim of the Genetic Algorithm is to maximise the fitness by using a method similar to natural selection [11]. For the Traveling Salesman Problem each individual tour has a fitness, which has a reversed correlation with the tour length. A population is a set of tours for a specific TSP instance and using selection, recombination, and mutation the fitness of the tours converges to the optimum tour length [20], [11]. The GA is proven to be an efficient heuristic algorithm for the Traveling Salesman Problem, which can solve relatively large instances (150 cities) up to optimality [7]. This specific implementation is applied to small instances of 20- to 30 cities in various web tutorials1_.

The first step in the Genetic Algorithm is the initialisation of a number of random tours for a given grid instance, this set of tours is known as the initial population. The algorithm enters a bounded loop where each iteration, the fitness of the population is improved and the most fit tour is saved. During each iteration three operators are applied which are divided over two pro-cesses: selection of the individuals for reproduction and manipulation of the selected individuals by mutation and crossover [20]. These two processes are divided in three operators, where the first process consists entirely of the parent selection operator end the second process relies on the crossover operator and mutation operator. The implementation of the three operators is based on the description of [20], [2] and [7].

The parent selection operator (Algorithm 8) can be instantiated by multiple selection mech-anisms, such as tournament selection, proportional-, and rankbased roulette wheel selection [20]. For small instances the tournament selection is proven to outperform the roulette wheel mech-anisms in solution quality for low computation times [20]. Tournament selection is a simplistic yet efficient method for parent selection [20]. The mechanism randomly selects n-tours from the population, which is known as the tournament. From this tournament for each individual the fitness is determined and the most fit individual is outputted as parent tour for reproduction.

1_{Example GA in java:}

http://www.theprojectspot.com/tutorial-post/applying-a-genetic-algorithm-to-the-travelling-salesman-problem/ 5

(9)

The process of manipulating the selected individuals commences with the crossover operator (Algorithm 9). This operator recombines two tours into multiple new tours without making an invalid solution. One common method is a crossover operator which relies on a variation of the partially mapped crossover (PMX), defined by [17]. In order to make a child tour, the operator copies a part of random length from the first parent tour and fills in the remaining cities in the chronological sequence from the second parent tour by making minimal changes to achieve a valid tour [17]. One set of parents can create multiple children in the next generation.

The final operator is intended to add randomness to the new population. Without a small amount of random mutations the algorithm could potentially converge to a local optimum. A percentage of the population is mutated by applying a random swap (algorithm 11) in the se-quence of the tour as a form of ’Exchange mutation’ [17]. A swap function will change the order of cities visited, however the resulting tour will remain valid (see algorithm 10).

The combination of these three operators will evolve the initial population into a new pop-ulation with a higher average fitness. Due to the swap operator some instances become less fit, however on average the fitness increases, leading to shorter tour lengths. After a specified amount of runs the algorithm is stopped and the current best tour is returned.

Operator Implementation Parent selection Tournament selection

Crossover Partially Mapped Crossover variation [17] Mutation Exchange mutation

Variable Value

Iterations 100

Population size 50

Tournament size 5

Mutation rate 0.015

Figure 3.4: Operators and variables used in the Genetic Algorithm

3.3.3 Simulated Annealing

Simulated Annealing is a probabilistic method that uses a natural analogy in a similar way as natural selection for the Genetic Algorithm. Simulated Annealing is based on techniques for controlled solidifying of meta, where the final product anneals using a cooling scheme. The algorithm uses an hill-climber to approximate the optimal solution, with additionally accepts negative improvement of the tour length to avoid local optima. During the annealing process the chance of acceptation of a negative improvement is gradually decreased, simulating the fixation of a cooling solution. By accepting negative change the algorithm can circumnavigate local optima and sample from a large area of the state space.

The algorithm is ran iteratively and the best found solution is returned after completion. For each iteration Simulated Annealing will go trough one cooling down procedure. In this procedure the algorithm will initialises with a random tour and tries to optimise the tour by applying a hill-climber algorithm. The cooling strategy defines what kind of function defines the decrease in temperature at each step [21]. A variable called the temperature decreases at each step of the hill-climber, specified by a cooling function. This function determines the rate of the cooling scheme and can be linear, quadratic, or logarithmic [1]. For the implementation a linear cooling scheme is chosen (see table 3.5).

If the hill-climber finds an dis-improvement of the current optimum, the chance of accepting the new tour depends on the current temperature. In the acceptance function (algorithm 14) [1] determines if a new tour is accepted by comparing the tour lengthy with the current optimal tour length. An improvement will be accepted unconditionally, however a dis-improvement the chance of allowing a dis-improvement is relatively large at a high temperature, whilst at low temperatures the chance of acceptance is almost zero. The Hill-climber is based on a random swapping mechanism [1], which uses the same swap operator defined in algorithm 11.

(10)

Operator Implementation Cooling scheme Linear

Cooling function temperature *= (1-CoolingRate)

Variable Value

Iterations 100

CoolingRate 0.0003 Initial temperature 10000

Figure 3.5: Operators and variables used in Simulated Annealing

3.4 Settings

The algorithms from the previous section are instantiated with fixed algorithmic parameters (see figures 3.5 and 3.4), that are essential to the performance on the grid instance. The parameters are chosen according to similar experiments on TSP instances of twenty to thirty cities [22], however no additional parameter tuning is performed for the algorithms.

In addition to the algorithmic variables, there are settings which determine the experimental set up. The experimental set up will be partially based on the methodology of Fischer et al. where the shake operator and reduction operator are applied in different gradations. Fischer et al. used {25, 50, 75, 85, 90, 95, 98, 99} percent of preservation for the reduction operator and {5, 10, 25, 50, 75, 100, 150, and 200} percent of perturbation applied by the shake operator (see 3.2 for examples). Per operator 100 instances are generated for each level of perturbation, with in total 800 generated instances per experiment. Although these methods were defined by Fischer et al, the number of cities on the perfect grid instance is not specified in their research. To keep the computations within reasonable limits, the operators are applied tom perfect grid instances of size 20 (see figure 6.2). In addition a bound is introduced of 100.000.000 iterations for the Branch and Bound. When this limit is exceeded an instance is labelled ’infeasable’ and excluded from further analysis. In previous test this bound was exceeded twice and the solution was found above 250.000.000 iterations.

After applying the operators, the instances are solved with the Branch and Bound, Genetic Algorithm, and simulated annealing. The Branch and Bound is first applied to find the optimal length for an instance and the CPU time for this computation is stored. The minimum length of the instance is used to measure the performance of the heuristic algorithm. The Genetic Algorithm and Simulated Annealing are expressed with the relative score of the best solution, compared to the optimal solution. After 100 iterations the algorithms are cut of and the per-centage above the minimal tour length is computed. In total six experiments are conducted by combining applying the two operators with each of the three algorithms (an overview is shown in figure 6.2).

3.5 Evaluation

To get a better insight in the data the scatter points are plotted along with the mean and the 95% confidence interval of the mean. The data will be evaluated by a comparison to the results of the research of Fischer et al. Although the algorithms in this thesis differ from those imple-mented by Fischer et al., their conclusions evaluate the operators, not the algorithms. Therefore the general trends and conclusions from, the research of Fischer et al. are used for comparison.

Fischer et al. evaluated the shake operator on both the heuristic and exact algorithm. On their exact algorithm, Fischer et al. observed an increase in CPU time, reaching a maximum at a shake level of 25%. At higher levels of perturbation the CPU time showed a decrease in CPU time and for the most structured instances the CPU time had only a relatively small degree of variation. In addition 5 to 10% of the instances took significantly more CPU time to solve for the shake operator. Fischer concludes that when the structure of instances is reduced there should be a significant increase in computation hardness, however when subjected to very strong perturbations, instances tend to become easier again.

(11)

For the heuristic Lin-Kernighan Helsgaun algorithm, Fischer et al. used a system of time penalties for instances that were not solved to optimality. Because this system requires an arbitrary choice of the height of the penalty which makes comparison between two non equal algorithms infeasible. Therefore this research uses the relative performance of the algorithms after a fixed amount of runs.

According to Fischer et al. there should be an increase of computational cost for the shake levels up to 50%, after which there is no further increase in difficulty. The reduction operator in combination with the heuristic algorithm had the hardest instances around 75- to 90% preserva-tion levels. Fischer et al claim that the well-structured instances (98 and 99% preservapreserva-tion) are solved efficiently by both their heuristic- and exact algorithm. However Fischer et al. also state that ”When subjected to very strong perturbations, instances tend to become easier again. For reduction, this effect can be explained by the decrease in the number of cities.”. These claims will be evaluated with the data form the following section.

CHAPTER 4

Results

The results of the Branch and Bound algorithm are shown in figure 4.2 and figure 4.3. For both operators there seems to be an affected on the CPU time of Branch and Bound, under the variable quantities of perturbation.

The Branch and Bound algorithm reached the limit of 10.000.000 iterations for twenty instances (figure 4.1) when combined with the shake operator. Especially on the shake levels with the highest perturbation (150% and 200% shake) the bound was reached up to eight times. There-fore the average time displayed on these levels might differ from the values displayed in graph 4.2.

The average CPU time of the Branch and Bound operator seems to increase when more per-turbation is applied by the shake operator. The most structured instances are solved significantly faster than the higher levels and when more perturbation is added, the CPU time increases. Also the variability in CPU time within one level of the shake operator tends to increase then the instances become less structured.

Operator Level Number of infeasible instances

Shake 75 1

Shake 100 3

Shake 150 8

Shake 200 8

Reduction 95 2

Figure 4.1: Unsuccessful runs of the Branch and Bound algorithm, where more than 10.000.000 iterations are needed to find a solution

(12)

Figure 4.2:Influence of the shake operator on the CPU time of Branch and Bound for instances with 20 cities

The reduction operator displays the opposite effect on the CPU-time; when instances are heavily perturbed, the computation time is relatively low for the Branch and Bound operator (figure 4.3). The instances with the lowest preservation levels (25 or 50%) are solved to optimality in less than one millisecond. In the region of 90 to 95% preservation the CPU time reaches a maximum and for the highest structured instances of 98% and 99% preservation, the CPU time declines with 21 milliseconds.

Figure 4.3:Influence of the Reduction operator on the CPU time of Branch and Bound for instances with 20 cities. The subplot shows the data on a logarithmic scale

(13)

When the shake operator is applied to the heuristic algorithms, the performance of the algorithms seems much less affected than for the exact algorithm. Both the Genetic Algorithm and the Sim-ulated Annealing show relatively small variations in the average solution quality compared to the effects on the Branch and Bound. The Genetic Algorithm seems to find solutions that are on average closer to the optimum tour length than Simulated Annealing. Both the performance of the Genetic Algorithm and Simulated Annealing tends to improve on the extreme values of the shake operator. The performance differs around 5 percent for the Genetic Algorithm and 10 percent for the Simulated Annealing. The performance of the Genetic Algorithm seems to increase especially on the highly structured instances. When the perturbation increases there is a increase in variability within one level of shake. The Simulated Annealing also displays a slight increase in performance for the highly structured instances although this decrease seems more evenly spread over the first three levels of the shake operator (5%, 15% and 25%).

/2

(a) Genetic Algorithm with the reduction operator. (b) Genetic Algorithm with the shake operator

/2

(c) Simulated Annealing with reduction operator (d) Simulated Annealing with the shake operator

Figure 4.4: The performance of the heuristic Genetic Algorithm (GA) and the heuristic Simulated An-nealing (SA) for both the shake- and reduction operator

(14)

When the reduction operator is combined with the Genetic Algorithm (graph 4.4a) the aver-age percentaver-age above the optimum is lowest for the reduction levels 25%, 50% and 75% preser-vation. The average percentage above the optimum increases until a preservation level of 95%. For higher levels of preservation (98% and 99%) the solution quality improves. The reduction operator in combination with Simulated Annealing (graph 4.4c appears to have a similar trend as the Genetic algorithm. Especially on the highly structured instances there seems to be a relative increase in performance since the solution quality becomes closer to the optimum.

Besides the general trend there appears to be some clustering for the results of Simulated Annealing. When Simulated Annealing is combined with either the reduction or shake operator, the resulting data does not seem to occur around the area of the mean. There are two are two area’s that are highly populated with data points, however unlike Branch and Bound and the Genetic algorithm the area around the mean does not contain many data points.

In search of a cause for this dichotomy, the points above and below the mean of Simulated Annealing were separated and stored in individual clusters. These clusters contain the same instances that were solved with the Genetic Algorithm, thus every instance that is solved by Simulated Annealing (figure 4.4c and 4.4d) can be traced in the graph of the Genetic Algorithm (figure 4.4a and 4.4b). The average performance of the Genetic Algorithm on the separate clus-ters is computed to see if the hard instances for Simulated Annealing also difficult for the Genetic Algorithm. However the average performance on the separate clusters does not indicate the same twofold in the data as the Genetic Algorithm (figure 4.5).

Shake level Average percentage above optimum GA Reduction level Average percentage above optimum GA

Cluster 1 Cluster 2 Cluster 1 Cluster 2

5% 11.0676480005 9.996850449 99% 10.34192544222 10.40441731005 15% 11.9127035283 11.888728011 98% 10.67617672153 11.84388532464 25% 14.1045960528 14.850010831 95% 11.81137904975 9.4267582962 50% 13.4285886888 18.624429163 90% 10.80983427270 12.05790097700 75% 15.2223688099 13.733961732 85% 9.37398120245 11.10536770628 100% 15.1252228542 16.380327264 75% 8.28148778094 8.62414982620 150% 12.8046741359 14.248584852 50% 1.565587801486 0.61477591435 200% 14.2620244402 12.638102058 25% -1.705302565824E-14 1.19670355496E-15

Figure 4.5: Mapping of the clusters in SA to the performance of GA. Cluster 1 contains all data points that are below the mean of the shake operator in combination with SA (figure 4.4d) and Cluster 2 contains all data points that are above the mean of the shake operator in combination with SA

(15)

CHAPTER 5

Discussion and Conclusion

The results from the previous section indicate that the perturbation operators defined by Fischer at al. have, at least to some extend, influence on the computational hardness of instances. The measurements on the CPU time of exact algorithms seems to indicate a correlation between the perturbation and the computational cost of instances. For each higher level of perturbation applied by the shake operator the CPU time of Branch and Bound increases. Fischer et al. indicated a peak in the CPU time at 25% shake in their data, this is not apparent in the results of figure 4.2. A possible factor in this deflection could be the initial problem size, which Fischer indicates can have a strong influence on the variability in solving time. If the instance sizes in this research differ from those used by Fischer et al. there might be a shift of the maximum for the shake operator. When the Branch and Bound was combined with the shake operator a total of 20 instances were not solved to optimality. Especially on the shake levels of 150% and 200%, up to 8 percent of the instances remain unsolved, which were left out of the analysis.

The Heuristic algorithms in figure 4.4b and 4.4d show less strong effect of the shake operator than the exact algorithm. Both the performance of the Genetic Algorithm and Simulated An-nealing tends to improve on the extreme values of the shake operator, which is consistent with the prediction of Fischer et al. The most structured instances are solved relatively easy by the Genetic Algorithm and Simulated Annealing. Opposed to the exact algorithm, the performance of the algorithms seems to improve at high levels of the shake operator, which might be caused by the difference in metrics. The relative score was newly introduced as a metric for the heuristic algorithms and gives a low contrast between the different levels of the shake operator.

The reduction operator seems to display a similar trend for all three algorithms. The heavily reduced instances are significantly more easy than the instances with higher preservation levels. This trend is consistent with the results of Fischer et al. who stated this effect can be explained by the decrease in the number of cities. This number of cities appears to be an important fac-tor and when more cities are preserved there is indeed in increase in computational hardness. However not every increase yields a higher computational cost; in the most highly structured instances there is an increase of cities, although the computational hardness decreases for all three algorithms. This anomaly might be the effect of the spatial distribution although it might not be the only attribute that causes the decrease in instance hardness.

A remarkable trend in the data of the Simulated Annealing is the appearance of clusters above and below the mean performance for both the shake and reduction operator. When the points clusters above and below the average performance of Simulated Annealing were separately mapped to the performance of the Genetic algorithm, there was no apparent correlation. Based on the relative score of the algorithms there appears to be no relation between the in-stances that are computationally expensive for Simulated Annealing and the Genetic Algorithm. A possible explanation for the The clustering might be the algorithmic parameters of Simulated Annealing. Since no additional parameter tuning is performed on the algorithm the probabilistic scheme could cause the algorithm to get stuck in a local optimum. The combination of the iterations bound with the chosen cooling scheme could lead to a sub-optimal performance of Simulated Annealing.

(16)

In line with the prediction of Fischer et al. there is an effect of the reduction and shake opera-tor on the computational hardness of instances. For both the heuristic- and the exact algorithm, highly structured instances were relatively easy to solve and increasing the perturbation lowered on average the performance of the algorithms. Heavily perturbed instances by the reduction operator became easier again to solve, although Fischer et al. claim the effect can be partially explained by the influence of the instance size. Perturbation applied by the shake operator de-creases the performance of the exact algorithm, however the results indicate a slight increase in performance of the heuristic algorithms. Although both operators influence the performance of Branch and Bound, the Genetic Algorithm, and Simulated Annealing it is hard to determine to what extend this influence is due to the structure of a grid, because of the many attributes of the Traveling Salesman Problem.

For continuation of this research I would recommend further parameter tuning for simulated annealing. Also the limit on the Branch and Bound would preferably be raised or removed, since the amount of infeasable instances could influence the average performance of the algorithms. Since the size of an instance seems to be an important factor I would recommend multiple experiments with different instance sizes following the principle shown in figure 6.1. In addition a possible extension could be a statistical analysis with an Empirical Performance Model. Finally I would suggest further research on a theoretical foundation for the variable performance of algorithms on the Traveling Salesman Problem. As Hutter et al. (2014) [15] state not much is known about what causes the variability of algorithms on the instances of the Traveling Salesman Problem.

(17)

CHAPTER 6

Appendix

6.1 Glossary

1. TSP instance: one instantiation of the Traveling Salesman Problem, consisting of a set of cities

2. Tour: Route that connects a given set of cities and is cyclic 3. subtour: a tour that does not connect all cities of a given set

4. NP-hard: Complexity class for decision problems that are intrinsically harder than those that can be solved by a non-deterministic Turing machine in polynomial time [23]

5. NP: (non deterministic polynomial time) Complexity class for decision problems

6. Order parameter: feature of an (TSP) instance that predicts the computational hardness

6.2 Additional settings

Dimensions grid Instance size Length optimal tour

4 × 3 12 12 * d

4 × 4 16 16 * d

5 × 4 20 20 * d

6 × 5 30 30 * d

. . . .

Figure 6.1: Instance sizes and the corresponding grid dimensions

/2

Description Algorithm Operator Measured variables Perfect grid size Iterations bound

BB time vs. shake Branch and Bound shake iterations/time 20 10.000.000

BB time vs. reduce Branch and Bound reduction iterations/time 20 10.000.000

GA score performance vs. shake Genetic Algorithm shake best score 20 100

GA score performance vs. reduction Genetic Algorithm reduce best score 20 100

SA score performance vs. shake Genetic Algorithm shake best score 20 100

SA score performance vs. reduction Genetic Algorithm reduce best score 20 100

Figure 6.2: The experiment set up for the three algorithms.In total six experiments were conducted where each operator is applied in eight different gradations. Per gradation the experiment was repeated 100 times, resulting in 800 instances per experiment.

(18)

6.3 Pseudo-code

6.3.1 Branch and Bound

Algorithm 3 Bounding operator

1: procedure BOUND(partialTour)

2: CostM artix ← costM atrix(partialT our)

3: lowerBound ← 0

4: for row in size(partialT our) do

5: rowM inimum ← minimum(costM atrix, row)

6: lowerBound ← lowerBound + rowM inimum

7: return lowerBound

Algorithm 4 selection strategy based on Best First Search labeleuclid

1: procedure SELECTIONSTATEGY(parentNode)

2: initialise P riorityQueue

3: children ← getChildren(parentN ode)

4: for node in children do

5: AddT o(P riorityQueue, node)

6: return P riorityQueue

Algorithm 5 Branch operator labeleuclid

1: procedure BRANCH(partialTour, grid)

2: n ← size(grid)

3: if size(partialT our) = n − 1 then

4: return AddT o(partialT our, lastCity)

5: else

6: for city in grid do

7: if city not in partialT our then

8: newT our ← AddT o(partialT our, city) 9: AddT o(newBranches, newT our) 10: return newBranches

Algorithm 6 Outline Branch and Bound

1: procedure BB(grid)

2: AddT o(P riotityQueue, initialSubtour)

3: while PriorityQueue not is empty do

4: newT our ← pop(P rioityQueue)

5: if size(newT our) < size(grid) − 1 then

6: newBranches ← Branch(newT our)

7: for all partialT our in newBranches do

8: AddT oP riotityQueue(partialT our)

9: if size(newT our) = size(grid) − 1 then 10: city ← f indRemainingCity

11: addT oN ewT our(city)

12: if length(newT our) < length(bestT our) then

13: bestT our ← newtour

(19)

6.3.2 Genetic Algorithm

Algorithm 7 Fitness function

1: procedureGETFITNESS(tour)

2: for edge in tour do

3: length(tour) ← lengthtour + length(edge)

4: f itness ← 1/lengthtour

5: return f itness

Algorithm 8 Parent selection operator

1: procedure SELECTPARENT(population, tournamenSize)

2: f ittest = 0

3: for i in tournamentSize do

4: tour ← rondomT our(population)

5: f itness ← getF itness(tour)

6: if f itness > f ittest then

7: f ittest ← f itness

8: parent ← tour

9: return parent

Algorithm 9 Crossover operator

1: procedure CROSSOVER(tour1, tour2)

2: startIndex ← chooseRandomIndex(size(tour1))

3: endIndex ← chooseRandomIndex(size(tour2))

4: if startIndex < endIndex then

5: subT our ← getsubT our(tour, startIndex, endIndex)

6: else

7: subT our ← getsubT our(tour, endIndex, startIndex) 8: AddT o(child, subT our)

9: for city in tour2 do

10: if city not in child then

11: AddT o(child, city)

12: return child

Algorithm 10 Mutation operator

1: procedure MUTATE(population, mutationRate)

2: for tour in population do

3: randomN umber ← randomnumber[0, 1)

4: if randomN umber > mutationRate then

5: swappedT our ← RandomSwap(tour)

6: Addto(newP opulation, swappedT our)

7: else

8: AddT o(newT our, tour)

(20)

Algorithm 11 Swap function

1: procedure RANDOMSWAP(tour)

2: city1 ← selectRandomCity(tour)

4: while city2 = city1 do

6: city3 ← city1

7: replace(city1, city2) 8: replace(city2, city3)

Algorithm 12 Outline Genetic Algorithm

1: procedure GA(grid, bound, popSize)

2: for i in popSize do

3: tour ← generateRandomT our 4: addT o(tour, P opulation) 5: for i to bound do

6: while size(newP opulation).size < size(population.size) do

7: parent1 ← SelectP arent(population)

8: parent2 ← SelectP arent(population)

9: children ← Crossover(parent1, parent2)

10: for tour in children do

11: addT o(tour, newP opulation)

12: population ← mutate(newP opulation) 13: return getF ittest(population)

6.3.3 Simulated Annealing

Algorithm 13 Simulated Annealing acceptance function

1: procedureACCEPT(newTour, currentOptimumTour, temperature)

2: probability ← (lenth(currentOptimumT our) − length(newT our))/temperature

3: acceptanceBound ← exp(probability)

4: if length(newT our) ≤ length(currentOptimumT our) then

5: return true

6: else

7: if randomnumber[0, 1) > acceptanceBound then

8: return true

9: else

(21)

Algorithm 14 Outline Simulated Annealing

1: procedure SA(grid, iterations, temperature, cooling)

2: bestT our ← randomT our

3: for i in iterations do

4: currentT emp ← temperature

5: coolingRate ← cooling

6: while currentT emp > 0 do

7: newT our ← RandomSwap(tour)

8: if accept(newT our, bestT our, temperature) then

9: bestT our ← newT our

10: currentT emp ← currentT emp ∗ (1 − cooling)

Bibliography

[1] Aarts, Emile, Korst, Jan, and Michiels, Wil. “Simulated annealing”. In: Search methodologies. Springer, 2014, pp. 265–285.

[2] Abdoun, Otman, Abouchabaka, Jaafar, and Tajani, Chakir. “Analyzing the performance of mutation operators to solve the travelling salesman problem”. In: arXiv preprint arXiv:1203.3099 (2012). [3] Applegate, David et al. “Implementing the Dantzig-Fulkerson-Johnson algorithm for large

travel-ing salesman problems”. In: Mathematical programmtravel-ing 97.1 (2003), pp. 91–153.

[4] Arora, Sanjeev. “Polynomial time approximation schemes for Euclidean traveling salesman and other geometric problems”. In: Journal of the ACM (JACM) 45.5 (1998), pp. 753–782.

[5] Bertsimas, Dimitris, Tsitsiklis, John, et al. “Simulated annealing”. In: Statistical science 8.1 (1993), pp. 10–15.

[6] Boddy, Mark and Dean, Thomas L. Solving time-dependent planning problems. Brown University, Department of Computer Science, 1989.

[7] Braun, Heinrich. “On solving travelling salesman problems by genetic algorithms”. In: Interna-tional Conference on Parallel Problem Solving from Nature. Springer. 1990, pp. 129–133. [8] Cheeseman, Peter, Kanefsky, Bob, and Taylor, William M. “Where the Really Hard Problems Are.”

In: IJCAI. Vol. 91. 1991, pp. 331–337.

[9] Clausen, Jens. “Branch and bound algorithms-principles and examples”. In: Department of Com-puter Science, University of Copenhagen(1999), pp. 1–30.

[10] Cobham, Alan. “The intrinsic computational difficulty of functions”. In: (1965). [11] Davis, Lawrence. “Handbook of genetic algorithms”. In: (1991).

[12] Fischer, Thomas et al. “An Analysis Of The Hardness Of TSP Instances For Two High Performance Algorithms”. In: Proceedings of the Sixth Metaheuristics International Conference. 2005, pp. 361– 367.

[13] Gutin, Gregory and Punnen, Abraham P. The traveling salesman problem and its variations. Vol. 12. Springer Science & Business Media, 2006.

[14] Helsgaun, Keld. “An effective implementation of the Lin–Kernighan traveling salesman heuristic”. In: European Journal of Operational Research 126.1 (2000), pp. 106–130.

(22)

[15] Hutter, Frank et al. “Algorithm runtime prediction: Methods & evaluation”. In: Artificial Intelli-gence206 (2014), pp. 79–111.

[16] Jiang, Yan et al. “Comparing a hybrid branch and bound algorithm with evolutionary computation methods, local search and their hybrids on the tsp”. In: Computational Intelligence in Production and Logistics Systems (CIPLS), 2014 IEEE Symposium on. IEEE. 2014, pp. 148–155.

[17] Larranaga, Pedro et al. “Genetic algorithms for the travelling salesman problem: A review of rep-resentations and operators”. In: Artificial Intelligence Review 13.2 (1999), pp. 129–170.

[18] Little, John DC et al. “An algorithm for the traveling salesman problem”. In: Operations research 11.6 (1963), pp. 972–989.

[19] Nallaperuma, Samadhi et al. “Features of Easy and Hard Instances for Approximation Algorithms and the Traveling Salesperson Problem”. In: ().

[20] Noraini, Mohd Razali and Geraghty, John. “Genetic algorithm performance with different selection strategies in solving TSP”. In: (2011).

[21] Nourani, Yaghout and Andresen, Bjarne. “A comparison of simulated annealing cooling strate-gies”. In: Journal of Physics A: Mathematical and General 31.41 (1998), p. 8373.

[22] Philip, Adewole, Taofiki, Akinwale Adio, and Kehinde, Otunbanowo. “A Genetic Algorithm for Solving Travelling Salesman Problem”. In: ().

[23] Pieterse, Black. ”Algorithms and Theory of Computation Handbook. CRC Press LCc, 1999. [24] Reinelt, Gerhard. The traveling salesman: computational solutions for TSP applications.

Springer-Verlag, 1994.

[25] Uchida, Akihiro, Ito, Yasuaki, and Nakano, Koji. “An efficient GPU implementation of ant colony optimization for the traveling salesman problem”. In: Networking and computing (icnc), 2012 third international conference on. IEEE. 2012, pp. 94–102.

[26] Voigt, B Fr. “Der Handlungsreisende, wie er sein soll und was er zu thun hat, um Aufträge zu erhalten und eines glücklichen Erfolgs in seinen Geschäften gewiss zu zu sein ”. In: Commis-Voageur, Ilmenau. Neu aufgelegt durch Verlag Schramm, Kiel(1981).

[27] Wiener, Richard. “Branch and Bound Implementations of the Traveling Salesperson Problem -Part 4: Distributed processing solution using RMI.” In: Journal of Object Technology 2.6 (2003), pp. 51–65.