Traveling Salesman Problem

(1)

Traveling Salesman Problem

Isja Mannens

July 12, 2017

Bachelor Thesis Supervisor: dr. Viresh Patel

Korteweg-de Vries Instituut voor Wiskunde

(2)

Abstract

The Traveling Salesman Problem (TSP) asks for a minimal cost tour along all vertices of an edge-weighted graph. In this thesis we will discuss different versions of the problem and some of the algorithms designed to solve the problem. We will discuss an exact algorithm for general TSP, found in (Bj¨orklund, 2010). The algorithm uses an algebraic approach, reducing Hamiltonicity to Labeled Cycle Cover Sum. We also discuss an approximation algorithm for Euclidean Tsp, found in (Arora, 1998). This algorithm divides the graph up using a quad-tree, which then allows for a dynamic programming algorithm on the resulting subgraphs. Both algorithms have some random element, for which an upper bound is found, using Markov’s inequality.

Title: Traveling Salesman Problem

Authors: Isja Mannens, isja-m@hotmail.com, 10730346 Supervisor: dr. Viresh Patel

Date: July 12, 2017

Korteweg-de Vries Instituut voor Wiskunde Universiteit van Amsterdam

Science Park 904, 1098 XH Amsterdam http://www.science.uva.nl/math

(3)

1 Introduction

Some problems have achieved an almost mythical status in mathematics. They are the kind of problems which go unsolved for decades, sometimes even centuries. These problems tend to be the focal point for a lot of research and as a result they can be a great source of innovation. One of these problems is the Traveling Salesman problem, or more specifically, the search for a polynomial time algorithm to solve it.

The traveling Salesman Problem (TSP) is a textbook example of a real life problem, turned into an abstract one. Given a weighted graph, the problem asks for the cheapest route along all vertices of the graph, in terms of the edge weights. The problem has been proven to be NP-Hard, meaning that a polynomial time solution to TSP, would provide polynomial time solutions to every problem in NP. This means that finding such a solution would also solve the ’P vs. NP’ problem, one of the millennium problems.

Improvements to the runtime, even if they are still superpolynomial, can be very usefull, since TSP has many applications. There are obvious applications like planning delivery routes and bus routes. There are also less obvious applications, like optimizing fuel use in telescope targeting1 _{and even computing DNA sequences}2_.

In this project we will attempt to learn more about TSP by examining two fairly recent papers on TSP. Each paper describes an algorithm for a certain version of TSP. The first paper discusses an algorithm for the general version and the second paper discusses an approximation algorithm for the Euclidean variant of TSP. Our main goal will be to understand these algorithms. Through this understanding we hope to get an idea of some of the techniques that are being used to tackle this problem.

I would like to thank dr. Viresh Patel for helping me with this project. His guidance has not only helped me to understand the subject matter but has also allowed me to find a reasonable focus in such a widely documented problem.

1

http://www.math.uwaterloo.ca/tsp/apps/starlight.html

(5)

2 Prior Knowledge

2.1 Complexity of an algorithm

In this thesis we will discuss the complexity of a number of algorithms. The complexity of an algorithm is usually one of its most important characteristics. In this section we will give a short introduction to the complexity of an algorithm and the notations used. Broadly speaking, the complexity of an algorithm gives an indication of the rate at which the runtime of an algorithm increases, as a function of the problem size. Here we interpret run time as the number of computations1. Complexity is important, because it gives an indication of the input sizes which are feasible to compute using the algorithm. For example, the run time of multiplying two n digit numbers, as a function of n, can by bounded by some multiple of n2. However, the complexity of factorizing a number into its prime factors is assumed to be much higher2. This fact is used in cryptographic methods like RSA cryptography, which depend on the fact that it is easy to calculate the product of two large primes, but it is very hard to find those primes, given their product.

When we have found such a function f , which gives an upper bound on the runtime of an algorithm, we say that the algorithm has a runtime of O(f (n)). A function g(n), which would describe the runtime of an algorithm in our case, is O(f (n)), if there is some constant c > 0 such that g(n) < cf (n) for all n.

Finally there are some classes of algorithms, known as complexity classes. For example the complexity of the multiplication mentioned earlier can be described by a polynomial and thus we say that is a polynomial time algorithm. An algorithm with complexity O(epoly(n)_{) would be considered exponential time.}

This definition of complexity is sometimes referred to as time complexity, since we may use the same principles to describe other aspects of an algorithm, most notably the amount of memory it would use on a computer.

1

Some computations take more time than others. However, we will find that, as long as we have some upper bound on the time a computation takes, this has no impact on the resulting definition of complexity.

(6)

2.2 Dynamic Programming

Dynamic programming is a common technique used in algorithmics. The basic idea is to start at one or more trivial versions of the main problem and gradually build a lookup table of solutions, by combining solutions for smaller problems into solutions for bigger problems.

A simple example of this technique is the calculation of the n-th Fibonacci number. A recursive algorithm would calculate f ib(n) by f ib(n) = f ib(n − 1) + f ib(n − 2) meaning that the running time roughly doubles from n − 1 to n, since the algorithm will execute itself for both f (n − 1) and f (n − 2). This gives it a running time of O(2n). A dynamic programming algorithm for the same problem, would build a list of all Fibonacci num-bers, starting at f ib(0), and would use previous entries in the list to calculate each new entry. This means that the increase in running cost from n to n + 1 is only the cost of a single addition, making the running cost of this algorithm O(n).

2.3 Markov’s Inequality

Both algorithms discussed in this thesis use Markov’s inequality. This inequality is a useful tool for finding upper bounds on probabilities.

Given a non-negative random variable X and a > 0, Markov’s inequality tells us that

P(X ≥ a) ≤ E(X) a

where P(Y ) indicates the chance of event Y taking place and E(X) indicates the expected value of X.

(7)

3 Exact TSP

A common technique for finding algorithms, is to reduce the problem to another problem, by describing a way to translate a solution for the second problem into a solution of the original problem. This can often simplify the problem. In this section we discuss an exact algorithm for General TSP, found in (Bj¨orklund, 2010).

Definition 1 (General TSP). Given a weighted graph G = (V, E, w) with w : E → R≥0,

find a cycle π, such that all nodes in V are visited by π and the sum of the weights of the edges of π is minimal over all possible choices of π.

The paper first gives an algorithm which calculates the so called Labeled Cycle Cover Sum and then reduces the Hamiltonicity Problem to the Labeled Cycle Cover Sum. Definition 2 (Hamiltonicity). Given a graph G = (V, E), determine whether there exists a cycle π in G, such that all nodes in V are visited by π.

The algorithm is then extended to an algorithm for General TSP.

3.1 Framework

3.1.1 Labeled Cycle Cover Sum

In this section we will discuss the labeled cycle cover sum and some lemmas which will help with the reduction in the next section.

Definition 3 (Labeled Cycle Cover). Given a directed graph D = (V, A), a cycle cover is a subset C ⊆ A, such that for all v0∈ V there is exactly one v0v1 ∈ C and exactly one

v2v0 ∈ C1. Given a label set L, a labeled cycle cover is a cycle cover C, together with a

surjective function g : L C. We denote the set of all labeled cycle covers (C, g) on D as lcc(D, L) and the set of labeled Hamiltonian cycle covers (H, g) on D as lhc(D, L)2. Intuitively, a cycle cover is a disjoint collection of cycles on a graph, which cover all vertices of the graph. The function g, simply distributes the labels in L over the cycle cover. Using this defintion, we can define the labeled cycle cover sum.

Definition 4 (Labeled Cycle Cover Sum). Given a directed graph D = (V, A), a label set L and a function f : A × 2L\{∅} → R for some ring R, the labeled cycle cover sum is defined as Λ(D, L, f ) = X (C,g)∈lcc(D,L) Y a∈C f (a, g−1(a)) 1

We assume that D has no arcs from a node to itself and thus v16= v06= v2.

(8)

Later on, we will use this sum to determine Hamiltonicity. The following definition and lemma show how we can we can remove all non-Hamiltonian cycle covers from the sum.

Definition 5 (s-oriented Mirror Function). Given a bidirected graph D = (V, A), a finite set L and a special node s ∈ V , an s-oriented mirror function is a function f : A × 2L\{∅} → R, such that f (uv, Z) = f (vu, Z) for all Z ∈ 2L_{\{∅} and u 6= s 6= v.}

Lemma 1. Given a bidirected graph D = (V, A), a finite set L and a special node s ∈ V , let f be an s-oriented mirror function with a codomain ring of characteristic two. Then

Λ(D, L, f ) = X

(H,g)∈lhc(D,L)

Y

a∈H

f (a, g−1(a))

Proof (sketch). Since f (a, g−1(a)) is in a ring of characteristic two, equal elements will cancel out. This means that if we can pair each labeled non-Hamiltonian cycle cover to another, unique labeled non-Hamiltonian cycle cover with the same term for

Y

a∈H

f (a, g−1(a))

they will cancel out and we will be left with just the Hamiltonian cycle covers. The way we construct this pairing is by choosing an ordering on the possible subcycles. We then find an elements partner by reversing the direction of the first subcycle, which does not contain s3_{. Since f is an s-oriented mirror function, reversing the direction of the}

subcycle, does not change cycle cover’s contribution to the sum and thus the pairs cancel out.

3.1.2 Calculating the Labeled Cycle Cover Sum

In this section we will discuss a method for calculating the labeled cycle cover sum over a ring of characteristic two, which will prove to be relatively fast. This method is the main factor in determining the runtime of the algorithm and thus is crucial to the effectiveness of the overall algorithm.

Let D = (V, F ) a directed graph. If we define a matrix A as

Ai,j = ( ω(ij) : ij ∈ F 0 : otherwise we find that per(A) = X σ:V V |V | Y i=1 ω(iσ(i)) = X C∈cc(D) Y ij∈C ω(ij)

Furthermore, we use the fact that in a ring of characteristic two, det(A) = per(A).

3

We will not go into detail on how to construct this ordering, since it does not add much to the understanding of the proof.

(9)

We will use this fact to construct a polynomial in x, out of determinants of certain matrices, for which the coefficient for x|L| will equal the labeled cycle cover sum of D.

First we will define the aforementioned matrices as follows. For Z ⊆ L, with L some label set, let

Mf(Z)i,j =

(

f (ij, Z) : ij ∈ F, Z 6= ∅ 0 : otherwise We then define the polynomial as

p(f, x) = X Y ⊆L det X Z⊆Y x|Z|Mf(Z)

The following lemma then tells us how this polynomial relates to the labeled cycle cover sum.

Lemma 2. For a directed graph D, a labelset L and f : A × 2L_{\{∅} → GF (2}k_),

[x|L|]p(f, x) = Λ(D, L, f )

where [xk]p(x) indicates the coefficient of xk in p(x).

Proof. Using the equality of determinants and permanents in rings of characteristic two, we write p(f, x) as p(f, x) = X Y ⊆L per X Z⊆Y x|Z|Mf(Z) = X Y ⊆L X C∈cc(D) Y ij∈C X Z⊆Y x|Z|f (ij, Z)

Now note that for any set of numbers (ai,j)n,mi,j=0 n Y i=1 m X j=1 ai,j = X q:{1,...,n}→{1,...,m} n Y i=1 ai,q(i)

Applying this to the polynomial we find

p(f, x) = X Y ⊆L X C∈cc(D) X q:C→2Y Y ij∈C

x|q(ij)|f (ij, q(ij))

= X C∈cc(D) X Y ⊆L X q:C→2Y Y ij∈C

x|q(ij)|f (ij, q(ij))

= X C∈cc(D) X q:C→2L X Y ⊆L ( S ij∈C q(ij))⊆Y Y ij∈C

(10)

There are 2

|L\( S

ij∈C

q(ij))|

different Y , such that ( S

ij∈C

q(ij)) ⊆ Y . This means that if ( S

ij∈C

q(ij)) 6= L, there are an even number of equal Q

ij∈C

x|q(ij)|f (ij, q(ij)) terms in the final sum. Since we work in a ring of characteristic two, these cancel out, leaving us with

p(f, x) = X C∈cc(D) X q:C→2L ( S ij∈C q(ij))=L xPij∈C|q(ij)| Y ij∈C f (ij, q(ij))

From this we find that

[x|L|]p(f, x) = X C∈cc(D) X q:C→2L ( S ij∈C q(ij))=L ∀a6=b:q(a)∩q(b)=∅ Y ij∈C f (ij, q(ij))

Because of this we may now ’reverse’ q, by g(a) := b where a ∈ q(b). This results in

[x|L|]p(f, x) = X C∈cc(D) X g:LC Y ij∈C f (ij, g−1(ij)) = X (C,g)∈lcc(D,L) Y ij∈C f (ij, g−1(ij))

which is exactly the definition of the labeled cycle cover sum.

3.1.3 Hamiltonicity Reduction

In this section we will reduce the Hamiltonicity problem to the Labeled Cycle Cover Sum. The main idea is to divide the vertex set into two equal halves, using one half as labels and the other half as vertices in a new graph. We will find that the Labeled Cycle Cover Sum of this new graph is zero, precisely when the original graph is non-Hamiltonian.

Let G = (V, E) be any graph, we divide V into to halves V1 and V2 such that

|V₁| = |V₂|.4 _{We call an edge in G unlabeled by V}

2 if it has no end points in V2 and

call the remaining edges labeled by V2. We will denote by U (G) and L(G) the sets of

unlabeled and labeled edges respectively. We also define lhcm_V

2(G) as the set of oriented

Hamiltonian cycles (H, σ) with precisely m edges in U (G), where σ : Lm,→ U (G) labels

these m edges.

In order to use lemma 1 we will need to choose a label set, define a bidirected graph D and define an s-oriented function f over a ring of characteristic two. We define D = (V1, F ) as a complete graph and use the label set V2∪ Lm for some 0 ≤ m ≤ n − 1

and Lman arbitrary label set of size m. We will later find that we have to check multiple

values of m. We also choose some s ∈ V1.

Before we can define our s-oriented mirror function f we have to introduce some variables. For every edge uv ∈ L(G), we introduce variables xuv and xvu. We set

xuv = xvu, precisely when v 6= s 6= u, which will help ensure that the function f is

4_{If |V}

(11)

s-oriented. For every edge uv ∈ U (G) and every d ∈ Lm we introduce variables xuv,d

and xvu,d. We again set xuv,d= xvu,d, precisely when v 6= s 6= u. Finally let Pu,v(X) be

the set of paths from u to v, which pass through every vertex in X. We now define f as follows. For uv ∈ F and ∅ 6= X ⊆ V2, we set

f (uv, X) = X

P ∈Puv(X)

Y

wz∈P

xwz

For uv ∈ F such that uv ∈ L(G) and d ∈ Lm, we set

f (uv, {d}) = xuv,d

Everywhere else, we set f to zero. We will interpret these variables as polynomials over GF (2k) for some k. We will choose the value of k in section 3.2.3

Lemma 3. With G, D, V2, U , L, m, Lm, f and lhcm_V₂ defined as above,

(i) Λ(D, V2∪ Lm, f ) = P (H,σ)∈lhcm V2(G) Q uv∈U (H) x_uv,σ−1_(uv) Q uv∈L(H) xuv

(ii) Λ(D, V2∪ Lm, f ) is the zero polynomial, precisely when hcmV2(G) = ∅

Proof (i). Using lemma 1 we find that Λ(D, V2∪ Lm, f ) = X (H,g)∈lhc(D,V2∪Lm) Y uv∈H f (a, g−1(a))

Since f (a, g−1(a)) = 0 when g−1(a) intersetcts Lmand is not a singleton, we may assume

that

g(a) = (

g0(a) if a ∈ V2

σg(a) if a ∈ Lm

Where g is surjective and σg is bijective for codomains Hg0, H_σ

g, such that Hg0∪Hσg = H

and im(g0) ∩ im(σg) = ∅.

Using this we write the sum as

Λ(D, V2∪ Lm, f ) = X (H,g)∈ lhc(D,V2∪Lm) Y uv∈H g−1(uv)⊆V2 X P ∈Pu,v(g−1(uv)) Y wz∈P xwz Y uv∈H g−1(uv)⊆Lm x_uv,g−1_(uv) (1) = X (H,g)∈ lhc(D,V2∪Lm) X S∈SH,g Y wz∈P ∈S xwz Y uv∈H g−1(uv)⊆Lm x_uv,g−1_(uv) (2) = X (H,g)∈ lhc(D,V2∪Lm) X S∈SH,g Y wz∈P ∈S xwz Y uv∈H σ−1_g (uv)⊆Lm x_uv,σ−1 g (uv) (3) = X (H,σ)∈ lhcm V2(G) Y wz∈L(H) xwz Y uv∈U (G) x_uv,σ−1_(uv)

(12)

In step (1) we define SH,g := Q uv∈H g−1(uv)⊆V2

Pu,v(g−1(uv)) as the set of all possible

com-binations of paths P , and rewrite the expression accordingly. In step (2) we only use distributivity in rings to move the last product into the summation over SH,gand replace

g by σg. In step (3) we re¨ınterpret edges in the paths P as labeled by V2 and the others

as unlabeled. We also note that each Hamilton cycle in lhcm_V

2(G) is precisely an ordering

of V1, together with a choice of paths between all but m consecutive vertices in this

or-dering. This translates into a choice of g, to determine which vertices of V2 are between

which vertices in the ordering and to determine which m consecutive vertices will be connected directly. Given g, we only need to determine the paths between consecutive orderings, which is precisely a choice of S ∈ SH,g.

(ii). If hcm_V

2(G) = ∅, then clearly the sum in (i) evaluates to zero.

For the other direction, we will argue that each oriented Hamilton cycle in hcm_V

2(G)

contributes m! different monomials to the sum, one for each σ. If two oriented Hamilton cycles use different edges, then these edges contribute different variables and thus result in different monomials. If two different oriented Hamilton cycles use the same edges, then they must be the same cycle with opposite orientation. In this case the asymmetry of the variables around s ensures that the edges around s contribute different variables and thus the two Hamiltion cycles contribute different monomials.

Since each Hamilton cycle contributes a unique set of monomials, the sum can only be zero if hcm_V

2(G) = ∅

3.1.4 Extension to General TSP

In this section we will describe a way to expand the algorithm to instances of TSP with bounded integer weights. The main idea is to keep track of the weight of the edges in the Hamilton cycles, by introducing a new variable y, whose power will represent the edge weights. We then need a way to find the smallest l, such that yl has a nonzero coefficient.

More precisely, given a weight function w : E → Z≥0, we define a function fy as

fy(uv, X) :=    P P ∈Pu,v(X) Q st∈P yw(st)xst : if uv ∈ L(G) and ∅ 6= X ⊆ V2

yw(uv)xuv,d : if uv ∈ U (G), |X| = 1 and X ⊆ Lm

We essentially replace each variable xuvand xuv,dby yw(uv)xuv and yw(uv)xuv,d. In order

to keep the exponent from wrapping around, i.e. have an exponent larger that the size of the ring in which we work, we will choose k, such that |GF (2k)| > ”weight of heaviest Hamilton cycle”. We can use any upper bound, such as wtotal, the sum of the weights

of all edges.

We now choose some generator g of the multiplicative group in GF (2k) and define

T (l) :=

mmax

X

i=0

(13)

We then compute the inverse Fourier transform of T t(j) := 2k−2 X l=0 g−jlT (l)

which gives us the coefficient of yj in

mmax P i=0 Λ(D, V2∪ Li, fy), since X g−jlgil= ( 1 : i = j 0 : i 6= j

Now it is only a matter of checking t(j) for j = 0, . . . , wtotal and returning the smallest

j, for which we find a nonzero coefficient.

3.2 The algorithm

In this section we will describe the steps of the actual algorithm. We will refer to definitions and lemmas from the previous section.

3.2.1 Description

Given a graph G = (V, E), the main idea of the algorithm is to construct everything used in lemma 3, choosing V1 and V2 at random. This allows us to define cycle cover sums,

which are nonzero, precisely when the graph has a Hamiltoncycle with m edges unlabeled by V2. We then evaluate these labeled cycle cover sums in some randomly chosen set of

values5. We will use the method described in section 3.1.2, which calculates the cycle cover sum, by constructing a polynomial, which has the value of Λ(D, V2 ∪ Lm, f ) as

one of its coefficients. We do this for values of m from 0 to mmax for some value of

mmax. This procedure is called a run. We will perform r runs, where r and mmax will

be determined in section 3.2.2, and if any of the cycle cover sums evaluate to a non-zero value, we conclude that the graph is Hamiltonian. If all evaluations return zero, we assume that the graph is non-Hamiltonian.

In order to quickly compute the value of f , in a subset of F × V2, we use the following

recursion X P ∈Pu,v(X) Y sz∈P xsz= X w∈X uw∈E xuw X P ∈Pw,v(X\{w}) Y sz∈P xsz

In the case of general TSP, we use the function fy as defined in section 3.1.4. We then

calculate t(j) and search for the smallest j for which t(j) 6= 0. This forms a run, which we then repeat r times, using the smallest found tour as our output.6

5

The cycle cover sum is a polynomial in this case.

6

Note that we still use the method from section 3.1.2, in order to calculate T (l) =

mmax

P

i=0

(14)

3.2.2 Complexity

There are a number of factors which contribute to the complexity of the algorithm. First of all there is the calculation of f (uv, X) for X ⊆ V2. However, since |X| ≤

|V2| = n₂, this contributes eO(2

n

2) to the runtime7. We will find that we may safely ignore

this contribution.

Next, the number of runs also affects the runtime, by a factor of r · mmax. We will

find that setting mmax = n₄ and r = n2 will give a good runtime, while still keeping the

chance of false negatives at e−Ω(n).

Finally the method described in section 3.1.2 also adds to the complexity and we will find that this method forms the main contribution to the runtime. The following lemma will show exactly how.

Lemma 4. The labeled cycle cover sum Λ(D, L, f ) for a function f with codomain GF (2k) on a directed graph D on n vertices, and with 2k > |L|n, can be computed in O((|L|2n + |L|n1+ω)2|L|+ |L|2n2) operations, where ω is the square matrix multiplication exponent.

Proof. In order to find the coefficient of x|L|in p(f, x), we will use a number of existing techniques. We will evaluate p(f, x) in |L|n different values of x and apply the Lagrange interpolation to find the desired coefficient8. This step will take O(|L|2n2) operations and accounts for the second term in the runtime expression.

In order to apply the interpolation, we need to calculate p(f, xi) for all |L|n choices

of xi. As a reminder we have p(f, x) = X Y ⊆L det X Z⊆Y x|Z|Mf(Z) We will calculate X Z⊆Y x|Z|_i Mf(Z)

for all Y ⊆ L using Yates’ fast zeta transform (Yates, 1937), using O(|L|2|L|) operations. We then use the determinant algorithm by Bunch and Hopcroft (Bunch and Hopcroft, 1974), to calculate p(f, xi) in O(nω2|L|) operations, where ω is the square matrix

multi-plication exponent. Doing this for all |L|n choices of xi, we find a runtime of

O((|L|2|L|+ nω2|L|)|L|n) = O((|L|2n + |L|n1+ω)2|L|) which accounts for the first term in the runtime expression.

7

e

O(f (n)) suppresses all polylogarithmic functions in f (n), meaning that eO(log(f (n))kf (n)) = eO(f (n)) for any constant k.

8

Note that the maximum degree that p(f, x) can achieve, is equal to |L|n, from the determinant of P

Z⊆L

(15)

Since |L| = |V2| + |Lm| ≤ n₂ + mmax, we find a total complexity of

”Runtime” = O(r · mmax((

n 2 + mmax) 2_{n + (}n 2 + mmax)n 1+ω₎₂(n 2+mmax)+ (n 2 + mmax) 2_n2₎₎ = O(n3· mmax(n52( n 2+mmax)+ n4)) = O(n 3 4 (n 5₂(n₂+n₄)_{+ n}4₎₎ = O(n823n4 + n7) = O(n823n4 ) = eO(23n4 )

In the case of general TSP, the runtime is increased by a factor of wtotal, by the

calculation of T (l), for each l = 0, . . . wtotal. The Fast Fourier Transform allows us to

calculate t(j) for all j, in only O(wtotallog(wtotal)). Since this is only done once per run,

we add r · wtotallog(wtotal) to the runtime, which can be neglected in most cases9.

3.2.3 False negatives

Since the algorithm evaluates a polynomial in a random point, in order to determine whether it is the zero polynomial, there is a chance to receive false negatives. If the algorithm happens to choose one of the roots of a nonzero polynomial, it will wrongfully label it as the zero polynomial. If this happens, every time Λ(D, V2∪ Lm, f ) is evaluated,

the algorithm will return non-Hamiltonicity, when the graph in question is actually Hamiltonian. In order to find an upper bound on the chance of this happening, we use the following, well-known lemma.

Lemma 5 (Schwartz-Zippel). Let P (x1, x2, . . . , xn) be a nonzero n-variate polynomial

of total degree d over a field F . For r1, r2, . . . , rn∈ F (uniformly) randomly chosen, we

have

P(P (r1, r2, . . . , rn) = 0) ≤

d |F |

If we choose k such that |GF (2k)| > cn for some c > 1, we find that

P(P (r1, r2, . . . , rn) = 0) ≤

n cn =

1 c if the graph is Hamiltonian.

Another factor which may cause false positives is the fact that only some values of m are checked. It is possible that all Hamilton cycles have at least mmax + 1 edges,

unlabeled by V2, given the chosen V1 and V2. Given the expected number of unlabeled

9_{Of course, if w happens to be very large, this will have an effect on the runtime, but if the weight of}

the individual edges is assumed to be bound by some finite value M , then we find that w is O(n2) and we may neglect it.

(16)

edges in a Hamilton cycle, we can use Markov’s inequality to find an upper bound for the chance of this happening.

P n/4 X m=0 |lhcm V2(G)| = 0 ≤ P|U (H)| ≥ n 4 + 1 for a given H ≤ E(|U (H)|)_n+4 4 (Markov’s inequality) = n 4 n+4 4 = n n + 4

In the third step we use the fact that E(|{uv|u ∈ V1, v ∈ V2}|) = n₂ and E(|U(H)|) = n−E(|{uv|u∈V1,v∈V2}|)

2 =

n 4.

We find that the chance of a false negative, after each run is

P ”False Negative, one run” = P n/4 X m=0 |lhcm_V 2(G)| = 0 + 1 − P n/4 X m=0 |lhcm_V 2(G)| = 0 1 c ≤ n n + 4+ (1 − n n + 4) 1 c = n n + 4+ 4 c(n + 4) = cn + 4 c(n + 4)

and thus after n2 runs, the chance of false negatives is

P ”False Negative, n2 runs” =

cn + 4 c(n + 4) n2 = cn + 4 c(n + 4) nn ≤c + 4 5c n

In the last step, we use the fact that cn + 4 c(n + 4) n =1 − 4c − 4 c(n + 4) n ≤1 −c 0 n n

For c0 = 4c−4_c . We know that (1 −k_n)n is a monotonically decreasing function for k > 0. This tells us that

cn + 4 c(n + 4) n ≤ 1c + 4 c(1 + 4) 1 = c + 4 5c

for n ≥ 1. We find that the chance of false negatives is exponentially small in n. In the case of general TSP, the same upper bound holds for the chance of returning a non-optimal salesman tour. The reason we can use this upper bound, is that in the derivation for it, we only assumed the existence of a single Hamilton cycle. If we substitute this Hamilton cycle for the optimal salesman tour, the exact same derivation still holds.

(17)

4 Approximate TSP

In this section we will discuss a polynomial time approximation algorithm for Euclidean-Tsp, found in (Arora, 1998). The precise statement of the problem is as follows:

Definition 6. (PTAS for Euclidean TSP) Given a set of n nodes in Rd _{and a constant}

c > 0, let OP T be the length of the shortest cycle visiting all nodes. A ’Polynomial Time Approximation Scheme for Euclidean TSP’, is a polynomial time algorithm which outputs a cycle which visits all nodes and is of length at most (1 + 1_c)OP T .

In the following sections we will discuss the algorithm in detail, a proof of its correct-ness and some additions to the paper.

4.1 Algorithm description

In this section we will discuss the algorithm, as it would be executed in R2. The main idea behind the algorithm is to build what is known as a quad-tree on the given graph, a certain series of subdivisions starting with a square around the graph. The quad-tree is built a randomly shifted dissection 1. We will then apply a dynamic programming algorithm on the resulting figure. Each step will be discussed in more detail in the following sections.

4.1.1 Making the graph well rounded

In order to make the graph easier to work with, the nodes are moved around slightly, such that they meet the following criteria:

• All nodes have integral co¨ordinates

• The distance between two nodes is at least 8 (unless two nodes are moved to the same set of co¨ordinates)

• The maximum distance between nodes is of O(n)

The first criterion will allow us to make certain assumptions about the nodes. For example, none of the nodes will lie on a grid-line. The second criterion will allow us to find an upper bound on the number of times a salesman tour crosses a grid-line. This is related to a more technical argument we will discuss later on, but on an intuitive level we may understand it as a way to make sure that there cannot be a situation in which we jump back and forth between nodes, which are relatively close but still make us cross

(18)

a large number of grid-lines. The last criterion gives us a way to relate the size of the bounding box around the graph, to the number of nodes n in the graph, which will allow us to express the run-time in terms of n.

This so called ’perturbation’ is executed as follows. Place a square around the graph, called the bounding box, and place a grid in it. The width of each square in the grid is equal to L0

8nc, where L0 is the width of the bounding box and c is the constant mentioned

in the statement of the problem. We then move each node to the nearest grid-point and scale up by a facter of 64nc_L

0 , making the distance between consecutive grid-lines 8.

Note that the perturbation increases the length of each edge in a salesman path by at most 2 · L0

8nc and thus the cost of the tour is increased by at most 2n · L0 8nc = L0 4c < OP T 4c ,

since the width of the bounding box is determined by the largest horizontal or vertical distance between two nodes2 and thus OP T > L0. Also note that the width L of the

new bounding box is L = L0·64nc_L₀ = 64nc = O(nc).

4.1.2 Building the shifted quad-tree

We will now build a quad-tree on the the graph. We will assume L is a power of 2. If not, we will increase the bounding box until L is a power of 2, thus increasing L by at most a factor of 2. The conclusion that L = O(nc) still holds in this case.

Building a quad-tree is a recursive process. At each step, we determine the number of nodes in the current square. If there are two or more nodes in the square, we divide the square into four equally sized squares and apply the same process to each square. If there are one or no nodes in the square, we do nothing and the recursion stops.

Figure 4.1: An example of a tree on a Euclidean graph and a (1,1)-shifted quad-tree. The original bounding box is shown in red. Note that the resulting quad-tree is very different from the original quad-tree, which had a (0,0)-shift.

A shifted quad-tree is constructed by first shifting the bounding box, horizontally and vertically, by integer values. An (a,b)-shift is a shift of a distance a horizontally and a distance b vertically. We then construct the quad-tree as normal, where we interpret

2

Technically L0should be 1 more than this distance, in order to make sure none of the nodes are on the

(19)

squares at the edge to (possibly) wrap around to the other side. In the example above, the first dissection in the (1,1)-shifted quad-tree, would consist of the thick black lines.

4.1.3 Applying the dynamic programming algorithm

The general idea behind the dynamic programming step, is to solve a general version of the problem for each square of the shifted quad-tree. These solutions can then be combined to find solutions for larger squares and so on, until we find the solution for the square which contains the whole graph.

In order to limit the number of solutions for each square, we will only permit the path to cross the quad tree at certain points and only a certain number of times. More specifically we use the following definitions.

Definition 7. An m-regular set of portals on a quad-tree is a set of points on the edges of the squares in the quad-tree. Each square has one point on each corner and m equally spaced points on each edge.

Definition 8. An (m,r)-light salesman path, with respect to a quad-tree, is a salesman path which only crosses the quad-tree at an m-regular set of portals on the quad-tree, using ≤ r portals on each edge of each square of the quad-tree3, where we count multiple uses of the same portal as multiple portals. The edges in this path are allowed to ’bend’ to reach a portal, meaning that an edge between two nodes doesn’t have to be a straight line.

The dynamic programming step will find the optimal (m,r)-light salesman path, with respect to the shifted quad-tree. In section 4.3 we will show that this path is sufficiently close to the actual salesmen path, with chance at least 1₂, w.r.t. the random shift of the quad-tree.

The dynamic programming algorithm will use the following observations. Let S be a square in the quad-tree, excluding the outer square. Then the section of the optimal (m,r)-light salesman path OP Tm,r in S is a sequence of p paths such that:

• Let a1, . . . a2p be the sequence of portals which OP Tm,r crosses, in the order it

crosses them. Then the i − th path connects a2i−1 to a2i.

• Each node in S is visited by one of the paths.

• The collection of paths uses at most r portals on each edge, where we count multiple uses of the same portal as multiple portals.

On each square S in the quad tree, the section of OP Tm,r in S will be the ’cheapest’

set of paths with these properties. If this weren’t the case, we could switch the set of paths with a cheaper set and still have a salesman path, but with a difference in cost equal to the difference between the sets of paths. Using these properties we may define the generalized version of the (m,r)-light salesman path, a minimal (m,r)-multipath, as follows.

(20)

Definition 9. Let S be a non-empty square in the shifted quad-tree. Let M be a multiset of ≤ r portals on each side of S, such that the total number of portals in M , including multiplicity, is even4. Let {{p1, p2}, {p3, p4}, . . . {p2t−1, p2t}} be a pairing between the

portals in M . The optimal (m,r)-multipath is the ’cheapest’ set of paths, which connect p2i−1 to p2i, for i = 1, . . . t and collectively visit all nodes in S.

The optimal (m,r)-multipaths will form the entries of our lookup table, for all squares except the bounding box. The entry for the bounding box will be OP Tm,r. For more

information on the use of lookup tables in dynamic programming algorithms, see section 2.2.

In order to find the entry of a larger square S, using smaller squares S1, . . . , S4, we

must have a way of combining smaller multipaths into larger multipaths. We do this by first choosing a set of ≤ r portals on each of the four inner edges of the ’children’ of S. We then choose an ordering of the inner portals and a way to distribute them over the paths in an (m,r)-multipath of S. We can then simply look up the four entries corresponding to the choice of portals and their ordering on each Si. The minimum over

all possible choices will give us the table entry for S.5

Once we have the entries of the four children of the bounding box, we can find the (m,r)-light salesman path by checking each multiset and (reasonable) ordering of the inner portals, thus omitting the necessity for the inner portals to lie on a certain path.

4.2 Calculating the complexity of the algorithm

The smallest squares in the table contain one node and use ≤ 4r portals, so for each choice of portals computing the entry takes O(r) time, by placing the node in each of the O(r) paths. Each time we combine existing solutions, we have to try at most (m + 4)4r multisets of portals on the inside edges, at most (4r)! orderings of the portals and at most (4r)4r ways to distribute the portals over the given paths. We find that the calculation of each entry costs O((m + 4)4r(4r)4r(4r)!) time.

There are T = O(n log(n)) squares in the quad-tree. For each square there are at most (m + 4)4r ways to choose the portals and (4r)! ways to order the portals. This means that every square contributes at most (m + 4)4r(4r)! entries to the table, giving us a total of O(T (m + 4)4r(4r)!) entries.

Factoring in the time for each entry we find a running time of O(T (m+4)8r(4r)4r(4r)!2). By choosing m = O(c log(n)) and r = O(c) we find:

Trun= O(T (m + 4)8r(4r)4r(4r)!2)

= O(n log(n)(c log(n) + 4)8c(4c)4c(4c)!2) = O(n log(n)(c log(n))O(c)(4c)O(c)cO(c)) = O(n(c log(n))O(c)cO(c))

=∗ O(n(log(n))O(c))

4

Note that the use of a multiset implicates the possibility of some portals appearing multiple times.

(21)

(*) Since c is a constant, we find O(cO(c)) = O(1).

We see that the algorithm is indeed of polynomial time, with the degree depending on the desired precision.

4.3 Proof of correctness

The algorithm as described earlier computes the optimal (m,r)-light salesman path. The original problem, however was to find an approximation of the optimal salesman path. In this section we will discuss a proof for the following theorem, which relates these two problems. We will not give the full proof and will focus on the important aspects instead.

Theorem 1 (Structure Theorem). Let c > 0 be any constant. Let the minimum nonzero internode distance in a Euclidean graph be 8 and let L be the size of the bounding box. Then for a randomly chosen (a,b)-shifted quad-tree and m = O(c log(L)), r = O(c), there is a chance of at least 1₂ that the optimal (m,r)-light salesman path is at most 1 +1_c times as expensive as the optimal salesman path.

In the proof we will use the following two lemmas.

Lemma 6 (Patching Lemma). There is a constant g > 0 such that the following is true. Let S be any line segment of length s and π be a closed path that crosses S at least three times. Then there exist line segments on S whose total length is at most g · s and whose addition to π changes it into a closed path π0 that crosses S at most twice.

Lemma 7. For a grid where the distance between lines is of unit length, for π a salesman path and l a line in the grid, let t(π, l) denote the number of times π crosses l. If the minimum internode distance is at least 4 and T is the length of π, then

X

l,vertical

t(π, l) + X

l,horizontal

t(π, l) ≤ 2T

For a proof of the first lemma, resulting in g = 4, see section 4.4.1. The second lemma can be proven by looking at a single line segment in π and showing that the number of lines it crosses is at most twice its length6.

Proof (sketch). In the proof we will use constants s = 12gc and r = s+4. In order to find the difference in cost between the optimal salesman path π and the optimal (m,r)-light salesman path, we will modify π into an (m,r)-light salesman path and show that with chance ≥ 1₂ it is at most 1 + 1_c times as expensive as π.

Let l be a line in the quad-tree of the graph, we say that a line is at level i, if it borders a square in the quad-tree, which was formed after i divisions. For example the vertical lines on the outer borders of the left quad-tree in figure 4.1, on page 17, are at level 0, 1 and 2, where vertical line through the middle of the square is only at level 1 and 2.

(22)

We must first modify π, such that it is (m,r)-light at l. We do this by going through all subsections of l and applying lemma 6, if it has more that s crossings 7. We do this using a bottom up approach, starting with the edges of the smallest squares and gradually dealing with larger sections8. Rewriting some sums9 and using the fact that we have applied the lemma at most t(π,l)_s−3 times10, we find that the expected cost increase for the line l, over the random shift of l, is at most 2gt(π,l)_s−3 .

We then move each crossing of π with l to the nearest portal of an m-regular set of portals, where m ≥ 2s log(L), by adding two line segments on l. The expected increase for this procedure is at most

E(”Cost of moving portals”) =

log(L)

X

i=1

P(level l = i) · t(π, l) · 2 · distance to portal

= log(L) X i=1 2i L · t(π, l) · L 2i_m = t(π, l) log(L) m (since m ≥ 2s log(L)) ≤ t(π, l) 2s Assuming that s > 15 and g > 1 we find

2gt(π, l) s − 3 + t(π, l) 2s ≤ 3gt(π, l) s

By adding the expected cost increase of all lines together and using lemma 7 we find

X l,vertical 3gt(π, l) s + X l,horizontal 3gt(π, l) s ≤ 6g OP T s = OP T 2c

where OP T is the length of π. A direct application of Markov’s inequality now gives us that the total cost increase is less then OP T_c with a chance of at least 1₂.

7

The reason why we apply the lemma for more than s crossings, is that we may also get more crossings when we apply the lemma on a line l0, perpendicular to l. We can limit this to 2 extra crossings per line, using the patching lemma, giving us at most 4 extra crossings per section, resulting in s + 4 = r crossings.

8_{We will not discuss exactly how this approach works.} 9

We will not discuss these derivations.

(23)

4.4 Additions

4.4.1 Patch Lemma

In this section we will give an alternate proof of the patching lemma, which was used to prove the correctness of the algorithm. We will find a value of g = 4, instead of the g = 6 which was found in (Arora, 1998).

Lemma 8 (Patching Lemma). There is a constant g > 0 such that the following is true. Let S be any line segment of length s and π be a closed path that crosses S at least three times. Then there exist line segments on S whose total length is at most g · s and whose addition to π changes it into a closed path π0 that crosses S at most twice.

Proof. Like the proof in (Arora, 1998), we will use the well-known result that any con-nected graph with all even degrees has an Eulerian path. In order to illustrate the proof, we will apply it to the example path below.

v1 v2 v3 v4 v5 v6

S

Figure 4.2: An example closed path π, crossing a line S.

For ease of exposition, we will change the visualization of the given path. We will break the path up into arcs, at the points where it crosses S. In the example, this is at v1, v2, . . . , v6. We will then represent the crossing points as vertical lines and the arcs as

horizontal lines between them. This results in the following, ’layered’ representation.

v1 v2 v3 v4 v5 v6

S

(24)

From here on, we will interpret the path as a graph, with the crossing points as vertices and the arcs between them as edges. Note that the layers only connect their endpoints, meaning that v1 and v2 are not connected. Also note that the vertical position of the

’layers’ has no extra meaning. Our goal is to turn the graph into an Eulerian graph, on both sides of S, by adding edges along S. This is sufficient, because we can find the closed path π0 by first walking along an Eulerian path on the top side of S, starting and ending at any of the original crossing points, then crossing and doing the same on the bottom side of S, again crossing at the same point.

We construct these Eulerian graphs, by first connecting each pair of consecutive cross-ing points. This guarantees that the graph on {v1, v2, . . . , v6}, with the horizontal lines

as edges, is connected. We now add a second edge between two consecutive crossing points, if the number of ’layers’ between them is uneven. This results in the following graph.

v1 v2 v3 v4 v5 v6

S

Figure 4.4: The ’layered’ representation of the path π0. The added edges on S are indicated in red.

v1 v2 v3 v4 v5 v6

Figure 4.5: The top half of π0, as a multi-graph.

This graph has all even degrees, because the degree di at crossing point vi is di =

EL,i+ ER,i where EL,i is the number of edges starting at vi and moving to the left and

ER,i is defined similarly but to the right. We find that EL,i+ ER,i= LL,i+ LR,i− 2L•,i,

where LL,i is the number of layers, slightly to the left of vi, LR,i the number slightly

to the right and L•,i the number of layers that move along vi without ending or beging

at it. For example, on the top side of the example graph, we have EL,3 = 3, ER,3 =

1, LL,3= 4, LR,3 = 2 and L•,3 = 1. We find that di ≡ LL,i+ LR,i ≡ 0 + 0 mod 2 and

thus vi has even degree. This makes the graph Eulerian at both sides of S.

Finally we have to find the value of g. We add, at most, 2 edges between each pair of consecutive crossing points and thus add, at most 2s worth of edges. We do this on both sides of S, giving us a final value of g = 4.

This lemma gives us an upper bound of g, but we can also find a lower bound of g, by looking at the following path.

(25)

v1 v2

S

Figure 4.6: A closed path for which g ≥ 2, with crossing points v1 and v211.

Note that for any choice of crossing points of π0, we can find a pair of crossing points, which lie on crossing points of π and are at least as efficient12. Thus we may assume that the crossing points of π0 lie on {v1, v2}.

If both crossing points lie on the same point then π0 has to go back and forth on the top side of S, giving us g ≥ 2. If the crossing points lie on different points, one on v1

and one on v2, π0 has to go along S once on the top side and once on the bottom side,

since the arcs on the bottom side also go along the entirety S and in order to get to the other side of S, the path has traverse S an odd number of times.

4.4.2 Reasons behind the shifted quad-tree

The randomly shifted quad-tree is probably the most mystifying part of this algorithm. In this section we will discuss its role in more detail.

As part of the proof of correctness, we find an upper bound on the expected cost increase from making a salesman tour (m,r)-light. Since the structure of each shifted quad-tree can be completely different, the number of times an optimal salesman tour crosses one of the lines in the tree can vary significantly between different quad-trees. Since the number of crossings is loosely related to the aforementioned cost increase, it is not surprising that some shifted quad-trees will result in much better approximations than others. The proof of correctness really just tells us that at least half of the shifted quad-trees will meet our requirements.

This explains the need for a random shift, but not the need for a quad-tree. First of all the quad-tree decreases the runtime by roughly a factor of n, since a quad-tree has O(n log(n)) squares which have to be computed in the dynamic programming step, where a full disection has O(n2log(n)) squares. Next to that the quad-tree limits the amount of unnecessary crossings by not dividing up empty squares. This also factors into some of the derivations which were left out in section 4.3

11

Strictly speaking there should be four crossing points, but for the sake of clarity overlapping crossing points are left out.

12

This is not hard to prove, simply consider all possible situations for a certain crossing point and you will see that the claim holds for each.

(26)

5 Conclusions

In this thesis we have examined two algorithms, each of which solves a different version of TSP.

We have discussed an algorithm for general TSP as seen in (Bj¨orklund, 2010). This algorithm used some algebraic methods to reduce the problem to a simpler one, which was then solved using a combination of existing methods. One of the most notable techniques used was the use of rings of characteristic two to have certain, unwanted elements cancel out. Another notable technique was the use of Markov’s inequality and the Schwartz-Zippel lemma to justify the assumption that, with high probability, a certain polynomial is zero, if it is zero for some test input.

We have also discussed an approximation algorithm for Euclidean TSP as seen in (Arora, 1998). This algorithm uses a more geometric approach, combined with dynamic programming. One of the most interesting results is the manner in which the problem is divided up into smaller subproblems, in order to allow for a dynamic programming approach. Another surprising result is the fact that, what seems like a very convoluted method, turns out to be rather fast. Like the first algorithm, this algorithm also depends on randomized elements to function, also using Markov’s inequality to deal with the probabilities. It is not at all unlikely that we will see more algorithms incorporate random elements in the future.

Of course, there is still a lot of progress to be made on the Traveling Salesman Problem. In regard to the papers discussed in this thesis, one avenue of research may be the optimization of these algorithms. In (Bj¨orklund, 2010) there is a section discussing a more optimal choice of some of the parameters. It may be possible to improve the efficiency or the chance of correctness by further tweaking these variables. This may also be the case for (Arora, 1998). In the case of (Arora, 1998) it may also be possible to find a more efficient dissection than a quad-tree, or perhaps one which doesn’t require randomness to achieve a polynomial runtime.

(27)

6 Popular Summary

We’ve all been there, it’s a Saturday, you’re in the city centre with a long list of errands to run. You have to pick up your prescription from the pharmacist, buy groceries for dinner, check five different stores to find a birthday gift for a friend, and so on, ending up back at the car park you started at. Of course, like most rational people, you strongly dislike shopping and want to finish as fast as possible. This raises the question, which order of the locations results in the shortest shopping trip? Most people will see this as an average Saturday, but a mathematician will immediately find an interesting scheduling problem in this situation. What we have here is an instance of the famous Traveling Salesman Problem, a problem which has been the focal point of a large amount of mathematical research. The problem asks for the fastest route along a collection of points, which starts and ends at the same point. Like a lot of famous problems in mathematics, stating the problem is not very complex, but finding good solutions has proven to be quite difficult.

Figure 6.1: An example of a possible tour along a collection of cities.

In this case a ’good’ solution consist of a fast algorithm which finds the fastest tour for a given instance of the Traveling Salesman Problem. Currently, the fastest known algorithms run in time, exponential in the number of points. This means that, if there are n points to visit, the runtime will be around kn, for some constant k. This means that adding a single extra point, will multiply the run time by a factor of k.

(28)

check. If we fix one of the points as the starting point, we can pick any of the remaining n − 1 points as the next point, then any of the remaining n − 2, etc. This gives us a total of (n − 1)! = (n − 1) · (n − 2) · · · 3 · 2 · 1 possible routes to check!

Luckily, putting certain restrictions on the problem can make it easier to solve. The assumption that the distances between points are ’Euclidean’ in nature, meaning that the points behave as points in a 2 dimensional plane, allows for much faster algorithms. When we also allow approximate solutions, we can push the runtime down even further. In my thesis I discuss two algorithms, one for the general case of the Traveling Salesman Problem and one for the approximate, Euclidean variant. I discuss the workings of each algorithm and also give a proof of correctness for each. I also try to add as much original work as I can, by giving alternative proofs for lemmas from the papers and other such additions.

One of the reasons this is such a famous problem, is because it is ’NP-hard’. Essentially what this means is that finding a fast solution to this problem will allow you to then find fast solutions to a whole class of problems. Among these problems are things like prime factorization, which is vital to the functioning of some widely used cryptographic methods. This means that finding a fast enough solution to the Traveling Salesman Problem, could potentially endanger the privacy of a large portion of internet traffic.

(29)

7 Bibliography

Sanjeev Arora. 1998. Polynomial Time Approximation Schemes for Euclidean Traveling Salesman and Other Geometric Problems. J. ACM 45, 5 (Sept. 1998), 753–782. DOI: http://dx.doi.org/10.1145/290179.290180

Andreas Bj¨orklund. 2010. Determinant Sums for Undirected Hamiltonicity. CoRR abs/1008.0541 (2010). http://arxiv.org/abs/1008.0541

James R. Bunch and John E. Hopcroft. 1974. Triangular factorization and inversion by fast matrix multiplication. Math. Comp. 28, 125 (1974), 231–236. http://www.ams. org/jourcgi/jour-getitem?pii=S0025-5718-1974-0331751-8

F. Yates. 1937. The Design and Analysis of Factorial Experiments. Imperial Bureau of Soil Science. https://books.google.nl/books?id=YW1OAAAAMAAJ

Traveling Salesman Problem