Where the really hard graph coloring problems are

(1)

Where the really hard graph coloring

problems are

a replication study

Melanie G.J.W. Baaten 11053909 Bachelor thesis Credits: 18 EC

Bachelor Opleiding Kunstmatige Intelligentie University of Amsterdam

Faculty of Science Science Park 904 1098 XH Amsterdam

Supervisor Jelle van Assema Faculty of Science University of Amsterdam

Science Park 904 1098 XH Amsterdam

(2)

Abstract

NP-complete problems are known to be one of the most difficult problems in the field of computational complexity. Yet, it seems that most instances of these problems are quickly solved. In “Where the really hard problems are” Cheeseman et al. (1991) were determined to find out where the instances are hiding that cause the hardness of solving an NP-complete problem. Four NP-complete problems, one being the Graph Coloring problem, were put to the test. Cheese-man et al. (1991) found the hard instances to be located in the phase transition. This is an area in which instances of a problem go from a solving-probability of almost one hundred percent to this proba-bility dropping to nearly zero. This research provides a replication of the experiment on Graph Coloring as carried out by Cheeseman et al. (1991). Additionally, the graph generation methods are altered, to check whether these affect the phase transition. The results show that the research as put forward by Cheeseman et al. (1991) is repro-ducible and valid. Three sets of graphs were created: Random graphs, Cetal graphs and Fully Reduced graphs. For all a phase transition is found at the critical degree of 8.1. On top of that, it is proven that the more reduction steps are applied to produce a graph, the more the area that holds the hard instances expands to higher average degrees.

(3)

1 Introduction

Throughout the years, computational power has seen an increase while the computational costs have decreased. These improvements have helped to make it possible to solve bigger and more complex problems. At this time, problems that have not found a solution begin to stand out (Fortnow, 2009). Problems like this, for example, can be found in the NP-complete complexity class. In the field of computational complexity these problems are part of the bigger P vs NP problem. Where P stands for problems solvable in polyno-mial time. As these problems grow in size, the time to solve them scales up evenly (Garey & Johnson, 1979). Sorting a list is part of the P complexity class. In the worst case, the algorithm has to loop through the whole list to sort it. An algorithm that solves problems like this is a polynomial-time algorithm. NP-problems do not know such an algorithm yet, and it is un-known whether that algorithm will ever be found for these problems. An example of an NP-problem is factorization. Solving factorization is done in exponential time, but the answer can be checked in polynomial time. When even the answer can not be checked in polynomial time, a problem falls into the NP-hard subset (Garey & Johnson, 1979). The NP-complete subclass mentioned earlier can be found in the overlap between the NP class and NP-hard class. What differentiates these problems is the fact that they can all be translated to each other. That is why, if a polynomial-time algorithm is found, the equation NP = P is satisfied and an entire complexity class will be wiped out. For now though, this does not seem to be the case (Fortnow, 2009). The problems classified as NP-complete are still known to be among the hardest problems to solve, where in some cases the problem can become so big it is even unsolvable.

Part of the NP-complete class is the graph coloring problem. Classified as an NP-complete problem by Karp (1972), graph coloring knows many real-life applications. Scheduling (Wood, 1969), air-traffic flow management (Barnier & Brisset, 2002) and register allocation (Allen et al., 2002) all can be tackled by translating the problem to graph coloring. Problems that are NP-complete are typically viewed as one of the most complex problems. No-tably, even though graph coloring falls into the NP-complete class, years ago researchers found the problem easy to solve (Minton et al., 1990). Turner (1988) even came out with the title: “Almost All k-Colorable Graphs are Easy to Color.” It turns out that not all instances of the graph coloring problem are hard to solve. The problem space can be split into under-constrained and over-under-constrained phases (Cheeseman et al., 1991). In the under-constrained phase an instance has many solutions and knows many

(5)

paths to get to these solutions. The probability of solving an instance lo-cated in this area is almost one hundred percent (Cheeseman et al., 1991). It looks like Minton et al. (1990) and Turner (1988) were solving instances in this under-constrained phase (Cheeseman et al., 1991). The instances in the over-constrained phase are counterintuitively also easy to ‘solve’. The in-stances have a solving probability of nearly zero. Algorithms that try to solve an instance in this phase can detect the problem being unsolvable quickly, which leads to a quick return.

The question arises: “Where are the really hard problems?” This question is the driving force behind the research conducted by Cheeseman, Kanefsky and Taylor in 1991 (hereafter referred to as ‘Cetal’). Cetal found that some-thing interesting is happening in the phase transition. Located between the over-constrained and under-constrained phases, the phase transition is an area that holds almost all of the hard instances. Subsequently, this is the area where the computation time drastically increases. Cetal perform four experiments and define order parameters that indicate the location of this phase transition for these experiments. One of these experiments is the graph coloring problem. The order parameter for graph coloring is defined by Cetal as the critical connectivity. Connectivity has gotten a new definition since Cetal’s paper came out (Beineke et al., 2002). The manner in which connec-tivity is interpreted by Cetal, now translates to degree. This is the average amount of edges a vertex in a graph has. Knowing this predictable value of the critical degree, and subsequently knowing the location of the phase transition, before solving a graph coloring problem is vastly useful. It could serve as a guide in which resources to use and how to handle the problem. Instances that are located in the phase transition could be tuned, by de-composition for instance, to avoid the phase transition. Since graph coloring knows so many real-life applications, it is important to check whether the re-sults as put forward by Cetal are reproducible and still valid. For that reason this research will focus on replicating the experiment on graph coloring as originally carried out by Cetal to answer the question: Where are the really hard instances of the graph coloring problem? To support this question a sub-question will be answered: How do the graph generation methods affect the phase transition?

The next section, section 2, gives an in-depth explanation of the graph coloring problem and discusses previous research on this subject. Section 3 provides an insight in the graph generating process, as well as a detailed description of the chosen graph coloring algorithm. Following, section 4 portrays the results, where it can be seen that the phase transition changes as different generation methods are applied. An analysis of these results will be provided in the same section. Section 5 summarizes these results

(6)

and answers the research question. Finally, section 6 provides insights on difficulties experienced during this research and gives a possible direction for future work.

(7)

2 Theoretical Background

This section focuses on giving a more in-depth explanation of the graph coloring problem and what algorithms can be employed to solve this type of problem. Additionally, previous work on the subject will be discussed.

2.1 Graph Coloring

The graph coloring problem is a part of graph theory, a branch within math-ematics, that focuses on points that are connected by lines. The points in a graph are named vertices, and the lines are named edges (West, 2001). Being a constraint satisfaction problem, graph coloring is done conforming to three constraints. Firstly, every vertex has to be colored. Additionally, this has to be done with a maximum of K colors. Lastly, no adjacent vertices can have the same color.

Graph coloring is, as mentioned before, an NP-complete problem. These are decision problems, which means they have a Yes/No answer. For graph coloring a ‘Yes’ answer is returned when the graph is K-colorable. ‘No’ would be whenever the algorithm is not able to find a coloring for the graph. An algorithm that solves this kind of problem is non-deterministic. This means that it can follow different paths to find the same answer. Additionally, the algorithm has to be complete, that is to always return an answer. An algo-rithm that provides a Yes/No answer, is non-deterministic and complete is depth-first search. The algorithm creates a tree-structure where every vertex represents the partly colored graph (see figure 1). With every step in the tree-structure an additional vertex is colored. The algorithm firstly traverses down the state-space, the space that holds all (partial-)solutions, hence the name depth-first. An example in which a state-space is explored can be found in figure 1. 1 2 3 4 5 6 7 8 9 10 11 12

Figure 1: Depth-first search tree. Each vertex is a partly colored graph. The numbers represent the order in which the tree is explored. The solutions are located in the leave nodes.

(8)

2.2 Related work

Cetal’s work was followed up with the paper “Computational Complexity” (Cheeseman et al., 1992). Both papers are referenced throughout this re-search. The follow-up paper has a more extensive description of the graph coloring experiment, which proved to be more useful for describing the meth-ods of this research. “Where the really hard problems are” pioneered in the study of phase transitions and sparked great interest in the subject within the research community. Various papers followed, such being the paper by Hogg & Williams (1994), where a double phase transition was discovered. It was found that when provided with a longer computation time, another peak occurred slightly before the peak observed by Cetal. The researchers claim that this firstly occurring peak holds the most difficult instances.

Cetal performed experiments on four different NP-complete problems: Hamilton circuits, Graph coloring, K-Satisfiability and Traveling Salesman. The experiments on Hamilton circuits and K-Satisfiability have since been replicated. The experiment on Hamilton circuits has succesfully been repli-cated by van Horn et al. (2018) and concluded that the results as put forward by Cetal are valid. Multiple algorithms were tested, which showed that a phase transition is present at all times. Additionally, van Horn et al. (2018) also showed that sophisticated pruning has a significant effect: It minimizes the computational costs in the phase transition. Meeus (2019) replicated the K-Satisfiability experiment as a thesis project and found for this NP-complete problem, Cetal’s experiment is reproducible as well and the results are still valid.

(9)

3 General Methodology

The graphs are created with the use of the NetworkX package version 2.41_.

The algorithms used for coloring are all written in Python version 3.6.4, and are run on a MacBook Pro with an Intel Core i5 processor. The methods in which the graphs are generated will be discussed in this section, followed by the methods used to color the graphs. Next, a description of the evaluation methods is given. Finally, the set-up for the experiments is explained.

3.1 Graph generating

This research puts the graph generation methods to the test. Cetal per-formed experiments on 3-K colorable graphs as well as 4-K colorable graphs. This research solely focuses on replicating the experiment on 4-K colorable graphs as this experiment presents the best spread of degrees. Besides, the phase transition is more distinct in 4-K colorable graphs. To check whether the used methods affect the location of the phase transition, three subsets of graphs are produced: Random Graphs, Cetal Graphs and Fully Reduced Graphs2_{. The Random Graphs serve as a control group and are generated by}

a random graph generator provided by NetworkX3_{. The foremost goal is to}

follow Cetal’s methods closely whilst making as little assumptions as possi-ble. Unfortunately, it was noted early that the research skips over important details, which means choices are left to be made. The first choice concerns the amount of vertices used to build a graph. Cetal state that different val-ues for N, the amount of vertices, are used. The boundaries, however, are not specified. After some trials the lower boundary was set to 100 vertices, as going lower will leave most graphs to be reduced to the null graph by later performed reduction steps. An N of 300 satisfies the required range of degrees and is therefore set as the upper boundary. The generation of the Cetal graphs and the Fully Reduced Graph will be according to the follow-ing methods. An amount of N vertices varyfollow-ing between 100 and 300 will be added to the graph with a number as label. After the vertices are added to the graph each vertex will form a number of edges based on a chance P. Hereafter, three reduction steps will be applied, that will be described in the next section. The graph generation algorithm can be found in algorithm 1.

1_{NetworkX documentation} 2_{Generated graphs}

3_{NetworkX G}

(10)

Algorithm 1: Generating Random Graphs

Input: amount of vertices N, amount of colors K, chance P initialize graph G for N do add vertex to G for Vertex1 in G do for Vertex2 in G do if random(0,1) < P / N then

G = create edge(G, K, Vertex1, Vertex2) G = reduce graph(G, K)

return G

Cliques, multiple vertices that are all connected to each other, could pose a problem for solving a graph. When the size of a clique is bigger than the amount of colors K, the constrainedness of vertices in this clique becomes too big which results in the graph being unsolvable. The Fully Reduced Graphs differentiate from the Cetal Graphs by the manner in which the edges are formed. Whilst Cetal did not check for a maximum size of cliques, this ap-peared to be a logical additional step. This is because when a clique bigger than K is present in a graph, it can be decided in polynomial time that this graph is unsolvable. Making sure that cliques like this are not formed might as a result produce graphs at a higher degree that are solvable. Whenever an edge is added to a Fully Reduced graph it is checked whether this produces a clique bigger than K. When this is the case, the edge is removed and the algorithm moves on to a new pair of vertices. The way in which this algo-rithm operates can be found in algoalgo-rithm 2.

Algorithm 2: Create Edge

Input: graph G, amount of colors K, Vertex1, Vertex2 if Vertex1 != Vertex2 then

add edge between Vertex1 and Vertex2 in G for clique containing Vertex1 do

if Vertex2 is in clique and length(clique) > K then remove edge from G

return G

3.1.1 Cetal Reduction Steps

When a Cetal graph or a Fully Reduced graph is produced a set of three reduction steps are applied. Cetal introduced these reduction steps because the hardness lies not in the coloring the vertices concerned. The first step is to eliminate under-constrained vertices (see figure 2). An under-constrained vertex can always be colored using K colors because it only has K-1 edges.

(11)

Next to that, subsumed vertices are eliminated (see figure 3). A subsumed vertex is one which complete edge-list is a subset of another vertex. As these two vertices are connected to the same vertices they must have the same color. Consequently, the one with the least amount of edges, the subsumed vertex, can be removed. Lastly, symmetrical vertices that are connected via multiple cliques of size K-1 must have the same color and can, therefore, be merged (see figure 4). The order in which these three reduction steps are applied is interchangeable and the application of one leads to the other be-coming applicable (Cheeseman et al., 1992). Cetal do not specify how many times these reduction steps are applied, so the choice was made to apply them until the graph stabilizes. The application of the reduction steps can reduce a graph into the null graph. A null graph is a graph which vertexlist and edgelist are both empty (West, 2001). The hardness lies not in solving null graphs, thus these are not taken into account.

X

Figure 2: Remove undercon-strained vertices (K = 3)

X

Figure 3: Remove subsumed vertices (K = 3)

X X

=⇒

X

Figure 4: Merge symmetric vertices with each-other (K = 3)

3.2 Graph coloring algorithm

The following section describes the methods in which the problem space is explored. Further, this section provides a description of the algorithm that determines the order in which a graph is colored.

(12)

3.2.1 Depth first search and coloring

The overall approach to solving the generated graphs is conforming to a tra-ditional depth-first search. The DSatur algorithm will provide the search algorithm the coloring order of the vertices. The coloring is done by assign-ing a vertex to a color represented by a number. One iteration can provide multiple vertices and these vertices can at times be colored in multiple colors. This is why all choices are kept on a stack so backtrack is possible whenever needed. The algorithm for this part can be found in algorithm 3.

Instances located in the phase transition can take up such a large com-putation time that a boundary has to be set to cut the further exploration of the state-space. A closer inspection of the plot displayed by Cetal (see figure 5), shows that a boundary was set as well. This boundary was set at 144.000 Brelazsteps. Cetal do not specify why this number is chosen nor what Brelazsteps are. After carefully reading the paper by Br´elaz (1979) that was referred to by Cetal, a definition had not presented itself. That is why this research defines the Brelazsteps parameter as a counter that increments with every node returned by the DSatur-algorithm. Next to that, another measuring-parameter was defined: Steps. This is a counter that increases whenever an element is popped from the stack. Trials were run to test vari-ous boundaries (see appendix A). One test setting the boundary at 144.000 Brelazsteps and another one setting the boundary at 100.000 Steps. The trial showed that the different boundaries did not affect the location of the phase transition. The choice was made to pick steps as a computational cost parameter and to select 100.000 steps as a boundary.

(13)

Algorithm 3: Depth first search

Input: graph G, amount of colors K create stack S

push G to S while S do

G* = S.pop steps + 1

if G* is fully colored then return Solved

if steps > 100000 then return Stopped vertices = DSatur(G*) for vertex in vertices do

G** = color-vertex(vertices, G*, K) if G** is not visited then

mark G** as visited push G** to S return Unsolved

Algorithm 4: color-vertex

Input: vertex, graph G, amount of colors K for color in K do

if color not in neighborhood of vertex then G* = apply color to vertex

return G* return G

3.2.2 DSatur

Cetal chose the DSatur algorithm to determine the order of coloring. This was most likely the best scoring algorithm at the time for solving the graph coloring problem. This algorithm is used, as one of the goals of this research is validating the results introduced by Cetal. Introduced by Br´elaz (1979), the DSatur algorithm uses degree of saturation to determine the coloring or-der. The vertex with the highest degree of saturation will be colored first. If this provides multiple vertices, the one with the highest degree is chosen. The NetworkX-package provides this coloring strategy4_{. When presented with a}

tie the NetworkX algorithm makes a choice and returns one vertex. That is why this algorithm was slightly altered such that it returned not one vertex, but multiple at once. In this way these choices can be pushed on a stack and whenever faced with a choice, the depth-first search algorithm can backtrack and pick another option.

(14)

3.3 Evaluation

Cetal have produced a plot to visualize the results (see figure 5). Similar plots are created to evaluate and compare the results of all three graph-sets. It was uncertain whether the ’Average Connectivity’ on the x-axis referred to the degree of the graph before reducing it or afterward. Hence, a trial was run to check the effects of selecting different variables for the x-axis on the location of the phase transition (see appendix B). On the grounds that no change in the location of the phase transition was observed, the average degree of the reduced graphs was selected as the value for the x-axis. This is because these are the graphs that will be solved by the depth-first search algorithm. The variable on the y-axis will be the number of steps taken to solve an instance. The main focus of these plots is to locate the phase transition, so the critical degree can be determined. The critical degree can be found at the peak of a line visualizing the average computational costs per average degree. Further, a distinction is made between solved, unsolved and stopped graphs. The solved graphs should predominantly occur before the critical degree, as the phase transition is the area in which the solving probability drops to nearly zero. The stopped graphs should be centered around the phase transition. By separating and visualizing the solved and stopped instances, these validate the location of the phase transition. In Cetal’s plot (figure 5) the peak is seen at the average degree of 9.1. The stopped graphs are all centered between average degrees of 8.5 and 10.5. This is subsequently the area where the hardest to solve instances are located. To check whether similar results are found, the findings of Cetal will be compared to the results of this research.

(15)

Figure 5: Plot visualizing the computational costs (Cheeseman et al., 1991). Axis are altered for readability.

3.4 Experiments

Three distinct graph-sets are created to test the graph generation methods: Random graphs, Cetal graphs and Fully Reduced graphs. Each set of graphs will be run a total of three times. This has to be done because of the usage of a non-deterministic algorithm to color the graphs. The algorithm has a variable outcome every time, so an average will be taken to evaluate the results. Cetal used a number of five trials. The amount of trials for this research is set on three. Due to time-limitations, this was the closest amount of trials to Cetal that was possible.

(16)

4 Results and Analysis

This section displays the obtained results and provides an analysis. A plot displaying the average computational costs of three trials against the average degree is presented for all experiments. After providing the results for all three graph-sets an analysis and a comparison between the three graph-sets will be given.

4.1 Random Graphs

A total of 2700 Random graphs were produced. Figure 6 shows the average results of three trials for the Random graphs. A phase transition is observed with a peak in computational costs at an average degree of approximately 8.1. The critical degree is, consequently, for the Random graphs set at an average degree of 8.1. After this average degree, the probability of finding a solution to an instance is near zero. A closer inspection of the 1222 solved graphs showed that beyond an average degree of 8.06 no graphs are solved. Additionally, a total of 189 graphs were stopped, where the first appeared at an average degree of 5.99 and the last at an average degree of 10.62. Moreover, most of the instances of the random graphs that take up a vast amount of computational cost can be found in an area spanning from an average degree of 6 to an average degree of 10.5. The graphs that lie beyond this range are nearly all solved in a minimal amount of steps.

(17)

4.2 Cetal graphs

A total of 780 Cetal graphs were colored by the depth-first search algorithm. This number lies lower than the amount of Random graphs because the reduction steps reduce part of the produced graphs to the null graph and it was decided to not solve the null graphs (see section 3.1). Figure 7 shows the average results of three trials for the Cetal graphs ran with the depth-first search using the DSatur heuristic. The difference of this plot and the one by Cetal was noticed, that is why a trial was run with a new Cetal graph-set where the null graphs were also used (see appendix C). Considering the null graphs did not affect the critical degree, thus it was chosen to use the Cetal graph-set without null graphs. The Cetal graphs show a phase transition with the critical degree located at an average degree of approximately 8.1. A total of 152 graphs were solved where the graph with the highest solved degree is an average degree of 8.3. 128 graphs reached the maximum amount of permitted steps and were located between an average degree of 6.47 and 14.03. The remaining graphs were unsolved, where most hard instances are centered within an area from an average degree of 7 and 13.

Figure 7: Computational costs Cetal graphs

4.3 Fully Reduced graphs

A total of 779 Fully Reduced graph were created. As mentioned at the results for the Cetal graphs, this is due to the fact that null graphs are not saved. Figure 8 displays the results obtained for these graphs. A phase transition can be seen with the peak in computational costs at an average

(18)

degree of approximately 8.1. A total of 151 graphs were solved, with the highest average degree being 9.1. A total of 154 stopped graphs can be found between an average degree of 6.56 and 15.95. The remaining graphs were unsolved and most hard instances are centered between an average degree of 7 and 14. Although the phase transition starts around the same average degree as seen in the Cetal graphs, it expands beyond where the Cetal graphs stopped. This is an expected results as the additional step that prevents cliques bigger than K to be formed is applied to these graphs. When the algorithm finds a clique bigger than K, it can quickly see that the graph is unsolvable. Because these graphs do not contain such cliques the amount of steps required to determine that a graph is unsolvable has increased.

Figure 8: Computational costs Fully Reduced graphs

4.4 Combined Results

Table 1 displays the critical degree and the range of average degrees in which the hard instances are located for all sets of graphs as well as for the plot by Cetal. For the Random graphs, Cetal graphs and the Fully Reduced graphs a phase transition is seen just like in Cetal’s work. Notable is that the peak in computational cost for all three subsets is observed one degree earlier than is seen in the results put forward by Cetal. However, the peak that is observed by Cetal does lie within the phase transition. This means when assumed that the findings of this research are correct, that hard instances begin to occur earlier than initially was thought. For all of the graph-sets, this is confirmed by the area in which hard instances occur starting earlier than in Cetal’s research.

(19)

The graph generation methods do not seem to affect the critical values of the order parameter. What is affected by the graph generation methods is the area in which hard instances are found. This area gets expanded as more reduction steps are applied to produce graphs. Because null graphs are filtered out, as more of the reduction steps get applied, harder graphs are produced. As the average degree increases a vertex’s constrainedness grows. Subsequently, the algorithm used to solve has a harder time solving these graphs. This means that the hard instances are not exclusively found around the critical degree but also beyond that degree, although in smaller numbers.

Critical degree Hard instances (avg. degrees)

Random graphs 8.1 6 - 10.5

Cetal graphs 8.1 7 - 13

Fully Reduced graphs 8.1 7 - 14

Cheeseman et al. 9.1 8.5 - 11

Table 1: Critical degree and location of the hard instances for all three graph-sets and the results obtained by Cheeseman et al. (1991).

(20)

5 Conclusion

This research aimed to reproduce the experiment on graph coloring as intro-duced by Cetal to locate the hard instances of the graph coloring problem. It is fair to say that Cetal’s experiment is reproducible and that the results are valid. A phase transition is observed at all times. However, when as-sumed that the findings of this research are correct, it can be concluded that phase transition occurs slightly earlier than stated by Cetal. The three sets consisting of Random graphs, Cetal graphs and Fully Reduced graphs all show a spike in computational costs around an average degree of 8.1, while Cetal found the spike at an average degree of 9.1. Although all graph-sets share an identical critical degree, they do show a difference in the area in which hard instances occur. This observed difference provides an answer to the question: “How do the graph generation methods affect the phase transi-tion?” The graph generation methods affect the area in which hard instances occur. The application of reduction steps leads to the production of harder to solve instances. As more of these steps are applied, the area in which hard instances occur expands. This leads to an answer to the research question: “Where are the really hard instances of the graph coloring problem?”. The bulk of hard instances do occur at a critical degree of 8.1, however, they do expand beyond this point. Additionally, this area is heavily affected by the manner in which a graph is produced. The critical degree is a good indicator of where to stop trying to solve instances, as nearly no instances are solved beyond this point.

(21)

6 Discussion

The foremost goal of this research was to replicate Cetal’s experiments on graph coloring. As mentioned throughout this research, plenty of details were not specified by Cetal. This left choices open, where it cannot be said with complete certainty that the Cetal methods are duplicated. There was some uncertainty about the used parameters, specifically the Brelazsteps. Cetal do not clarify what a ‘Brelazstep’ exactly entails, that is why it cannot be certain that the same metrics were used by this research. Comparing the computational costs produced by this research to the plots by Cetal is tricky. An observed difference could be a result of these measurements being non-identical. The height of the peak in computational costs could unfortunately not be compared to the height of the peak observed in the plot by Cetal. The critical degree, however, can be compared. For all three graph-sets, the critical degree is found at an average degree of 8.1. Cetal found the critical degree to be at an average degree of 9.1. Various methods were put to the test: different graph generation methods, various values for the x-axis as well as the y-axis and multiple boundaries. For all tests ran by this research the critical degree was located at an average degree of 8.1. This gives reason to believe that this found value for the critical degree is the correct one. How-ever, the decision was made to not color graphs that have been reduced to the null graph. This seemed a logical step because solving a graph that is re-duced to the null graph will not be a hard instance. In retrospect, it appears that Cetal did consider these graphs (see appendix C). An explanation as to why Cetal found the location of the phase transition at a higher average degree could be a combination of using a higher boundary and using null graphs. These alterations in methods could lead to a shift in the location of the phase transition. Unfortunately, this could not be tested within the scope of this research project, so it remains open to be investigated in future research.

The research by Hogg & Williams (1994) pointed out, as mentioned in the related work section, that a second phase transition exists. This research did not find this second phase transition. This could be due to the maximum amount of steps permitted to be taken to solve a graph. Future research could look into the effects of a higher boundary and locating this second phase transition.

The generation methods for the Cetal graphs, as well as the Fully Re-duced graphs, proved the generation of higher average degrees to be harder. The generation process for these graphs proved to be a time-consuming task and, due to time limitations, more graphs could not be created. The Fully

(22)

Reduced graphs required an increment of the number of vertices in order to obtain graphs of a higher average degree. This is due to preventing the formation of cliques bigger than K. Future work could focus on getting a wide range of degrees within using the same amount of vertices. In that way, the relationship between the average degree, the computational costs and the amount of vertices can be explored.

Acknowledgements

I want to express my gratitude to various people for their contribution to this thesis project. First and foremost, I would like to thank my supervi-sor, Jelle van Assema, who provided me with a great deal of assistance and support. Without his knowledge and dedicated involvement the goal of this thesis would not have been realized. I am also very grateful for the encour-agement and support of Roos Riemersma and Romy Roomans while writing this thesis.

(23)

References

Allen, M., Kumaran, G., & Liu, T. (2002). A combined algorithm for graph-coloring in register allocation. In Proceedings of The Computational Symposium on Graph Coloring and its Generalizations (pp. 100–111). Barnier, N. & Brisset, P. (2002). Graph coloring for air traffic flow

man-agement. In Proceedings of the Fourth International Workshop on Integration of AI and OR techniques in Constraint Programming for Combinatorial Optimisation Problems (pp. 163–178).

Beineke, L. W., Oellermann, O. R., & Pippert, R. E. (2002). The average connectivity of a graph. Discrete Mathematics, 252(1), 31 – 45.

Br´elaz, D. (1979). New methods to color the vertices of a graph. Commu-nunications of the ACM, 22(4), 251–256.

Cheeseman, P., Kanefsky, B., & Taylor, W. M. (1991). Where the really hard problems are. In IJCAI, volume 91 (pp. 331–337).

Cheeseman, P., Kanefsky, B., & Taylor, W. M. (1992). Computational com-plexity and phase transitions. In Workshop on Physics and Computa-tion (pp. 63 – 68). Dallas, Texas, USA.

Fortnow, L. (2009). The status of the p versus np problem. Communications of the ACM, 52(9), 78–86.

Garey, M. R. & Johnson, D. S. (1979). Computers and intractability :a guide to the theory of NP-completeness. A series of books in the mathematical sciences. Morgan Freeman.

Hogg, T. & Williams, C. P. (1994). The hardest constraint problems: A double phase transition. Artificial Intelligence, 69(1-2), 359 – 377. Karp, R. M. (1972). Reducibility among Combinatorial Problems, (pp. 85–

103). Springer US.

Meeus, T. J. (2019). Where sometimes the really hard problems are. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

Minton, S., Johnston, M. D., Philips, A. B., & Laird, P. (1990). Solving large-scale constraint satisfaction and scheduling problems using a heuristic repair method. In Proceedings of the Eighth National Conference on Artificial Intelligence, volume 1 of AAAI’90 (pp. 17–24).: AAAI Press. Turner, J. S. (1988). Almost all k-colorable graphs are easy to color. J.

Algorithms, 9(1), 63–82.

van Horn, G., Olij, R., Sleegers, J., & van den Berg, D. (2018). A predictive data analytic for the hardness of hamiltonian cycle problem instances. In DATA ANALYTICS 2018 : The Seventh International Conference on Data Analytics (pp. 91–96).: IARIA XPS Press.

(24)

Theory Series. Prentice Hall.

Wood, D. C. (1969). A technique for colouring a graph applicable to large scale timetabling problems. The Computer Journal, 12(4), 317–319.

(25)

A

Trial 1: Comparing variables for the

bound-ary and y-axis

The following graphs show executed tests to examine what effect different boundaries would have on the location of the phase transition. Figure 9 is a Trial ran with the boundary set at 144.000 Brelazsteps like the plot pro-duced by Cetal. The variable on the y-axis is Brelazsteps, where one step is equal to 144 steps. Figure 10 shows a trial of the same graphs where 100.000 steps were selected as a boundary. A comparison between figure 9 and figure 10 shows that the course of the plot looks identical. The critical degree is for both trials found at an average degree of 8.1. At closer inspection the difference between the two boundaries were exactly 13 graphs. The trial that selected 144.000 Brelazsteps solved thirteen graphs more, which is 1.6% more than when the boundary is set at 100.000 steps. These thirteen graphs are visualized in figure 11.

(26)

Figure 11: Difference of instances solved be-tween the boundary of 144.000 Brelazsteps and the boundary of 100.000 Steps.

(27)

B

Trial 2: Comparing variables for the x-axis

This trial was carried out to test whether the original degree or the reduced degree should be used as a variable for the x-axis. This trial is performed on the Random Graphs set because this set was the only one able to provide the original degree and the reduced degree. The peak in computational costs is seen at the 8.1 for both the trial using the original degree as variable, figure 12, as the trial using the reduced degree as variable, figure 13.

Figure 12: Original Degree on the x-axis

(28)

C

Trial 3: Cetal graph-set with null graphs

After the results of the Cetal graphs were in and visualized in a plot, it was noticed this plot looks different than the plot produced by Cheeseman et al. (1991). To investigate whether the null graphs were considered by Cetal, a trial was ran with a new Cetal graph-set that as well used the null graphs. The results are visualized in figure 14. The peak in computational costs is observed at an average degree of 8.1, just like the peak in the Cetal graph-set without null graphs.

Figure 14: Cetal graph-set with null graphs. Original degree on the x-axis

Where the really hard graph coloring problems are