Machine Learning for Selected NP-complete Problems

(1)

M

ASTER

T

HESIS

Machine Learning for

Selected NP-complete Problems

by

I

GOR

P

EJIC

12259330

June 30, 2020

48EC November 1, 2019 - June 30, 2020

Supervisor:

drs. D

AAN VAN DEN

B

ERG

Assessor:

dr. H

ERKE VAN

H

OOF

F

ACTULTY OF

S

CIENCE

,

I

NFORMATICS

I

NSTITUTE

(2)

I Introduction . . . 4

Monte Carlo Tree Search for Perfect Retangle Packing . . . 5

II Related work . . . 6

III Formal definition of the problem . . . 6

IV MCTS for Perfect Rectangle Packing . . . 6

V Problem generation . . . 7

VI Results . . . 8

VII Conclusion and future work . . . 9

Neural Networks for Perfect Retangle Packing . . . 10

II Related work . . . 12

III Problem definition . . . 12

IV State representations . . . 12

V Dataset generation . . . 13

VI NN Architecture . . . 13

VI-A Permutation invariance . . . 13

VI-B NN output . . . 14

VI-C Loss function . . . 14

VII NN implementation . . . 14

VIII Baselines . . . 15

IX Results . . . 15

X Conclusion, discussions and future work . . . 16

Monte Carlo Tree Search for Eternity II . . . 17

II Related Work . . . 19

III Problem definition . . . 20

IV The MCTS algorithm . . . 20

V Implementation details . . . 21

VI Experimental setup . . . 21

VII MCTS Results . . . 21

VIII Backtracker results . . . 22

IX Discussions and future work . . . 24

III Conclusion . . . 25 1

(3)

ACKNOWLEDGEMENTS

I would like to thank my supervisor drs. Daan van den Berg for guidance during the research. He went out of his way to help me learn how to do proper science. I am eternally grateful for that.

Special thanks go to Martin Biraˇc, who provided encouragement, as well as the infrastructure on which the majority of this work was performed.

A scientific thank you is deserved by colleague Florian Braam from the University of Amsterdam who has helped make the research on rectangle packing stronger by sharing results from his work. Another appreciation goes to Dr. Geoffrey Harris from Bond University for his help regarding the Eternity II puzzle.

(4)

ABSTRACT

Deep reinforcement learning is a research area combining deep learning, based on neural networks, and reinforcement learning. The methods discovered in this new area of research are pushing the limits of progress on solving problems whose complexity represents an insurmountable obstacle for traditional algorithms. Among these problems are NP-complete problems. In this work, we analyse the efficiency of the building blocks of deep reinforcement learning when applied to two different NP-complete problems: rectangle packing and edge-matching puzzles. Firstly, we analyse the performance of Monte Carlo tree search without a learned estimator on the rectangle packing problem. We show the difference of using two different action-selection strategies with six different rollout configurations on two differently constructed datasets. Secondly, we construct a neural network architecture to solve the rectangle packing problem by generating a dataset and training the network in a supervised fashion on subtasks derived from the problems. We show that the network is able to generalise on datasets of different dimensions and beat the analysed heuristic benchmarks on the given subtask. Finally, we apply the Monte Carlo tree search procedure on Eternity II-like edge-matching puzzles. We show that the procedure applied in solving these puzzles is not suitable and analyse why this is. For comparison, we generate an exhaustive backtracking solver whose speed is at state-of-the-art level.

(5)

I. INTRODUCTION

Recent applications of deep reinforcement learning have proven to be successful in areas previously dominated by hand-crafted heuristic methods. Learning from experience, these methods can explore and exploit new, previously unseen strategies setting new state-of-the-art results on benchmarks such as the games of go and chess [75]. Our work explores the application of such techniques on the problem of rectangle packing and edge-matching puzzles. Rectangle packing is an NP-complete problem in which the goal is to place a set of smaller rectangles into one big rectangle. The Markovian property of the problem, as well as the existence of many heuristics, make this problem a good candidate to be approached using reinforcement learning. Furthermore, the problem has many practical uses in areas such as scheduling, memory management, logistics and warehousing which also makes it valuable from an industrial standpoint [40]. The second NP-complete problem we analyse is Eternity II, and edge-matching puzzle, unsolved for thirteen years. The two building blocks of reinforcement learning blocks analysed are the Monte Carlo tree search, used to guide the search process, and neural networks used to estimate the value and the policy. By analysing them individually, we can observe their strengths and drawbacks, as well as analyse the properties of different rectangle packing and edge-matching problems and how they affect the search procedure. We approach both problems using Monte Carlo tree search, and for rectangle packing, we design a neural network for the suitable subtasks.

Our work is structured in three self-contained parts, each containing an introduction to the analysed topic and related work overview. The three parts are independent of each other such that in this form there are more easily suitable for publishing. Part of the work extracted from the first part has already been accepted as part of GECCO 2020 [67]. The first part deals with the application of Monte Carlo tree search on perfect rectangle packing. The second part explores the usage of a neural network to solve the same problem. The third part solves edge-matching puzzles to analyse the behaviour of the Monte Carlo tree search method on two different NP-complete problems.

(6)

Monte Carlo Tree Search for Perfect Rectangle

Packing

Igor Pejic

Faculty of Science, University of Amsterdam

The Netherlands

igorpejicw@gmail.com

Abstract—In recent years, Monte Carlo tree search has proven to be a promising method for developing new state-of-the-art systems for playing board and video games. We design a Monte Carlo tree search algorithm for solving perfect rectangle packing problems. Perfect rectangle packing is an NP-hard problem whose goal is to fit as many small rectangles (tiles) of arbitrary size into a bigger rectangle (frame) such that no space is wasted. On this problem, we compare the performance of Monte Carlo tree search to an existing exhaustive rectangle packing algorithm. The experiments are run on two differently generated problem instance sets. In the results we show the difference between the two analysed sets, the difference between the two action-selection strategies (maxDepth and avgDepth), and the effect of increasing the number of simulations (nSim). When compared to the existing deterministic solver, the results show the solution properties of a heuristic non-exhaustive search for solving perfect rectangle packing problems without a learned machine learning state estimator.

I. INTRODUCTION

Reinforcement learning (RL) has proven to be one of the most promising directions of artificial intelligence for solving tasks that require learning through experience. In 2013, Volodymyr Mnih et al. brought reinforcement learning to the spotlight by setting new state-of-the-art scores on Atari games [59]. After that, RL was notably used to beating the world champion at the game of go in 2016 [74]. An extension to other board games soon followed, and new state-of-the-art results were accomplished on chess and shogi [75]. The progress did not stop only on board games with a discrete action set. In 2019, OpenAI Five system beat the world champions at the video game Dota 2 [6]. In the same year, the algorithm AlphaStar accomplished world-class performance levels at the strategy video game Starcraft II[81]. Recently, the same methods have been even applied to beating games without providing the game rules, which resulted in a further performance increase [71]. However, games are not the only problems approached using RL algorithms. Promising results were shown in the application to mathematical optimisation and decision problems such as the Travelling Salesman Problem as was shown by Wouter Kool et al. [47] and Michel Deudon et al. [20], or the Bin Packing problem as shown by Alexandre Laterre et al. [49].

In order to learn, RL algorithms need to experience different states with varying rewards. These data points are often generated using the Monte Carlo tree search (MCTS) method. The basis of MCTS is the Monte Carlo method which dates back to the middle of the twentieth

century, and its main idea is to utilise randomness in order to find solutions to various problems [24]. The method generates a random sequence of actions which result in states whose quality can be determined using an objective function. Aggregated statistics gathered from generated states allow quantifying the perceived ’quality’ of different actions. In 2006, approaches in combining a search tree with the Monte Carlo method explored in works by Hyeong Soo Chang et al. [12] and Bruno Bouzy [8] were improved upon and officially named Monte Carlo tree search in 2006 [15]. In MCTS, during the search of the action space, a search tree is stored and utilised in order to estimate the best way to proceed from the current state. Such an approach is most easily visualised in turn-based games where each state corresponds to a node in the search tree. For each of these states, the best next action is estimated by repeatedly performing random simulations from the current state and then evaluating their result. However, MCTS is not without drawbacks. Because of the stochasticity of the process, MCTS does not have guarantees for finding the solution that exhaustive search methods have.

The perfect rectangle packing problem (PRPP) is an NP-complete problem [48] that consists of packing multiple smaller rectangles (tiles) into one bigger rectangle (frame). It is a constraint-satisfaction problem whose solutions are all tiles configurations which fill the area of the frame such that there is no wasted space. The more general, rectangle or bin packing problem, is often solved in the industry as an NP-hard [48] optimisation problem for physical environments such as warehousing [22] or digital ones in packing virtual machines in data-centres [29]. In practice, many heuristics have been used to solve rectangle packing problems [68, 51, 38, 10, 40] with varying efficiency depending on the problem instances properties. These heuristics are a result of the effort put into exploring and comparing different approaches, similar to the heritage in vastly studied games such as go and chess, beaten by MCTS-based algorithms. In this paper, we analyse whether MCTS can have similar results on PRPP.

We start the exploration with an overview of the related works in Section II, and a formal definition of the problem in Section III. Then, we explain the MCTS algorithm used (Section IV). The problem generation procedures used to obtain the datasets are described in Section V. Finally, we present the results in Section VI and conclude the analysis in Section VII.

(7)

II. RELATED WORK

As mentioned, MCTS applications are present in dif-ferent problem types; however, we consider constraint-satisfaction problems as they match the PRPP definition. In a study by Satomi Baba et al. the authors successfully apply MCTS to the generalised quantified constraint satisfaction problem (QCSP) [4]. Furthermore, MCTS has been successfully applied to other CSP problems such as graph colouring [42] and the game SameGame [70]. For an exhaustive survey of MCTS methods and their applications, we refer the reader to a survey by Cameron Browne et al. [11].

The application of MCTS to rectangle and bin packing has mostly been related to solving the optimisation problem. Hailiang Li et al. apply MCTS to solve three-dimensional problem instances and, using it, beat state-of-the-art algo-rithms on a total volume occupied metric [53]. Similarly, in a study by Stefan Edelkamp et al., a Nested MCTS search is used to solve two and three-dimensional instances [26]. The authors use a different but comparable problem for 2D (finding minimum rectangle for a set of squares) and focus on space minimisation for the 3D case. Nested MCTS is an extension for MCTS which remembers the simulation results from all depths of the search tree such that they can be used in combination with simulations from new states.

III. FORMAL DEFINITION OF THE PROBLEM

Let F = {f1, f2, f3, ..., fm} denote the set of frames

and T = {t1, t2, t3, ..., tn} denote the set of tiles which

are to be placed inside the frame. n specifies the number of tiles which are to be placed in F and m specifies the number of frames. In our experiment, we always set m = 1 and n = 20, i.e. we always use one frame (in further text:

f without an index) and twenty tiles. All elements f and t

have two dimensions: a height (h) and a width (w). These dimensions are such that the total area covered by the frame

Af = hf· wf is equal to the sum of the areas covered by

the individual tiles AT =Pn_i=1hti· wti:

Af = AT

As tiles have two dimensions, they are allowed to be rotated in two orientations before placement inside the frame:

tw,h= th,w

The only rotation allowed is the rotation of 90◦. The definition of the solution of the problem consists of a few requirements described in the following text. Let P L represent the set of tiles T placed inside the frame f . For a solution to the problem, it then must hold:

P L = T

meaning that all the tiles t are placed inside the frame f . Now, let us place the bottom left corner of the frame f at the origin of a two-dimensional coordinate system. The x limits of the frame are, then, at the left border xfl = 0 and

at the right border at xfr = wf. Similarly, the minimum

y value of the frame is at yfb = 0 and the maximum at

yft = hf. With these borders in place, we can define a

correctly placed tile inside the frame. Before placing in the

frame, we predefined its orientation which determines its width and height values. In the following text, we consider

t(w, h) as the final rotation to be placed.

Firstly, the tile must not surpass the borders of the frame. Let (xt, yt) represent the coordinate of the bottom

left corner of the tile. Then it must hold:

xt>= xfl (1)

yt>= yfb (2)

xt<= xfr (3)

yt<= yft (4)

Secondly, the individual tiles inside the frame must not intersect with each other. Let us define a tile with four values: xl representing the leftmost x coordinate, xr

representing the rightmost x coordinate, yl representing

the lowest y coordinate and yˆtrepresenting the topmost y

coordinate that the tile takes inside the frame’s coordinate system. Two tiles t1 and t2 will then not intersect if:

xlt1 > xrt2 ∨ xrt1 < xlt2∨ ylt1> yˆtt2∨ yˆtt1 < ylt2

This condition has to hold for all tile combinations: p ∈

P L

2 , where p is the pair of two tiles from the set of all

placed tiles P L.

IV. MCTSFORPERFECTRECTANGLEPACKING

We designed an MCTS-based algorithm that treats rect-angle packing as a turn-based single-player game composed of turns, i.e. states. A state is a unique combination of the current configuration of tiles inside the frame and the tiles which are yet to be placed in the frame. The frame state is represented with a binary 2D matrix in which 1 represents that the location has been occupied and 0 that the element location has not yet been occupied. The dimensions of the matrix are equal to the area of the frame. The state can, therefore, be written as S = (F, Tunplaced) where F is the

previously described frame matrix with dimensions w × h and Tunplaced is the array of tiles which have not yet been

placed in the frame. A graphical representation of the state mapping to its representation can be seen in Figure 1.

Figure 1: State representation

The initial state is, then, a pair of the null matrix F, and the array Tunplaced containing all the tiles. This state is,

also, the root of the search tree. From this state, only legal actions of placing the tile to the bottom left (BL) corner are considered, which is known as the bottom-left heuristic

(8)

[13]. Placing the tiles always at BL position, reduces the search space dramatically but does not limit the space of possible perfect solutions. Legal actions are the ones which do not produce an illegal frame state according to rules described in Section III. In the initial state, a total of n = length(Tunplaced) · 2 actions are considered. The

factor 2 is there to accommodate both possible orientations of the tiles.

MCTS consists of four conceptual steps: selection, expansion, simulation and backpropagation. In each state, the choice of the best next action is evaluated by analysing all possible actions from this state (selection). This is done by performing nSim simulations from the state which arises after performing the action. A simulation consists of randomly choosing the next actions and filling the frame until there are no more legal moves (either because the solution is found, or because each possible next move is illegal). The performed simulations receive a score equivalent to the depth reached during the simulation. For instance, a simulation from the root state to a state in which all tiles are placed would have a depth equivalent to the total number of tiles (length(T )). If a solution is found during the simulation phase, the whole process stops, and the solution is recorded. After performing nSim simulations for each action, the scores are aggregated either by taking the maximum or by taking the arithmetic mean of the simulation depths. We call these strategies maxDepth and avgDepth, respectively. The action with the highest aggregated result is chosen as the next state (expansion), and the process is repeated until the end of the game. In our implementation, the backpropagation step updates only the states from which the simulations start, i.e. it does not update the values in their children. The pseudocode of the algorithm is presented in Algorithm 1.

V. PROBLEM GENERATION

We utilise two different sets of problem instances: guillotinable problems (G) generated using the algorithm specified in a study by Eva Hopper and Brian Turton [37] and a differently constructed set of problems (B) obtained from currently unpublished work at the University of Amsterdam [9]. Guillotinable problems allow cuts through one tile at a time, while for non-guillotinable problems this does not hold. From a qualitative analysis of the B-set set, we have concluded that most problems from this set are non-guillotinable.

Guillitionable problem instances are constructed by randomly selecting the cutting point on either of the edges of the frame and cutting through it with a straight vertical line. This cutting generates two tiles, one of which will be chosen to be cut through in the next step. The full process is repeated iteratively with randomly selecting from all the available tiles until the required number of tiles is extracted. Figure 2 illustrates this process with numbers indicating the order of actions. First, a random edge and a point on that edge are randomly chosen on the frame generating the point 1. From point 1, a straight perpendicular line is extended creating two tiles: A and B. Then, the tile B-set is randomly chosen from the set of tiles A and B. The same process repeats until the required number of tiles is reached.

Algorithm 1 MCTS for Perfect Rectangle Packing

1: . Main procedure

2: state ← (nullMatrix(width, depth), tiles)

3: while hasLegalMoves(State) do

4: bestAction ← NULL

5: bestMaxDepth, bestAvgDepth ← 0, 0

6: for legalAction in getLegalActions(state) do

7: newState ← applyAction(state, legalAction)

8: actionDepths ← List() 9: for nSim do 10: depth ← performSimulation(newState) 11: actionDepths.append(depth) 12: end for 13: maxDepth ← max(actionDepths) 14: avgDepth ← average(actionDepths)

15: if strategy = ’maxDepth’ and maxDepth > bestMaxDepth then

16: bestMaxDepth ← maxDepth

17: bestAction ← legalAction

18: else if strategy = ’avgDepth’ and avgDepth > bestAvgDepth then

19: bestAvgDepth ← avgDepth

20: bestAction ← legalAction

21: end if

22: end for

23: state ← applyAction(state, bestAction)

24: end while 25: if tiles.length() = 0 then 26: solutionFound ← true 27: else 28: solutionFound ← false 29: end if 30: function performSimulation(state) 31: depth ← 0 32: while hasLegalMoves(state) do 33: action ← random(getLegalActions(state)) 34: depth ← depth + 1

35: state ← applyAction(state, action)

36: end while

37: return depth

38: end function

The other problem set, B-set, was constructed in order to analyse the differences in complexity with a growing number of tiles [9]. Problems from this set were constructed by creating a random set of tiles with dimensions of sides between 1 and rmax. Trivially non-solvable problems are

excluded by analysing the total dimensions of combined tiles. The solvability of other sets was checked using exhaustive search with pruning and heuristics. In our work, we have not considered the problems for which there are no solutions.

For both sets, the existence of the solution is guaranteed. In guillotinable problems, this arises from the generation algorithm. In the non-guillotinable set, this is guaranteed by selecting only the subset of problems for which the solution was previously found using the exhaustive method.

(9)

Figure 2: Guillotine problem generation procedure

VI. RESULTS

In all our experiments, we tracked execution through the proxy variable of the number of placed tiles. This number is a measure analogous to the number of moves played and is useful because it can be used for comparison between algorithms with different search processes. Furthermore, this measure eliminates hardware-specific bias from results. For instance, on the Intel(R) Core(TM) i5-7200U CPU @ 2.50GHz processor, one tile placement took around 0.15 milliseconds on one fully utilised core.

Figure 3: Visualization of the MCTS search tree. The solution is found during a simulation at the eleventh step. We solve one thousand instances of G-set and B-set problems, whose generation procedure is described in Section V. All problem instances solved have twenty tiles.

Firstly, we analyse the effect of varying the number of simulations nSim during the action selection pro-cedure. Figure 4 displays the aggregated results ob-tained using the MCTS solver on all the problems from both problems sets with setting nSim to values of 100, 200, 500, 1000, 2000, 5000. As expected, a higher number of simulations leads to a better score, but also,

longer execution time. From the graph, it can also be seen that the search time, measured in the number of tiles placed, rises linearly with the number of simulations. The mean error drops with respect to the number of simulations and the initial growth of the percentage of solutions found seems to slow down towards the end of the tested nSim values.

Figure 4: Comparison of performance with varying number of simulations nSim on all instances

In Table I, we present the comparison of performance between the two search strategies when using the number of unplaced tiles as the score measure. The scores represent how many wins each algorithm has had against the other variant on the specific G-set or B-set problem instance. A winner of a problem is that algorithm variation which has the least amount of unplaced tiles in the final solution. Each row shows the results for a different nSim configuration. The winning values are bolded, and the ones which are statistically significant with 95% confidence are underlined. The results show that in terms of non-placed tiles, maxDepth performs better on B-set instances. However, this is not the case on G-set instances, where avgDepth proves to be significantly better on a smaller number of simulations.

Table I: Number of times one algorithm configuration performed better than the other on individual problems. Statistically significant wins are underlined.

N sim G instances B instances MCTS maxDepth MCTS avgDepth MCTS maxDepth MCTS avgDepth 100 146 246 461 147 200 168 216 519 127 500 158 191 574 103 1000 168 178 609 88 2000 166 148 651 83 5000 135 125 678 78

Further comparison of maxDepth and avgDepth can be observed in Figure 5 which shows the distribution plots of unplaced tiles for varying nSim. Analysing the subplots in the vertical direction, we observe the qualitative progress in solutions when increasing nSim and by analysing them horizontally, we can observe the differences in solving the two problem sets. Finally, by focusing on the distributions inside the individual subplots, we can observe the difference in distributions produced for the two algorithm variations.

(10)

The figure clearly shows that avgDepth finds more solutions. However, it can also be seen that it has a much higher variance in the final number of non-placed tiles. By qualitatively analysing the search trees of a few problem instances using our visualisation tool shown in Figure 3, we estimate that the differences in performance arise from the differences in the problem properties in combination with the strategies. On problems with a very narrow solution tree shape, only a few search paths lead to a better score, and on such problems, it is expected for maxDepth to perform better. Problems with a more gradual narrowing of the solution tree benefit from avgDepth, as it does not get lost in cul-de-sac search paths. More research is needed to give a definitive answer to the underlying cause of the difference in performance.

Figure 5: Comparison of performance between maxDepth and avgDepth strategies (lower score is better) on G-set and B-set problem instances for different nSim values

Table II shows the final performance of the MCTS algo-rithm variations compared to the exhaustive deterministic solver. The deterministic solver is the solver used to validate the existence of solutions for B-set problem instances and utilises backtracking with an exhaustive search. The table shows the percentage of solutions found, the mean number

Table II: Final results

Problem set Algorithm Solutions found Mean tiles placed Mean error avgDepth 100 26.6% 250,535 0.973 maxDepth 100 20.0% 258,019 1.111 avgDepth 200 30.0% 486,260 0.91 maxDepth 200 24.1% 494,891 0.971 avgDepth 500 38.9% 1,095,306 0.791 maxDepth 500 31.5% 1,137,401 0.837 G instances avgDepth 1000 46.1% 1,987,120 0.716 maxDepth 1000 40.3% 2,089,319 0.701 avgDepth 2000 52.1% 3,577,742 0.633 maxDepth 2000 46.0% 3,755,117 0.621 avgDepth 5000 60.1% 7,460,459 0.541 maxDepth 5000 55.0% 7,964,695 0.491 Deterministic solver1 81.6% 407,018,300 0 avgDepth 100 0.9% 256,283 2.735 maxDepth 100 0.1% 225,010 2.15 avgDepth 200 0.9% 518,083 2.728 maxDepth 200 0.0% 451,921 1.937 avgDepth 500 2.1% 1,304,128 2.629 maxDepth 500 0.2% 1,133,944 1.739 B instances avgDepth 1000 3.2% 2,607,251 2.559 maxDepth 1000 0.4% 2,266,621 1.623 avgDepth 2000 4.0% 5,203,417 2.544 maxDepth 2000 0.7% 4,533,941 1.506 avgDepth 5000 5.5% 12,903,938 2.484 maxDepth 5000 1.4% 11,311,870 1.358 Deterministic solver1 98.5% 254,973,310 0 1_{Time limit of 1 hour}

of tiles placed during the search process and the mean error which represents the mean number of non-placed tiles in the final solution. The solutions found column, again, shows the definite advantage of avgDepth for PRPP. Mean error shows that maxDepth has less variance in final results. Furthermore, the table displays the difference in relative hardness of the two problem sets with the MCTS solver doing significantly better on G-set instances. For B-set instances we also observe the comparison with the deterministic solver. Because of its exhaustive nature, the deterministic solver finds more solutions, but with a higher search time, as can be seen in the mean tiles placed value.

VII. CONCLUSION AND FUTURE WORK

This paper presented the application of Monte Carlo tree search on the perfect rectangle packing problem. We

(11)

analysed its performance on two different problem instance sets. The results have shown that, for guillotinable PRPPs, MCTS can be used with relative success. However, on the other set of mostly non-guillotinable problems, MCTS struggled to find solutions. The differences in performance showed the potential of utilising MCTS to estimate the relative hardness of different rectangle packing problems. Furthermore, we have analysed the effect of the number of simulations during the selection procedure (nSim) on the final score and the search duration. The results showed that increasing the number of simulations improves the performance of the algorithm, but after a certain point, the increase in performance starts slowing down.

Finally, we have compared two different action-selection strategies. We have observed that the avgDepth strategy finds the solution more often, but also has higher variance when compared to maxDepth. These results seem to indicate underlying differences in properties of the solved problems.

In future work, the MCTS implementation could be im-proved with Upper Confidence bounds to guide the search process. Another improvement would be to temporarily store a bigger part of the search tree obtained through simulations to allow for backtracking. Finally, a (deep) machine learning model could be used to estimate the score (reward) of a state and replace the simulation step with an estimated evaluation which could lead to better action selection.

(12)

Neural Networks for Perfect Rectangle Packing

Igor Pejic

Faculty of Science, University of Amsterdam

The Netherlands

igorpejicw@gmail.com

Abstract—In this work, we perform an exploration of using neural networks on solving the perfect rectangle packing problem. We approach the problem by dividing it into smaller Markovian problems for which we construct a labelled dataset used for supervised learning. We construct a dual-input convolutional neural network and test it on datasets with different frame dimensions and a different number of tiles. We open-source the code and dataset generation procedures. The results show that the network can learn to predict the best tile to place given a frame-state. In doing so, it beats the analyzed deterministic baseline heuristics, such as always choosing the biggest tile. The abstractions learnt on one problem set are transferable to problem sets with different properties.

I. INTRODUCTION

Perfect rectangle packing (PRP) is a problem in which the goal is to place a set of rectangles (tiles) into a bigger rectangle (frame). In other words, the goal is to find such a configuration of tiles inside the frame with no wasted frame space and no unplaced tiles. Rotation of the tiles is allowed, and the order of placement does not matter as long as the final configuration is valid. Even when rotation is not allowed, this problem has been proven to be NP-complete in 2003 by Richard E. Korf [48] [38]. The proof was obtained by reducing the problem to the bin packing which has also been classified as NP-complete [28]. In industry, rectangle packing arises on tasks ranging from VLSI design in chips [57] and CSS-sprite packing for webpages [56] to cutting material in the textile and leather industry [5].

Because of the NP-complete property, all exact PRP solvers have a time complexity which is superpolynomial in the input size, meaning that the time needed to find the solution is often impractical. Therefore, in practice, non-exact, heuristic algorithms are often employed to speed up the search process, but they give up on the guarantee of finding the solution (exactness).

Similarly, in this work, we utilize a neural network (NN), an approximator with no exactness guarantees to solve the PRP problem. This approach is motivated by recent advances in using NNs to solve NP-complete problems. Daniel Selsam et al. designed a solver for the boolean satisfiability problem by training a NN to predict satisfiability of SAT problems as a classification task [72]. Applying a graph NN architecture on the graph colouring problem, Lemos et al. [50] achieved accuracy comparable to a well-known heuristic for that problem (tabucol). Furthermore, the authors of a recent paper called ”What Can Neural Networks Reason About?” show that a newly designed network based on LSTM ([36]) and, the

Figure 1: Example of a perfect rectangle packing problem subtask. It consists of a frame-state and unplaced tiles (A, B and C). The arrow indicates the bottom left insertion point.

usual, perceptron ([69]) layers, called Neural Exhaustive Search achieves 98% test accuracy on the NP-complete subset sum problem, outperforming previous NN-based architectures [85]. Since the rectangle packing problem and its solution can be represented visually in a grid-like, matrix manner, we explore the intuition that the similarities in solution configurations could be exploited using a convolutional neural network (CNN). We explore the validity of this intuition by analyzing the performance of the NN on a set of PRP problems.

The paper is structured as follows. In Section II, we analyze a limited number of related works which employ NNs on the rectangle packing problem. We proceed by describing the rectangle packing problem and the decomposition into Markovian subtasks in Section III. After that, we explain the state representations into which the problem is transformed to be able to input it to a neural network (Section IV). The dataset generation is described in Section V. In Section VI, we describe the important properties of the NN architecture and the motivations behind choosing them. The final NN implementation is described in Section VII. Finally, we present the baselines for comparison with the NN performance in Section VIII and give the results in Section IX. We end with a discussion in Section X.

(13)

II. RELATED WORK

Since we could not find any work utilizing NNs on strictly perfect rectangle packing problems, we extend the discussion to non-perfect (optimization) rectangle packing, including stock cutting, and bin packing problem, which is a superset of rectangle packing. The stock-cutting problem is solved in a study by Cihan H. Dagli and Pipatpong Poshyanonda using a combination of a genetic algorithm and a NN [16]. The NN is used to predict the locations and rotations of two tiles given at the input with the help of a postprocessor which adjusts the prediction to minimize the total packing area. Similarly, it is also solved utilizing a single-layer feedforward NN to generate an initial set of patterns which are later improved using simulated annealing [32]. The input to the NN are the coordinates of the pattern to be extracted, and the outputs are coordinate points representing the cut to be made. Feng Mao et al. employ a NN to solve 1D bin packing problems by first manually extracting sixteen features from the properties of the problems, and then predicting the best heuristic (best, first, next or worst fit for assignment and best or expect fit for allocation) to apply on that problem [55]. The best-performing heuristics on a specific problem are pre-computed, and the network is trained in a supervised fashion, obtaining a validation accuracy of around 65% with saturation happening early in the training process.

A novel 3D bin packing problem in which the goal is to minimize the area of the bin that packs all items is proposed in a study by Haoyuan Hu et al. [39]. This novel problem is solved using a pointer network ([82]), an architecture suited for sequential data (in this case a sequence of bins to be packed). The network is trained and generates solutions using policy-based reinforcement learning. The same problem is solved in a study by Lu Duan et al. where the pointer network is extended and trained for two tasks: sequence ordering, i.e. specifying the order of bins to fill the container and orientation classification which determines the orientation of the to-be-placed bin [22]. The training of the NN for both tasks is accomplished by sharing the encoder-decoder learning module NN and by training for a different task separately in each round.

In a study by Alexandre Laterre et al. a NN is also used to solve 2D and 3D bin packing problems [49]. The inputs to the network are the encoded features of the possible actions related to different strategies, and the output is the chosen action as well as the state-value estimate. A NN based on the transformer network ([79]) and a novel conditional query model is used in a work by Dongda Li et al. to solve 2D and 3D bin packing problems by utilizing an end-to-end NN architecture in which executed actions are fed back into the network [52].

III. PROBLEM DEFINITION

Ultimately, the goal of the PRP problem is to find the configuration of positions and orientations of a set of tiles which fit inside the frame. The complete solution can, therefore, be represented as an array of elements (tw, th, x, y), i.e. the width and height of the placed tile, and

the coordinates of the tile inside the frame. This solution space is exponential in the number of input nodes (frame

and tile dimensions) and mapping such a relation represents a hard task for any function approximator, as well as NNs.

Therefore, we transform the problem into smaller sub-tasks. One such subtask is shown in Figure 1). The goal of each subtask is to place only one tile inside the frame. Given the initial state with the empty frame, the NN’s goal is to predict the placement of only one tile inside the frame. After that tile is placed, the NN is reutilized to perform the same goal again until all the tiles are placed, or no more tiles can be placed. The total number of subtasks extracted from a single PRP problem is equal to the number of tiles, i.e. a problem with twenty tiles will have twenty subtasks to solve. This transformation of the PRP problem into subtasks is possible because each partial state obtained after completing a subtask is Markovian, meaning that each subtask has all the information needed to be analyzed individually and independent on the history of how the subtask was obtained. Even more, for the goal of the subtask, only the unoccupied space inside the frame is important; the positions of the individual tiles in the occupied space do not matter. This property will prove to be vital as it simplifies the input space of the frame into a binary matrix as will be seen in Section IV.

Considering the placement of one tile in the simplified case when we ignore the already occupied frame positions and take into account tile placements which result in illegal states, we see that the potential search solution space of placing one tile is W · H · NO. With two orientations

(NO = 2) and a frame of width W = 20 and height

H = 20, the total solution space consists of 800 possible

options to place one tile in the frame. This subtask is easier (in terms of the number of potential solutions) than the initial exponential search space. However, this too can be simplified even further.

If in every subtask, we would always place the next tile deterministically in the same relative position according to some heuristic, then the solution space would consist only in choosing the correct tile and its orientation. We choose the well-known bottom-left (BL) heuristic [13] which places the tile always in the most bottom-left corner of the frame (shown with an arrow in Figure 1). Therefore, the final subtask of the NN is to predict which tile, and in which orientation, should be placed next in the BL position of a given frame. In the following Section IV, we analyze how to transform this subtask in a form suitable to input in a NN.

IV. STATE REPRESENTATIONS

An input representation with many elements increases the NN complexity and can hurt generalizability becase the network has to learn to generalize on a bigger input space [21][44]. Because of that, the optimal state representation is one that does not hide any crucial properties of the data, while minimizing the level of detail at the input layer. At any moment, a frame-state consists of filled and non-filled space and the most natural representation for it is a 2D matrix of dimensions W × H, the width and the height of the frame. As has been said, the choice of the next tile to be placed is constrained only by which locations in the frame are free or not. This means that the unique

(14)

shapes of the placed tiles do not matter for the decision of the next move. Thus, the frame-state is represented as a binary matrix, 0 indicating an available, and 1 indicating an occupied position.

As for the representation of the non-placed tiles, the first possibility is to represent the tiles in the same shape as the frame-state. We call this representation matrix-based. This representation is the approach used for chess, go and shogi by David Silver et al., in which the potential moves are stacked as layers in a 3D matrix [75]. Each tile is transformed into two 2D matrices, where each one represents one of the possible orientations by placing the tile in the bottom-left corner of an empty frame and representing it with values of 0 and 1 in the manner similar to the frame-state. The fact that the frame-state and tiles can be stacked allows for potential reuse of weights in the layers of the neural network, but the representation of a tile occupies W × H inputs when it could be represented by only two scalar values.

The big number of input units in the matrix-based representation is the motivation for the alternate tile representation in which the tiles are represented as an array of scalars with two dimensions. We call this representation ”scalar-based”. Analogous to an array, the array can be formalized as a matrix of dimensions N ×2, where N is the number of tiles. Each element of the array represents one tile’s width and height (tw, th). To account for orientation,

the vector length is doubled by including both orientations of each tile: (tw, th) and (th, tw). A flag value of (0, 0)

is used to represent tiles which were already placed. This doubling of the vector length is done to preserve the constant shape of the input to the NN, as well as to be able to solve problems with a varying number of tiles. The disadvantage of the scalar-based representation is that the dimensions of the frame and the tiles representations are no longer equal and the NN has to support and learn to deal with different input shapes and data types (binary for the values in the frame, and scalar for the tiles). This difference in shapes imposes higher complexity to the NN architecture as can be observed in the dual-input NN property which we implement in Section VII. Illegal tile placements in the input are treated in the same way as already placed tiles by masking them to (0, 0) in the input representation. This strategy diverges from the approach used by AlphaZero, where illegal moves are input in the network and eliminated only after the NN prediction is obtained [75]. After initial exploration, we have decided to use the scalar-based representation.

V. DATASET GENERATION

The dataset of complete problems is created by im-plementing the guillotine problem generation algorithm described in a study by Igor Pejic and Daan van den Berg [67]. The result of this generation algorithm is a frame, a set of tiles and one of the possible solutions to the problem. We open-source the guillotine dataset generation implementation in an online repository [63]. As we have described in Section III, the NN will not solve these complete problems, rather the set of subtasks generated from the problems. Therefore, to obtain training instances,

the solution of the complete problem is decomposed into subtasks. In further text ’instance’ always refers to the subtask and not to the complete PRP problem. The dataset to train the NN is composed of these instances.

The decomposition of complete problems into subtasks is done by ’backpeeling’ the complete problem solution, i.e. taking out one tile at a time from the final solution until the frame-state is completely empty. The tiles are taken out in top-right order as it is the reverse of BL insertion which will be learnt by the NN. The backpeeling procedure applied to a solved problem with twenty placed tiles generates twenty individual training instances. Complete problem solutions with different tiles may produce the same individual instances because, as has been said, the configuration of individual tiles inside the frame-state is transformed into one combined block to represent occupied positions. This equality between instances is more likely to happen at the beginning of the backpeeling process because at that stage there are less unplaced tiles (which need to be equal between instances). Therefore, we explicitly remove instances from the validation set that are also in the training set to always validate the NN performance just on truly unseen data.

In the experiments we construct the training dataset as follows. First, we generate a thousand complete prob-lems using the guillotine generation procedure. Using backpeeling we transform these complete problems into 1000 · 20 = 20000 subtask instances. Then, we explicitly remove instances with trivial solutions, i.e. those with only one possible legal tile placement, from our dataset. After, we augment the set of instances by randomly shuffling the tiles twenty times (for why this is possible, see Subsection VI-A). This produces twenty new instances out of one bringing the final number to 20000 · 20 shuffles = 400, 000 instances. The validation set is constructed in the same man-ner but without the data augmentation step. As we geman-nerate the datasets separately, we also perform the duplication checks between the datasets and remove duplicates from the validation set. Finally, we eliminate the duplicates which occur inside the individual sets as they could introduce bias. During training, the training set is split into batches of 128 instances.

VI. NN ARCHITECTURE

A. Permutation invariance

Since the order of tiles in the instance does not influence the end-result, the NN has to learn to ignore this order and focus only on the contents of the given array of tiles. This property is called permutation invariance A work by Nicholas Guttenberg et al. shows that permutation invariance can be embedded at the NN architecture level by combining the input pairs into a dense layer whose weights are shared among all the pairs [31]. The outputs of the dense layer are then forwarded to a pooling layer. In this way, on top of permutation invariance, interactions between input components can be captured.

Permutation invariance can also be accomplished using data augmentation, a technique which uses data-suitable transformations to increase the number of instances in the training set. Training on such an augmented dataset makes

(15)

the model more suitable for generalization. Recent works show that, when applied on convolutional neural networks, data augmentation leads to better generalization properties than methods such as dropout and weight decay [35]. In our experiments, we use data augmentation.

B. NN output

When considering the subtask decomposition described in Section V, two output forms for the NN are possible: one focused on the frame-state, and the other focused on the tiles. The first output considered, which we call ”activation-heatmap”, has the same shape as the

frame-state, i.e. it is a 2D matrix with NN output probabilities on each element of the matrix (locus). The probabilities of the positions inside the matrix indicate the confidence of the NN that the next tile should be placed at that position of the frame-state. The idea takes strong inspiration from the fully convolutional networks architecture first described in a study by Evan Shelhamer et al. used in the computer vision task of semantic segmentation [73]. Applied to our subtask, the target output is a matrix with all elements 0 except for the loci in which the next tile should be placed, which have a value of 1. The output is then an indication of the most important region in the frame to be filled, considering the currently available tiles. However, the output defines only the region in the frame, and not which tile should be placed next. The decision on which tile should be placed next given such an output requires an additional decision step. The decision in the next step is made by analyzing the output of the NN and the current frame and tiles. One possible strategy is to choose the tile which covers the predicted locus area most efficiently (the average prediction per covered locus is the highest). Another strategy would be to always choose the tile which maximizes the total sum of probabilities covered by the tile. Using the activation-heatmap output has the advantage that the output of the network immediately points to the most critical areas of the frame to be filled next.

To address the drawback of the tile choice in the activation-heatmap output, we propose an output that eliminates the need for the tile choice in the last step. In this output, which we call ”tile-choice”, the prediction of the NN is an N-dimensional vector, where N is the length of the tiles vector given at the input. Each element in this vector is a floating number ranging from 0 to 1. This number represents the confidence of the network that the tile at that index should be placed next. The target output vector consists of the value 1 on the index, which should be predicted and 0 on all other vector indexes. Because the problem instances might have duplicates and square tiles, multiple values in the target vector may have the value 1. In such a case, placement of any tile indexed with a value of 1 is correct as they all have the same width and height dimensions. In our final experiments, we have used the tile-choice output as its loss function is more directly linked to the subtask.

C. Loss function

The loss function of a NN has to be defined when backpropagationis used as the optimization procedure. The

previously described forms of output are different in what they predict, and therefore the choice of their loss functions is discussed separately. Firstly, for the activation-heatmap output, the loss function needs to represent the similarity of the predicted area to the target area of the frame-state. To represent this we use the binary cross-entropy loss defined as: L = −1 N N X i=1 [yilog ( ˆyi) + (1 − yi) log (1 − ˆyi)] (1)

where N is the number of loci in the output frame, yi

is the true output and ˆyiis the predicted value. In this type

of output, identical to the one in the image segmentation task, a problem can occur during training if the target area to be predicted is always significantly smaller than the total area. This phenomenon is called class imbalance and can be addressed using loss functions which give higher importance to correct predictions such as the weighted cross-entropy function or the Dice loss function [77]. In our initial experiments, the usage of the loss functions addressing class imbalance did not show any improvement to the overall results.

For the tile-choice output form the same binary-cross entropy loss given in Equation 1 is used. The only exception is that N now represents the total number of tiles given at the input and yi represents the prediction that a tile, at

that position in the input vector, should be placed next in the BL insertion point.

VII. NNIMPLEMENTATION

The final implementation of the NN follows the scalar-based input (Section IV) and the tile-choice output forms (Section VI-B), i.e. the form in which the output is the index of the predicted tile. Figure 2 shows the conceptual architecture of the NN. As can be seen, the NN has two input heads. The frame-state input is followed by 2D convolutional layers followed by global max pooling and a dense layer. This setup allows the network to receive frames of different sizes, as all the sizes are eventually reduced to the fixed number defined in the dense layer. The output of the dense layer is flattened and concatenated with the processed input from the tiles-state head.

The tiles-state head consists of residual blocks ending with dense layers. Residual blocks were introduced in a study by Kaiming He et al. [34] as a very efficient way to stack layers and increase the accuracy of a network without encountering the vanishing gradients problem. The vanishing gradients problem is diminished by utilizing skip connections which connect the end and the beginning of the residual block using a connection which skips the intermediary layers. The inner components of the residual block can be observed in the same Figure 2 inside the greyed-out box. After the residual blocks, two dense layers are added, ending with a dropout layer, and finally concatenated with the other head. Dropout layers [76] used in both heads help avoid overfitting and increase the accuracy on the validation set by randomly ignoring a part of the neural network during training and backpropagation which helps the network to learn more robust features.

(16)

Figure 2: Conceptual graph of the NN model. The frame-state and tiles-frame-state inputs are processed in different input heads. For brevity we show only one out of five residual blocks in the greyed box.

The NN is trained for five epochs with batch sizes of 128 training instances. For optimization, the Adam algorithm [46] is used with a learning rate of 0.001. The training is performed on a CUDA enabled NVIDIA GeForce GTX 1080 Ti which takes approximately 0.02 seconds of wall-clock time to train per batch. The code of the model implemented in Keras and Tensorflow can be found in the published online repository [66].

VIII. BASELINES

There are many PRP benchmarks used to compare algorithms based on the percentage of final solutions found for the complete problems. However, for the subtask of choosing the correct tile to place next, we have found no benchmarks in the existing literature. Therefore we

introduce simple baseline benchmarks obtained by using heuristics.

The simplest baseline is the random baseline. It is obtained by randomly choosing the index of the next tile to be placed. If that tile dimensions and orientation coincide to the correct solution (obtained by the backpeeling proce-dure), the guess is marked as correct. The final baseline score is the percentage of tiles which were correctly placed from all the guessed actions. The subtasks analyzed originate from any stage of the backpeeling process as long as they fit the requirements discussed in Section V. Next to the random one, we construct four more baselines which take inspiration from the different heuristics used in rectangle packing solvers and base the decision on the dimensions of the tiles: maxArea, maxHeight and maxWidth. As their names indicate, these baselines always choose the dimensions of the tiles with either the biggest area, or one of the rectangle sides1_{. Finally, we add the minArea baseline,}

which always chooses the tile with the smallest area. The value of including these baselines in the analysis is twofold. Firstly, they provide more insights into the properties of the dataset, i.e. the final decomposed solutions. Secondly, these baselines have different accuracies than the random baseline depending on the problem instance.

IX. RESULTS

The obtained results consist of the accuracies of the NN and the baselines on datasets generated by PRP problems of different properties. The accuracy is the percentage of correctly predicted tile dimensions to be placed given the current state.

Figure 3 shows the validation accuracy of the network trained on 342537 instances with twenty tiles and a frame size (25, 25). The training dataset was constructed from a thousand backpeeled complete problem solutions with twenty tiles shufflings. The validation set is composed of a hundred backpeeled complete problem solutions which resulted in 1847 validation instances. The figure shows that the NN accuracy outperforms all the analyzed baselines. It can also be seen that the baselines choosing tiles based on a maximum property outperform the random baseline. From the left graph it can be seen that the NN outperforms the baselines immediately after just one epoch of training which means that the majority of the learning happens during the first epoch. To analyze this, we can observe the right side of Figure 3 which shows the validation accuracy after every batch of learning. The figure displays the validation accuracy for the first epoch, i.e. 2677 batches. It illustrates how the NN performance starts below the random baseline and quickly outperforms the baselines after a short period of learning. It also shows a slowing down of the learning improvements already towards the end of the first epoch. Table I shows the results on validation sets of dif-ferent problem sets. The dataset labels are shown in column titles and are of format {number of tiles}@{frame width}x{frame height}. The training dataset used to train

1_{In maxHeight and maxWidth ties are broken by selecting the tile with}

the biggest total area. In the case of multiple tiles with the same area and in the maxArea baseline, deterministically the first tile is chosen, as the index of the tile does not affect the accuracy, only the tile dimensions do.

(17)

Figure 3: The NN accuracy overtakes all the baseline accuracies after seeing ≈ 750 batches of training instances and eventually accomplishes accuracies which are 10 percentage points better than then best baseline. The left graph shows the validation accuracy after every training epoch and the right graph shows the accuracy after every training batch (batch=128 instances). The validation accuracies are obtained on set of 1847 instances of frame dimension (25, 25) and 20 tiles. The training set consists of 342537 instances.

Table I: Comparison between accruacies of the NN and baselines. Columns labels show the vadliation dataset, row labels show on which dataset the NN was trained on.

20@20x20 20@25x25 20@30x20 20@ 20x30 20@30x30 NN T-20@20x20 39.91 32.54 36.99 28.24 31.37 NN T-20@25x25 36.99 40.81 37.65 33.06 35.95 NN T-20@30x20 39.19 33.45 41.37 29.44 32.46 NN T-20@20x30 35.40 36.81 35.84 37.93 36.87 NN T-20@30x30 33.64 38.61 38.14 35.66 40.19 random 18.64 18.54 18.69 15.42 16.06 maxArea 31.94 25.18 35.46 25.27 25.01 maxHeight 28.97 21.81 28.96 25 21.35 maxWidth 33.81 31.02 37.54 30.46 33.24 minArea 0.32 0.54 0.11 0.54 0.21

Table II: Comparison between accruacies of the NN for varying number of tiles

10@25x25 15@25x25 30@25x25 50@25x25 NN 52.91 46.81 34.82 30.10 random 19.37 15.77 16.48 19.09 maxArea 29.49 30.13 21.75 16.18 maxHeight 23.04 25.92 18.87 14.57 maxWidth 36.21 37.41 27.72 21.47 minArea 0.89 0.53 0.49 0.41

the NN is indicated in the name of the NN in the first column. Because the network can process problems with varying frame sizes, we also test the networks on sets different than the ones on which they have been trained. As expected, it can be seen that on all sets, the best performing NN is the one trained on the training set of the same properties. However, the table also shows that, for multiple datasets, these NNs trained on other datasets outperform all the deterministic baselines.

Table II shows the results of the NN on PRP problems with a varying number of tiles. Here the NNs are trained on a training dataset with the same dimensions as the validation dataset. The table shows that the network can handle inputs with a bigger number of tiles and, again, beat all the baselines. As expected, the accuracy drops when increasing the number of tiles.

X. CONCLUSION,DISCUSSIONS AND FUTURE WORK

In conclusion, we introduced a NN architecture to solve PRP problems by decomposing them into Markovian

subtasks. The results show that after a relatively fast learning process, the NN beats other analyzed deterministic baselines. Importantly, the results show that the NN can deal with problems with a varying number of tiles and varying frame sizes.

The main reason for using baselines for comparison to the NN performance is that there are no public benchmarks for this subtask. However, the comparison with baselines can be considered fair because of the nature of the problem. For instance, consider the case with an empty frame, and all tiles unplaced. In this situation, the network has to select the correct tile according to one possible solution, i.e. such a tile placement which is guaranteed to lead to a solution. The expectation that a NN or any other algorithm could solve such a subtask with 100% accuracy is unrealistic as it would mean that the NN has learnt to ’perfectly’ solve an NP-complete problem.

The selection of the baselines used is motivated by heuristics used in the tile selection step of published work on PRP problems. Eric Huang and Richard Korf state: “Our variable order is based on the observation that placing

rectangles of larger area is more constraining than placing those of smaller area” [41]. Although the algorithm and task discussed in their work are different from ours, this intuition can also be observed in our results where one can see that the baselines, which choose a maximum value, consistently beat the random baseline.

We hypothesize the reason behind the different perfor-mances of the baselines by analyzing the example shown in Figure 1. In this example, we see one problem instance

(18)

in which seventeen tiles were already placed in the frame, and three tiles are still unplaced. The bottom left insertion point in which the next tile will be placed is indicated by the arrow. Tiles A and B can be placed in the corner only with the shown orientation, while tile C can be placed with both horizontal and vertical orientation. If tile A is placed, it constraints the leftover space such that any next placement of B and C will certainly lead to a solution. Repeating the same process for tiles B and C, it is easy to see that the placement of tile A is preferred as it enforces that a solution is found in the following steps, while B and C may lead to frame-state in which no solution can be reached. Tile A would be chosen by maxArea, maxHeight and maxWidth, tile C would be chosen by minArea, and any of the 4 possible legal actions (A horizontally, B horizontally, and C horizontally and vertically) could be chosen by the random baseline. As the density of the possible solutions which the actions can produce varies, it is evident that the baselines will have different scores based on the problem’s properties.

The careful reader will notice that something is left unsaid about the properties of the analyzed problem set. The problems are decomposed by backpeeling randomly generated guillotinable problem solutions. However, it is very probable that the analyzed solution is not the only possible solution to the problem. Therefore, it is probable that there are more than one correct placements of a tile, given a state, which lead to a solution. To find all the solutions of the problem means that the whole solution space would have to be exhaustively searched first, producing multiple solutions which would be backpeeled into individual instances. This exhaustive search would have to be done separately before the training and is a time-intensive process which we did not explore. However, it is the only way to give an exact answer to the question: ”What are all the tile placements in this state that lead to at

least one solution?” and can be a direction in future work. In further work, it would be useful to modify an existing rectangle packing solver which utilizes one of the baseline heuristics, replace it with the NN prediction and analyze how much it helps the search process. As the NN allows the learning of any arbitrary strategy which reduces the defined loss function, it could also be combined with a search process such as Monte Carlo tree search, and trained in a reinforcement learning manner.

(19)

Monte Carlo Tree Search for Eternity II

Igor Pejic

Faculty of Science, University of Amsterdam

The Netherlands

igorpejicw@gmail.com

“

“I want it how much does it cost?” he asked. “A lifetime“, Scuttlebutte responded.

”

– Comments on an Eternity II article [19]

Abstract—Eternity II is an edge-matching puzzle unsolved for thirteen years so far. In this work we approach solving it using Monte Carlo tree search. The experiments are performed on a dataset of smaller Eternity-II-like puzzles with similar uniform colour distributions across pieces. The results are compared to a backtracking solver we design. We explain the speed optimization details which apply to both solvers, and which make the backtracker’s speed comparable to state-of-the-art solvers. The results show that MCTS is not suitable to solve these puzzles as it manages to solve only puzzles with the smallest size tested (4 × 4). The analysis shows that increasing the number of simulations does not help the search procedure as much as expected and we discuss why this could be. We open-source both solvers and the dataset used.

I. INTRODUCTION

In October 2000, two mathematicians from Cambridge became half-millionaires by solving a hard mathematical problem [62]. However, the problem was not one of the famous Millennium Prize Problems as one could expect, but a seemingly simple and easily understandable polyform packing puzzle called “Eternity I” that could be bought in toy stores. The designer of the puzzle, Christopher Monckton, had provided part of the prize fund and was surprised when the puzzle was solved faster then he expected. It was solved by exploiting a weakness in the original puzzle design. The winners analyzed the ”tileability” of individual pieces which allowed them to utilize pruning to shorten the search process by excluding the search branches which could not possibly lead to a solution.

Seven years later, learning from the flaws of Eternity I, Monckton designed “Eternity II” (E-II) and was very confident in the improbability of it being solved: ”Our calculations are that if you used the world’s most powerful computer and let it run from now until the projected end of the universe, it might not stumble across one of the solutions” [27]. In July 2007, Christopher Monckton and TOMY UK Ltd. released the E-II puzzle accompanied by a new prize fund, almost double the amount of the first one. The alluring award again drew many enthusiasts and researchers into buying the puzzle and spending hours of thinking and computing power trying to solve it. However, the two-year deadline to claim the prize elapsed without a correct submission, and no solution has been found to this day. The complexity of the puzzle has turned to be much

higher than in the original puzzle, and even though some heuristics perform better than random search, the remaining search space still poses an insurmountable problem for modern computer architectures.

General edge-matching puzzles (GEMP) are puzzles which consist of a rectangular frame and a set of pieces which are to be placed inside the frame such that the pieces’ borders match the pattern (referred to as ’colour’ in further text) of the neighbouring piece. An example of an edge-matching puzzle can be seen in Figure 1. Erik and Martin Demaine have proven that edge-matching, jigsaw and polyomino packing puzzles can all be converted into each other as well as that they are NP-complete [18]. The most popular GEMP from which E-II could have taken inspiration was TetraVex, known because of its inclusion in the Windows 3.x operating system in the beginnings of the 1990s. TetraVex was also separately proven to be NP-complete in an extensive proof given by Takenaga et al. [78]. TetraVex is meant as an entertainment game for leisure, whose average game duration does not take more than a few hours. With that in mind, it is obvious there have been some adaptations which transformed such a game in one of the hardest human-designed unsolved puzzles. Firstly, the complexity is increased by incrementing the number of pieces to be placed. TetraVex game modes consisted of problems ranging from 4 to 36 pieces, meanwhile E-II uses 256 pieces. Secondly, TetraVex did not allow for rotation of the pieces, i.e. the given pieces were initially rotated in the correct orientation of the solution. In E-II, the correct orientation of the piece is not known and could be any of the four possible orientations. Ebbesen et al. prove that adding rotation to a one-row puzzle-matching problem transforms it from a tractable P problem to an NP-complete problem[23]1_{. Furthermore, E-II is a Framed}

GEMP (GEMP-F) meaning that there is a unique colour which is allowed only on the pieces placed on borders of the frame. This constraint allows for adjusting of the colour ratios between the two piece classes (’frame’ and ’inner’ pieces) which can be used to tune the problem ’hardness’. Finally, the hardness can be controlled by the total number of colours used, the ratio of colours in the inner versus the frame and corner pieces, and the frequency distribution of the colours across pieces. Ans´otegui et al. show that for different frame sizes, the hardness with respect to the number of colours peaks at a certain value [3]. This value is called “phase transition” and is observed

1_{The authors use the terms NP-hard and NP-complete interchangeably.}

Even though in the text they mention it is NP-hard, in the final table they mark it as NP-complete, which we deem more appropriate.