Analysis of metaheuristic optimization techniques for line scheduling of finishing lines

(1)

line scheduling of finishing lines

Analysis of metaheuristic optimization techniques for

Academic year 2019-2020

Master of Science in Computer Science Engineering

Master's dissertation submitted in order to obtain the academic degree of

Counsellor: dhr. Ward Bijttebier (ArcelorMittal Gent)

Supervisors: Dr. Jan Goedgebeur, Dr. Nicolas Van Cleemput

Student number: 01313447

(2)

Preface

Hundreds of topics were available for this years master’s dissertations. Yet, this topic immediately caught my attention. The industrial setting and the sheer complexity of this problem challenged me to tackle this problem head on. The ability to gain experience in an enterprise like ArcelorMittal is priceless.

I would like to thank Ward Bijttebier for the excellent support and advice. The gained experience is enormous. Hopefully, the findings of this thesis could one day contribute to the day-to-day operation of ArcelorMittal.

Last but not least, I want to thank my supervisors Jan Goedgebeur and Nicolas Van Cleemput. Their contributions lifted this thesis to a level that would be unreachable otherwise. Their unconditional dedication to help, is something that I cannot appreciate enough.

Permission for use of content

The author gives permission to make this master dissertation available for con-sultation and to copy parts of this master dissertation for personal use. In all cases of other use, the copyright terms have to be respected, in particular with regard to the obligation to state explicitly the source when quoting results from this master dissertation.

(3)

Analysis of metaheuristic optimization techniques for line scheduling of

finishing lines

Thibaut Vermeulen

Supervisors: Dr. Jan Goedgebeur, Dr. Nicolas Van Cleemput

Counsellor: dhr. Ward Bijttebier (ArcelorMittal Ghent)

Abstract— This master’s dissertation analyzed a variety of heuristics to tackle several scheduling problems in the production process of ArcelorMittal. The immense complexity of this scheduling requires advanced heuristics in order to achieve the lowest possible costs. Multiple local search heuristics were developed for two different lines in ArcelorMittal. These perform a variety of transformations on an initial solution to explore the solution space. Additionally, the problem was formulated as a Integer Linear Program in an attempt to achieve the optimal solution.

I. INTRODUCTION

ArcelorMittal is the biggest steel manufacturing corporation world-wide, with an annual steel production of 92.5 million metric tonnes (2018). They start from raw materials, and deliver flat steel products to their customer. This process consists of a large amount of treatments and manipulations on steel products.

II. PROBLEMDESCRIPTION

The goal of this master’s dissertation is to optimize the selection of previously produced steel products and the scheduling of these products for the next step in the production process. An optimal schedule could reduce the costs of the production process. Unfortunately, it is infeasible to try all possible schedules, especially as the goal is to use the program interactively. This will limit the allowed running time. Finding good solutions in a reasonable amount of time will require efficient heuristics.

A. Costs

An optimal solution will have the lowest possible costs. These costs depend on several factors, and are divided into three large groups: transition costs, selection costs and global costs.

1) Transition Costs: When scheduling a large amount of steel products to be processed, a specific cost is associated with the transition from one product to another. These costs are called transition costs. They depend on various physical properties as well as the finishing requirements. Nodes that have the same physical properties, or require the same

2) Selection Costs: Not all products in the inventory can be scheduled at the same time. Therefore it is required to make a selection of the products that will be scheduled first. This is why selection costs were introduced. Products that have a deadline in the near future will have a very low selection cost. Other products that are less urgent will have a larger selection cost to give priority to the other products. 3) Global Costs: It is not always possible to calculate the cost of a solution by looking only at the individual products and their neighbors. Some costs depend on multiple products, or even the whole schedule. The costs that are based on an extensive number of nodes in the schedule are called global costs.

B. Subproblems

This problem consists of two different subproblems. The first problem is the selection of the products. There is a large inventory of steel products, but not all products need to be scheduled right away. The second problem is that when a set of products is selected, they need to be scheduled as efficiently as possible. Unfortunately, it is impossible to solve these problems separately: one cannot find the best subset of products if the best order is unknown, and we cannot find the best order of products if we do not know which products will be selected. When searching for a good solution, both subproblems will have to be evaluated at the same time. C. Complexity

As described before, there are two major subproblems that cannot be separated. The total number of possible solution can be obtained by multiplying the number of possible solutions for each subproblem. This results in the following equation [1] where n is the number of items in the inventory:

X

0≤k≤n

n! (n− k)!

It is clear that the amount of possible solutions increases incredibly fast for an increasing number of steel products. A realistic example has 500 products in stock. For this value

(4)

Fig. 1: HSM Width example

III. ARCELORMITTALLINES

The heuristics that were developed in this research were tested on two real lines from ArcelorMittal. The two lines were quite different from each other, making it perfect to test the heuristics

A. COMBI Line

The first line was a COMBI line from ArcelorMittal Poland. This is a galvanisation and painting line in series. This line was chosen because of the good cost representation of the transition and selection costs. There is no requirement for global costs to achieve a good solution.

B. Hot Strip Mill (HSM)

The second line was a Hot Strip Mill (HSM) line from ArcelorMittal Ghent. This line takes large pieces of hot steel, and passes them through pairs of rolls to reduce the thickness in order to produce a long coil of sheet metal. This is a complex line that requires global costs to correctly evaluate schedules.

1) HSM global costs: This research focused on two important global costs: width run-down and width run-up costs. In the run-up phase of the schedule, it is important that there are no big increments in the width of a product. In the run-down phase, the width of the nodes should be decreasing. A violation of these rules will result in a large global cost. Figure 1 shows the widths for all products in a schedule that was generated by ArcelorMittal. The run-up and run-down phases are clearly visible.

IV. IMPLEMENTEDHEURISTICS

The majority of the developed algorithms employed Local Search heuristics. Local Search heuristics use local changes to move from one solution to another. They were chosen because they can explore a large number of possible solutions in a limited time.

A. Local Search Heuristics

The following four Local Search heuristics were implemented and evaluated:

• Random Descent • Tabu Search [2][3]

• Reduced Variable Neighborhood Search [4]

B. Integer Linear Programming (ILP)

In an attempt to achieve the optimal solution, the problem was formulated as a Integer Linear Program. The formulation was based on the Dantzig–Fulkerson–Johnson (DFJ) formulation [5] for the Travelling Salesman Problem, with a few adjustments. Contrary to the Travelling Salesman Problem, we allow solutions that do not visit all nodes.

V. CONCLUSION

A variety of Local Search heuristics were implemented to improve the solutions generated by the current algorithm from ArcelorMittal. The performance of these heuristics were investigated on two different production lines from ArcelorMittal. The first line only contains transition and selection costs, while the second line also contains global costs. We showed that it is still possible to improve upon the currently used algorithms. However, there was no clear-cut winner that always performs better.

REFERENCES

[1] P. Vansteenwegen and A. Gunawan, “State-of-the-art solution techniques for ptp and pctsp,” in Orienteering Problems, pp. 33–40, Springer, 2019.

[2] F. Glover, “Tabu search—part i,” ORSA Journal on computing, vol. 1, no. 3, pp. 190–206, 1989.

[3] F. Glover, “Tabu search—part ii,” ORSA Journal on computing, vol. 2, no. 1, pp. 4–32, 1990.

[4] N. Mladenović, J. Petrović, V. Kovaˇcević-Vujˇcić, and M. ˇCangalović, “Solving spread spectrum radar polyphase code design problem by tabu search and variable neighbourhood search,” European Journal of Operational Research, vol. 151, no. 2, pp. 389–399, 2003.

[5] G. Dantzig, R. Fulkerson, and S. Johnson, “Solution of a large-scale traveling-salesman problem,” Journal of the operations research society of America, vol. 2, no. 4, pp. 393–410, 1954.

(5)

1 Introduction 1 1.1 Context . . . 1 1.2 Problem Description . . . 2 2 Problem Analysis 3 2.1 Costs . . . 3 2.1.1 Transition Costs . . . 3 2.1.2 Selection Costs . . . 4 2.1.3 Global Costs . . . 4 2.2 Subproblems . . . 4 2.2.1 Subset Selection . . . 4 2.2.2 Subset Scheduling . . . 5 2.3 Problem Complexity . . . 5 2.4 Problem Conversion . . . 6 2.4.1 Cyclic . . . 6

2.4.2 Prize Collecting Travelling Salesman Problem . . . 7

2.4.3 Profitable Tour Problem . . . 7

2.4.4 Conversion from PTP to ATSP . . . 7

2.4.5 Uses of the PTP . . . 9

3 Data 12 3.1 Artificial Data . . . 12

3.2 COMBI Line . . . 13

3.3 Hot Strip Mill (HSM) Ghent . . . 15

3.3.1 Global Costs . . . 15

3.3.2 Segments . . . 18

3.3.3 Constraints . . . 19

3.3.4 Datasets . . . 19

4 Heuristics 21 4.1 Local Search Metaheuristics . . . 21

4.1.1 JAMES Framework Implementation . . . 21

4.1.2 Problem specification . . . 21

(6)

4.1.4 Neighborhood functions . . . 22

4.1.5 Random Descent . . . 24

4.1.6 Steepest Descent . . . 24

4.1.7 Tabu Search . . . 24

4.1.8 Reduced Variable Neighborhood Search . . . 24

4.1.9 Metropolis Search . . . 25

4.1.10 Parallel Tempering . . . 25

4.2 Integer Linear Programming (ILP) . . . 25

4.2.1 ILP Solving Method . . . 25

4.2.2 Formulation . . . 26 4.2.3 Implementation . . . 27 4.3 Clustering . . . 27 4.3.1 Cluster Properties . . . 27 4.3.2 Clustering Implementation . . . 28 4.4 Other heuristics . . . 29

5 Implementation Details: a case study of the HSM 30 5.1 Introduction . . . 30 5.2 Solution representation . . . 30 5.3 Solution Generation . . . 31 5.4 Cost calculation . . . 31 5.4.1 Run-up costs . . . 32 5.4.2 Run-down costs . . . 32 5.5 Search optimization . . . 32 6 Results 34 6.1 Artificial Data . . . 34 6.1.1 Random Descent . . . 34 6.1.2 Tabu Search . . . 35

6.1.3 Reduced Variable Neighborhood Search . . . 35

6.1.5 Integer Linear Programming (ILP) . . . 37

6.1.6 Artificial datasets results overview . . . 38

6.2 COMBI Line Results . . . 38

6.2.1 Clustering . . . 38

6.2.2 Random Descent . . . 39

6.2.3 Reduced Variable Neighborhood Search (RVNS) . . . 40

6.2.5 Integer Linear Programming (ILP) . . . 42

6.2.6 Overview . . . 43

6.3 Hot Strip Mill Results . . . 44

6.3.1 Choice of heuristic . . . 44

6.3.2 HSM results without global costs . . . 44

(7)

7 Conclusion 48 Bibliography . . . 49

(8)

List of Tables

3.1 Artificial Datasets . . . 13

3.2 COMBI Line Datasets . . . 14

3.3 HSM Line Datasets . . . 20

6.1 Random Descent on Artificial Datasets . . . 34

6.2 Tabu Search on Artificial Datasets . . . 35

6.3 RVNS on Artificial Datasets . . . 36

6.4 Parallel Tempering on Artificial Datasets . . . 37

6.5 ILP on Artificial Datasets . . . 37

6.6 Artificial Datasets results overview . . . 38

6.7 Clustering results on COMBI Line Datasets . . . 39

6.8 Random Descent on COMBI Line Datasets . . . 39

6.9 Random Descent on COMBI Line Datasets with clustering . . . 40

6.10 RVNS on COMBI Line Datasets . . . 41

6.11 RVNS on COMBI Line Datasets with clustering . . . 41

6.12 Parallel Tempering on COMBI Line Datasets . . . 42

6.13 Parallel Tempering on COMBI Line Datasets with clustering . . 42

6.14 COMBI Line Results overview . . . 43

6.15 Parallel Tempering on HSM Line Datasets without clustering . . 44

6.16 Parallel Tempering on HSM Line Datasets with clustering . . . . 45

6.17 Parallel Tempering on HSM Line Datasets without clustering with global costs . . . 45

6.18 Parallel Tempering on HSM Line Datasets with clustering with global costs . . . 46

(9)

List of Figures

1.1 Aerial image of ArcelorMittal Ghent . . . 1

2.1 An example path . . . 6

2.2 An example cyclic solution . . . 6

2.3 PTP Conversion to ATSP . . . 8

2.4 PTP Conversion to ATSP example . . . 9

2.5 Weighted PTP Conversion to ATSP . . . 10

3.1 A plot of the quadratic weight cost function . . . 14

3.2 Node properties of scenariusz-1 . . . 16

3.3 HSM Slab . . . 17

3.4 Width Run-down Example . . . 17

3.5 HSM Width Example . . . 19

4.1 JAMES Framework Architecture (1) . . . 22

(10)

List of Acronyms

AM ArcelorMittal.

ATSP Asymmetric Travelling Salesman Problem. HSM Hot Strip Mill.

ILP Integer Linear Programming. LP Linear Programming.

PCTSP Prize Collecting Travelling Salesman Problem. PT Parallel Tempering.

PTP Profitable Tour Problem.

RVNS Reduced Variable Neighborhood Search. TSP Travelling Salesman Problem.

(11)

Chapter 1

Introduction

1.1 Context

ArcelorMittal Ghent formulated the problem that was tackled in this master’s dissertation. ArcelorMittal is the biggest steel manufacturing corporation world-wide, with an annual steel production of 92.5 million metric tonnes (2018). The site of ArcelorMittal Ghent is located at the Ghent–Terneuzen Canal.

(12)

1.2 Problem Description

The vast majority of products that are produced at ArcelorMittal Ghent are flat steel products. A significant amount of these products are used in the automotive industry. The process starts with raw materials that are delivered by ship. A series of processes will convert the raw materials into metal slabs. These metal slabs are then processed by the Hot Strip Mill. Here, the thick slabs will be converted into rolls of sheet metal. These rolls finally are processed by the finishing lines, to fulfill the specific needs of the customers.

The goal of this master’s dissertation is to optimize the selection of previously produced steel products and the scheduling of these products for the next step in the production process. An initial scheduling happens at the HSM. Here, the goal is to schedule slabs as efficient as possible, according to its physical properties. The result of this process is a roll of sheet metal. Later, these rolls will be scheduled once more to enter the finishing lines. In this step, the schedule will greatly depend on the finishing requirements

Since the number of rolls that can be scheduled is large, it is virtually im-possible to achieve optimal results when scheduling manually. Algorithms are extensively used to achieve a more efficient schedule.

(13)

Chapter 2

Problem Analysis

2.1 Costs

To find an optimal schedule, it is required that we can calculate the costs of a solution. These costs depend on several factors, and are divided into three large groups: transition costs, selection costs and global costs.

2.1.1 Transition Costs

When scheduling a large number of steel products to be processed, a specific cost is associated with the transition from one product to another. These costs are called transition costs. They depend on various physical properties as well as the finishing requirements. An important physical value is the width of a steel product, because this limits the next product that can be scheduled. For example, when a steel product enters the Hot Strip Mill (HSM), it will leave a mark on the roll at the edges of the product. When a following steel product passes these marks, the mark on the rolls will cause an imperfection. Because of this risk, it is preferred to transition from widest to narrowest products. The narrower product will fit between the edges of the previous products, eliminating the risk of problems. Some lines will weld the scheduled products together to create a continuous flow of steel. However, products that differ too much cannot be welded together in a secure way. A crucial physical property for this is the thickness of the product. Products that have a thickness that is alike, can be firmly welded.

Apart from the physical properties, also finishing requirements are vital to a good schedule. Nodes requiring the same treatment on the line should be grouped together. If a certain layer of color is applied, we want the products to be bundled as much as possible. Another example is the galvanization. It is preferred that nodes that need a zinc layer are grouped together in a schedule, making a continuous finishing process possible.

When a transition from one product to another product is physically impos-sible, then the transition cost will account to an extremely large value. A good

(14)

schedule will never contain such a transition.

2.1.2 Selection Costs

Obviously not all products in inventory can be scheduled at the same time, therefore one needs to make a selection of the products that we want to schedule first. The decision of the best nodes is based on a couple of things. When a product is scheduled too early, it will be stored for a while, and a storage cost needs to be taken into account. The opposite can also occur. A customer should not need to wait too long for his product. These properties are combined into a cost for each product, called a selection cost. When a product is very urgent, it will have a large negative cost. Having negative costs is counter-intuitive, but these virtual costs make it more likely that an urgent product will be selected. A good solution will always try to select products with substantial negative costs to decrease the cost of the total solution.

2.1.3 Global Costs

The transition and selection costs can be defined for each pair of products, and a selection cost is specific for a single product. However, it is not always possible to calculate the cost of a solution by looking solely at the products and their neighbors. Some costs depend on multiple products, or even the whole schedule. The costs that are based on an extensive number of nodes in the schedule are called global costs.

An example is when a product leaves a mark on the roll as discussed before. It is possible that the roll will consequently leave a mark on all future products that have a larger width. So it is important to look at the width of all later nodes to calculate the cost on the whole schedule.

2.2 Subproblems

If one takes a closer look at the problem, it becomes clear that there are two subproblems. The first problem is the selection of the products. We have a large inventory of steel products, but not all of them need to be scheduled right away. The second problem is that when we have selected a set of products, we need to schedule them as efficiently as possible.

2.2.1 Subset Selection

The first subproblem is to select the products that need to be scheduled. The software that builds the solution receives a minimum and maximum solution length. The sum of the lengths of all products needs to be within this range. The solution needs to have a certain length, because a schedule has a substantial fixed financial cost. If more products can be scheduled, the fixed financial costs can be divided over multiple products. However, a schedule cannot be too long,

(15)

as some of the machines can only handle a certain amount of material. If this limit would be exceeded, quality issues could occur. To achieve a solution within these bounds, we add additional virtual costs if a schedule would be too short or too long. Note that it is still possible that the optimal solution is out of this range. This means that a schedule out of the bounds would contain a highly undesired transition between products.

2.2.2 Subset Scheduling

Once we have selected the nodes that we want to schedule, then we also have to find the ideal order. This is where all the transition costs, and global costs will define the optimal schedule. Because of the enormous variety in possible orders, and the computational expensive calculation of global costs, this can be challenging.

2.3 Problem Complexity

As described before, we have two major subproblems. Unfortunately, it is im-possible to solve these problems separately: one cannot find the best subset of products if the best order is unknown, and we can’t find the best order of products if we don’t know which products will be selected.

The number of possible subsets of a fixed size from the total inventory is given by the binomial coefficient:

n k = n! k!_{· (n − k)!}

where n is the total number of steel products in the inventory, and k is the number of selected products. However, we don’t know how many products must be selected. In fact, every value of k should be taken into account.The following equation represents the total number of subsets of any size from the inventory.

X 0≤k≤n n k = X 0≤k≤n n! k!· (n − k)! = 2 n

All of these combinations are still not ordered. A subset of size k can be ordered in k! different ways.

As stated above, it is impossible to solve these two subproblems separately. To find the total number of possible solutions, we need to consider any order for each subset. This number is given by the following formula (2):

X 0≤k≤n n! k!· (n − k)!· k! = X 0≤k≤n n! (n− k)!

It is clear that the number of possible solutions increases incredibly fast for an increasing number of steel products. A realistic example has 500 products

(16)

in stock. For this value of n, the number of possible solutions is 3.31_{· 10}1134_.

We will need a more clever approach to tackle this problem.

2.4 Problem Conversion

In this section we will try to translate this problem to another problem, that is already thoroughly studied. The current implementation from ArcelorMittal uses two dummy nodes at the beginning and the end of a solution. These nodes are always the same for every solution. Figure 2.1 shows an example solution. Here, node 1 and 10 are the dummy nodes, and all other nodes in between are selected from the inventory. It is obvious that a solution represents a ‘path’. The nodes are processed/visited one-by-one in a specific order.

Figure 2.1: An example path

2.4.1 Cyclic

By taking a deeper look at the previous representation, one could observe that we can convert this solution to a cycle. We know that the first and the last nodes are dummy nodes, and that these are fixed. By making a link from the last node to the first node, we create a cycle that represents the same solution. Figure 2.2 shows the previous example in a cyclic way.

Figure 2.2: An example cyclic solution

It is clear that this represents the same solution. It turns out that the transformation from the original problem to the cyclic problem is easily achiev-able. We can do this by only changing the transition costs from and to the dummy nodes. We change the transition cost from the last dummy node, to the first (dummy) node to_{−∞. Next, we change the transition cost from the last} (dummy) node to all other nodes to +_{∞. By using these transition costs, all} good solutions will have a path from the last dummy node, to the first dummy node. Otherwise, the cost would be +_∞.

(17)

2.4.2 Prize Collecting Travelling Salesman Problem

The cyclic problem represented above, is called the Prize Collecting Travelling Salesman Problem (PCTSP) in the literature. The PCTSP is a variant of the very famous Travelling Salesman Problem (TSP). As one would expected, the PCTSP is an NP-hard problem (3).

This problem has already been researched in 1989 by Balas (4). It was formulated as a model for scheduling the daily operation of a steel rolling mill. This problem is related to the current problem of ArcelorMittal. In the PCTSP, each node has a prize associated with it. All the edges from one node to another have a specific cost. The goal is to collect a minimum amount of prize money, while keeping the costs as low as possible. If we translate this to our problem, the prize money is equivalent to the weight of the node. Our problem defines the minimum and maximum sum of weights of the total solution. If the PCTSP solution collected a minimum of prize money, then it will have exceeded the minimum weight that is required. At first sight it might appear that a PCTSP solution would always contain a number of nodes close to the minimum required nodes. However, this is only the case if our costs would be strictly positive. In our case, the selection costs can be negative. Adding an extra node with a negative selection costs to the solution, can decrease the total cost. An optimal solution will contain as much nodes with negative costs as possible, creating a large schedule.

The costs from one node to another are our transition costs. We can combine the selection and transition costs by adding the selection cost from a node to all paths that end in this node.

This way, we have converted our problem into one that has already been studied. Unfortunately, the amount of literature is still limited, and developed heuristics are scarce.

2.4.3 Profitable Tour Problem

Another problem that could be related to the PCTSP, is the Profitable Tour Problem (PTP). This is a relaxation of the PCTSP, where there is no minimum amount of prize money that needs to be collected. An empty path is a valid solution for the Profitable Tour Problem. Yet, this is not a valid solution for our problem. This means one cannot translate our problem directly to the Profitable Tour Problem.

2.4.4 Conversion from PTP to ATSP

Although we can’t translate our problem to the Profitable Tour Problem di-rectly, it is still important to take a closer look at the Profitable Tour Problem. As it turns out, it is possible to convert a Profitable Tour Problem to an Asym-metric Travelling Salesman Problem (ATSP)(5).

If one wants to convert the PTP to ATSP, the graph needs to be change. Figure 2.3 shows the converted graph.

(18)

Figure 2.3: PTP Conversion to ATSP

All the nodes that are circled in red are nodes that existed in the original problem. All nodes circled in black are dummy nodes that are necessary for the conversion. All nodes circled in red can visit any other red node, except the path from the last to the first node is not allowed. If we take a look only at the red nodes, we see that it is possible to create any path between them, starting from the first node, to the last node. All nodes can be selected in any order, but not all nodes are required to be selected. Nodes one and eight are the dummy start and end nodes that were introduced by ArcelorMittal.

If we take a look at 2.4, the path from node 1 to 8 will represent the solution path. In this example, we have 1 _{→ 5 → 3 → 6 → 8. Nonetheless, not all} the red nodes are selected, and thus we do not have a valid ATSP path. For this reason the dummy nodes were introduced. They make sure that all nodes that are no part of the original path will be visited. The dummy nodes are represented by an apostrophe. There is a path from the last real node to the first dummy node, to all other nodes, and finally back to the first real node. If we take a look at our example, node 2 is not included in the solution. Anyhow, instead of going from dummy node 10 to dummy node 20(10 → 20 _{), we can visit}

our real node 2 in between (10 → 2 → 20_{). This way we added node 2 to our}

path. Node 3 is part of our initial solution, thus it doesn’t need to be visited again. We go straight from dummy node 20 _{to dummy node 3}0 ₍₂0 _{→ 3}0_{). By}

repeating this until we get back to the first real node, we created a cyclic path that contains all nodes from the graph, and thus have solved an ATSP problem. From this solution, we can find the path of our original problem, by only looking at the first part with real nodes (from node 1 to 8 in our example). A transition from our problem to an ATSP can be very valuable because the ATSP has been very thoroughly studied. It is still an NP-complete problem, but some very

(19)

Figure 2.4: PTP Conversion to ATSP example

good heuristics have been developed to tackle this problem. Nonetheless, the conversion from the PTP to the ATSP causes the number of nodes to almost

double. Being an NP-complete problem, with (n−1)! possible solutions, this causes a huge increase in possible solutions. Yet, the dummy nodes have a low degree, reducing the number of possible solutions. It turns out that professional solvers generate very good results on these converted graphs.

2.4.5 Uses of the PTP

In previous section we discussed the possibility to convert a Profitable Tour Problem into an Asymmetric Travelling Salesman problem. Yet, the described problem is equivalent to the PCTSP, which can’t be transformed to a PTP without changing the problem. The main reason for this is that the PTP doesn’t require a minimum number of nodes that need to be visited. In problems like these, we want to reduce the cost of a solution. Because most costs are positive, the PTP solution of this problem would be empty, having a total cost of 0. If there would be a way to transfer our problem into a PTP, however, one could solve the problem with a professional TSP solver, e.g. Concorde(6) or use the Lin–Kernighan heuristic (7).

It is clear that the constraint that specifies the minimum and maximum size of the problem is the main challenge to overcome. It is important to know from where this value originates. The schedulers at ArcelorMittal choose these parameters to achieve a solution of a desired length.

Instead of determining the bounds of the size of the solution at the start of the algorithm, perhaps it could be possible to let the algorithm choose the optimal size.

(20)

Between two schedules, the finishing lines need to be re-initiated. This tran-sition between two schedules has a specific cost. Because of this cost, schedules should not be too short, because the cost for schedule transitions would in-crease. Let SC be the cost of a transition between two schedules. Define f (n)

as the solution of n nodes with minimum cost. If we know the value n where the following is true:

f (n + 1)− f(n) > Sc

then we have the optimal solution size. If we would search the best solution that has n+1 nodes, the difference in cost would be larger than SC. This means

that generating a new schedule would have a reduced cost.

If we take another look at the graph to convert a PTP into an ATSP, one could observe that two new types of edges were added: edges from one dummy node to another, and edges between a dummy node and a real node. The edges between dummy nodes and real nodes are only visited if a real node has not been used in the solution path. If we would add a cost to these edges, that would be the cost to not select a specific node. We could also add a cost to the edges between dummy nodes. This cost would equal the cost to select a specific node.

In figure 2.5 we added a cost to every edge between two dummy nodes. The cost is−Sc. All edges between a dummy node and a real node have cost with

a value of 0.

Figure 2.5: Weighted PTP Conversion to ATSP

Every time a node is selected in the solution, a negative cost of_−Scis added

(21)

cost to include it was larger than Sc.

If we would find the optimal solution for this converted graph, we could extract the optimal path. Adding any other node to this path will cause an increment in cost that would be larger than beginning a new schedule.

(22)

Chapter 3

Data

3.1 Artificial Data

The first experiments were executed on artificial data. This data was generated by employees of ArcelorMittal. To goal was to generate datasets where the val-ues were as realistic as possible. The problem was represented by the following properties:

• Transition Costs • Selection Costs • Node Weights • Weight target range • Slack Costs

The transition and selection costs were randomly generated according to the following constraints:

T [i][j]_{∈ [0, 200]} for i, j = 1, 2, . . . , n (3.1) S[i]∈ [−65, 85] for i = 1, 2, . . . , n (3.2) This first constraint shows that all transition costs are positive. A favorable transition between two nodes will have a cost close to zero, while poor transitions will have a cost close to 200. Constraint 3.2 shows the range of the selection costs. Despite it does not make sense at first sight that selection costs could be negative, these selection costs are used to push urgent nodes into the solution. When the deadline for the production is close, the selection costs increase. This way, it might be possible to include a node that does not have the optimal transition costs, but is necessary for the schedule. For the artificial data, all nodes have the same weight. This weight is 10 for all datasets. The data also

(23)

contains a weight target range. The goal is to find a solution where the sum of the weights of all nodes is in this range. A solution where the total weight is out of this range is still possible, but will receive an additional cost. This cost is called the Weight Cost, according to following formula:

(WM in− W ) · CS,M in if W < WM in (3.3)

(W_{− W}M ax)· CS,M ax if W > WM ax (3.4)

Here, W represents the total cost, WM inand WM axthe minimum and

max-imum weight target respectively and CS,M in and CS,M ax the minimum and

maximum slack costs. These slack costs are parameters that can be changed according to the requirements. If it is important that the total weight of the solution is between the weight bounds, then high slack costs will be used. For our datasets, both minimum and maximum slack costs are equal to 25.

Table 3.1 shows 5 different artificial datasets with the solutions from the current ArcelorMittal (AM) heuristic.

Artificial Datasets AM Solution

Dataset n Min Weight Max Weight Total Cost Length Weight

artificial-1 486 2712 5661 -3806 274 2720

artificial-2 486 3901 7184 14 392 3900

artificial-3 486 2518 4886 -4597 254 2540

artificial-4 486 4085 8099 374 411 4090

artificial-5 486 3982 6646 297 401 3990

Table 3.1: Artificial Datasets

3.2 COMBI Line

The first real datasets were extracted from a COMBI finishing line from Arcelor-Mittal in Poland. This is a galvanisation and painting line in series. The reason why this line was chosen is because of the good cost representation of the tran-sition and selection costs. There is no requirement for global costs to achieve a good solution.

As was intended, the structure of the COMBI datasets is equivalent to the artificial datasets. However, there is one more weight restriction. The employees at the ArcelorMittal facility in Poland preferred longer solutions over shorter ones. To achieve this, they added an additional weight cost according to the following formula:

CS,p·

(W _{− W}M ax)2

(WM in− WM ax)2 − CS,p

(24)

WM ax

WM in

−CS,p

Figure 3.1: A plot of the quadratic weight cost function

In this formula CS,p is a positive value. This formula then gives a quadratic

function that reaches its minimum of −CS,p when the weight is equal to the

maximum allowed total weight, and the function is zero when the weight is equal to the minimum allowed total weight. Between these two values the function is continuously decreasing. Figure 3.1 shows a plot of the quadratic function. The parameter CS,p defines the steepness of the parabolic function. For our

datasets, CS,pis equal to 250,000.

Table 3.2 shows the different datasets. Unlike the artificial data, the weights of the nodes can be different. Also, the minimum and maximum slack costs are changed to 50,000 for all datasets.

COMBI Line Datasets AM Solution

Dataset n Min Weight Max Weight Total Cost Length Weight

scenariusz-1 333 4000 7000 -150473.98 292 6524.3

scenariusz-2 97 4000 7000 108085567 85 1839.8

scenariusz-3 202 4000 7000 10134318.21 193 4118.1

scenariusz-4 530 4000 7000 -240069.57 315 6946.37

scenariusz-6 150 4000 7000 38779931.83 150 3227.0

Table 3.2: COMBI Line Datasets

We can calculate the cost of our solution with the parameters that were covered above. Note that the datasets also contain physical properties and finishing requirements of the nodes. These are used to calculate the transition costs. The following properties and finishing requirements are useful for our problem:

(25)

• Zinc layer thickness • Thickness • Paint color A • Paint color B • Width • Output Width • Inner Diameter Out • Chemical Passivation Layer

Figure 3.2 shows some of the properties and finishing requirements from above for the solution from ArcelorMittal for dataset scenariusz-1. One could observe that a satisfactory solution groups the same physical properties and fin-ishing requirements. The transitions costs will be very low for these transitions. Another essential requirement is the difference in thickness. Because the nodes are welded together for the finishing lines, nodes that follow each other should have thicknesses that do not differ much. Otherwise there might arise problems with the weld.

3.3 Hot Strip Mill (HSM) Ghent

The last dataset is also the most complex. This dataset is from a Hot Strip Mill (HSM) located in Ghent. This line takes large pieces of hot steel, called ’slabs’, and passes them through pairs of rolls to reduce the thickness in order to produce a long coil of sheet metal. An example of a slab is shown in Figure 3.3.

3.3.1 Global Costs

The hot rolling process is rather complex, creating the impossible task of repre-senting the cost of a solution by using only transition and selection costs. Some costs are based on the full schedule, causing the need of global costs. The HSM has a large amount of global costs. For this research, only the run-down and run-up costs will be covered.

Width Run-down Costs

When a slab of metal passes the rolls, it is possible that there will be damage to the rolls at the edges of the slab. When a next slab passes these marks on the rolls, it can lead to imperfections in the final product. One of the main strategies to counter this problem is to schedule the nodes by decreasing width. If the next product is narrower, then it will not come into contact with these

(26)

(27)

Figure 3.3: HSM Slab

marks, since the damage happened at the edges of the product. The transition costs are adjusted in a way that a transition from a wide node to a narrow node is cheaper than the reverse. Obviously, these transition costs will not represent the whole cost of a schedule.

Figure 3.4: Width Run-down Example

If we take a look at Figure 3.4, we can see an example schedule. The graph shows the width of each node, in the scheduled order. One could observe that globally, the widths are decreasing. Yet, node 5 is clearly more narrow than a lot of the following nodes, causing the transition cost from 5 to 6 to be high, but the transition costs from 6 to 7 will be low. Because node 5 is more narrow than node 7, it is possible that node 7 will have imperfections. In this example, it is obvious that node 5 should have a cost related to all future nodes that are

(28)

more narrow. This will be represented by this global Width Run-down Cost. It is clear that for every node in the solution, there is a need to take at all previous nodes into account to check how many nodes were more narrow than our current node. A full width run-down cost calculation would require n·(n−1)₂ steps. Making it a _O(n2_{) algorithm. This takes a huge amount of time when}

searching for better solutions.

When we have a solution, and want to calculate the cost of a neighbor solution, we do not always have to recalculate everything. When we have a single move (e.g. an insertion, removal, inventory switch), we only have to look at the impact of the move. If a single node changes, we will need to look at all previous nodes to look for smaller nodes. Then we can calculate the cost of the changed node. It is possible that this changed node is smaller than following nodes, so it is necessary to check all following nodes, and update their cost accordingly. This way, we had n− 1 operations, making this an O(n) operation. Width Run-up Costs

From previous paragraph, we’ve learned that the products in a good solution should have a decreasing width. Nevertheless, this is only the case in the run-down phase. The run-run-down phase is preceded by a run-up phase. In this case, we start from a product that is less wide, and gradually move to wider products. This step is required for the hot rolling process.

This is quite the opposite of what we previously discussed for the run-down phase. However, the problems that could occur in the run-down phase when moving to wider products, do not occur in the run-up phase. This is because the rolls are less likely to be damaged because they are not yet fully heated by the hot slabs.

This initial run-up process, contains different phases. For each phase, there are different transition costs. Also, not all products are allowed in a run-up phase. Because of this, we will have multiple transition matrices: one for each phase.

A width run-up and run-down example

To clarify these run-up and run-down phases, we will take a closer look at a schedule generated by ArcelorMittal. Figure 3.5 shows a solution for a problem with run-down and run-up costs. For each node, the width is shown for the nodes in the order of the schedule. The transition from the run-up segments to the run-down segments happens at node 43. This is the widest node in the solution. Before this segment change, we can see the increment of widths. From the new segment onward, the trend of decreasing width is visible.

3.3.2 Segments

As discussed in the global rules, a good solution requires nodes both in increasing width, and in decreasing width. To define these parts, a solution has several

(29)

Figure 3.5: HSM Width Example

segments. A segment is either run-up, or run-down. The run-up segments will always be in front of the run-down segments. Because the structure of the transitions is different for each segment, we will have multiple transition matrices: one for each segment. A run-up segment will have a small cost when going from a narrow node to a wide node. The opposite is true for a run-down segment.

3.3.3 Constraints

The HSM line has several constraints to determine the size of the solutions. These constraints are based on:

• Total length • Total weight • Total run-up weight

Each dataset has a minimum and maximum value for all the properties above. If the solution is outside of these bounds, an additional cost will be added. These constrains are required to achieve an optimal solution in terms of costs and production quality.

3.3.4 Datasets

For the HSM problem, three different datasets were used. These datasets are very different in terms of the problem size. The size of these datasets varies from 539 nodes all the way up to 1738 nodes. Details of the datasets are shown in Table 3.3.

The first two datasets have four segments, the last one has five. In each and every case, there is only one run-down phase, which is last. The cost of the

(30)

HSM Line Datasets

Dataset n Segments AM score AM score with GC

hsm-1 539 4 [up,up,up,down] -9644.4 -9548.9

hsm-6 927 4 [up,up,up,down] -30285.6 -29447.3 hsm-16 1738 5 [up,up,up,up,down] -26741.0 -26091.1

Table 3.3: HSM Line Datasets

solution from the current ArcelorMittal heuristic is shown both for the problem without global costs, and with global costs.

(31)

Chapter 4

Heuristics

4.1 Local Search Metaheuristics

Local Search Metaheuristics are high-level strategies that try to explore a certain search space as efficiently as possible. The goal is to approximate the optimal solution as much as possible. The search space is explored by iteratively moving to a neighboring solution. These neighbors are defined by one or more neighbor-hood functions. As an example for our problem, a neighborneighbor-hood function could define all solutions where one node is removed as neighbors. The neighborhood functions will be extremely important for the development of a good heuristic. Local Search Metaheuristics are ideal to use on search spaces that are too large to be sampled. They will try to achieve solutions that are as close as possible to the optimal solution. Because of this, it is a very good choice for this problem. Multiple Local Search heuristics were developed. These were implemented by using JAMES Framework(8).

4.1.1 JAMES Framework Implementation

JAMES Framework separates the problem specification from search application. This way it is possible to try different algorithms on the same problem without rewriting everything. The architecture is shown in Figure 4.1.

4.1.2 Problem specification

The problem specification consists of different parts: data, the objective, con-straints, and solution generators.

The data contains all the parameters from the problem definition e.g. num-ber of nodes, minimum and maximum size... In addition to this, we have all the product details e.g. transition costs, selection costs, weights... The transition costs are represented by an n by n matrix. The selection costs and weights are saved in a list of size n.

(32)

Figure 4.1: JAMES Framework Architecture (1)

Newly generated solutions are evaluated by the objective. The total cost of the solution will be calculated here. The constraints are checked to verify that our solution is valid e.g. the solution has the right size.

At all times, an initial solution is required. That is why we specify different random solution generators. There are different options:

• An empty solution

• A random solution of size k, k ∈ [1, n]

• A full solution where node x is followed by node x + 1 • A solution from another heuristic

4.1.3 Solution structure

Our solution is represented by two lists. The first list contains all the nodes that are selected for the solution, in a specific order. This defines our path. The second list is a collection of nodes that were unused. This is required to determine possible future new solutions. The unused nodes could also be represented by a set, but a list was chosen to be able to quickly select specific elements.

4.1.4 Neighborhood functions

An important part of Local Search heuristics are the neighborhood functions. These make it possible to explore new solutions. Developing these carefully is important to achieve good solutions, and not getting stuck in local minima. Our implementation uses five different neighborhood functions, represented by possible moves.

(33)

Insertions

Insertion moves take an unused node, and insert it between two nodes of the current solution. No used nodes disappear because of this move.

Removals

Removals take a used node, and remove it from the solution. By removing a node, the transitions from and to this node will disappear, and a new transition will be created between the neighbors of the removed node.

3-Change

A 3-Change(9) move is the asymmetric variation of a 3-opt move. A 3-opt move cuts the solution in three different places, and reconnects them in a different order. There are seven distinct ways to reconnect the parts, however, only one of them keeps the order of all parts. This is critical for our asymmetric problem. An example of a 3-change move is shown in Figure 4.2.

Figure 4.2: 3-Change Move Example

Switch Moves

Switch moves take two nodes from a current solutions, and switch their positions. This is a fast operation, that will be crucial to improve the order of nodes in a certain schedule. Because only two nodes change their position, it will be hard to improve groups of nodes that are in a bad location using switch moves. Inventory Switch Moves

An inventory switch move takes one node from the solution, and replaces it by the unused node. This way a new node enters the solution, and another one disappears.

(34)

Neighborhood function combinations

The neighborhood functions above can be used individually for the Local Search heuristics. Yet, by only using one of the moves, the search algorithm will quickly be stuck in a local minimum. For example, if we would only use switch moves, it would become impossible to add unused nodes to our solution, or to exclude used nodes. The algorithm would only be able to find the best order for the nodes of the initial solution. By using multiple different moves, we can escape local minima, and move closer towards the global minimum. The ratio of the different moves will be important for the performance of the algorithm.

4.1.5 Random Descent

Random Descent is one of the most basic local search heuristics. In every iteration, the heuristic generates a new solution according to the neighborhood function. If this new solution is better than the current best solution, the new solution is accepted as the best solution.

4.1.6 Steepest Descent

In the Steepest Descent (10) heuristic, the best neighbor is searched in every iteration. If the best neighbor is a better solution than the current best, the new solution is accepted. If no neighbor is better, the algorithm stops. Because of the huge number of possible solutions for our problem, it will not always be possible to try all neighbors in a reasonable amount of time when we use all possible moves as discussed above. For example, a solution with n nodes has close to n3_{possible 3-Change moves. This should be considered when implementing the}

heuristic.

4.1.7 Tabu Search

Tabu Search (11)(12) is different from previous Local Search heuristics, because it is possible to obtain a worse solution in a search step. During each iteration, the algorithm searches the best solution with steepest descent, that is non-tabu. A tabu memory contains recently visited neighbors, this way the same neighbors will not be visited over and over again.

4.1.8 Reduced Variable Neighborhood Search

Reduced Variable Neighborhood Search (RVNS) (13) is a more advanced heuris-tic that can use multiple different neighborhoods. In every iteration, we take a random neighbor from neighborhood 1. If there is no improvement, we take a neighbor from neighborhood 2. We keep incrementing the neighborhood, un-til an improvement is found, then we go back to the first neighborhood. If all neighborhoods are used and no better solution is found, we restart with the first neighborhood.

(35)

4.1.9 Metropolis Search

Metropolis Search is a heuristic based on random descent. However, here it is possible to accept inferior solutions with a certain probability. This probability is calculated by the following equation:

p = e∆ET

with ∆E the cost difference, and T the temperature of the heuristic. The temperature T is a fixed value that is defined at the start of the search. A higher temperature increases the probability to accept inferior solutions, but slows the convergence. A low temperature will quickly converge, but will get more easily stuck in a local minimum.

4.1.10 Parallel Tempering

The Parallel Tempering heuristic runs multiple instances of Metropolis Searches with different temperatures at the same time. The Parallel Tempering heuristic has a minimum and maximum temperature, and assigns equally distributed tem-peratures in this range to each Metropolis Search. During the search, the most promising solutions are pushed to Metropolis Searches with low temperatures, while less promising solutions are pushed to the Metropolis Searches with higher temperatures. This way the convergence is increased for low temperatures, and the ability to escape local minima is improved for high temperatures.

4.2 Integer Linear Programming (ILP)

The goal of this section is to model the problem from ArcelorMittal as an In-teger Linear Programming (ILP) problem. An ILP is a variant of a Linear Programming problem where one or multiple variables are restricted to be inte-gers. Linear Program solvers can find an optimal solution in polynomial time. Unfortunately this is not the case for ILP problems. Solving ILP problems is proven to be NP-complete.

4.2.1 ILP Solving Method

As stated above, ILP problems cannot be solved in polynomial time. Profes-sional ILP solvers use a branch-and-bound algorithm to achieve the optimal solution. If the discrete constraints of the variables are removed, a Linear Pro-gramming problem is created. This is called the relaxation of the ILP problem. The solution of a relaxed problem might be prohibited, however, the relaxed solution will always represent a lower bound of the original problem. These lower bounds are used by the branch-and-bound algorithm to avoid unneces-sary calculations.

(36)

4.2.2 Formulation

The problem can be formulated as an Integer Linear Programming (ILP) prob-lem. The formulation was based on the Dantzig–Fulkerson–Johnson (DFJ) for-mulation (14) for the Travelling Salesman Problem, with a few adjustments.

min n X i=1 n X j6=i,j=1 tijxij+ sjxij: (4.1) 0≤ xij ≤ 1 i, j = 1, . . . n; (4.2) xn,1= 1 (4.3) n X i=1,i6=j xij ≤ 1 j = 1, . . . , n; (4.4) n X j=1,j6=i xij ≤ 1 i = 1, . . . , n; (4.5) n X j=1,j6=i xij = n X j=1,j6=i xji i = 1, . . . , n; (4.6) X i∈Q X j6=i,j∈Q xij ≤ |Q| − 1 ∀Q ⊂ {2, . . . , n − 1}, |Q| ≥ 2 (4.7) Wmin≤ n X i=1 n X j6=i,j=1 xijwj ≤ Wmax (4.8)

Constraint (4.1) represents the goal to minimize the cost of the solution. If xij is 1, that means that product j follows product i in the schedule. tij

repre-sents the transition cost from product i to product j. sjrepresents the selection

cost of product j. The sum of all transition costs and selection costs represents the total cost. Constraint (4.2) makes sure that all Linear Programming (LP) variables are within 0 and 1. Because the variable is an integer, only values 0 and 1 are possible. Constraint (4.3) makes sure that there is a path from the last dummy node to the first dummy node. This will make it possible to achieve a cycle. Constraint (4.4) makes sure that there is at most one predecessor for each product in the schedule. Constraint (4.5) makes sure that there is at most one successor for each product. It is necessary that the number of predecessors equals the number of successors. If a node is part of the solution, it will have both a predecessor and successor. If the node is no part of the solution, then it will have neither a predecessor nor successor. This is represented by constraint (4.6). It is necessary that there is only one cycle in the path. With the con-straints above, it would still be possible to have multiple sub-tours. Constraint (4.7) makes sure that there are no sub-tours that do not contain the first and last vertices. This will make sure that there is only one subtour in a solution, that will contain the first and last nodes. Because the algorithm receives limi-tations for minimum and maximum total weight, this needs to be accounted for

(37)

as well in the ILP. Constraint (4.8) makes sure that the sum of the weights of all the nodes in the solution is between these bounds. Here, wj represents the

weight of product j.

4.2.3 Implementation

The ILP was implemented in Python 3.7 and solved with Gurobi(15). The above formulation is based on the Dantzig–Fulkerson–Johnson TSP formulation. This makes it stronger than for example the Miller–Tucker–Zemlin formulation (16). If we take a closer look at constraint (4.7), we can see that the implementa-tion leads to an exponential number of constraints, which is unfeasible for the solver. Instead of adding all these constraints to the initial problem, they are added as lazy constraints. This way, they are inactive until a feasible solution is found, at which point the solution is checked.

When a feasible solution is found, all subtours from the solution will be in-vestigated. If there is only one subtour, then this solution is valid. Nonetheless, if there are multiple subtours, we will add lazy constraints to make the solution invalid. If there is more than one subtour in a solution, we will add the following lazy constraint: X i∈Q X j6=i,j∈Q xij≤ |Q| − 1

for each subtour Q that doesn’t contain the first and last dummy nodes. By adding the constraint, that specific subtour cannot be formed in future solutions.

4.3 Clustering

As discussed before, we are dealing with an NP-complete problem. The number of nodes will be a very limiting factor for the algorithms. Even when n increases slightly, it can greatly increase the runtime. Yet, the opposite is also true: solving problems with less nodes would be noticeably easier. This is why we will take a closer look at clustering. If we can be sure that some nodes will always be grouped together in an optimal solution, we can replace this cluster with a single node that contains the combined properties of all those nodes.

4.3.1 Cluster Properties

When clustering our problem, we will replace all clusters by a single node. By using this approach, all algorithms that can solve the original problem will be able to handle the clustered data. When replacing a cluster by a node, it is necessary that the selection costs, transition costs and weights remain the same. Selection Costs and Weights

When changing a cluster into a single node, it is straightforward to combine the selection costs and weights. The selection cost of the cluster equals the sum of

(38)

the selection costs of all the nodes in the cluster. The same reasoning can be applied for the weights; the weight of the cluster equals the sum of all weights of the nodes in the cluster.

Transition Costs

The transition cost of a cluster is little more complicated. We have two different cases. In the first case, all transition costs from and to each node in the cluster is the same for all other nodes. In this case, the transition cost to the cluster is the same as the transition cost to any node of the cluster. The same applies to the transition cost from a cluster; it equals any transition cost from any node in the cluster. In the second case, the transition costs from and to the nodes in the cluster are not the same for all other nodes. This means that for each transition from and to the cluster it is required to know the internal order. Only then it is possible to determine the cost to the first and last nodes of the cluster. For our problem, the second cased never occurred because we only clustered nodes that had the same transition costs for all neighbors.

4.3.2 Clustering Implementation

Clustering was achieved by executing the following steps. First we search the best neighbor for each node in our problem. If multiple nodes have the same lowest cost, then all of these nodes are saved. The result of this step is a mapping of each node to their best neighbors. This function will be referred to as N (x). In the next step, we will search for nodes who have the lowest cost to visit each other. This is according to the following function:

y∈ N(x) ∧ x ∈ N(y)

Nodes that have the same, lowest cost to visit each other, will be grouped into a cluster. Another requirement is that all transition costs should be equal for each node in the cluster to all other nodes. This way the internal order of the cluster is not of any significance. Nodes with a low cost to each other are physically alike. Because of this, there was a big chance that nodes in the same cluster had the exact same transition costs. To be sure, it is required to compare the transition costs from all nodes within a cluster to all other nodes. If these would not be identical, the cluster has to be split.

One could argue that other approaches could create bigger clusters, thus reducing the nodes even more. However, it is important to be sure that the optimal solution can be created with the clusters. If a group of nodes are combined in a cluster, and that group is not a part of the optimal solution, then it is impossible to achieve that optimal solution on the clustered data. The approach discussed above was able to reduce the number of nodes significantly, without restriction good results.

(39)

4.4 Other heuristics

In this thesis we will focus on the heuristics which we described above. Although other heuristics were implemented as well, they will not be discussed in detail because their performance was significantly less than that of the methods above. Several implementations of genetic algorithms were developed. However, the solutions were only improved by mutations. This is basically equivalent to an inefficient local search heuristic. Some Ant Colony heuristics were developed as well. Unfortunately, the solutions would quickly be stuck in a local minimum. The big difference in results suggests that our Local Search heuristics yield the greatest chance to find good results.

(40)

Chapter 5

Implementation Details: a

case study of the HSM

5.1 Introduction

In this chapter, we will take a closer look at the implementation of one of our heuristics. As the final goal is to use the program interactively, the running time is limited to 60 seconds and so it is extra important that the heuristics are implemented as efficiently as possible. This way we can explore the search space as much as possible.

The Parallel Tempering heuristic was chosen, implemented for the Hot Strip Mill datasets. This was implemented in Java by using JAMES Framework (8).

5.2 Solution representation

It is important to represent the current solution in an efficient way. Because the solution represents a collection of nodes in a certain order, it would be a logical choice to collect the nodes in a list. However, a HSM schedule has multiple distinct parts. These are called segments. To represent this, we have a few options. The first option is that each solution has two lists: one list represents the nodes, another list represents the matching segment for each node. A second option is to have a list that represents the nodes, and to keep track of the indeces where the different segment starts. A third option is to have a list of nodes for each segment, representing the nodes that are part of that specific segment. Here, the third option was chosen. The goal was to split the logic for the different segments. Some of the global costs are only applicable on a certain type of segments. If a move would change a certain segment, it is possible that some global costs of other segments will be unchanged.

To avoid recalculating global costs on all segments over and over again, each segment should be able to keep track of its own cost. Because of this, a class

(41)

SegmentSolutionwas used to make this possible. The class tracks whether the last calculated cost of this segment is still valid. When the cost of a full solution is calculated, the costs for each segment solution are gathered. The full cost of the solution is obtained by adding up the costs of all segments.

The following example should clarify this. Imagine we add a node into the last segment of the current solution. This last move is a run-down segment, and this move makes the current run-down global cost invalid. The global cost of the run-down segment should be recalculated. However, this move will have no impact on all previous run-up segments. It is not necessary to recalculate the global costs for these segments.

To reduce overhead, each node is represented by an integer, representing its number. The nodes do not contain their own characteristics, these can be acquired from the data by using the number as an index.

5.3 Solution Generation

A local search starts with a priorly generated initial solution. There are many possible ways to generate initial solutions: random solutions, empty solutions, sorted solution,...

For the HSM line, three different initial solution generators were tested: • A solution consisting of nodes sorted by decreasing width

• A fully random selection of nodes for each segment

• A solution with empty run-up segments, and random initialised run-down segments

The first solution might look unusual. However, the largest part of our solution is run-down. Here, an increment in width would cause a large cost. Ordering the nodes by decreasing width should result in good initial transitions. Unfortunately, this initial solution generator did not perform very well. A good solution could be found very early, but the search would get stuck in a local minimum very quickly. In our tests, the last initial solution generator performed best. The run-up segments could be filled early in the search with good values, and the randomness in the run-down segments cause a lot of variation.

5.4 Cost calculation

As discussed above, the goal is to reduce unnecessary cost calculations. When a cost calculation is required, an efficient calculation is key to achieving good results. Transition costs and selection costs can be calculated by iterating over all nodes. However, when a move is executed on a current solution, a delta cost can be calculated. To determine the difference in costs caused by the move, only the neighbors of the changed nodes should be accounted for. This makes it possible to execute moves in constant time.

(42)

Unfortunately, not all costs can be calculated quickly. The global cost cal-culation hugely impacts the number of solutions that can be searched in a given runtime.

5.4.1 Run-up costs

The calculation of the run-up costs for each node is not possible by only looking at its neighboring nodes in the schedule. The minimum and maximum width of all previous nodes can have an impact on the cost of a certain node in the run-up segment. This makes it necessary to iterate over all nodes, resulting in a linear running time for this part. However, the second part of the cost calculation is more expensive. For each node, the characteristics of the previous k nodes are required for the global cost, where k is defined in each dataset. This results in a running time of O(k · n). This is still a linear running time, but since this is a practical implementation, we cannot just neglect the constants and the asymptotic behaviour is not sufficient to judge the performance of the algorithm.

To avoid the expensive computations of the second part, it will be key to avoid recalculating these values. If the cost of a specific node is impacted by the previous k nodes, then it is only required to recalculate the costs of the next k nodes for all changed nodes. If the k previous nodes of any node remain unchanged, then the cost of this node won’t change.

5.4.2 Run-down costs

As previously explained, it is important in run-down segments that the width decreases. Otherwise, marks in the rolls at the edges of nodes can cause problems for future nodes. To be able to calculate these costs, an expensive operation is required. For each node, we have to check if there are previous nodes that are less wide. These could cause problems for the current node. This full cost calculation would require n·(n−1)₂ steps. This makes it a O(n2_{) procedure. It}

is possible to get rid of the quadratic complexity by only updating previously calculated values when this is necessary. When a certain move is executed, all changed nodes should be evaluated. Each changed node can cause an impact on all future nodes, and all previous nodes have an impact on the selected node. This means that it takes n steps for each changed node. Most of the moves only change a few nodes at a time. This makes it possible to calculate the global costs in linear time for a certain move.

5.5 Search optimization

Although the global cost calculation has been optimized, each calculation still requires linear time. When a few linear procedures are executed in each step of the search, the number of steps will be very limited. This will significantly reduce the number of explored solutions.

(43)

For this problem, only the global costs cannot be calculated in constant time. This offers some opportunities: first the cost difference without global costs is calculated. If this new cost would result in a much worse solution, we know that there is no point in calculating the global cost of the new solution, because it would be discarded anyway. This makes it possible to quickly discard bad solutions.

Because global costs are non-negative, we know a better solution is impos-sible if the new cost without global costs is bigger than the current total cost with global costs. It would make sense to discard the solution if this is the case. However, it is important not to discard too many solutions. Although we know the solution will not be an improvement, it might be key to escaping a local minimum in future steps. A good balance between discarding solutions, and solution space exploring will be key.

(44)

Chapter 6

Results

6.1 Artificial Data

First of all, we will take a look at the performance of the different algorithms on artificial data. All of our five datasets have the same inventory size. The main difference between these datasets is the required size of our solution. All algorithms for these datasets had a maximum runtime of 60 seconds, unless stated otherwise. This runtime might appear very short for a problem this complex. However, the generated schedule will still be manually reviewed. In order to make it possible to look at different schedules, and judge them manually, the duration of the runtime is limited.

6.1.1 Random Descent

First we will take a look at the Random Descent algorithm. To take full ad-vantage of multiple core architectures, several instances of the algorithm were executed in parallel. This was done with the BasicParallelSearch class from JAMES Framework. This tracks the multiple instances, and collects the results, keeping track of the best solution over all instances. The results are listed in Ta-ble 6.1. Twelve instances were executed in parallel on a six core, twelve thread CPU.

Artificial Datasets

Dataset n AM score Min Score Avg score Max score Difference

artificial-1 486 -3806 -4854 -4798.2 -4762 -1048

artificial-2 486 14 -1403 -1370.9 -1343 -1417

artificial-3 486 -4597 -5500 -5466.1 -5429 -903

artificial-4 486 374 -1057 -1028.5 -990 -1431

artificial-5 486 297 -1146 -1112.2 -1071 -1443

Analysis of metaheuristic optimization techniques for line scheduling of finishing lines

line scheduling of finishing lines

Analysis of metaheuristic optimization techniques for

Counsellor: dhr. Ward Bijttebier (ArcelorMittal Gent)

Supervisors: Dr. Jan Goedgebeur, Dr. Nicolas Van Cleemput

Preface

Permission for use of content

Analysis of metaheuristic optimization techniques for line scheduling of

finishing lines

Thibaut Vermeulen

Supervisors: Dr. Jan Goedgebeur, Dr. Nicolas Van Cleemput

Counsellor: dhr. Ward Bijttebier (ArcelorMittal Ghent)

Contents

List of Tables

List of Figures

List of Acronyms

Chapter 1

Introduction

1.1

Context

1.2

Problem Description

Chapter 2

Problem Analysis

2.1

Costs

2.1.1

Transition Costs

2.1.2

Selection Costs

2.1.3

Global Costs

2.2

Subproblems

2.2.1

Subset Selection

2.2.2

Subset Scheduling

2.3

Problem Complexity

2.4

Problem Conversion

2.4.1

Cyclic

2.4.2

Prize Collecting Travelling Salesman Problem

2.4.3

Profitable Tour Problem

2.4.4

Conversion from PTP to ATSP

2.4.5

Uses of the PTP

Chapter 3

Data

3.1

Artificial Data

3.2

COMBI Line

3.3

Hot Strip Mill (HSM) Ghent

3.3.1

Global Costs

3.3.2

Segments

3.3.3

Constraints

3.3.4

Datasets

Chapter 4

Heuristics

4.1

Local Search Metaheuristics

4.1.1

JAMES Framework Implementation

4.1.2

Problem specification

4.1.3

Solution structure

4.1.4

Neighborhood functions