Scheduling multi-event knockout tournaments in badminton

(1)

Scheduling multi-event knockout tournaments in

badminton

(2)

(3)

Scheduling multi-event knockout tournaments in

badminton

Tom Smit s1778021 Master’s thesis Operations Research University of Groningen July 6, 2015 Abstract

(4)

1 Introduction

Not every athlete might be aware of it, but planning a sports tournament can be difficult. If players are allowed to take part in multiple events at the same time, bottlenecks and deviations from schedule can arise easily, especially if it is unknown who will participate in matches at later stages. A tournament structure with uncertainty in later rounds is the knockout (single-elimination) structure. In this format, the loser of every match is dismissed from the event until one player survives in the final. This thesis focuses on badminton tournaments, where men and women are allowed to play in singles, doubles and mixed doubles events. Some of our research is applicable in practice for other (racquet) sports with knockout tournaments as well.

For the reader unfamiliar with badminton, a small introduction will be given. Badminton as a sport is incomparable to the occasional times one might have had a racket in his/her hands outside at the camping, or during physical education in high school. Matches are played in a hall on courts that are about 13 by 6 meters. Strokes over 300 km/h can be hit during play, making badminton the world’s fastest racquet sport. A match comprises at most three games, where the first player winning two games wins the match. Games run to 21 points with a required two point difference until one of the players reaches thirty, making 30-29 the highest possible score in a game. Matches generally last between 25 minutes and one hour.

Large badminton tournaments include hundreds of players and usually have a knockout structure rather than a round-robin one. In the latter format, all players or pairs compete against each other to determine the winner. This is a convenient way to find the best team in a league, but due to time constraints this format is not possible for most tournaments. If some events are rel-atively small (up to about six entries), these might be played in a group, while the other events contain the knockout structure. Sometimes, especially in youth or recreational tournaments, a mixed format is used where players are first distributed over small groups. All entries in a group compete against each other once and the best of every group go(es) to the next stage, where a knockout phase will determine the winner. For simplification, we will assume in this thesis that every event has a knockout format, regardless of the amount of entries.

(6)

same amount of time, regardless the difference in the players’ ability. If the designer ignores these differences, multiple long or short matches in a row can result in unexpected and unnecessary deviations from schedule.

Many tournaments are still largely scheduled by hand due to the lack of appropriate software, although restrictions can often be written in a nice mathematical way. In this thesis, we will construct models to support tournament planning in practice. We try to model the most com-mon restrictions and goals in badminton knockout tournaments. The results can generally be adapted for other racquet sports and tournament formats. Most of the constraints and results we derive are applicable for the Dutch Open (DO), an annual six-day international badminton event which is introduced in Section 5.1. The person responsible for the schedule wants to know to which degree his task can be automated. Currently, it takes him every year about fifteen hours to make a good tournament schedule.

The main research question of this thesis is:

• How to formulate a model that is able to construct feasible schedules for multi-event knock-out tournaments with a minimal makespan?

Here, the makespan is defined as the end time of the final match. Such a schedule has to fulfill the following requirements:

• given the draw, it should be fair for players in terms of rest times; • some or all matches should be planned beforehand;

• deviations from schedule should occur as little as possible.

To answer the research question, we have decide how to model time and how to estimate match durations and winning probabilities. To keep the model size reasonable, it might be necessary to split a tournament into multiple parts and optimize for each day differently, or to put constraints on the time certain disciplines or rounds can be played. For example, time intervals may be prespecified for every round, or singles and doubles can be assigned to different parts of the day. We will investigate the effects of such rules on the model and its limits on practical applications. The remainder of this thesis is structured as follows. Section 2 gives an overview of relevant literature for our subject. We construct a deterministic mixed-integer programming formulation for scheduling a tournament in Section 3, where we assume that every match exactly lasts its scheduled time. We drop this assumption in Section 4, where we extend our model with stochastic match durations and minimize deviations from schedule. Applications and computational results of the two models are presented in Section 5, where we perform a numerical case study of the Dutch Open. Finally, conclusions and opportunities for further research are summarized in Section 6.

2 Literature review

(7)

Literature on sports scheduling traces back to over 40 years ago. Kendall et al. (2010) provide an annotated bibliography of the most important sports scheduling articles until then, with the earliest paper being published in 1968. The authors claim that ’the best results (on optimization problems) are often obtained by methods derived from the hybridization of integer programming, constraint programming and metaheuristics’. A reference list including more recent papers is on a personal website of Knust (2015). Unfortunately, the majority of this research applies for round robin tournaments with a moderate number of participants. This research is often not applicable for the knockout (single-elimination) structure. The terms knockout and single-elimination are used interchangeably in literature and in the remainder of this paper we will use the first one, as this term is simpler and more common in practice as well.

A book about the structuring and scheduling of knockout tournaments, along with a few other formats, is written by Rokosz (2000). Results are derived considering the necessary amount of rounds and matches for every number of players. Unfortunately, the book lacks mathematical models and therefore it is suggested to schedule matches by hand. On the structure of knockout tournaments, a lot of theoretical results have been found, for example in Ryvkin (2005) and Horen and Riezman (1985). However, this theory is often focused on the optimality of draws and seedings, in terms of fairness and expected winning probabilities of the players. As the main interest of this thesis deals with the construction of a schedule based on a given draw and seed-ings, there is not much need to determine the current seedings’ optimality or investigate which draw should have been better.

Only few papers exist on scheduling events with hundreds of participants. A huge task was tackled by Andreu and Corominas (1989), who made a decision support system to schedule the Olympic Games in 1992. The authors note that the ’best’ possible model is not well-defined: there has to be a trade-off between, for example, a large audience for important events without causing inconveniences to the athletes. Existing models were only present for a few aspects of specific sports and therefore not useful in their research. Kostuk (1997) constructs a way of scheduling a large multi-event tournament with over a hundred contestants using a kind of knockout format. However, his problem is restricted to a specific curling structure, where a losing team degrades to one of the consolation events and is removed from the tournament after too many losses. It is not really comparable to a badminton tournament where players can compete against each other in different disciplines without consolation events. This author also claims that almost no existing literature was helpful to his problem. We are not aware of any other research on the scheduling of large knockout tournaments. Because of the relevance in practice, such as the scheduling of Olympic events or Grand Slam tournaments in tennis, one would expect some directly applicable scheduling literature on the knockout format.

(8)

In existing literature, little research is focused on rest times or waiting times of the participants. An exception is in Knust (2008), who minimizes the waiting times for a tournament on one court, where every team only has to play twice. This author constructs models where the number of long waiting times is minimized, as well as the total waiting time. Rodney (1995) maximizes the minimal waiting time for teams playing against each other, again on a single court. In both papers, the number of teams is relatively small and there are no rounds and uncertainty about the players participating in a match as in a knockout structure.

Our scheduling assignment can be seen in a more general setting. A generic scheduling problem is usually described by a triplet α|β|γ, where we stick to the notation as used, for example, in Pinedo (2010). Jobs have to be processed on machines subject to a number of constraints specified by the triplet. In a sports tournament, matches can be treated as jobs and courts as machines. The α contains the machine environment and can be written as P (parallel machines) or Q (machines with different speeds). In our case, P means that match duration does not depend on a court, while Q implies a dependency. The β includes certain processing characteris-tics, where the common elements rj (release dates) and prec (precedence constraints) appear in

a tournament. The former implies that a job j cannot start its processing before a release date, similar to the fact that some matches cannot be scheduled before a certain time, due to organiza-tional preferences and the knockout structure. Because of this structure, precedence constraints apply as well, as a match of a later round cannot be played before the relevant previous rounds are finished. Finally, the γ describes the objective to be minimized. In our deterministic model, we choose as objective to schedule the last match as early as possible. This is almost equivalent to the makespan objective: minimize the completion time of the last job.

In a tournament, the release date is the time a match is scheduled and the due date when it is planned to end. If we assume that matches can be played directly after each other on a court, the due date of a match equals the release date of the subsequent match. In scheduling, a job is late if it ends after its due date, and early if it ends before its due date. In the deterministic model, we will assume that a match exactly takes its planned time to play, implying that matches are never late or early. Making match durations stochastic, it does make sense to look at lateness in our model. More specifically, the tardiness of a match can be measured, which is the maximum of the lateness and zero. Similarly, the earliness can be defined in our case as the maximum of zero and the time a match starts ahead of schedule. For some matches, the need to start on time might be greater than general. A common objective is then to assign weights to each job, so that the total weighted tardiness, earliness or both can be minimized.

We know that some matches in a round will last longer on average than others, while we schedule the same amount of time for those matches. For example, on the first days of the Dutch Open, release and due dates are planned on every multiple of a fixed duration, for example 45 minutes. As the ability gap between the players may be relatively small or large, the estimated match duration can be longer or shorter than the mean time for the average match in that round. Furthermore, as there is often a bound on the amount of time a match can start early, courts may stay empty for some time and this emptiness can be penalized as well. This complicates analysis and the possibilities of describing the problem by a combination of α|β|γ. From Pinedo (2010), we further note that even the 1||P Tj problem, where the total tardiness has to be

(9)

3 A deterministic MIP formulation

In this section, we will formulate our tournament scheduling problem as a deterministic MIP. This will be the foundation of the stochastic model constructed in the next section. First, we treat our preliminary modeling choices and elaborate on the scheduling decisions a committee has to make beforehand. Next, we define the basic blocks of our model, using theory on knockout tournaments. Subsequently, we introduce a way to determine winning probabilities and we decide how to model time and match durations in our model. After that, all the sets, parameters and variables of the model can be introduced and their relevance in practice will be explained.

3.1 Modeling choices

We choose to model the deterministic version before introducing the stochastic variant, for an easier explanation of the main concepts of our model. Furthermore, this model will already be useful in practice, as it outputs feasible tournament schedules, given organizational desires and restrictions. All the constraints we will derive are applicable for some tournaments. Depending on the tournaments size, the model will schedule a part of a day, one day or multiple days at once. We start off with some general issues a tournament designer has to decide on. For large tourna-ments, it is undesirable to schedule all the matches at once. Instead, it needs to be decomposed into different days or even further. It is sometimes particularly convenient to optimize for every day, as rest times can be ignored in the model and in practice as well by starting a new day. In general, a designer knows on forehand when certain parts of the tournament will be scheduled largely. The bigger events with the most rounds are normally spread out over the day(s), to avoid possible bottlenecks later and to allow for more or less equal rest times for players partic-ipating in these events. This type of tournament design needs to be determined on some level, to decrease the size of the model. More specifically, the following questions should be answered before the actual modeling begins:

• For every discipline, which rounds have to be scheduled on which days?

• Do multiple rounds of a discipline have to be scheduled on one day, or can a round take different days to finish?

• Do there exist bounds on the earliest and/or latest times a match of a discipline’s round can be played?

• Are some matches or rounds attached to specific time slots?

For example, the Dutch Open contains men’s singles, women’s singles and mixed doubles on the second day (Wednesday). The men’s singles event is the largest one, as usual in badminton. This day starts with some men’s singles of the first round, creating a bound for the women’s singles and mixed doubles on the earliest time to play. In general, this day ends with some men’s singles of the next round, which implies the other disciplines have a bound on the latest time as well.

3.2 Problem setting

(10)

that court. In this deterministic setting, we assume that every match lasts exactly the scheduled amount of time. This assumption will be dropped in the stochastic model. In our upcoming declarations and constraints, t is the only time unit and days are not directly modeled. If a tournament consists of multiple days, these can be modeled separately, starting again at t = 0 at a new day. The choice of the time unit length will be discussed later.

Some of our vocabulary need to be made clear. A pair in doubles or mixed doubles consists of two players p and q, who are both able to perform in singles disciplines as well. If we say that two players compete in a match against each other, the word players can usually be replaced by pairs. In this thesis, the term event often refers to discipline, unless the context implies that it refers to the tournament itself. We note that many recreational tournaments have the same discipline played in several categories for players with different levels; in that case multiple events share the same discipline and the distinction between the terms is important. However, in this thesis we will just assume one category per discipline for convenience. Our modeling applies for recreational tournaments as well and the reasoning can be easily extended to multiple categories. All indices, sets, parameters and variables that we define are also listed in the Appendix.

Indices

d, e Discipline

1 = men’s singles (MS), 2 = women’s singles (WS), 3 = men’s doubles (MD), 4 = women’s doubles (WD), 5 = mixed doubles (XD) r, s Round m, n Match p, q Player t, u Time slot c Court

Tournament structure (basis) D Set of disciplines

P Set of players C Set of courts T Set of time slots

Ed Number of entries (players/pairs) in discipline d

Rd Number of rounds in discipline d Rd= ln(Ed)/ ln(2)

Md Number of matches in discipline d Md= Ed− 1.

The indices can take on the following values:

d ∈ D; r = 1, . . . , Rd; m = 1, . . . , 2r−1; p ∈ P; t ≥ 0; c ∈ C.

For constraints where an index takes on these bounds, we will simply write ∀ before this index. These values can be different if the tournament is split into multiple parts and modeled as such. In that case, not every round is played on the same day and different bounds hold for r. If a round of an event is split in multiple parts, different bounds hold for m as well.

(11)

Figure 1: Knockout tournament with six players number of matches per round can be written as 2r−1_{. The values for M}

d and Rdare determined

by the number of entries Ed. A knockout tournament with E entries consists of E − 1 matches,

since all players lose once except for the winner, implying Md = Ed− 1. It can be shown that

Rd= dln(Ed)/ ln(2)e, where dce is the notation used for rounding c up to the next larger integer.

A knockout tournament with two players only needs one round, the final one. A tournament with three or four players results in R = 2, five to eight players leads to R = 3 and so on. To find the amount of rounds needed for the tournament, the total number of entries E is rounded up to the smallest value for which 2R≥ E, with R ∈ N. Setting this value equal to E, the relation 2R _{= E is obtained, implying R = ln(E)/ ln(2). By rounding the right-hand side up again, the}

result follows.

If 2R _{6= E for some R ∈ N, some players receive a bye in the first round: they go to the next}

round without playing. These are usually the highest seeded players. Most tournaments have extensive seeding regulations describing where byes should be placed in order to assure a fair tournament, making the assumption that higher-seeded players deserve byes more than others. Figure 1 shows a tournament with backward numbered rounds and six players, where the player number implies the rank. In this example, Player 1 and 2 receive a bye in the quarter finals.

3.3 Players per match

As shown in Figure 1, the draw implies the matches a player is able to perform in. We can model this using the following parameter, defined for all players and matches in the tournament:

Ydrmp Parameter indicating if player p can take part in match m in round r of

discipline d (1 = yes, 0 = no).

(12)

each match in reality, using this no-lose assumption can be convenient. For example, it is not necessary to model that two players in doubles or mixed doubles belong to one pair. Further-more, this assumption is used in practice as well, for example in scheduling the first few rounds of large tournaments as the DO. As said before, all players should have a certain amount of rest time between their matches, while the schedule cannot be changed after some unlikely match outcome(s). Instead of taking possible winning probabilities into account, it is safest to first assume that every player in a match ’wins’ and continues to the next round. This is of course not tenable for later rounds. If a certain winning probability is very small for some player, a designer might ignore his possible appearance in a match if this results in a better schedule. This case will be further discussed in the next subsection.

We note that the parameters Ydrmp are completely determined by the draw. The relation is

derived below for the interested reader. First we number the matches in every round for a given discipline, starting at the top until match 2R−1 _{is reached. In Figure 1, the matches actually}

played in round 3 are thus match 2 and 3. Then, assuming that a given player p wins match m in round r, he plays in match dm/2e in round r − 1. This holds because both the winners of match 1 and 2 in round r play match 1 in round r − 1, the winners of match 3 and 4 in round r play match 2 in the next round and so on. So if Ydrmp = 1 for some discipline, this implies

that Yd,r−1,dm/2e,p = 1 as well. This relation on Y is true for all subsequent rounds and thus,

the following relation holds:

Yd,Rd,m,p= 1 ⇒ Yd,Rd−j,dm/2je,p= 1 for j = 1, . . . , Rd− 1; ∀d, m, p. (1)

The draw directly implies the subscripts for which Yd,Rd,m,p = 1 and the other values can be

calculated with (1). The parameters not affected by this relation are zero.

As two players are needed in a singles match and four in doubles, the following is true for every match: X p∈P Yd,Rd,m,p= 2 d = 1, 2, X p∈P Yd,Rd,m,p= 4 d = 3, 4, 5.

If the number of entries in a discipline d is not a power of 2, some matches do not have to be played in round Rd and the corresponding players receive a bye in the draw. In that case, the

following equations hold for these ’matches’: X p∈P Yd,Rd,m,p= 1 d = 1, 2, X p∈P Yd,Rd,m,p= 2 d = 3, 4, 5.

(13)

Figure 2: Qualification stage

m = 2. Then (1) implies that the winner plays the first match in round 3. The other match in round 4 equivalently gets m = 7. These values imply as well the correct m for round 5, where we note that only match 4 and 13 have to be played.

The probability that someone performs in a certain match can be very small and a tournament designer might want to ignore such probabilities at an early stage. In the next subsection, we expand our formulation with winning probabilities for all players. This part is not essential in understanding our upcoming deterministic MIP and might be skipped by readers more interested in the formulation of this model.

3.4 Winning probabilities

So far, we assumed that all contestants in a match continue to the next round. This will cause inflexible modeling. In practice, a tournament designer often assumes that a player will exit the event at a certain stage. To determine the chance that a player reaches a certain round, we will introduce the concept of winning probabilities. As we will explain, this does not make our model stochastic yet; in this deterministic model, winning probabilities are merely used to set specific Ydrmp= 1 to Ydrmp= 0.

Winning probabilities in sports can be modeled in many different ways. In badminton, there are several scales to measure this: per match, game or point, where the probability per point may even depend on the person serving. As this thesis is not meant to construct the most de-tailed and realistic model, we will not consider winning probabilities per point. In the model with stochastic match durations, we will estimate a fixed length per game, while the estimated number of games depend on the players’ abilities. Therefore, we choose to model the winning probability per game. First we define the necessary parameters:

Adp Ability of player p in discipline d

P Adp1p2 Ability of a pair of players p1 and p2 in discipline d (d = 3, 4, 5)

Gdpq Probability that player p wins a game against player q in discipline d

Wdpq Probability that player p wins a match against player q in discipline d

Kdrmp Probability that player p competes in match m in round r of discipline d

(14)

Table 1: Winning probabilities for two given of players or pairs p and q 2 games 3 games W P (player p wins the match) G2 2G2(1 − G) G2(3 − 2G) P (player q wins the match) (1 − G)2 2G(1 − G)2 1 − G2(3 − 2G) P (match played in i games), i = 2, 3 1 − 2G(1 − G) 2G(1 − G)

An elegant option to determine winning probabilities is proposed by Ryvkin (2005), by assuming that a player’s performance depends on his ability A plus a noise term υ ∼ N (0, σ2_{). The output}

for one player p in a match is then defined as up = Ap+ υp, where the player with the highest

output wins the match. The author assumes that a probability density function f (A) of the abilities exists. Although the noise distribution is assumed to be normal, the ability distribution could be normal, Pareto or uniform, for example. The density function is then normalized to get Var(A) = 1. Instead of dealing with an entire match at once, we will use this method to determine Gdpq, the probability that player p wins a game against player q in discipline d:

Gdpq= Φ

Adp− Adq

σ√2

,

where Φ(·) is the distribution function of the standard normal distribution. With this expression, the probabilities that a player wins a match in two or three games can be calculated. For example, using Gdqp= 1 − Gdpq, P (player p wins in 3 games) = 2G2dpq(1 − Gdpq), as the first two games are

won by different players and the last game by player p. By summing the applicable expressions, the winning probabilities Wdpqcan be calculated as well. Here, we take W O into account, where

a walkover means that a player resigns from the match before or during play. We assume that this walkover probability is equal for all players, rounds and disciplines. For two players p and q competing in a singles match, we obtain:

Wdpq= W O + (1 − 2W O) · G2dpq(3 − 2Gdpq).

where the term (1 − 2W O) is the probability that neither player resigns; the case that both players give up is ignored. An overview of these values is given in Table 1, where the last column reflects the winning probabilities for each player. For convenience, subscripts and walkover pos-sibilities are omitted.

The winning probabilities above are well defined for singles disciplines. To determine those in (mixed) doubles, a function on the respective abilities Adp1 and Adp2 for a pair of players p1and

p2 should be constructed to obtain P Adp1p2. A possible option for mixed doubles is to add more

weight to the male player, as a man usually has more influence in rallies than an equivalently ranked woman. However, to determine the seedings list in a general tournament and therefore the estimated quality of a pair, individual rankings are just summed in both doubles and mixed doubles. Analogously, we set:

P Adp1p2 = Adp1+ Adp2.

(15)

Abilities can generally be well estimated by using an official ranking constructed by some (inter)national federation. For a better reflection of actual winning probabilities, these values should generally be transformed, for example by taking the logarithm. If the committee thinks that a player’s ranking is not reflecting his ability, this ranking should be assigned manually. Using these winning probabilities, we can formulate for each player the probability that he reaches a certain round. These playing probabilities Kdrmp are determined in an iterative way and the

procedure will be shown for the interested reader. After these probabilities are defined for every player and match, a tournament designer might choose to disregard the possible participation of a player in a match for small K. For these players, sufficient rest time will not be guaranteed after some unlikely match outcomes. We define

α minimal value of Kdrmpfor player p to include his possible appearance in that match.

Subsequently, we set

Ydrmp= 0 ∀d, r, m, p|Kdrmp< α.

Finally, the derivation of K is given. The equations are constructed for singles events; the procedure for doubles is similar. A player is always present in the first round:

Kd,Rd,m,p= Yd,Rd,m,p ∀d, m, p.

The values for the next round are modeled as follows, where we use the probability from Table 1 that player p wins the match:

Kd,Rd−1,dm/2e,p=

X

q∈P∗

Wdpq ∀d, m, p,

where P∗ = {q ∈ P|q 6= p, Yd,Rd−1,m,q = 1}. This is the set of player p’s possible opponents in

round Rd, existing only of one entry here. On the left-hand side, the same holds for determining

the subscript m as with the Y parameters. For subsequent rounds, this equation has to be extended. Given all possible opponents in a round, a player’s winning probability has to be calculated and multiplied by the probabilities that these opponents are still present at that stage. The probability that a player reaches the corresponding round itself is needed as well:

Kd,Rd−2,dm/2e,p= Kd,Rd−1,m,p·

X

q∈P∗

Kd,Rd−1,m,q· Wdpq ∀d, m, p.

Here, we have P∗= {q ∈ P|q 6= p, Kd,Rd−1,m,q > 0, Yd,Rd,2m,p6= Yd,Rd,2m,q}. The last condition

on Y is to exclude players that could have been player p’s opponents at the previous round and will therefore not be present in the current round. This procedure can be implemented for every round after Rd− 2. Similarly to the derivation of Y , all the combinations of d, r, m and p not

affected by this procedure result in Kdrmp= 0.

(16)

3.5 Time planning

The players possibly taking part in a match are now determined. Next, we make an estimation of the match durations. More attention to the uncertainty of these durations is given in Section 4. Here, we assume that every match takes exactly its scheduled amount of time to play. It is first necessary to make a rough estimation of the match durations. In practice, later rounds have more exciting and therefore on average longer matches. The duration may thus depend on the round and most tournaments schedule more times for the final rounds than for the first ones. In many recreational (one-day or two-day) tournaments, it is common to increase the gaps between time slots after a given point in time, for example, from 35 minutes per match to 40 minutes. This can happen on multiple moments of the day, although these moments are often somewhat arbitrarily chosen. Some tournaments, as the DO, tend to schedule possibly exciting matches on specified courts, implying that match duration can depend on a court as well. It can be argued that the discipline is another factor to determine match length, but in practice there is hardly any difference between durations across the different disciplines.

Summarizing, the estimated match duration can depend on round, time and/or court. It is important to pick a time unit that is easy to calculate with and reflects reality well. In almost every tournament, the minimal rest time for a player has to be assured correctly. For the DO and other large tournaments, the time between two adjacent matches in a round has to be taken into account as well, to avoid large differences in rest times among opponents. Below we define the parameters regarding these concepts; the corresponding constraints will be constructed at a later stage.

T Mrtc Scheduled match duration for a match in round r at time t on court c

T R Minimal start time difference between two subsequent matches for a player T B Maximal start time difference between two adjacent matches of which the winners

meet each other in the next round.

Sometimes, a constant duration is used for (almost) the entire tournament. In that case, the time unit in the model can simply be this value, as matches are generally planned just at time instances that are multiples of this duration. Other tournaments increase the duration after a certain point in time with 5, 10 or possibly 15 minutes, and a corresponding time unit length can be used in the model if the original duration is a multiple of this length as well. For example, a tournament starts at 9:00 AM, where 30 minutes for matches up to 12:00 AM are planned and 40 minutes afterwards. In that case, a time unit of 10 minutes can be used, resulting in T = {0, 3, . . . , 18, 22, 26, . . .}. To decrease the model size, it is possible to split the model at these points where the duration increases. In that case, different models are optimized where each model has its own match length as time unit.

(17)

resulting in T = (0, 8, 9, 16, 18, ...).

3.6 Constraints on start times

In this subsection, we will formulate the constraints in our model that purely affect the start times and not the players. We first have to discuss the time unit and the consequences for our model. Therefore, we need variables corresponding to the time and place of all matches. These will be the decision variables of the model:

xdrmtc Binary variable indicating if match m in round r of discipline d is scheduled

at time t on court c (1 = yes, 0 = no).

This formulation looks like a huge amount of variables are created for realistic instances, which is in essence true. However, many of those variables are zero in any solution. For example, the final cannot be played before both semi-finals are finished, so for matches at later rounds natural bounds apply for the start time. Similarly, matches at earlier rounds have to be finished before some time, to make space for the finals. Many constraints will exclude combinations of time slots and courts for specific matches. Some of them are rather obvious, others will be explained in more detail. After all constraints of the model are listed, some valid inequalities will be provided. In general, not at all multiples of the time unit a match can be scheduled. We thus have

xdrmtc= 0 t /∈ T ; ∀d, r, m, c, (2)

where the set T can be manually constructed or implied from the model. For example, if T M only depends on c and if there are no general breaks in the daily schedule as in the DO, matches will be planned on time slots being a multiple of the match duration on that court. These instances are indicated by an index b running from 1 to T Mc− 1 in the next constraint. We get:

xd,r,m,T Mc·t+b,c= 0 t = 0, . . . , bT E/T Mcc; b = 1, . . . , T Mc− 1; ∀d, r, m, c,

Recall that this results in T = (0, 8, 9, 16, 18, ...) for the DO with a time unit of five minutes. Another option is to take the minimal duration as the time unit for this tournament. this means a time unit of 40 minutes, where matches will be planned on every time unit on each of the courts. The advantage is that less time slots are considered, but there is also a large disadvan-tage. On court 1 and 2, the time has to be corrected to stay synchronized with courts 3 and 4. For every eight matches on court 1 and 2, nine matches on court 3 and 4 have to be planned in practice. Therefore, after every eight matches a dummy match can be scheduled on both court 1 and 2. The next match starts again at the correct time in the model. Using this workaround, the true rest times per player and start times between adjacent matches differ by a maximum of 40 minutes from the values in the model. It is possible to increase T R and decrease T B by this amount, to make sure that the actual values do not violate them. However, this might result in unnecessarily large rest times and small gaps between adjacent matches, and possibly in apparent infeasibility of the model. Therefore, it is not recommended to use this method for measuring time.

We know that only one match on each court can be planned at a time and a match cannot be started before the current one has finished. If T M only depends on c, every match lasts T Mc

(18)

If T M depends on r or t, this constraint cannot be defined for the entire tournament at once. In that case, this constraint has to be defined for different time periods in which these rounds or matches are allowed to be played.

As mentioned earlier, a designer should know beforehand when to schedule certain parts of the tournament approximately. Constraints have to be put on the time slots that these matches can or cannot be played. A way to do this is to construct time windows for each round of an event. Of course, every tournament has a final time slot for scheduling matches. Below we declare this time, along with the time window parameters.

T E Latest time a match can be planned

N Bdr Before this time, a match in round r of discipline d cannot be scheduled

N Adr After this time, a match in round r of discipline d cannot be scheduled.

The default values are N Bdr = 0 and N Adr= T E. By specifying these windows, assumptions

have to be made on the match durations. Setting these bounds too tightly might result in infeasible models. A round of an event has to be scheduled between the times N Bdrand N Adr.

Equivalently, matches in a round cannot be played on time slots outside this window:

xdrmtc= 0 t = 0, . . . , N Bdr− 1; ∀d, r, m, c, (4)

xdrmtc= 0 t = N Adr+ 1, . . . , T E; ∀d, r, m, c. (5)

Inside the corresponding time windows, every match has to be played once:

N Adr X t=N Bdr X c∈C xdrmtc= 1 ∀d, r, m. (6)

Some tournaments declare explicitly the amount of matches in a round that has to be played per time slot, before making the actual schedule. For example, a tournament with six courts might want to start off with three men’s singles and three mixed doubles. A designer might want to fill every time slot already with non-specified matches of some round. Subsequently, the actual matches are inserted in the schedule and it is then investigated if an allocation fits where every player has enough rest time. The parameter below regards the making of those schedules:

M Sdrt Amount of matches in round r of discipline d planned at time t.

This way of scheduling does not need to be used for the entire tournament, which implies that not for every combination of d, r, t a parameter M S has to be specified. The notation d, r, t|∃M Sdrt

in the constraint below means this parameter is defined for those indices. X c∈C 2r−1 X m=1 xdrmtc= M Sdrt d, r, t|∃M Sdrt. (7)

Next, we formulate constraints on the start times of matches and introduce the following variable: τdrm Start time of match m in round r of discipline d.

(19)

as this sum only exists of one positive element, namely t times the combination of the subscripts for which x = 1. Hence, all τdrm are defined by other variables in the model. As we need this

expression for other constraints as well, we choose it to model a variable for convenience. Especially at the final stage of the tournament, a match in a later round is usually not played before the preceding rounds are finished. This can be modeled by specifying tight bounds on the time windows per round that do not overlap for subsequent rounds. If these windows do overlap, a constraint should be specified for these particular rounds. A match in round r may never be played before a match in round r − 1 of the same discipline:

τd,r−1,m≥ τd,r,n r = 2, . . . , Rd; m = 1, . . . , 2r−2; n = 1, . . . , 2r−1; ∀d. (9)

Next, we define constraints on the maximal time T B between two adjacent matches. Most tournaments, especially professional ones, want to keep this gap not too large, as this might result in an unfair difference in waiting times. We divide every round before the final in pairs of two matches. First we assume that there are no byes, so all these matches have to be played. If match m in a round is played after match m − 1, the following has to hold:

τd,r,2m− τd,r,2m−1≤ T B r = 2, . . . , Rd; m = 1, . . . , 2r−2; ∀d. (10)

If match m in a round is played before match m − 1, the next constraint is necessary:

τd,r,2m−1− τd,r,2m≤ T B r = 2, . . . , Rd; m = 1, . . . , 2r−2; ∀d. (11)

To explain these constraints, we take again two semi-finals as an example. For r = 2, these constraints only have to hold for m = 1. Consider the first constraint. If match 2 is played after match 1, the first constraint assures that the difference between the time slots is smaller than T B. If match 2 is not played after match 1, the left-hand side is non-positive and the inequality always holds. Similarly, the second constraint assures that the gap is sufficiently small if match 1 is played after match 2. If this order does not hold, the left-hand side is non-positive and the inequality is always true. These arguments hold for all pairs of matches in every round before the final. This constraint has to be adapted if byes in the schedule are allowed; both matches 2m − 1 and 2m have to be played in round Rd for these constraints to hold.

3.7 Constraints on courts

A few straightforward constraints apply to court availability. We define the following sets, which will be used in the subsequent constraints:

CAj_drm Set j of courts available for match m in round r in discipline d CS Set of courts only suitable for singles matches

CLj_{T C}

j Set j of courts left after a given time T Cj.

Some matches may be assigned to certain courts beforehand, often to increase public interest or due to contractual obligations. These matches usually exist in all rounds and disciplines. Taking the DO as example, two of those special sets are present. Some matches have to be played on court 1, so CA1_drm= {1} for these ones. Other matches have to be played on either court 1 or 2, implying CA2_drm= {1, 2}. For other matches, we simply have CAj_drm= C. To make sure these matches are indeed scheduled at one of those courts, we impose the constraint:

(20)

There exist sports halls, mostly suited for recreational tournaments, with courts very close to the wall or only existing of the lines used in singles matches. These courts are therefore inappropriate for playing doubles and as an application of (12), the next constraint holds for these ones:

xdrmtc= 0 c ∈ CS; d = 3, 4, 5; ∀r, m, t. (13)

For some tournaments, mostly recreational ones, courts are unavailable after a given time in-stance. Other tournaments reserve less courts for the last few rounds or matches, to increase public interest. As an example, the DO has only two courts available for the last three days. For courts that are unavailable from time t on, the following holds:

xdrmtc= 0 c /∈ CLjT Cj; t = T Cj, . . . , T E; ∀d, r, m. (14)

3.8 Constraints on players

Now we consider constraints that affect players performing in certain matches. To include a constraint on the minimum amount of rest time for each player, it is necessary to measure if one can play at a certain time instance. We will explain two ways to do this, of which the second one is introduced in the stochastic formulation of our model. The first way is by using new variables:

zdrmpt Binary variable indicating if player p performs in match m in round r of

discipline d at time t (1 = yes, 0 = no)

ςpt Binary variable indicating if player p performs at time t (1 = yes, 0 = no).

This variable z equals zero if a player cannot compete in a match at all, and equals zero as well if a match cannot be played at a certain time instance. It depends on Y and x in the following way:

zdrmpt = Ydrmp·

X

c∈C

xdrmtc. (15)

As every player can perform in at most one match at a time, ςpttakes on binary values as well.

Summing over all possible matches one can play at a time slot, we obtain: ςpt= X d∈D Rd X r=1 2r−1 X m=1 zdrmpt ∀p, t.

The constraint that assures enough rest time per player is then given by:

t+T R−1

X

u=t

ςpu≤ 1 ∀p, t. (16)

As an example, consider a player scheduled at time t = 60. With T R = 24, we know that the next match will not start before t = 84 as ςp,60 = 1. These constraints have to be defined for

every player and time period of length T R − 1.

Another way to ease the pressure on players is by putting a restriction on the maximal amount of matches one has to play during a given time period. This restriction is sometimes used in multi-day tournaments, often with the first phase of an event played in groups. We define:

(21)

The corresponding constraint is written as:

T P

X

t=0

ςpt≤ M M ∀p, t, (17)

if the number of matches is just restricted from the start of the tournament to t = T P . This constraint can be easily expanded if desired by, for example, assigning different values to T P and M M for specific periods on the day(s).

Organizing committees often receive one or more requests from people who are not able to play on some day before a certain time. It is up to the schedulers whether or not they honor an individual request. We define the following parameter:

N Ppt Parameter indicating if player p is unable to play up to time t, but able to play

after time t + 1 (1 = yes, 0 = no).

This parameter is usually positive for only a few indices. If such a request is honored for a player, we have:

t

X

u=0

ςpu= 0 p, t|N Ppt= 1. (18)

This constraint can be easily adapted for other, less common, cases of unavailability. Players might be occupied temporarily, or have to leave early some day at a multi-day tournament.

3.9 Overview

The constraints listed so far comprise the important scheduling aspects for most tournaments. Next, we formally define the makespan, the variable that will be minimized in our objective function:

Cmax Time to schedule the last match.

This variable equals the start time of the last match(es) if it is minimized. If the entire tournament is scheduled at once, the last matches to be played are finals:

Cmax≥ τd,1,1 ∀d. (19)

It is not necessary to model that Cmax has to be larger than every non-final match, since

con-straints as (4), (5), (7) and (9) make sure that the final is indeed the last match of an event. Finally, the binary and integrality constraints have to hold for the relevant parameters and variables. Summarizing, the model can be described as follows:

min Cmax

s.t. Constraints on start times

(2) Matches can only be played at available time slots.

(22)

(5) No match in a round after a certain time. (6) Every match has to be played once.

(7) Some time slots are (partially) assigned to matches of a certain round. (9) A round has to be finished before the next round is started.

(10) - (11) Two adjacent matches have to be started between a certain time period. (19) The makespan equals the start time of the latest match(es).

Constraints on courts

(12) Some matches are assigned to specific courts. (13) Doubles matches cannot be played on narrow courts. (14) Some courts are unavailable after a given time. Constraints on players

(16) A player has a minimal rest time between the start of two consecutive matches. (17) A player competes in a maximum number of matches in a given time period. (18) A player is not available before a given time.

Some remarks can be made on this formulation. If only a few or no restrictions are set on the time slots to play matches on, constraints (9) and (16) will result in a huge number of inequal-ities. If too many constraints are specified or the time windows are set too narrow, the model will be infeasible. The same can easily result if one out of the many parameters is accidentally defined wrongly; as always, inputting the data should be done with care.

An improvement could be to include variables on rest times in the objective function. Then it becomes possible to maximize the minimal rest time for all players, or to minimize the maximal rest time. Unfortunately, from the model it is not clear in which order the matches are played and optimizing rest times for a player is therefore difficult. Some sensitivity analysis can be done by running the model for different values of T R and check for feasibility and results on Cmax.

Choosing Cmaxas our objective function usually means that there are a lot of optimal schedules

possible. Therefore, the computation time of a MIP solver can be relatively small. As is shown in Section 5, we are able to compute solutions in reasonable time for a tournament of realistic size. Hence, we can expand our model and allow for a second goal: minimizing a function on the total deviations from schedule. To achieve this, the model is extended with stochastic match durations in the next section.

3.10 Valid inequalities

For the interested reader, we conclude this section with some valid inequalities. By includ-ing these constraints in our model, computation time decreases. First, (9) can be formulated differently: X c∈C t X u=N Bdr 2r−1 X n=1 xd,r,n,u,c≥ 2r−1· X c∈C xd,r−1,m,t,c r = 2, . . . , Rd; m = 1, . . . , 2r−2; t = N Bdr, . . . , N Adr, ∀d.

This assures that all the 2r−1 _{matches of round r are scheduled before a match in round r − 1}

will be played. For round Rd, this number equals the matches that have to be played if byes are

(23)

played between N Bd2 and N Ad2. If these matches are not both played at time t, the left-hand

side is smaller than 2 and the final of this event cannot yet be played. If this final would instead be scheduled at time t, the sum over x on the right-hand side would be 1. The multiplication of the constant 2r−1 _{= 2 violates the inequality. This check has to be done for every match m in}

a round and also for every time slot between N Bdr and N Adr. If there are only some pairs of

subsequent rounds with overlapping time windows, these constraints can be specified for those cases separately instead of considering every window of a round.

Instead of treating Y as a parameter, we can pretend that it is a binary variable. Since x ∈ {0, 1} as well, we can ’linearize’ (15):

zdrmpt≤ X c∈C xdrmtc zdrmpt≤ Ydrmp zdrmpt≥ X c∈C xdrmtc+ Ydrmp− 1

for all d, r, m, p and t. Of course, there might be more useful valid inequalities than the ones presented in this section.

4 A stochastic MIR formulation

In this section, we expand our deterministic model to a stochastic one and minimize deviations from schedule. We start off with a few general notes on how to schedule uncertainty in practice. Thereafter, we introduce some notation of a stochastic model in general and describe its random parts. We will formulate a stochastic mixed-integer program with recourse variables (MIR), where matches are scheduled on the same time intervals as in the deterministic model, but have random durations. If we assume that courts are never empty, the problem can be written as a simple recourse model. For every match, its estimated duration depends on the winning probabilities per player up to that round and the corresponding ability differences. The stochastic models, with or without empty courts, can be used to solve small problems to optimality if every match duration can take on a few values. It is not practical to solve larger instances to optimality. Instead of making durations random, we may assume that some matches take on their expected durations, based on the ability differences of the players.

4.1 Estimating durations

(24)

Ideally, a schedule is always on time, with no early or late matches or empty courts. For every minute that a match starts early or late, a penalty could be incurred. The time that a court is idle, to which we refer as emptiness, can be sanctioned as well. A modeling question is how these concepts are weighted against each other. If one is indifferent between matches starting early or on time, total earliness can simply be ignored. Some tournament designers might like early matches, so that a small bonus could apply for the total earliness rather than a penalty. Empty courts are to be avoided in general, although some might not dislike it strongly enough to put a penalty on it. Late matches are never good and the amount of time a match is late is generally more disliked than the amount of time a court is empty. We define the earliness of a match m by the variable:

y_m+ = max(0, Em),

and the tardiness of a match by:

y−_m= max(0, Lm),

where Em denotes the time match m starts early and Lm the time this match starts late. An

objective is then to minimize a function on the total tardiness, earliness and emptiness.

We mentioned that there are generally a lot of schedules with the same optimal makespan in the deterministic model. A natural extension is then, given this makespan, to determine schedules that minimize a function on the total deviation. In general, it is expected that schedules where short and long matches alternate on a court result in the lowest objective values. If too many matches in a row last longer than scheduled, their successors cannot start on time and an un-necessarily high lateness follows. Conversely, too many short matches after each other result in empty courts, if there is a bound on the time a match can start early. If we assume that matches are scheduled at equidistant time slots, it is already nontrivial to find the optimal order for a few matches with stochastic durations on a single court. We will show this by means of an example. One important restriction applies to the analysis in this section. The deterministic model assigns a time slot and court number to every match. However, in practice matches are often allowed to take place at any free court. For example, a Dutch Open match not assigned to court 1 or 2 will be played on the remaining one available at that time, which can be either court 3 or 4. In our upcoming model, we will not check at every time instance if a free court is available. Hence, the earliness and lateness per match as calculated in the model for these courts differs from practice. It is complicated to formulate a linear program that allows a match with random duration to start on any free court. Instead, simulation can be used to evaluate earliness and tardiness of those matches in practice.

(25)

pair of opponents. These probabilities are derived in a way similar to the calculations in Section 3.4; the exact derivation is omitted here.

4.2 Notation

Before we write down our model in general, we will introduce some notation in stochastic pro-gramming similar to Klein Haneveld and Van der Vlerk (2012). Readers either familiar with this notation or more interested in practical applications may skip some of these parts. Let the following LP model be specified:

min

x∈X{cx : Ax = b, T x ∼ h},

where X specifies the simple bounds on x. Ax = b denote hard constraints and cannot be violated. T x ∼ h denote the constraints that can be violated, where ∼ stands for ≤, ≥ or =. We rewrite the LP model as the following penalty model :

min

x∈X{cx + v(h − T x) : Ax = b} = minx∈X{cx + v(z) : Ax = b, T x + z = h}.

In this model, v(z) is the penalty cost function for deviations z = h − T x of the constraints. This penalty function allows for recourse actions: corrections compensating for observed deviations. Define q as the recourse cost coefficients vector and W as the recourse matrix. For some recourse structure (q, W ), it holds that:

v(z) = min

y∈Y{qy : W y ∼ z},

where Y specifies the bounds on y, the vector with recourse variables. Integer variables can be present in both x and y. The function v gives the minimum recourse costs necessary to compensate for deviations z ∈ Rm in the constraints T x ∼ h. These constraints appear in the second stage of the model, while the first stage consists of the constraints Ax = b. Thus, the following LP problem is obtained:

min

x,y{cx + qy :

Ax = b T x + W y ∼ h

x ∈ X, y ∈ Y }.

Now we apply our model to this general notation. All our variables take on integer values, where the makespan Cmax and start times τ are the only non-binary variables. Thus we have x ∈ Zn+.

In our model, h is the vector containing match durations. At the start of every match, we will model its earliness or tardiness. For simplicity, we assume that a match will always be played on the scheduled court, so that the deviations from schedule are independent among different courts. The earliness and tardiness of a match is then implied by the durations and start times of its predecessors.

(26)

Ω = {ω1_{, . . . , ω}S_{} and P (ω = ω}s_{) = π} s(with s = 1, . . . , S) to get: min x,y {cx + π1· qy(ω 1_{) + . . . + π} S· qy(ωS) : Ax = b T (ω1)x + W y(ω1) ∼ h(ω1) .. . T (ωS)x + W y(ωS) ∼ h(ωS) x ∈ X, y ∈ Y }.

In our formulations, either T or h will not depend on ω, which will simplify analysis. However, this model has a huge amount of constraints, even for a moderate value of random variables and their possible realizations. Therefore, we have to keep the number of realizations per random variable small.

4.3 Simple recourse

If we assume that courts are never empty at all, we consider just the total earliness and tardiness. It is then possible to write the model as simple recourse (SR): a common structure where devia-tions from the goal constraints are merely penalized, without modeling actual recourse acdevia-tions. As all second-stage LP’s can be solved in closed form, simple recourse models are much easier to solve than general recourse models. The SR model can serve as an approximation of reality, if the model with emptiness is unable to solve a problem in reasonable time. As the model assumes courts are never empty, matches will start earlier than in practice. Therefore, this model will generally underestimate lateness and overestimate earliness.

First we derive the constraints for the first stage. Some new variables and constraints should be added to the deterministic model from Section 3. To construct a SR model, we introduce a binary indicator matrix denoting the predecessors or ancestors of every match. For easy explanation, we illustrate our model with a small-sized example. At the last day of a multi-day tournament like the DO, the finals are played on a single court. Here, we also assume that a final in every discipline has to be played with one court available. All players and their abilities are known and nobody plays in multiple matches. The expected durations differ among matches, although we schedule them at equidistant time slots as in the previous section. The objective is to minimize a function on the expected deviations from schedule.

Indices m, n Match t Time i Realization

Tournament structure (basis) M Set of matches

T Set of time slots

xmt Binary variable indicating if match m is played at time t (1 = yes, 0 = no)

(27)

Table 2: Match durations and their probabilities for given ability differences Match Adp− Adq ωmi πmi Mean d = m = 1 2.7550 (7, 11) (0.95, 0.05) 36 min d = m = 2 2.2886 (7, 11) (0.90, 0.10) 37 min d = m = 3 1.4871 (7, 11) (0.75, 0.25) 40 min d = m = 4 1.0631 (7, 11) (0.65, 0.35) 42 min d = m = 5 0 (7, 11) (0.50, 0.50) 45 min T M Scheduled match duration

ωi

m Realization i of the duration of match m

πi

m Probability that realization ωimoccurs.

We assume that M = {1, 2, . . . , 6}, where we end with dummy match 6. This dummy match is necessary to measure the early or late end time of the last match. We take as time unit five min-utes and assume that a match takes on average 40 minmin-utes to play. Hence, T M = 8 and matches can be played at T = {0, 8, . . . , 40}. In this example, we have ω_mi ∈ N for all m and i; note that ωmi ∈ R+can hold as well, while τm∈ N always holds for all m. In Table 2, the probabilities πim

on two or three games, implying a duration of respectively 35 or 55 minutes, are listed for every realization of ωim. The ability differences are stylized to result in nice values for these

probabil-ities. Note that the average estimated match duration equals 40 minutes. The actual winning probabilities Wdpqare omitted as they do not have any influence on the optimality of the scheme.

First stage

We start with the first-stage constraints. Every match has to be played once, as in (6): X

t∈T

xmt= 1 ∀m. (20)

Every available time slot we plan a match. This is a special case of (7): X

m∈M

xmt= 1 ∀t. (21)

As in (8), the planned start time τ is directly implied by other variables in the model: τm=

X

t∈T

t · xmt ∀m.

The dummy match is played at t = M · T M , where M denotes the total number of non-dummy matches. Thus, in this case:

τ6= 5 · T M. (22)

In order to obtain simple recourse, we have to model the predecessors for every match. We define: βmn Binary variable indicating if match n is played before match m (1 = yes, 0 = no).

(28)

following constraint:

τm− τn≤ Mτ· βmn, ∀m, n. (23)

where Mτ is a sufficiently high number.

Second stage

Let the following second-stage concepts be defined: ρm(ω) Realized start time of match m

ym+(ω) Earliness of match m

ym−(ω) Tardiness of match m

q+ Cost parameter for an early match q− _{Cost parameter for a late match.}

As ωi

m∈ N for all m and i, we have ym+(ω), ym−(ω) ∈ N as well for all m. Note that q+ < 0 can

hold if an early match is (slightly) better than a match starting on time. In any case, q+_{+ q}− _{≥ 0}

for a simple recourse model.

The realized start time of a match m is calculated as follows: ρm(ω) =

X

n∈M

ωmnβmn ∀m.

Using this, the earliness and tardiness of a match are given by

ρm(ω) + ym+(ω) − ym−(ω) = τm ∀m. (24)

This implies that the matrix T depends on ω, while h is deterministic. As always with simple recourse, the penalty function for the i-th constraint can be defined as:

vi(zi) = qi+(zi)++ q−i (zi)−,

where zi denotes the deviation in the i-th constraint of (24).

4.4 Model with emptiness

In practice, the time a match can begin before its planned start time is often bounded, resulting in an empty court if a match ends too early. In this section, we propose a model that includes the possibility of an empty court and we will refer to it as the emptiness model. First, we define the following:

qe _{Cost parameter for an empty court}

em(ω) Time that the court is empty between match m and its predecessor

EA Maximum time a match can be started early.

Including qe _{in our model, the simple recourse structure does not hold any longer. If a match}

ends before the next match on that court can be started, the court stays empty and qe

pe-nalizes this time. As qe _{> q}+_{, large deviations are penalized harder than smaller ones and it}

(29)

can only serve as an approximation of reality. First stage

Instead of exploring the MSR as an approximation of reality, we will now construct a model that defines emptiness correctly. We use the same notation as in the SR model and need constraints (20) - (22) as well. Furthermore, we add a dummy match to the model which will be played at time −T M . This dummy match is necessary for defining constraints in the recourse part and for constructing valid inequalities, which will be explained later. Instead of measuring all ancestors per match, we now measure if two matches are played in succession:

δmn Binary variable indicating if match n is played directly after match m

(1 = yes, 0 = no).

We know that every match needs has a successor, except for the last match:

6

X

n=1

δ_mn0 = 1 m = 0, . . . , 5, (25) and every match has a predecessor, except for the first match:

5

X

m=0

δ_mn0 = 1 n = 1, . . . , 6. (26) At least one of these two constraints should be implemented. Including (25) in the model makes (26) a useful valid inequality, and vice versa.

Second stage

To explain the second stage constraints, we first assume that starting a match early is slightly better than starting it on time and that a late match is worse than an empty court. Possible values are:

q+= −0.01; q−= 0.5; qe= 0.2; EA = 1.

As mentioned earlier, a match cannot start too early. This only holds for non-dummy matches; the last dummy match starts at the time the last actual match has ended.

y_m+ ≤ EA m = 1, . . . , 5. (27) We know that dummy match 0 will always start at time −T M and is never early or late:

τ0+ y+0 + y −

0 + e0= −T M. (28)

Now we can show how the earliness and tardiness of a match depends on the end time of its predecessor and the past realizations of ω. We know that τn − τm = T M if match n follows

(30)

where ωm is the duration of match m. To show this mechanism in practice, we consider an

example. Take dummy match 0 with ω0 = 8 and let match 1 follow at time t = 0. From (28)

and (29), we get

e1= y1+= y − 1 = 0.

Now assume that match 1 takes ω1= 7 time units and match 2 follows afterwards with a planned

start time of τ2= 8. As EA = 1, match 2 can already start at t = 7. We obtain:

8 − 0 − e2− y2++ y −

2 + 0 − 0 = 7.

As a reward q+₂ is incurred if y₂+is positive, the variable will be as large as possible in an optimal solution. Thus, y+₂ = 1 and y₂−= e2= 0. This mechanism continues until the last dummy match

is scheduled. Note that for real-valued ω, the recourse variables can take on non-integer values as well, while the τ in the equations necessarily stay integer. As said before, the constraints involving ω only have to hold for two subsequent matches. Similar to (23), we have δmn= 1 if

and only if match n is scheduled directly after match m. Since (29) is an equality instead of an inequality, an additional constraint is needed, as explained in Williams (2013). We obtain:

T M − en− yn++ y − n + y + m− y − m− ωm≤ My· (1 − δmn) (30) T M − en− yn++ y − n + y + m− y − m− ωm≥ my· (1 − δmn) (31)

for all m, n, where My and my are upper and lower bounds on the left side of the constraints.

For the final constraints in the model, we consider the case that q+ _{≥ 0, while q}− _{and q}e_remain

the same. If a match ends early, it might then be profitable in an optimal solution to leave the court empty and start the next match on time, to prevent that match from ending early as well. Conversely, if a match is started early and will end early, a higher total penalty could be incurred. However, as durations are not known beforehand, matches will in practice be announced when possible. Hence, it has to be modeled that a court should never be unnecessarily empty during the day. Additional constraints are needed to assure that em> 0 only if ym+ = EA for a match

m. Thus, one out of the two following constraints can hold at a time for a given m: em> 0 or

y+

m< EA. Reversing these constraints and defining the binary variable ηm and a large number

Mq, we can model this as follows for all m:

em≤ Mq· ηm (32)

y+_m≥ EA − Mq· (1 − ηm). (33)

4.5 Overview

Now we formulate the objective function of the model with emptiness and give an overview of its constraints. As the makespan is already determined, we only consider the deviations in the second stage. In the example, there are 25_{= 32 possible scenarios for the match durations. Let}

(31)

Table 3: Optimal match orders for different cost parameter values Simple recourse Emptiness model (q+_{= 0)}

Cost parameters Match order Cost parameters Match order −1 ≤ q+_/q−_{< 0.56} _1-2-3-4-5 _{0 ≤ q}e_/q−_{< 0.17} _1-2-3-4-5

0.56 ≤ q+/q−≤ 1 1-2-5-3-4 0.17 ≤ qe/q− ≤ 1 1-2-4-3-5 described as follows, where the binary and integrality constraints have to hold as well:

min 32 X i=1 π∗_i · 6 X m=1 q+y+_m(ωi) + q−y_m−(ωi) + qeem(ωi) ! s.t. First-stage constraints

(20) Every match has to be played once.

(21) Every available time slot a match is played. (22) The second dummy match ends the tournament.

(25) Every match has a successor, except for the last dummy match. (26) Every match has a predecessor, except for the first dummy match. Second-stage constraints

(27) A match cannot start too early.

(28) The first dummy match always starts at time 0.

(30) - (31) The start time of a match is determined by the match durations, earliness, tardiness and emptiness up to then.

(32) - (33) A court may never be unnecessarily empty.

The obvious downside of this MIR is its size. A lot of constraints and variables are defined for calculating the deviations of only 5! = 120 match orders. However, the example above is con-structed to show the mechanism. In the next section, it will be implemented for the Wednesday of the DO, where all other kinds of constraints on rest times and adjacent matches have to hold as well.

The optimal match orders of the example are given in Table 3 for some meaningful parameter choices in both models. For a small ratio of q+_/q−_{, the simple recourse model can serve as an}

approximation of the model with emptiness, as the optimal orders are the same and matches are just ordered by increasing estimated duration.

If the entire model is written as a large LP, the special structure of the SR model cannot be utilized well and a mixed-integer solver needs computation time similar to the model with emptiness. Due to the integrality of the τ variables and the binary constraints, these computation times of both the SR and the model with emptiness are disappointing, which we illustrate next. We take nonzero values for q+ _{and q}− _{in the SR model and let q}− _{and q}e _{be both positive in}

the model with emptiness, keeping q+ _{= 0. With these parameters, both models can solve the}

toy example of this section in two seconds. For a tournament with six matches and 26 _possible

scenarios, computation time heavily depends on the values of the cost parameters, especially in the SR model where it can take 5 to 30 seconds for different ratios of q+_/q−_{. Solving an instance}

(32)

the model with emptiness and five minutes in the SR model. Adding additional matches and scenarios result in hours of computation time. For realistic problems with dozens of matches with random durations, the number of scenarios should therefore be very small. A possibility is to consider the EV model: one scenario where each match lasts its mean duration. The optimal solution to this model can then be evaluated by simulation and taking (a sample of) all possible distributions, as shown in the next section.

4.6 Valid inequalities

Some valid inequalities can be constructed for both the simple recourse and the emptiness model. In (30) and (31) of the latter model, the second stage variables can be ignored. Furthermore, it can decrease computation time to replace T M by τn− τm, resulting in:

τn− τm− T M ≤ Mδ· (1 − δmn)

τn− τm− T M ≥ mδ· (1 − δmn)

where Mδ is an upper bound on τn− τm− T M and mδ a corresponding lower bound. These

vari-ables and constraints on successive matches could also be included in the deterministic model. However, this has to be defined for each court, and corresponding dummy matches should be added to all courts as well. The number of variables and constraints becomes therefore quite high in realistic instances. In our tests, computation time went up by including these constraints. It is of interest to look at another way to define rest time constraints. This formulation does not result in faster computation times in our deterministic model, but it is convenient for this simple stochastic model. In the deterministic model, the Y parameter values imply the matches with common players. In this simple model, these parameters or z variables in rest time constraints are not considered. The following parameter can be used instead:

CPmn Parameter indicating if a player can take part in both match m and n

(1 = yes, 0 = no).

For two consecutive matches, the difference in start times has to be at least T R and this can only hold if match n is played after match m, thus if βmn = 1. The constraints assuring rest

time then become:

τm− τn≥ T R − 1 − Mc· βmn (34)

τn− τm≥ T R − 1 − Mc· (1 − βmn), (35)

for all m, n with CPmn= 1, where Mc is a sufficiently large number.

Scheduling multi-event knockout tournaments in badminton

Scheduling multi-event knockout tournaments in

badminton

Scheduling multi-event knockout tournaments in

badminton

Contents

1

Introduction

2

Literature review

3

A deterministic MIP formulation

3.1

Modeling choices

3.2

Problem setting

3.3

Players per match

3.4

Winning probabilities

3.5

Time planning

3.6

Constraints on start times

3.7

Constraints on courts

3.8

Constraints on players

3.9

Overview

3.10

Valid inequalities

4

A stochastic MIR formulation

4.1

Estimating durations

4.2

Notation

4.3

Simple recourse

4.4

Model with emptiness

4.5

Overview

4.6

Valid inequalities

5

Applications and results