Next Generation Sequencing in the UMC Groningen

(1)

August 14, 2015

MASTER THESIS

NEXT GENERATION SEQUENCING IN THE UMC GRONINGEN

Imke Gerritsma

Faculty of Electrical Engineering, Mathematics and Computer Science (EEMCS) Stochastic Operations Research

Committee:

Prof.dr. R.J. Boucherie

Dr. T.J. de Koning

J. Dijkhuis

Dr. J.B. Timmer

Dr. B. Manthey

(2)

(3)

Preface

This report is the result of my graduation project, which I have completed as a part of the Master’s programme in Applied Mathematics of the University of Twente (UT). The research, described by this report, is performed at the diagnostics department of the University Medical Center Groningen (UMCG), from September 2014 until August 2015. The main goal of the project is to improve the organisation of DNA diagnostic processes on the diagnostics laboratory.

This project gave me the opportunity to combine two interest of mine: mathematics and healthcare.

It allowed me to show the managers and technicians of the UMCG what can be achieved with applying operations research in their everyday work. I really enjoyed their enthusiasm about modelling and their trust in my results. I’d like to thank Tom de Koning and Jos Dijkhuis, for guiding me during my time in the hospital. They were always willing to organise everything I needed for the research.

I would also like to thank Richard Boucherie, for showing me the possibilities of applying mathematics in healthcare, for providing me with this project and supporting me throughout. Thank you for thinking along with me during the discussions in our sessions, and for always being critical on what I came up with.

My sincere thanks to my family, for letting me stay over when I was working at the hospital. Thank you for all the rides to the bus station, the warm meals ready for me when I came home, and the love and support always.

Imke Gerritsma

Page i

(4)

(5)

Summary

Next-Generation Sequencing is becoming increasingly important in DNA diagnostic laboratories. It is to be expected that within a few years time nearly all traditional Sanger sequencing will be replaced by NGS. The University Medical Center Groningen (UMCG) has already transitioned most of their testing to NGS. For the use of NGS, work flows need adjustment. In contrast to Sanger sequencing, NGS methods require longer processing times stressing the need of efficient planning of all procedures.

This report analyses the NGS processes in the diagnostics laboratory of the UMCG, and aims to find scheduling tactics to improve the work flow of DNA diagnostics.

Currently, the UCMG uses a static schedule to plan the NGS processes. The schedule serves as a guideline to execute the processes. The technicians decide, based on the amount of samples waiting to be tested, which processes of the schedule are started.

Three sample types can be identified. The types “NGS” and “SSID” have a target date 42 days after their arrival. At the target date, results of the DNA tests have to be analysed and a letter with the out- come has to be sent to the physician. Additionally, the sample type “Rush” can be identified. When a physician has a pressing case, he asks for the DNA tests to be rushed. These rush samples have a target date 14 days after their arrival. Rush samples are not processed with NGS yet, because the throughput times are too long.

This research aims to find scheduling tactics to improve the planning of NGS processes. A three phase approach is designed to aid this goal. First, a weekly schedule is developed with a Mixed Integer Linear Programming (MILP) model. NGS processes can be started and executed according to this schedule.

Then, a Markov Decision Problem (MDP) model finds an optimal policy for selecting NGS processes, according to a certain performance criterion. Finally, different policies are compared using a simulation model.

Two policies can be identified as optimal, one for minimising the throughput time of DNA samples, the other for minimising processing costs. However, due to the simplifications made during the modelling process, the current results are not ready for implementation right away. Further research is advised.

Page iii

(6)

(7)

Preface i

Summary iii

1 Introduction 1

1.1 DNA diagnostics and Next Generation Sequencing . . . . 1

1.2 Problem description . . . . 2

1.3 Outline of the thesis . . . . 2

2 Context analysis 3 2.1 The UMCG and the diagnostics laboratory . . . . 3

2.2 Arrival process . . . . 3

2.3 The simplified NGS process . . . . 6

3 Approach 9 3.1 Method description . . . . 9

3.2 Model description . . . 10

3.3 Mathematical model . . . 11

3.4 Implementation difficulties . . . 12

3.5 Solution heuristic . . . 13

4 Phase 1: MILP 15 4.1 Method description . . . 15

4.2 Model description . . . 15

4.3 Model simplifications . . . 16

4.4 Mathematical model . . . 16

5 Phase 2: MDP model 19 5.1 Model simplifications . . . 19

5.2 Mathematical model . . . 20

6 Phase 3: Simulation 23 6.1 Method description . . . 23

6.2 Model description . . . 24

7 Numerical experiments 27 7.1 Numerical experiments with the MILP model . . . 27

7.2 Numerical experiments with the MDP model . . . 31

7.3 Numerical experiments with the simulation model . . . 34

8 Conclusion and recommendations 37 8.1 Conclusions . . . 37

8.2 Discussion . . . 38

8.3 Recommendations . . . 39

Page v

(8)

(9)

Chapter 1 Introduction

The diagnostics department of the University Medical Center Groningen (UMCG) performs DNA diag- nostics with two main methods. These methods are called Sanger sequencing and Next Generation Sequencing (NGS). Currently, the diagnostics laboratory is transitioning most of their DNA sequencing work to the second method, NGS. NGS has higher throughput against lower cost. However, the pro- cessing time of NGS is significantly longer than the processing time of Sanger sequencing. Requests for DNA testing with NGS are increasing rapidly. Therefore, it becomes more and more important to plan the process efficiently. The aim of the research, presented in this master’s thesis, is to investigate the process of DNA diagnostics with NGS methods, focussing on the possibility of lowering the throughput time of DNA samples.

1.1 DNA diagnostics and Next Generation Sequencing

Deoxyribonucleic acid (DNA) is a molecule that functions as the most important carrier of genetic infor- mation. A single DNA molecule consists of two biopolymer strands, coiled around each other to form a double helix. The strands are composed of simpler units, called nucleotides. These nucleotides are formed by a nucleobase, a monosaccharide sugar called deoxyribose, and a phosphate group. There are four types of nucleobases: cytosine (C), guanine (G), adenine (A) and thymine (T). The order of the nucleotides carries the genetic information. DNA sequencing is the process of determining the precise

Figure 1.1: The structure of a DNA molecule

order of nucleotides within a DNA molecule. There are a lot of methods for DNA Sequencing. The newest methods, that have high throughput, are called Next Generation Sequencing (NGS) methods.

These methods can sequence thousands or millions of sequences concurrently.

Page 1

(10)

1.2 Problem description

The UMC Groningen is transitioning their DNA sequencing method more to Next Generation Sequenc- ing. In contrast to Sanger sequencing, NGS methods require longer processing times, stressing the need of efficient planning of all procedures. Optimizing planning and production structures can be ef- fectively done with the help of Operations Research. A first step has been made, by creating a static schedule for multiple processes. The schedule consists of three processes, running concurrently, with as many samples in the processes as possible. If a process is started with less samples, the processing times of certain steps would be smaller than in the created schedule. The processing time of every step in the static schedule thus serves as an upper bound of the actual processing time.

In the last few years, the diagnostics laboratory of the UMCG has decreased the throughput time of regular DNA diagnostics significantly. The UMCG strives to be the fastest operating laboratory in the Netherlands. To achieve the low turnaround time, every incoming DNA sample is tested right away.

This approach is not suitable for NGS methods. It is too expensive to process every patient right away, because of the high costs of an NGS process. Therefore, some DNA samples will have to wait, until enough samples are available to make a batch and process them simultaneously.

The static schedule offers no directions for optimal batching strategies. The influence of choosing small batches with shorter access times, or bigger batches with longer access times is not reviewed before.

Moreover, rush samples with a processing deadline of two weeks are not being processed with NGS at all. Without modelling the system, it is unclear how to incorporate the rush samples into the NGS processes. Optimally, a new batching strategy would enable the diagnostics laboratory to process rush samples with the NGS method. This is desirable, since the new method, NGS, is more reliable than the old method and gives a lot more results.

The aim of this research is to improve the scheduling tactics of the diagnostics laboratory. Instead of using the static schedule, it is examined if it is possible to find a scheduling policy. This policy should describe when to start a new process, and of which size this process should be.

Research question

What is the optimal way to organize the NGS processes, weighing the throughput time of DNA samples and the costs for the laboratory, using the current available resources?

Subquestions

1. What is the current situation of the NGS processes, the organizational characteristics and the throughput time of samples, at the diagnostics laboratory of the UCMG?

2. What is the optimal way to organise the NGS processes, when minimizing the throughput time of the samples?

3. What is the optimal way to organise the NGS processes, when mainly considering the costs of processing and the occupation of the resources?

4. Is it possible to incorporate the testing of Rush samples into the NGS processes?

1.3 Outline of the thesis

The current situation of the NGS processes in the UMCG is described in Chapter 2. The chapter gives an analysis of the arrivals of samples and the flow of an NGS process. Chapter 3 describes the approach of the research. Since a mathematical model cannot be formulated directly, a solution heuristic is developed. The research is done in three phases: developing a schedule, developing scheduling policies, and evaluating and comparing the policies. The phases of the three phase approach are each described in a chapter. Phase one is treated in Chapter 4. Phase two is treated in Chapter 5. The last phase is treated in Chapter 6. The results of the three phase approach are given in Chapter 7.

A weekly schedule is found, several scheduling policies are obtained and these policies are simulated

and compared. Finally, conclusions about optimal scheduling tactics are discussed in Chapter 8. Two

optimal policies are identified, one that minimises throughput time and one that minimises processing

costs.

(11)

Chapter 2 Context analysis

This chapter describes the organizational characteristics of the diagnostics laboratory in the UMCG. A description of the arrival process and the NGS process will be the building blocks for the mathematical models that are developed further on in this report. Parts of their analysis are derived from Gerritsma [2015]. There is not enough data available for a reliable performance analysis of the current system.

Therefore, an analysis of the access times and throughput times is omitted.

Section 2.1 describes some background of the UMCG and the organisation of the diagnostics labo- ratory. Section 2.2 describes how samples of DNA become available for the NGS process. The flow of requests and the arrival rates of samples are treated. Section 2.3 describes the NGS process, step by step, with a clarifying flow-chart.

2.1 The UMCG and the diagnostics laboratory

In 1797, Professor Thomassen Theussink was responsible for the establishment of a real teaching hos- pital in Groningen, the Nosocomium Academicum. This hospital can be seen as the earlier version of the University Medical Center Groningen. Nowadays, the hospital is occupied by over 15,000 patients, employees, students and visitors every day. The UMCG has room for over 1300 patients and has more than 800,000 consults and visits a year [University Medical Center Groningen, 2014].

DNA diagnostics in the University Medical Center Groningen falls under the Genetics department. The department deals with all aspects of heredity, patient care as well as scientific research. The whole de- partment consists of about 250 employees. The genetics department is divided into sections. The DNA testing for diagnostic purposes is performed by the section genome diagnostics. This section is split up into four teams: general and technical, molecular genetics, molecular cytogenetics and karyotiping.

In the process of DNA diagnostics with the NGS method, team 1 (general and technical) and team 2 (molecular genetics) are involved. Team 1 performs the preparation work, by isolating the DNA from a blood sample. Team 2 performs the actual procedure for sequencing the DNA. The organization of the work of team 1 will not change due to the transition to the NGS method. The work of team 2, however, changes drastically. Therefore, this research will focus solely on the work of team 2.

2.2 Arrival process

There are two types of arrivals for the Next Generation Sequencing process. In one case, a patient needs testing that hasn’t been tested before. This patient needs to donate a blood sample to the laboratory. In the other case, a test is requested for a patient that has been tested before. Sometimes there is still DNA from this patient available in the laboratory. In this case, the patient does not have to donate blood. When there is no DNA available for the laboratory, the arrival process goes as follows.

The process starts with the visit of a patient to a physician. When the physician decides that a DNA test

is necessary, he fills in a form and sends this form to the secretariat of Genetics. The patient has to get

its blood drawn and a tube with the blood goes to the Genetics department. The tube of blood goes to

Team 1, the team that isolates DNA from the blood. The form goes from the secretariat, who puts the

request in the system, to the technicians of Team 2, who decide which kind of test is necessary for the

Page 3

(12)

Figure 2.1: Flowchart of arrival process

research. They put the test and the patient number in the “work list”. Usually the isolation of the DNA takes longer than putting the request on the work list. This means that as soon as the DNA is isolated, Team 2 can start testing.

When a request for a test is submitted for a patient who has been tested before, there can already be isolated DNA in the laboratory. Team 2 can start testing when the request is added to the work list, since they don’t have to wait for the blood to be isolated. The steps “Draw blood” and “Isolate DNA from bloodsample” are skipped in this case.

This means that there are two different ways in which samples become available for testing for the NGS process. In the first case the isolation date of the DNA is the arrival date of the sample. In the second case the entry in the work list is the arrival date. A schematic overview of the pathway of the samples can be found in Figure 2.1.

2.2.1 Arrival rates

The UMCG documents every sample of DNA that arrives for diagnostic testing. The sample is added to a database, which stores information like test type, isolation date, start date, end date and target date.

Figure 2.2 shows an example of this documentation.

Figure 2.2: Exerpt of the documentation of samples

The arriving samples can be divided into three categories. The first category is ’NGS’. This category includes samples for different kinds of tests, for example oncology or cardiovascular tests. The samples have to follow the NGS trajectory described in Section 2.3.

The second category is ’SSID’. This is the test for mental retardation. The samples follow an adjusted

version of the described NGS trajectory. Samples of this category are tested in trios: mother, father and

child. Because the SSID samples are tested for a wider range of genes, they need another machine for

the sequencing; they are processed on the HiSeq instead of the MiSeq.

(13)

The deadline for both the ’NGS’ samples and the ’SSID’ samples is 42 days. Before the deadline, a letter with the result of the tests should be delivered to the physician that requested the tests.

The third category is ’Rush’; this concerns the samples that would like their results back within two weeks. Rush samples aren’t being sequenced with NGS right now, because it is too hard to meet the two week deadline. Sometimes, samples are asked to be ready sooner than 42 days, but not as quick as rush samples. For example, deadlines of 21 or 28 days are possible. These samples are incorporated into the set of rush samples.

From the sample input days and the sample isolation dates, the arrival times of the samples are derived.

For the analysis, only data of the year 2015 is used. According to the diagnostics laboratory, this data is most representative for the current situation.

The arrivals are tested for certain distributions. Rush samples seem to follow a Poisson distribution, with parameter 0.24. However, there is not enough data available to found this claim with a Chi-squared statistical test. The other types of samples do not seem to fit a Poisson process, therefore an empirical distribution will be used. The arrival distribution for the NGS samples can be found in Table 2.1a and the distribution for SSID samples in Table 2.1b. The total arrival rate, which consists of the arrivals of NGS as well as SSID samples, can be found in Table 2.2. When the samples are being tested, they are processed in batches. The possible batch sizes are multiples of eight. Therefore, it can be helpful to batch the arrivals. Batching the samples and counting the arrivals for batches only, is displayed in Table 2.3. Batching can be done in several ways. For this research, it is done as follows. Every arriving sample is taken into consideration. If one sample has arrived during the day, this is counted as the arrival of one batch. The same holds if two up to eight samples have arrived during a day. All days that nine op to 16 samples arrived, are counted for the arrival of two batches. There is no documentation of more than 16 samples arriving during one day.

Table 2.1: Number of samples arriving per day divided by type.

(a) Number of NGS samples arriving per day.

#samples 0 1 2 3 4 5 6 7 8 9 10 11

#occurences 8 4 9 10 6 10 9 6 1 5 3 1

Probability .111 .056 .125 .139 .083 .139 .125 .083 .014 .069 .042 .014

(b) Number of SSID samples arriving per day.

#samples 0 1 2 3 4 5 6 7 8 9 10 11

#occurences 36 4 4 10 0 1 2 0 0 0 0 0

Probability .632 .070 .070 .175 .035 0 0 0 0 0 0 0

Table 2.2: Number of samples arriving per day.

#samples #occurences Probability

0 6 0.083

1 5 0.069

2 2 0.028

3 8 0.111

4 7 0.097

5 14 0.194

6 7 0.097

7 8 0.111

8 4 0.056

9 4 0.056

10 3 0.042

11 2 0.028

12 1 0.014

13 0 0

14 1 0.014

Table 2.3: Number of batches arriving per day.

#batches Probability

0 0.083

1 0.764

2 0.153

Page 5

(14)

2.3 The simplified NGS process

The aim of this research is to find planning tactics for NGS. The main concern is when to start a pro- cess, and how to batch the samples. Therefore, the interest does not lie with an exact planning of the steps of the processes. Instead, a more general schedule of the processes is sufficient. To evaluate the flow of samples, it is only necessary to know when the samples arrive and when they are done being processed. Therefore, contrary to Gerritsma [2015], all the steps in the NGS process that don’t have a significant contribution to the processing time of a batch, are left out. These steps are mostly preparation steps that can be done during the waiting time of another step. The flow of samples through the simplified NGS process is described below.

Figure 2.3: NGS process Arrivals

The arrivals occur as described in section 2.2. The product that arrives is an isolated DNA sample. The NGS process starts with a batch of those isolated DNA samples.

Quality control

The process starts with the quality control of the samples. The analysts test if the concentration of DNA in every sample is high enough. Extra samples are taken into account for the tests. If a sample turns out to be bad, it can be exchanged for another sample.

Covaris

After testing, the samples go in the Covaris. The first part of the task consists of preparing the machine. When the Covaris is prepared, the samples are treated in it. This machine breaks the DNA into fragments.

Size selection and purification

After the DNA is fragmented, the samples are put on the next machine called the Bravo Automated Liquid Handling Platform, or in short the Bravo. This is a robot that can perform several protocols. In this step, the Bravo makes sure only the right size fragments are preserved and the rest of the fragments are washed away.

Sample preparation

Then, the Bravo runs another protocol, where the samples are prepared for amplification and hybridization. Adapters are ligated on the ends of the fragmented DNA. These adapters are a starting point for the DNA multiplying PCR reaction and a connector for sample specific bar codes.

Prepare for PCR

The bravo mixes the samples with a mixture, to prepare for the PCR reaction. This mixture contains nucleotides that can bind to the DNA fragments, to copy and multiply them.

PCR

The samples are put in the PCR machine for multiplying the DNA with a Polymerase chain reaction.

Purification

The samples are purified on the Bravo machine. The bad fragments and unused nucleotides are washed away.

Verification on tapestation

The samples get tested, to verify that the first round of selecting and

amplifying has gone according to plan.

(15)

Figure 2.4: NGS process con- tinued

Normalization

In the previous step, the concentration of the samples is measured. In this step the samples are normalized, which means that all concentra- tions are made to be almost equal.

Concentrate

After the normalization, the samples are being concentrated. There- fore, the samples are put in a PCR machine for an hour.

Hybridization

The Bravo mixes the hybridization mixture and the samples to- gether.

Incubate

After the hybridization, the samples are incubated. The samples are put in a machine for 20 to 24 hours. In this step, the contents of the hybridization mixture bind with the DNA fragments.

Capturing

At the moment, rightly prepared fragments of DNA are in between strands of DNA and buffer material. To extract the DNA fragments, they have to be ’captured’. The bravo runs a protocol to capture the rightly prepared fragments of DNA.

Prepare for PCR

It is important to have a good concentration of DNA in the samples for the remainder of the process. Therefore, the samples get amplified in the PCR machine again. First, the samples are prepared for the PCR machine on the Bravo.

PCR

The samples are put in the PCR machine for multiplication.

Post capture washing

Throughout the process, the DNA has always been in some kind of buffer fluid. At this point, only the strands of DNA will be processed fur- ther. Therefore, all buffer fluids and wrong DNA fragments are washed away. This happens during the post capture washing protocol on the Bravo.

Determine concentration

The next step is to put the samples on the MiSeq machine. Since this is a long and expensive step, it has to be checked first if the samples are prepared right. Furthermore, the samples have to be pooled to- gether in a certain proportion in the MiSeq. In this step, these aspects are checked.

Figure 2.5: NGS process con- tinued

MiSeq preparation and MiSeq run

The last step for the samples is to be put in the MiSeq ma- chine. Here, the actual sequencing takes place. The sam- ples will be in the machine for about 23 hours and the re- sult is a lot of data on a computer. Before the samples can be placed in the MiSeq, the machine and the samples have to be prepared. The samples have to be mixed with different flu- ids and eventually put together. The machine has to be turned on and properly prepared. The samples are placed in the ma- chine and the machine sequences the DNA and produces digital data.

Wash MiSeq

The data is extracted from the machine and put on an external disc.

The machine is cleaned.

Nextgene

A thorough analysis is done on the data.

Coverage

The final step is to decide if the samples have all gone through the process right.

Page 7

(16)

(17)

Chapter 3 Approach

In this chapter, a Markov Decision Problem model is developed. The MDP model should assist in developing a planning strategy for the NGS processes. Section 3.1 gives the necessary theoretical background information about MDP models. Section 3.2 describes the MDP formulation of the real world problem. Section 3.3 describes this mathematical model. Several problems occur with formulating the MDP model. These difficulties are discussed in Section 3.4. A solution heuristic is proposed in Section 3.5.

3.1 Method description

A Markov Decision Process (MDP) is a mathematical model, that consists of five elements: decision epochs, states, actions, transition probabilities, and rewards.[Puterman, 2005, p. 17] The model de- scribes a probabilistic system that can be influenced by an outside decision maker. The decision mak- ers goal is to choose a sequence of actions, which results in an optimal performance of the system, with respect to a predetermined performance criterion. The system is ongoing. Therefore, decisions must not be made short-sighted. The choice of action must anticipate the opportunities and costs of possible future states.

Decisions are made at fixed moments in time, called decision epochs. The set of decision epochs is denoted by T , and can be discrete or continuous, finite or infinite. When decisions are only made at discrete time points, the time is divided into periods, or stages. The decision epochs correspond with the start of a period. When the set of decision epochs is finite, the set T is denoted as T = {1, 2, . . . , N }, for some integer N < ∞. If T is infinite, T = {1, 2, . . .}. Elements of T are denoted by t, and often referred to as “time t”.

At each decision epoch, the system is in a certain system state. The set of possible states is denoted by S. An element of the set S is denoted by s.

The actions that are available to the decision maker can depend on the system state. The set of actions at every state is then denoted by A

s

. A certain action a is only available to the decision maker, if at time t the system is in state s and a ∈ A

s

.

Every possible action corresponds with a reward, r(s, a). If r(s, a) is positive, it can be seen as profit of a certain action, whereas if r(s, a) is negative, it can be seen as a cost.

When an action is chosen by the decision maker, the system moves on to the next period. Since the model concerns a probabilistic system, the next state of the system isn’t fixed. Instead, there are tran- sition probabilities p(s

⁰

|s, a), which describe the probability of going to state s

⁰

, when the prior state was s and the choice of actions in that state is a.

The collection of objects

{T, S, A

s

, p(·|s, a), r(s, a)}

is called the Markov decision process or Markov decision problem. The qualifier “Markov” is used be- cause the transition probability and reward functions depend only on the past through the current state and the action selected in that state.

As said before, the goal of the decision maker is to find an optimal sequence of actions. A sequence

Page 9

(18)

of actions is called a policy. A policy describes which action has to be chosen for every state, at every decision epoch. The policy is called stationary if the decisions are the same at every decision epoch.

The policy is called deterministic, if in every state one action is chosen with certainty. If several actions can be chosen with a certain probability, the policy is randomized.

The MDP model can be solved by linear programming. The linear program (LP) that represents the MDP is as follows.

Minimize X

s∈S

X

a∈As

r(s, a)x(s, a), subject to X

a∈A_s0

x(s

⁰

, a) − X

s∈S

X

a∈A_s

λp(s

⁰

|s, a)x(s, a) = α(s

⁰

) for s

⁰

∈ S,

x(s, a) ≥ 0 for a ∈ A

s

and s ∈ S.

(3.1)

The optimal solution of the LP can be connected to the optimal policy for the MDP in the following way.

[Puterman, 2005, p.227]

Theorem 3.1.1. Suppose we have bounded rewards: |r(s, a)| ≤ M < ∞ for all a ∈ A

s

and s ∈ S. Then:

a. There exists a bounded optimal basic feasible solution x

^∗

to the LP.

b. Suppose x

^∗

is an optimal solution to the dual linear program, then (d

x^∗

)

^∞

is an optimal policy.

c. Suppose x

^∗

is an optimal basic solution to the dual linear program, then (d

x^∗

)

^∞

is a deterministic optimal policy.

Here, the policy (d

x^∗

)

^∞

is defined as:

P {d

x

(s) = a} = x(s, a) P

a⁰∈A_s

x(s, a

⁰

) , for each s ∈ S, P

a∈As

x(s, a) > 0.

The output of the model will be a policy. A stationary deterministic policy is highly preferred, since this is the type of policy that is best to implement. When the state- and action spaces are finite, as is the case in this research, it is always possible to find an optimal deterministic policy. [Puterman, 2005, p. 90]

3.2 Model description

The MDP model in this research, describes the system of NGS processes in the diagnostics laboratory of the UMCG. The goal of the model is to optimize the tactical planning strategies for the laboratory.

The action space is developed according to this goal. The most important action to consider, is starting a process. At the start of each day, the decision can be made to start a process. When the decision to start a process is made, it should also be decided how many samples are put in the process, and of which type these samples are. Besides the decisions concerning the start of a process, there are actions possible when the processes are already running. Processes can be paused and put in the freezer, and be further processed later. It is even possible, under certain conditions, to merge pro- cesses.

The moment on which these decisions are made, defines the decision epochs of the model. But, mono- tonicity of the model has to be taken into account; every period between two decision epochs has to be of equal length. Hereto, all decision making is moved to the start of the day. This gives one decision epoch every day, with a period length of one day.

At every decision epoch, the state of the system should describe everything that is necessary to de- scribe the systems possible actions and transition probabilities. Since the technicians of the laboratory only work on weekdays and not on weekends, the day of the week has to be in the state description.

Furthermore, all information about the already running processes has to be put in the state description.

This includes how many samples are in the processes, of which type these samples are and at which

task the processes are. Finally, also the queue has to be put in the state description. This information

(19)

should describe the queue size for every sample type.

Most of the transitions are fixed. The probability for going from one day to the next is clearly one, if the next day is the following weekday, and zero otherwise. The transition probabilities for the queue depend on the chosen action and the arrival rates. This only leaves the transitions concerning the states of the processes. These transitions are harder to describe, because they are affected by the planning of the different tasks in the processes.

The rewards can be modified according to the desired performance criterion. When the only interest is minimizing the sojourn time of the samples, this can be taken as a cost. When the interest also lies with managing resources, costs for starting a process can be added.

3.3 Mathematical model

The model description of the decision problem, described in Section 3.2, can be transformed into a mathematical model. The mathematical model will be described by means of the five elements of Section 3.1: decision epochs, states, actions, transition probabilities, and rewards.

The decision epochs of the MDP

Decisions can be made once a day, at the start of a day. This gives a discrete set of decision epochs with infinite horizon.

T = {1, 2, . . .}

The states of the MDP

A state of the MDP is denoted by

s = (d, n

_p

, k

_p

, c

_p

, q) ,

where d denotes the day of the week; n

p

denotes the number of samples in every running process p, k

p

denotes the type of every process: does it contain regular NGS samples, SSID samples or also Rush samples. The index c

p

describes the condition of the process: which step of the NGS process it is in, or if it is paused. The last element, q = (q

NGS

, q

_SSID

, q

_RUSH

), denotes the size of the queue, for every sample type.

The options for d are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday.

The options for n

p

are: 8, 16, 24, 32, 48 or 96 for every process.

The options for k

p

are: “

NGS

”, “

SSID

” or “

RUSH

” for every process.

The options for c

p

are: all the steps described in Section 2.3 and pauzed.

The options for q are: 1, 2, . . . for every sample type.

Actions in the MDP

At the start of every day, several decisions can be made. First of all, the decision to start a process can be made. The decision should include the size and the type of the process. There are also decisions to be made about the already running processes. They may run for another day, or be paused, or be merged.

An action is denoted by a = (a

n

, a

p

), and consists of two elements. The first element, a

n

, describes the action taken with regard to starting a new process. The second element, a

p

, describes the action taken for every already running process p. The elements are defined as described below.

a

n

∈ (

(x

_NGS

, x

_SSID

, x

_RUSH

) | x

v

∈ N, X

v

x

v

∈ {8, 16, 24, 32, 48, 96}, x

_SSID

(x

_NGS

+ x

_RUSH

) = 0 )

,

a

_p

∈ { Pause, Resume, Merge, Do nothing } .

The set actions A is formed by all possible actions a. Depending on the state of the system, some actions are available and others are not. The available actions per state are described by the set A

s

.

Page 11

(20)

Transition probabilities of the MDP

The transition probabilities are

P

a

(s, s

⁰

) = Pr (s

t+1

= s

⁰

|s

t

= s, a

t

= a) , (3.2) where

s = (d, n

p

, k

p

, c

p

, q) , (3.3)

s

⁰

= d

⁰

, n

⁰_p

, k

⁰_p

, c

⁰_p

, q

⁰

. (3.4) The transition probabilities can be seen as a combination of several, easier to define, probabilities.

Those will be listed below.

The transitions for going from one day to another day are the easiest to define.

P (d, d

⁰

) = Pr (d

t+1

= d

⁰

|d

t

= d) =

( 1 if d

⁰

= (d + 1) mod 7, 0 otherwise.

The transitions that happen with regard to the arrival of samples depend on the arrival distribution.

P

a

(q, q

⁰

) = Pr (q

t+1

= q

⁰

|d

t

= d, q

t

= q, a

t

= a) =

( 1 if (d = 0 or d = 6) and q

⁰

= q γ(q

⁰

− q − a) otherwise

where γ(y

NGS

, y

_SSID

, y

_RUSH

) is the arrival distribution, that describes the probability of y

v

new samples being available for processing at the start of a new weekday for every sample type.

The transitions concerning the already running processes remain undefined. For knowing when a pro- cess is finished, or in which step of the NGS process it is, a schedule is needed by which the processes are executed. This problem is discussed more elaborately in Section 3.4

Rewards in the MDP

The rewards of the MDP consist of different elements. A part for processing, queueing and starting processes can be distinguished.

r (s, a) =

( −P

_processing

P

p

n

_p

− P

_queue

P

v

q

_v

if a

n

= (0, 0, 0),

−P

processing

P

p

n

p

− P

queue

P

v

q

v

− P

rush

q

RUSH

− P

start

otherwise, where

P

processing

= price for the time samples are in the process, P

_queue

= price per sample for a day in the queue, P

_rush

= extra costs for rush samples in the queue, P

_start

= price for starting a process.

3.4 Implementation difficulties

The model description of Section 3.2 is impossible to transform to a mathematical model without simpli- fications. Some simplifications turn out to be essential for formulating the problem as an MDP. Additional simplifications are necessary, to reduce the model size and make implementation and solving possible.

A big difficulty with the development of the Markov decision problem, turns out to be modelling the tran-

sition probabilities. When multiple processes run concurrently, the running time of these processes is

dependent on the chosen schedule. For example, the work on one of the processes can be postponed

to decrease the running time of another process. The scheduling decisions have to be incorporated

in the MDP model, or an optimal schedule has to be determined after every decision epoch. Both

options will make the result of the MDP practically useless, since the diagnostics laboratory does not

want to change their schedule after every decision. Moreover, incorporating all these options into the

MDP would lead to a model that is very difficult to formulate, and too big to implement. The problems,

(21)

concerning the running of multiple processes, are eliminated by creating a schedule for the processes.

Actions to start a process can only be taken if they are in agreement with the created schedule. The processes are executed according to the schedule, providing a guideline on when processes are fin- ished.

Besides the problems with the transition probabilities, other complications can occur. The state and action space of the MDP model are very large. Simplifications have to be made to reduce the problem.

Otherwise, solving the model with a computer will be impossible.

3.5 Solution heuristic

A solution heuristic is developed to eliminate the problems with formulating the MDP model. The heuris- tic will form a three phase approach. In phase 1, a static schedule is developed. Processes can be started and executed according to this schedule. This makes it possible to define the transition probabil- ities. In phase 2, the simplified MDP is formulated, with scheduling decisions and transition probabilities according to the schedule of phase 1. Finally, phase 3 is added, to evaluate the results of the MDP.

3.5.1 Phase 1: MILP

Phase 1 of the three phase approach is developed to eliminate the scheduling difficulty. In phase 1, a cyclic schedule is created with the use of an MILP. The schedule will contain as much processes as possible, starting on different days of the week. The MDP model can choose which processes to start, but the execution of the processes will be according to the schedule. A detailed discussion of phase 1 can be found in Chapter 4.

3.5.2 Phase 2: MDP

In phase 2, an MDP model finds an optimal planning policy for a given instance of the problem. An instance describes the arrival rates of samples and the cost influencing the decision making. The available actions and transition probabilities are defined, according to the schedule developed in phase 1. Modelling real world problems usually results in very large mathematical models. To make sure the mathematical model can be implemented and solved with a computer, several simplifications have to be made. These simplifications effect how accurate the model represents the real world. Due to simplifications, information is lost about how the found policy will behave in the real world. To evaluate the behaviour of the policies in a model that looks more like the real world problem, a simulation model is developed. The MDP model is more elaborately discussed in Chapter 5.

3.5.3 Phase 3: Simulation

Finding an optimal solution for a system, requires evaluating a lot of different realisations of that system.

A simulation model only looks at one realisation of the system. Therefore, a simulation model can be more detailed than an optimization model, while staying small enough to implement. This fact is exploited in phase 3. A simulation model is developed to represent the real world problem, with more detail than the MDP model does. The behaviour of the policy in the real world can be evaluated more accurately with the simulation model.

Different instances in the MDP model result in different policies. Most of the time, policies found by an MDP model are not easy to implement in the real world. The simulation model provides a tool to compare the policies of the MDP model with simplified policies. The simplified policies should be easier to use by the laboratory, but possibly the performance of these policies is worse. The simulation model is discribed in Chapter 6.

Page 13

(22)

(23)

Chapter 4 Phase 1: MILP

This chapter describes phase 1 of the three phase approach. It describes the Mixed Integer Linear Program, used for the development of the cyclic NGS schedule. First, Section 4.1 gives the necessary theoretical background information about MILP models. A general description of the model follows in Section 4.2. Section 4.3 describes simplifications of the general model description, that make sure the size of the problem is manageable. Finally, Section 4.4 describes the mathematical formulation of the problem.

4.1 Method description

A linear programming problem (LP) is an optimization problem with the following aspects. [Winston, 2004, p. 53]

• A linear function of the decision variables is attempted to be maximized or minimized. This function is called the objective function.

• The values of the decision variables must satisfy a set of constraints. Each constraint must be a linear equation or linear inequality.

• A sign restriction is associated with each variable.

Mathematically, this has the following form:

minimize c

^T

x

subject to a

i

x = b

i

for i ∈ I, a

_i

x ≥ b

_i

for i ∈ [n] \ I, x

j

≥ 0 ∀j ∈ J.

(4.1)

Here, I ⊆ [n] and J ⊆ [d], where n is the number of equations and d the number of variables. The vectors a

1

, . . . , a

_n

, c and the numbers b

1

, . . . , b

_n

are given, while x

1

, . . . , x

_d

are variables. If j / ∈ J , then the variable x

j

is called unrestricted.

In a mixed integer linear program (MILP), some of the variables are required to be integrals, while the other variables can assume any real value.

An x ∈ R

^d

is called a feasible solution of (4.1) if it fulfils all constraints. It is called an optimal solution if it minimizes c

^T

x among all feasible solutions.

4.2 Model description

The output of the MILP model, designed for this research, will be a planning. This planning should state the starting time of every task, for multiple processes. Therefore, the variables of the MILP are the start time of every job of every process. The goal of the MILP model is to optimize the planning of all processes with respect to the objective function. The objective function will contain the total processing time of all processes, but also the processing time of the individual processes.

The planning has to comply with certain restrictions. One of the most obvious restrictions concerns

Page 15

(24)

the ordering of the tasks. The tasks are the same as described in Section 2.3 and the ordering must be the same as the flow of the samples described there. The working hours of the employees restrict the planning, since active work on the tasks can only be done during office hours. The planning is further restricted by the availability of the machines. Since only one task of one process can be on the same machine at the same time, efficient sharing of the machines is a great concern when making a planning. Furthermore, the work on the samples imposes some restrictions. For certain treatments, the technicians use fluids that need defrost time. The tasks involving this treatment can therefore not happen at the start of the day, since the fluids need some time to defrost first. Also, certain tasks have to be followed by the next task on the same day, since otherwise the samples will deteriorate. These restrictions will form the constraints of the MILP, together with some extra constraints that are needed for the proper mathematical formulation of the problem.

4.3 Model simplifications

Solving a real world problem with a mathematical model, almost certainly requires simplifying the prob- lem. The art of modelling lies in choosing the simplifications in such a way that implementing and using the mathematical model becomes easier, while a good representation of the reality is maintained. After all, the results obtained with the model have to stay useful.

The first simplification, necessary for implementing the planning model, concerns time. As described in Section 4.2, the MILP contains variables that define the start time of the tasks. If the problem is handled with continuous time, the possibilities for the starting time would be infinite and the problem cannot be implemented in a computer. Therefore, time periods are introduced. Tasks of the NGS process can only start at the beginning of such a period, and the duration of the tasks must be a number of periods.

Here, the size of the time periods is chosen to be 20 minutes, because the processing time of most tasks is close to a multiple of 20 minutes.

Another necessary simplification is to divide the work of the NGS process into tasks. Not every act of a technician for the NGS process can be scheduled. Instead, several acts are grouped and form a task.

Only the generalized tasks will be put in the schedule. The acts are grouped together in such a way that only one bottleneck machine is used in every task. Furthermore, some acts are left out. There are a lot of steps in the NGS process that serve as preparation for other steps. Most of these preparation steps are performed during the processing of another task, when there is some waiting time. Since the interest for the planning in this research lies only in a general planning and the throughput time of the samples, these preparation steps are omitted.

The scheduling of the tasks is simplified further, by disregarding several practical implications such as break times, meetings and holidays. The scheduling model assumes that, every weekday from 8.00 a.m. to 4.30 p.m., every period is available for working on the NGS process.

There are also simplifications concerning the resources of the NGS process. For the execution of the NGS processes, several machines are used and employees are needed. It takes too much time to gather all data necessary for incorporating these aspects in the planning model. Moreover, the planning model itself will become to big and solving it will be very time consuming. Therefore, it is assumed that the machines and necessary employees are available at all times.

The last simplification is introducing fixed starting days for the NGS processes. By fixing the starting day of every process, a truly optimal schedule will not be found. However, for the purpose of this research, the interest lies more with finding a cyclic schedule. Fixing the starting days can help to find such a schedule. Moreover, it significantly improves the running time of the model.

4.4 Mathematical model

The planning model used for this research is a simplified model from [Gerritsma, 2015]. The model is

adapted to incorporate the simplifications of Section 4.3. This results in smaller sets and less variables

and constraints, allowing bigger instances to be solved.

(25)

Sets

Set Description Corresponding index

P p

1

, . . . , p

k

, . . . , p

_{|P |}

Processes p, q

J j

1

, . . . , j

k

, . . . , j

_{|J |}

All Jobs i, j

J

p

j

p,1

, . . . , j

p,k

, . . . , j

_p,|J_p_|

Jobs in process p ⊂ AllJobs i, j T t

1

, . . . , t

_k

, . . . , t

_{|T |}

Periods t, u

D d

1

, . . . , d

_k

, . . . , d

_|D|

Days d

T day

_d

Periods per day ⊂ Periods t

M m

1

, . . . , m

_k

, . . . , m

_{|M |}

Bottleneck machines m

Parameters

Processing time of a job (in periods) P T

_p,j

Active time within processing time of a job P T A

_p,j

Waiting time within processing time of a job P T W

_p,j

Indicator if job j has some active processing time A

_p,j

Indicator if job j has some waiting time W

p,j

Indicator if job i has to start before job j O

p,i,j

Indicator if job i and job j have to take place on the same day SD

p,i,j

Indicator if samples are occupied in a job SB

p,j

Indicator if the ending of a job requires activity (and therefore has to be inside working hours) E

p,j

Indicator if job j is processed on one of the bottleneck machines B

p,j,m

Indicator if job j needs to defrost some fluids before it can start F

p,j

Indicator if period represents a time outside working hours IP

t

Indicator if period represents a time during the first half hour of the day F P

t

Indicator if a certain day is a workday D

d

Number of periods that span a workday N

Weight parameter that eliminates multiple optimal solutions Y

_t

Weight parameter that indicates the importance of the processing time of an individual process Z

_t

Lower bound for the starting time of a process L

_p

Upper bound for the starting time of a process U

_p

Variables

Indicator if job j of process p starts in period t s

p,j,t

∈ {0, 1}

Starting time of job j in process p st

p,j

∈ N

Constraints

1. Every job belonging to a certain process starts exactly once.

X

t

s

p,j,t

= 1 ∀(p, j) s.t. j ∈ J

p

2. Define the starting time of job j in process p.

st

p,j

= X

t

t · s

p,j,t

∀(p, j) s.t. j ∈ J

p

3. The start of job j has to be after the active time of job i, if job j comes after job i, and if the samples are needed in job j, then the waiting time of job i also has to be over.

st

p,j

≥ st

p,i

+ P T A

p,i

+ P T W

p,i

· SB

p,j

+ (1 − A

p,i

) · (1 − SB

p,j

) ∀(i, j) s.t. O

p,i,j

= 1 4. All the active work has to take place during the day.





P T A_j+P T W_j·Ej−(Aj+W_j·Ej−Aj·Wj·Ej)

X

k=0

IP

_t+k



 · s

_j,t

= 0 ∀(p, j, t) s.t. j ∈ J

_p

Page 17

(26)

5. Designated jobs have to take place on the same day (as described in SD

p,i,j

).

st

p,j

− st

p,i

≤ N ∀(i, j) s.t. SD

p,i,j

= 1 st

_p,j

− st

_p,i

≥ −N ∀(i, j) s.t. SD

_p,i,j

= 1 6. A job that needs fluids to defrost cannot start in the first half hour of a day.

s

p,j,t

+ F P

t

≤ 1 ∀(p, j, t) s.t. j ∈ J

p

, F

j

= 1

7. There cannot be two jobs at the same time on a bottleneck machine, from different processes.

s

p,i,t

+ s

q,j,u

≤ 1 ∀(p, q, i, j, t, u, m) s.t. B

p,i,m

= B

q,j,m

= 1, p 6= q, t ≤ u ≤ t + P T

_p,i

− 1 8. Define the order of the processes.

st

p_k,j_pk,1

≤ st

p_k+1,j_pk+1,1

− 1 ∀k s.t. 1 ≤ k < |P | st

_p_k_,j

pk,|Jpk |

≤ st

_p_k+1_,j

pk+1,|Jpk+1 |

− 1 ∀p s.t. p < q 9. Fix the starting day of the processes.

st

_p,j₁

≥ L

p

∀p st

p,j₁

≤ U

p

∀p

Objective

Minimize st

p_{|P |},jp|P |,|Jp|P | |

+ P T

p_{|P |},jp|P |,|Jp|P | |

− st

p₁,j_p1,1

− 1 (4.2) + P

p,t

X

t

· b

p,t

+ P

p,j,t

Y

t

· s

p,j,t

+ Z · P

p

st

p,j_p,|Jp|

+ P T

p,j_p,|Jp|

− st

p,jp,1

− 1

(27)

Chapter 5 Phase 2: MDP model

This section describes phase 2 of the three phase approach. In this phase, a Markov Decision Problem model is developed. This model should assist in developing a planning strategy for the NGS processes.

Theoretical background information about MDP models can be found in Section 3.1. The model descrip- tion without simplifications can be found in Section 3.2. Section 5.1 gives the necessary simplifications to make formulating the problem as an MDP model possible. Section 5.2 describes the mathematical model of the simplified MDP.

5.1 Model simplifications

As discussed in Section 3.4, it is not possible to formulate the problem as an MDP model without the use of simplifications.

The first encountered problem is the formulation of the transition probabilities. In the model, it is pos- sible for multiple NGS processes to run concurrently. However, no decisions are made on how to plan these processes. Running multiple processes gives a lot of options for planning these processes. For example, one process can be paused, to speed up the running time of another process. These options cannot be incorporated into the MDP model, without making it too big to implement. The scheduling problems are eliminated by the use of a schedule. The MDP can only start processes according to this schedule. Moreover, the processes are executed according to this schedule, providing a guideline for formulating the transition probabilities.

Using the schedule to model the problem as an MDP, also effects other parts of the modelling. Where otherwise, all running processes should be registered in the states of the model, now only the processes that effect the action space have to be documented. The created schedule is a weekly schedule, with two main processes and two extra processes. These extra processes may only be started under certain conditions. Therefore, the state space should include a documentation of these processes. For more detail about the schedule, see Section 7.1.

Needing the schedule to formulate the problem as an MDP, restricts the action set more than just the starting days. In the real world, it is possible to interrupt or pause processes, batch processes together of have failures during the process. All these options are omitted in the model, since they will disrupt the schedule.

More simplifications are necessary, to keep the model size tractable. The action space is reduced fur- ther, by introducing an upper bound on the possible batch sizes. Instead of allowing the processes to start with 8, 16, 24, 32, 48 or 96 samples, only batch sizes of 8, 16 or 24 are allowed. Furthermore, sample types are ignored and arrivals are modified. Instead of letting every individual sample arrive, only batches of samples will arrive. As a consequence, the queue also only has to document batches of samples. Upper bounds are also introduced for the queue. Putting upper bounds on the actions and queue, calls for further modification of the model. When the queue is near its upper bound, there cannot arrive more batches of samples then the difference between the upper bound and the queue size. The arrival rates are modified to accommodate this. Also, the decision to start a process with 3 batches is changed, to represent starting a process with all batches in the queue. This makes it harder for the queue to become full.

Page 19

(28)

5.2 Mathematical model

The model description of the decision problem, described in Section 3.2, together with the simplifications of Section 5.1, can be transformed into a mathematical model. The mathematical model will again be described by means of the five elements of Section 3.1: decision epochs, states, actions, transition probabilities, and rewards. Some elements are directly copied from the mathematical model formulation of Section 3.3. Other elements have to be modified, to incorporate the simplifications.

The decision epochs of the MDP

Decisions can be made once a day, at the start of a day. This gives a discrete set of decision epochs with infinite horizon.

T = {1, 2, . . .}

The states of the MDP

A state of the MDP is denoted by

s = (d, k, q) ,

where d denotes the day of the week; k describes the type of process running that intervenes with starting other processes; q denotes the number of sample batches in the queue.

The options for d are: Monday, Tuesday, Wednesday, Thursday, Friday, Saturday, Sunday. Denoted as {1, 2, 3, 4, 5, 6, 0}.

The options for k are: A process that has started on Thursday, a process that has started on Friday, or none. Denoted as {4, 5, 0}

The options for q are: 1, 2, ..., Q, where Q serves as an upper bound to keep the state space small.

Actions in the MDP

An action states how many batches of samples will be started on a process. Zero batches means the process is not started. A batch contains eight samples, so the action 1 means to start a process with eight samples. Since processes can only start on certain days, and the processes started on Thursday or Friday can at most contain two batches, the set A

s

differs from the set A.

A = {0, 1, 2, 3},

A

s

=



 

 

 

 

{0} if d = 2 or d = 6 or d = 0, {a | a < q} if q < 3 and (d = 1 or d = 3), {3} if q ≥ 3 and (d = 1 or d = 3),

{a | a < q and a ≤ 2} if (d = 4 and k 6= 5) or (d = 5 and k = 0)

Transition probabilities of the MDP The transition probabilities are

P

_a

(s, s

⁰

) = Pr (s

_t+1

= s

⁰

|s

_t

= s, a

_t

= a) , where

s = (d, k, q) , s

⁰

= (d

⁰

, k

⁰

, q

⁰

) .

The transition probabilities can be seen as a combination of several, easier to define, probabilities.

Those will be listed below.

The transitions for going from one day to another day are the easiest to define.

P (d, d

⁰

) = Pr (d

_t+1

= d

⁰

|d

_t

= d) =

( 1 if d

⁰

= (d + 1) mod 7,

0 otherwise.

(29)

The transitions that happen when starting a process are a little more involved. The type of process that is running, is only tracked for the processes that start on Thursday or Friday, so a state change for the types will only happen if a positive action is chosen on those days.

P

a>0

(k, k

⁰

) = Pr (k

t+1

= k

⁰

|d

t

= d, a

t

= a) =



 



 



1 if a > 0

and k

⁰

= d and (d = 4 or d = 5), 1 if a > 0 and k

⁰

= k

and d 6= 4 and d 6= 3, 0 otherwise.

The transitions that happen when the decision is not to start a new process, work in a similar way as the transitions for starting a process. Since only the processes that start on Thursday or Friday are tracked, only transitions describing the end of those processes have to be defined.

P

a=0

(k, k

⁰

) = Pr (k

t+1

= k

⁰

|d

t

= d, a

t

= a) =



 



 



1 if a = 0 and d = 4

and ((k = 4 and k

⁰

= 0) or (k 6= 4 and k

⁰

= k)), 1 if a = 0 and d = 5

and ((k = 5 and k

⁰

= 0) or (k 6= 5 and k

⁰

= k)), 0 if a = 0 and d 6= 4 and d 6= 5 and k = k

⁰

,

0 otherwise.

Last, there are the transitions that happen with regard to the arrival of samples.

P

a

(q, q

⁰

) = Pr (q

t+1

= q

⁰

|d

t

= d, q

t

= q, a

t

= a) =



 

 

 

 

1 if (d = 0 or d = 6) and q

⁰

= q γ(q

⁰

) if (d ≥ 1 or d ≤ 5) and a = 3

and q

⁰

≤ 3,

γ(q

⁰

− q + a) if (d ≥ 1 or d ≤ 5) and a < 3 and q

⁰

− q + a ≤ 3 and q

⁰

< Q, P

3

x=q⁰−q+a

γ(x) if (d ≥ 1 or d ≤ 5) and a < 3 and q

⁰

− q + a ≤ 3 and q

⁰

= Q,

0 otherwise,

where γ(x) is the arrival distribution, that describes the probability of x new samples being available for processing at the start of a new weekday.

The transition probabilities can now be defined as follows:

P

_a

(s, s

⁰

) = P (d, d

⁰

) · (P

_a>0

(k, k

⁰

) + P

_a=0

(k, k

⁰

)) · P

_a

(q, q

⁰

). (5.1)

Rewards in the MDP

r (s, a) =



 

 

 

 

−P

p

· q − P

s

if a = 3

−P

p

· a − P

q

· (q − a) − P

s

− P

5

if a > 0 and d = 5

−P

p

· a − P

q

· (q − a) − P

s

if a > 0 and d 6= 5

−P

q

· q otherwise

where,

P

p

= price for the time samples are in the process, P

q

= price per sample for a day in the queue, P

_s

= price for starting a process,

P

₅

= extra price for the process starting on Friday.

Page 21

(30)

(31)

Chapter 6 Phase 3: Simulation

This section describes the final phase of the three phase approach. In this phase, a simulation model is developed. This simulation model will help evaluate the results of the MDP model of Chapter 5.

Section 6.1 gives some theoretical background information. Section 6.2 describes the simulation model, including the modelling assumptions and simplifications.

6.1 Method description

Mathematical simulation can be defined as:

Experimentation with a simplified imitation (on a computer) of an operating system as it progresses through time, for the purpose of better understanding and/or improving that system. [Robinson, 2014,

p. 5]

The simulation method used in this research is called Discrete Event Simulation (DES). With this method, the system is represented by entities flowing from one activity to another. The activities are separated from each other by queues. These queues result when the next activity for an entity can’t start directly after the preceding one. Discrete event simulation is event-oriented simulation, which means it only inspects those points in time, at which events take place within the simulation model.

Developing a simulation model is a cyclical process. Figure 6.1 shows an outline of a simulation study.

The boxes are the key stages of the study, the arrows are the activities that enable movement between the stages.

Real world (problem)

Conceptual model

Computer model Improvements/

understanding

Conce ptualm

od ellin

g

ModelCoding Ex

pe rime

ntation Im

plementation

Next Generation Sequencing in the UMC Groningen

August 14, 2015

MASTER THESIS