Incident End Time Prediction During the Incident Recovery Process

(1)

MASTER’S THESIS

Incident End Time Prediction During the Incident

Recovery Process

T. Kraai (Tim) February 2021

Supervisors University of Twente

Dr. E. Topan

Dr. C.G.M. Groothuis-Oudshoorn

Supervisors ProRail

Saskia Wevers

Martijn van der Weide

(2)

i

Management Summary

Railway incidents can have a big impact on train operations. By communicating prognoses of the incident end time, ProRail informs railway operators and travellers about the expected end time of an incident. For each incident type, a decision tree is developed by the “Consultants in Qualitative Methods” company (CQM) which determines an initial prognosis during the intake of an incident. At the beginning of an incident, estimating the end time of an incident is difficult because limited information is available. For a prognosis to be reliable, it must be given in time (35 minutes before the end of the incident) and it has to be precise (the prognosed end time is later than the actual end time). A reliable prognosis of the end time of an incident makes it possible to create an overlap between finishing the recovery activities and planning the restart of the trains. Unreliable prognoses of the incident end time lead to delay. Because of the importance of reliable prognoses and the complexity of the incident recovery process, prognoses during the incidents are currently given by an incident coordinator based on expertise. A reliable prediction of the incident end time determined with a data-based model, can support the coordinator with giving in time and precise prognoses. Therefore, the goal of this project is defined as:

Create a data-based model based on literature and previous research at ProRail that provides predictions of the incident end time to support in time and precise prognoses.

Method

Data analysis of the current reliability of the prognoses shows that most prognoses are precise

but are not given in time. This project focusses on the incident type collision with a person

because of the impact that this incident type has on the train operations and the amount of data

that is available during this type of incident. The data of these incidents is pre-processed by

removing outliers and imputing missing data. The review of the literature on prediction models

and previous research at ProRail show the use of machine learning methods for the

development of a data-based incident end time prediction model. Based on this, several

machine learning methods are selected. The prediction performance of these methods is

determined with cross-validation on all the data of incidents from June 2017 to November

2020. The prediction performance on all data showed the highest performance for the eXtreme

Gradient Boosting (XGBoost) method. This method is used to develop a model from which

feature importance on all data is analysed. With the model, predictions are determined during

chronological stages of the incident. For each stage, feature importance is analysed to identify

features with a big impact on the predictions and 90% prediction intervals are generated to

communicate the uncertainty of the predictions. The XGBoost model is also used to predict the

incident end time when new information about the incident becomes known for three new

incidents in December 2020. Lastly, prognoses overpredict the actual duration of an incident

to be precise. The model however predicts the incident end time by penalizing under- and

overprediction evenly. Thus, shifting of the distribution of the residuals is proposed to achieve

the desired percentage of overprediction.

(3)

ii

Results

The feature importance on all data shows that the updated and final prognoses have a high impact on the prediction accuracy of the developed model. Although the prognoses are important, a clear improvement in predictions and a decrease in the width of the prediction intervals can be seen at stage 2, when the first people arrive at the incident location. After this stage, the predictions slowly improve when more information becomes available.

The features that show the highest impact on the prediction performance of the model at stage 2 are the number of deceased victims and the degree of fragmentation of the body of the victim(s). Important features at other stages are the estimated arrival time and the prognosed finish time of the mortician and the actual arrival time of the AL.

At the final stage, when all data about the incidents is included, a clear split can be seen in the incidents with wide prediction intervals between incidents with a duration < 100 minutes and incidents > 200 minutes. The number of deceased victims is found to be the main feature to explain this split in duration. During the new incidents, the predictions of the incident end time converges to the actual incident end time. The biggest improvement in prediction for these incidents appears when the degree of fragmentation and the prognoses become available.

The predictions from the XGBoost model are given in time because this model predicts the incident end time at every moment during the incident. The incident end time is, however, predicted with even under and overprediction while prognoses focus on overprediction.

Therefore, the predictions from the XGBoost model cannot directly be compared with the prognoses of the AL. However, a desired percentage of overprediction can be achieved by shifting the distribution of the residuals of the predictions to obtain precise prognoses.

Recommendations

This project shows that an XGBoost model can be used for collision with a person incidents to support in time prognoses. Further research is recommended to determine if the XGBoost model can also support in time prognoses for other incident types.

For the incident type collision with a person, the degree of fragmentation shows a clear improvement in the prediction performance. The CQM decision tree for collision with a person includes the feature degree of fragmentation. However, during the intake, this feature is not known and therefore this CQM decision tree is currently not implemented. It is recommended to ProRail to implement this CQM decision tree at the moment the degree of fragmentation becomes available. Other features that could be used to extend this model are the arrival time of the AL, the estimated time of arrival of the mortician and the prognosed end time of the work of the mortician.

The updated and final prognoses show to be important features for prediction. Therefore, it is recommended to ProRail to keep these prognoses. Further analysis of predictions from the XGBoost model during new incidents can provide more insight into important moments for which a data-based model can support the AL with extra prognoses updates.

Lastly, overprediction of the actual duration of the incident is desired to prevent delay.

However, high overprediction leads to additional waiting time before trains can be restarted.

Therefore, it is recommended to ProRail to specify the penalty for different levels of under-

and overprediction. With these levels a custom objective function can be defined, with which

the XGBoost model can determine a precise prognosis directly.

(4)

iii

Preface

It is a pleasure to present my master’s thesis to you. This thesis is the result of half a year of research at ProRail and marks the end of my student time at the University of Twente. I am very thankful for the opportunity to perform my final internship at ProRail. During my internship, I have spoken with many approachable people that were all very open to help and share their knowledge via video calls. This made me feel involved in the process while working from home.

First, I would like to thank my daily supervisors at ProRail, Saskia Wevers and Martijn van der Weide. They motivated me to explore the incident process and the company by inviting me to meetings of other projects and showed great interest in the findings of my research. Secondly, I would like to thank my supervisors of the university, Engin Topan and Karin Groothuis- Oudshoorn for their advice, comments and feedback on my thesis. As sparring partners, they helped me to improve my understanding of the used methods.

Finally, I am very grateful for the support of my family, who challenged me and helped with improving the quality of my thesis, as well as my friends and girlfriend for the mental support and the interest they showed in the topic.

Tim Kraai

Utrecht, February 2021

(5)

iv

List of Figures

Figure 1. Problem cluster 3

Figure 2. Report Structure 4

Figure 3. Incident recovery process bathtub model 7

Figure 4. Decision tree section malfunction (CQM, 2019) 8

Figure 5. Swimlane diagram of the incident recovery process 10

Figure 6. Final prognosis not in time 10

Figure 7. Prognosed end time before the actual end time 11

Figure 8. Actual incident end time before prognosed end time 11

Figure 9. Overview incident types in time prognoses 13

Figure 10. Overview incident types precise prognoses 14

Figure 11. Defect material plots. Top left: Distribution duration. Top right: In time. Bottom

left: Precise. Bottom right: In time vs Precise 15

Figure 12. Collision with a person plots Top left: Distribution duration. Top right: In time.

Bottom left: Precise. Bottom right: In time vs Precise 16

Figure 13. Supervised machine learning process 20

Figure 14. Bayesian Network with unknown cause (Zilko, 2017) 24 Figure 15. Bayesian Network with known cause (Zilko, 2017) 24 Figure 16. Example of an Artificial Neural Network (Nielsen, 2015) 25 Figure 17. Grid search vs Random search for hyperparameter optimization 26

Figure 18. Quantile loss per error 27

Figure 19. Correlation with Incident duration 32

Figure 20. Degree of fragmentation boxplot 32

Figure 21. p-values ANOVA with incident duration 33

Figure 22. Nested Cross-validation 36

Figure 23. Performance of the modelling methods 36

Figure 24. XGBoost Permutation feature importance 37

Figure 25. Prediction performance at different stages during the incidents 38

Figure 26. Feature importance stage 2 39

Figure 27. Feature importance stage 9 39

Figure 28. Prediction intervals stage begin 40

Figure 29. Prediction intervals stage 2 40

Figure 30. Prediction intervals stage final 40

Figure 31. Decision Tree showing most important feature for split in short and long incidents.

Value = duration 40

Figure 32. CQM decision tree collision with a person 41

Figure 33. Timeline incident 1 42

Figure 34. Timeline incident 2 42

Figure 35. Timeline incident 3 43

Figure 36. Final prognosis residuals 44

Figure 37. Residuals incident end time prediction with and without overprediction 44

Figure 38. Correlation matrix numerical features III

(6)

v

List of Tables

Table 1 Tasks per role in the incident recovery process 10

Table 2 Features per incident type (Wemelsfelder, 2019) 19

Table 3: Wrapper RFE features for Support Vector Machine and Neural Network IV

Abbreviations

In this thesis, many abbreviations are used. Because ProRail is a Dutch company, an English translation is provided. But to prevent confusions, the Dutch abbreviations and Dutch name will be given in brackets. The list shows an overview of the abbreviations and gives their English translations.

AL Algemeen Leider (General Leader) DT Decision Tree

GB Gradient Boosting Lasso Lasso Regression

MKS Meldkamer Spoor (Railway Alarm Room) ANN Artificial Neural Network

PI Prediction Interval RF Random Forest

RFE Recursive Feature Elimination SVM Support Vector Machine

TIS Trein Incident Scenario (Train Incident Scenario)

TOBS Ten Onrechte Bezet Spoor (Train Vacancy Detection Failures) TRDL Treindienstleider (Railway Traffic Controller)

XGB eXtreme Gradient Boosting

(7)

vi

Management Summary i

Preface iii

Abbreviations v

1 Introduction 1

1.1 About ProRail 1

1.2 Problem introduction 1

1.3 Definition of problem and goal of this project 2

1.4 Methodology 4

1.5 Research Framework 5

1.6 Research scope 6

2 Incident recovery process 7

2.1 The ideal incident recovery process 7

2.2 Unreliable prognoses 10

2.3 Remarks about the incident process 12

2.4 Current reliability of final prognoses 12

2.5 Summary 17

3 Literature review 18

3.1 Previous prognosis research at ProRail 18

3.2 Machine learning 19

3.3 Data pre-processing 21

3.4 Feature selection 21

3.5 Modelling methods 22

3.6 Hyperparameter tuning 26

3.7 Model evaluation 26

3.8 Summary 28

4 Data pre-processing 29

4.1 Incident type selection 29

4.2 Data collection 29

4.3 Data pre-processing 30

4.4 Feature selection 31

4.5 Data preparation 34

4.6 Summary 34

5 Model development and results 35

5.1 Methods for prediction 35

(8)

vii

5.2 Prediction at incident stages 38

5.3 Prediction throughout incidents 41

5.4 Incident end time prediction to support reliable final prognoses 44

6 Conclusion 45

7 Discussion and recommendations 47

7.1 Discussion 47

7.2 Recommendations 48

References 49

Appendices I

(9)

1 1 Introduction

In this chapter, the role of ProRail in the railway system of the Netherlands is described, followed by the motivation for this project in Section 1.2. In Section 1.3, the problem statement and goal of this project are formulated. Section 1.4 describes the methodology for this project and the structure. In Section 1.5, the approach for solving the problem and research questions are defined. Lastly, the scope of this project is described in Section 1.6.

At ProRail many abbreviations are used. Because ProRail is a Dutch company, an English translation is provided. However, to prevent confusions, the Dutch abbreviation and Dutch name will be given in brackets. For example, General Leader (AL, Algemeen Leider). After the explanation, the Dutch abbreviations will be used throughout the report. A full list of all the abbreviations used can be found on page v.

1.1 About ProRail

The Netherlands has one of the world’s busiest railway networks. Every day 1 million people travel by train and 100.000 tons of goods are transported over the 7000 kilometres of track (ProRail, 2019a).

In the next 20 years, the population of the Netherlands is expected to increase by 1.6 million.

This increase is mostly expected in the Randstad, the urban area in the West of the Netherlands (Kooiman et al., 2016). For many of these people, the train will be a vital mode of transport to commute and travel (van Ammelrooy, 2020). This will lead to an increase of 25-40% of passengers. Besides the number of passengers, the transport of goods is expected to be doubled in 2040. These goods mostly originate from the port of Rotterdam and have to pass through the crowded Randstad to Germany and the rest of Europe (ProRail, 2019a).

ProRail B.V. is a private company, with the Dutch government as the only shareholder that facilitates the rail infrastructure in the Netherlands. Nearly 4000 employees working in different departments construct, maintain and improve the tracks, organize the train schedules, manage the traffic and respond to incidents.

The construction and maintenance of the rail infrastructure are not performed by ProRail itself, but by different rail contractors depending on the region. Railway operators for passengers and goods pay a fee to ProRail for the use of railways. To facilitate the growth in travellers and goods, ProRail is constantly improving the tracks and railway processes to offer a sustainable mode of transport (ProRail, 2019a).

1.2 Problem introduction

This project focuses on the duration of railway incidents. An incident is defined as a negative, unexpected and unforeseen event that can be troublesome (van Dale, 2019). Railways incidents can have a big impact on train operations. Trains might have to be rerouted or cancelled, which results in hindrance to travellers and goods. In 2019, 209 high impact incidents occurred with more than 10 hours of accumulated delay each (ProRail, 2019b).

When an incident occurs, ProRail determines an initial prognosis of the incident end time. This

initial prognosis and the information from the incident are used by the train traffic controller

(TRDL, Treindienstleider) to inform the trains about the location of the incident. The railway

operators use this information to replan their rolling stock and crew. The initial prognosis is

also used to inform travellers and change train schedules.

(10)

2 Incident end time prognoses are made at different moments. The initial prognosis is used as an indication and it is given during the intake when the incident is reported, based on the information that is available at that moment. During the incident, new prognoses called updated prognoses can be provided by the general leader (AL, Algemeen Leider), who is the incident coordinator from ProRail. Updated prognoses keep the involved parties informed. Near the end time of the incident, a highly certain final prognosis is communicated by the AL, to make it possible to start replanning the train schedule.

To inform the TRDL, railway operators and travellers correctly, the prognoses of the incident end time must be reliable. Each under- or overprediction results in waiting time before the trains can start. Therefore, reliable prognoses result in less waiting time and better information for the travellers.

The initial prognosis is currently determined with a decision tree per incident type. The parameters for this decision tree have been determined based on previous research at ProRail.

The updated and final prognoses are not determined with a decision tree, but given manually and are therefore largely based on the expertise of the AL. This results in differences at the moment in which the prognoses are given and differences in the reliability of these prognoses.

ProRail believes that more reliable prognoses of the incident end time can reduce waiting time before trains can start and can improve the quality of the information to the travellers.

1.3 Definition of problem and goal of this project

Problems with incident end time prognoses and their relations are bundled in a problem cluster in Figure 1. In this section, the problems in the problem cluster are described and the problem statement and goal of this project are defined.

Many different types of incidents occur, each requiring a special process for recovery. A split between the incidents can be made between technical and non-technical. In the technical incidents, a contractor is required to perform a repair to the railway infrastructure. In a non- technical incident, other actions must be performed for incident recovery (e.g., when a train is malfunctioning, the train has to be pulled away).

During an incident, the AL supervises the incident recovery process, communicates with the different parties involved and makes decisions at the location of the incident. During an incident, the AL also gives new prognoses for the end time of the incident. At ProRail a final prognosis is considered reliable if it is given in-time and if it is precise.

An in-time final prognosis makes it possible to create an overlap between the last work activities that must be performed to finish the incident, and the replanning of the crew and rolling stock to restart the train traffic. This overlap will reduce the waiting time before the start of the first trains after the incident. Currently, 35 minutes is set as the time needed for the replanning. Therefore, a final prognosis is in-time if it is given at least 35 minutes before the actual end time of the incident.

A final prognosis is precise if the last work activities of the incident are finished before the

time of the final prognosis. When the activities are finished after the final prognosis, the plan

made for the restart of the trains must be changed. This results in delay and unclear

communication to passengers. The activities that are finished before the final prognosis are less

problematic, unless they are finished far before the final prediction, because this results in

additional waiting time.

(11)

3 To give a final prognosis that is both in time and precise is complex because multiple features influence the incident duration. A feature is an attribute of the incident, such as the location or severity of the incident. The problem cluster in Figure 1 shows that the influence that these features have on the reliability of the prognosis is currently not known.

Figure 1. Problem cluster

In this project, prognosis is used for the prognosis given by the CQM decision tree and the AL.

Prediction is used for the prediction generated by the data-based model developed in this project.

A data-based model for prediction that incorporates the influence of features on the incident end time, can provide more reliable predictions. A reliable prediction can be used to determine a prognosis which meets the desired level of overprediction. This would make the prognoses less depended on the expertise of the individual AL.

The problem statement for this project is defined as:

ProRail does not use a data-based method to support in time and precise prognoses for the incident end time.

Therefore, the goal of this project is:

Create a data-based model, based on literature and previous research at ProRail that provides predictions of the incident end time to support in time and precise prognoses.

To develop a data-based model, the features that influence the reliability of the predictions will

be identified by data analysis of the current situation. The model will be tested for one incident

type to contribute to the question of whether a more reliable prediction of the incident end time

can be determined using a data-based prediction model.

(12)

4 1.4 Methodology

The Managerial Problem-Solving Method (MPSM) has been shown to be a useful method for solving business problems systematically while being able to include creative journeys to identify new and valuable alternatives (Heerkens & Winden, 2017). To solve the problem systematically, the MPSM consists of the following 7 steps (Heerkens & Winden, 2017):

1. Defining the problem

2. Formulating the problem approach 3. Analysing the problem

4. Formulating (alternative) solutions 5. Choosing a solution

6. Implementing the solution 7. Evaluating the solution

These steps serve as a guideline. It is also possible to make a loop and return to previous steps if a review is required. The structure of this report, outlined in Figure 2, follows the steps of the MPSM.

Figure 2. Report Structure

(13)

5 1.5 Research Framework

To solve the main problem, research questions are defined. These research questions structure the research process and are the questions that this project aims to answer.

The current process for incident recovery

Before focussing on incident end times, understanding the process of an incident, which parties are involved, and which steps are taken is important. Afterwards, it is also necessary to understand the process of prognoses and to define what makes a prognosis reliable. An analysis of the current reliability identifies the status and where an improvement is possible.

Based on the reliability and the impact on train operations, one incident type will be selected to focus on in this project. This results in the following five research questions:

• How is the process for incident recovery organized?

• How are incident end time prognoses currently determined?

• How reliable is the final prognosis currently?

• For which incident type does an improvement in the reliability have the biggest impact on train operations?

Previous research on incident duration prediction

To develop a data-based model for incident end time prediction, knowledge from previous research that has been performed at ProRail and internationally studies on incident duration prediction has to be considered. This leads to the following research questions:

• What previous research on incident duration prediction has been performed at ProRail?

• What research about incident duration prediction has been performed internationally?

Modelling methods

The aim of this master thesis is to build a model that can be used to provide a more reliable prognosis of the incident end time. From the prediction methods described in literature, the best method will be selected for incident end time prediction. To select the best method, performance measurement metrics and validation techniques to evaluate the performance are necessary. This results in the following research questions:

• What methods are proposed in the literature to develop a model for incident end time prediction?

• What metrics can be used to evaluate the performance of the identified methods?

• How can the performance of the developed model be validated and measured?

• Which method has the highest performance for the selected incident type?

Model development and performance

The best performing method for prediction will be used to develop a model for incident end time prediction. This model is applied during an incident to determine the prediction performance and which features influence the prediction. This leads to the following research question:

• What is the prediction performance of the developed model during incidents?

• Which features are important for prediction during incidents?

• Does the model support a more in time and precise prognosis?

(14)

6 1.6 Research scope

The scope of this project is the prediction of the end time of an incident during the incident.

The initial prognosis will not be researched, because previous studies at ProRail (see Section 3.1) showed that minimal improvement is possible in the reliability of the initial prognosis.

Interviews with an AL and other employees at ProRail working on prognoses suggested that the reliability of the final prognosis can still be improved.

Incidents that have an initial prognosis less than 60 minutes are not considered because a final prognosis cannot be given in time. This is because after the incident has been reported time is needed to assign an AL to the incident and for the AL to communicate with the TRDL to receive more information to base a new prognosis on. Therefore, if the prognosis has to be given 35 minutes before the end of the incident, the incident should be at least 60 minutes.

This project will focus on data-based decision support. The solution should serve as a support

to the AL to make more substantiated decisions. The working routine of the AL will be used

as a fixed process and will not be in the scope of this project.

(15)

7 2 Incident recovery process

In this chapter, the ideal incident recovery process from the start of an incident to the restart of trains is explained in Section 2.1. Then, the impact of unreliable prognoses is described in Section 2.2. In Section 2.3, remarks on the current incident recovery process are outlined. An analysis of the current reliability of the prognoses is performed in Section 2.4.

2.1 The ideal incident recovery process

The bathtub model can be used to represent the train traffic level during an incident (Ghaemi et al., 2017). This model consists of three phases for train traffic (see Figure 3). The first phase starts with the intake. The intake is when an incident is reported. Based on the information that is available from the intake, trains to the incident location get cancelled or are instructed to change tracks. The second phase starts after the schedules of trains to the incident location are adapted and ends when the plan to restart the train schedule is ready. The third phase starts with executing the restart plan and ends when the trains at the incident location are driving according to the original schedule again.

Throughout the incident recovery process, prognoses about the incident end time are made.

There are three types of prognoses: initial, updated and final. The time a prognosis is determined will be referred to as the time a prognosis is given. In this section, the prognoses are expected to be perfect. In practice, this is not always the case. These situations and their effects on the incident recovery process will be described in Section 2.2.

Figure 3. Incident recovery process bathtub model

Intake

The incident recovery process starts when an incident is reported to the Railway Alarm Room

(MKS, Meldkamer Spoor). This is the central point at which all information about the incident

is collected. With the available information that the reporter has of the incident, an intake form

is filled. Based on this information, the MKS has to decide about the urgency, schedule a time

for repair, alert government emergency services and determine the initial prognosis. The initial

prognosis serves as a first indication to the railway operators and the travellers to act upon.

(16)

8 In the first phase of the bathtub model, the trains on or next to the track where the incident occurred have to be cancelled, redirected or instructed to drive slowly. This is the responsibility of the TRDL. The initial prognosis is used by railway operators to communicate the travel information with their travellers and change the planning for rolling stock and crew. Currently, railway operators decide to cancel trains until the time of the initial prognosis.

The initial prognosis is given in SpoorWeb, the information system of ProRail for handling incidents, by a decision tree made by the consultancy company Consultants in Quantitative Methods (CQM). For every incident type, both technical and non-technical, a decision tree is constructed (see example in Figure 4). Based on the incident type, features of the incident are selected to create a split in the data. For a split, the feature is selected for which the 65th- percentile of the distributions of the options of the feature are not within the 95% confidence interval range of the other options. For example, for the feature rain the 95% confidence interval of “no” is [45,49] and for “yes” is [39,53]. The 65th percentiles are 47 and 45 respectively. Because the 65th percentile of “no” (47) is within the confidence interval of “yes”

[39,53], this feature is not used for a split. If multiple features can be used for a split, the feature is selected with the highest difference in duration for the leaves of the tree. The tree stops at the leaf in which no features satisfies this criterium (CQM, 2019). The decision tree is implemented in SpoorWeb and automatically gives the MKS an initial prognosis during the intake based on the features of the incident that are entered in the intake form.

Figure 4. Decision tree section malfunction (CQM, 2019)

The initial prognosis is the 65th percentile of the distribution at the leaves of the decision tree.

The 65th percentile is used for the initial prognosis because a pessimistic expectation is better

than an optimistic expectation. An optimistic expectation can result in an underprediction of

the incident duration. This leads to prognoses that often have to be extended, which makes

them unreliable. The 65th percentile means that 65% of the incidents with the same features

have been finished before the time of the initial prognosis. For the construction of the decision

trees, multiple data sources from 2014 to 2018 were used.

(17)

9

Incident recovery

After the intake, an AL is assigned to the incident. The AL can update the expected incident end time with an updated prognosis, e.g., when reading the intake form or when receiving new information from the assigned contractor or other parties. When the incident is considered to have a high impact, the MKS requests the AL to go to the location of the incident and the TRDL starts instructing or changing the trains that are affected by the incident. When all the trains on the track where the incident occurred or on the tracks next to the incident track have received driving instructions or are changed, phase 1 of the bathtub model is ended. In phase 2 the affected trains drive according to an adapted schedule to ensure the safety of the people at the location of the incident.

The parties involved with the incident differ per incident type. For technical incidents, a contractor has to come and perform repairs. For other incidents, the police or fire department might be involved. With incidents that involve a stranded train, the incident response team of ProRail also goes to the location and takes care of the passengers and the train.

From the moment the AL is assigned to the incident, the AL communicates with the parties involved in the incident recovery process, determines the actions that have to be performed and records the duration of these actions. An AL can decide to update the expected incident end time prognosis with information gained during the incident with his expertise and experience.

With the current model of CQM, it is not possible to use new information to determine updated prognoses. This is because the decision tree of CQM is static. This means that the decision tree is the same for every incident of that type and based only on information at the intake. A dynamic tree could consider information after the intake. As new information becomes available, a new prognosis could be made with a dynamic tree that is specific to the incident.

In this project, the decision tree of CQM will only be used to identify important features of incident types.

End of the incident and after the incident

At the latest 35 minutes before the expected incident end time, a new prognosis for the expected incident end time should be given by the AL. This can be an updated prognosis if the duration of the incident is uncertain, or a final prognosis when the AL is highly certain that the incident will be finished by that time. The moment in which a final prognosis is given should be at least 35 minutes before the end of the incident for a prognosis to be in time, because this time is needed for the replanning of the rolling stock and crew by the railway operators and traffic control (VL, verkeersleiding). The effects of a prognosis that is not given in time, or the effects when the prognosed incident end time is not correct, will be explained in Section 2.2.

The moment in which an incident is finished, the AL marks the end of the incident recovery

activities. This is called End ICB (Einde Incidenten Bestrijding). At this moment, phase 2 has

ended and this is the signal for the TRDL to allow trains to start driving according to the restart

schedule at the location of the incident, phase 3. When all the trains are back to the original

schedule, the restart is completed and phase 3 is ended. An overview of the actions by the

different parties is displayed in a swimlane diagram in Figure 5. All parties involved in the

process and their roles are summarized in Table 1.

(18)

10

Figure 5. Swimlane diagram of the incident recovery process

Table 1 Tasks per role in the incident recovery process

Role Tasks

MKS Central point of contact, fill intake form, generate initial prognosis, inform involved parties.

TRDL and VL Cancel, redirect, or instruct trains when an incident occurs and restart the train traffic after the incident.

Railway operators Communicate travel information, change planning rolling stock and crew.

AL Coordinate incident recovery process, update prognosis, and give a final prognosis.

Incident response team ProRail Take care of passengers and stranded trains.

2.2 Unreliable prognoses

A final prognosis that is not given in time can result in a delay before trains can be restarted because the restart plan is not finished when the incident is finished. Figure 6 shows the delay between the actual end of the incident and the start of the restart.

Figure 6. Final prognosis not in time

(19)

11 If the prognosed incident end time is before the actual incident end time, the prognosis is called as precise. When the actual end time of the incident is later than the prognosed incident end time, the restart plan has to be adapted because the incident is not resolved, and trains cannot start driving at the expected time. Then, the planners will wait till the end of the incident before creating a new restart plan. Thus, a prognosis that is not precise will result in a delay after the incident is resolved (Figure 7).

Figure 7. Prognosed end time before the actual end time

If the actual incident end time is before the prognosed incident end time, the restart plan does not have to be changed. However, the time between the actual incident end time and the prognosed incident end time is additional waiting time until the restart begins (Figure 8).

Therefore, a prognosis that overestimates the incident end time, a pessimistic prognosis, can lead to additional waiting time.

Figure 8. Actual incident end time before prognosed end time

The goal is to have a final prognosis that is given in time, (i.e. 35 minutes before the actual end

of the incident), and precise (i.e. the prognosed end of an incident is not before the actual end

of the incident). In practice, providing a reliable final prognosis is difficult. In Section 2.4, a

preliminary analysis is performed to identify how in time and precise the prognoses currently

are.

(20)

12 2.3 Remarks about the incident process

The initial prognosis is the time that follows from the leaves of the CQM decision tree. This expected time is a point and does not communicate the distribution of the end time of the incidents. For the parties involved, the initial prognosis serves as an indication of the time to aim for. Because the initial prognosis is based on the 65th percentile instead of the median of the distribution of duration of incidents at the same leaves of the decision tree, the AL’s are in 65% of the cases aiming for an end incident time that is too long.

Data stored from the actions that are performed at the incident location for technical incidents is limited. This is due to the employment of contractors. Contractors store the causes of the incident in their own databases, which are not shared with ProRail. Another reason for minimal data is that logging performed actions takes extra time and could cause more delay.

Prognoses currently change often when new information becomes available, (e.g., a repair that takes longer than expected, or when the identified cause doesn’t solve the problem). When an incident is resolved shortly after the time of the current prognosis, the planners have to change the planning for the restart again. In multiple interviews, it was stated that, currently, because of this planners have to wait until the incident is solved completely. This means that new changes cannot occur before they start planning the restart. This results in a delay with the duration of the restart planning process.

To avoid frequent changes of the restart plans, the railway operators currently cancel trains during the incident until the longest known prognosis: initial, updated or final prognosis. Since the initial prognosis is pessimistic, it should only be used as an indication and not as a fixed time. Reliable final prognoses would make it possible for the railway operators to focus more on the final prognosis instead of longest prognosis. This would change the role of updated and final prognoses in the recovery process and could lead to less additional waiting time before trains can start driving after an incident.

2.4 Current reliability of final prognoses

The incident type of an incident is determined during the intake. In this section, an analysis of the current reliability of the final prognosis is performed. This analysis will compare the differences in reliability per incident type.

Data analysis for problem identification

Data from 1st of January of 2020 to the 30th of September 2020 is used for the problem identification analysis. The data includes (1) the times the prognoses are given, (2) the prognoses times themselves and (3) the time in which the incident is resolved.

Incidents are first filtered on an initial prognosis of ≥ 60 minutes because incidents that have an initial prognosis of < 60 minutes are out of the scope of this project.

When no final prognosis is given manually, the system will automatically give a final prognosis at the moment the incident is marked as resolved in SpoorWeb with that time. To accurately analyse how much of the prognoses are in time and precise, the incidents with the automatic final prognosis are filtered out.

(21)

13

Overview analysis

This analysis gives an overview of how in time and precise the prognoses of the 9 most occurring incidents are. When a prognosis is changed (updated or final), the corresponding value in the database is overwritten. Therefore, the following timelines of the prognoses of an incident are based on the last stored prognoses.

In time

A final prognosis is in time if the time the final prognosis is given is 35 minutes or more before the end time of an incident. Thus, a prognosis is in time if [End incident] - [Final Prognosis Given] ≥ 35 minutes. An overview of how in time the prognoses are for the 9 most occurring incident types is shown in Figure 9.

Figure 9. Overview incident types in time prognoses

The in time overview shows that for most incident types, the final prognoses are not in time.

Some final prognoses are given more than 50 minutes before the end of the incident, but most

are given less than 35 minutes before.

(22)

14

Precise

A final prognosis is precise if the predicted final prognosis time is equal to or greater than the incident end time. So, a prognosis is precise if [Final prognosis] - [End incident] ≥ 0 minutes.

An overview of how precise the prognoses for the 9 most occurring incident types are shown in Figure 10.

Figure 10. Overview incident types precise prognoses

The precise overview shows that almost all of the prognosed incident times are after the actual

incident end time. The green area shows the additional waiting time, which is between 0 and

30 minutes for all incident types. This means that the final prognoses give an overprediction of

the actual time that is needed till the end of the incident. According to the definition at the

beginning of this paragraph, a final prognosis that is past the actual end of the incident is called

precise.

(23)

15

Individual incident types

The incident types defect material and the collision with a person will be compared here. These incident types were marked by ProRail as the most interesting incident because they occur often and do not involve a third-party contractor. The incident type for which a model will be constructed will be selected in Chapter 4.

Defect material

Figure 11. Defect material plots.

Top left: Distribution duration. Top right: In time. Bottom left: Precise. Bottom right: In time vs Precise

From the duration plot of defect material in Figure 11, it can be seen that most incidents are

shorter than 100 minutes. The in time plot shows that only 11% of the prognoses of the incidents

longer than 60 minutes is given in time and most final prognoses are given just before the end

of an incident. The precise plot shows a 90% overestimation and a peak between 0 and 5

minutes. This means that the final prognosis was very precise. From the in time vs precise plot,

it can be seen that most prognoses that are precise are not given in time. From discussing these

plots with people from ProRail and reading logging information about defect material

incidents, it becomes clear that only shortly before the end of an incident enough information

is available to predict the incident end time.

(24)

16

Collision with a person

Figure 12. Collision with a person plots

Top left: Distribution duration. Top right: In time. Bottom left: Precise. Bottom right: In time vs Precise

The duration plot in Figure 12 shows that the duration of a collision with a person incident is

more distributed compared to defect material. Most collision with a person incidents last

between 2.5 and 4 hours. From the in time plot, it can be seen that only 26.5% of the prognoses

are given more than 35 minutes before the incident end. The final prognoses are precise, since

almost all incidents have an overestimation of 10 to 20 minutes, as indicated by the green area

in the precise plot. The in time vs precise plot shows once again that most of the final prognoses

are precise but are not in time.

(25)

17 2.5 Summary

In this chapter, the incident recovery process is explained with the parties involved. In the

process, a good estimation of the end time is crucial to minimize the waiting time before trains

can start running again. Currently, only the initial prognosis is given based on a unique decision

tree per incident type. Based on the features of an incident, the 65th percentile of the

distribution at the leaf of the tree is taken for the initial incident duration prediction. During the

incident recovery process, an updated prognosis can be given by the AL. When the incident

end time is highly certain, a final prognosis is given. The reliability of this final prognosis

depends on the factors in time and precise. An analysis of the factor in time for the most

occurring incident types showed that most prognoses are given less than 35 minutes before the

end of the incident. So, they are not given in time. Most of the predicted incident end times

were after the actual end times. The analysis shows relatively high overestimation, which

results in extra waiting time. The analysis of incidents concerning defect material and collision

with a person shows that for defect material incidents the final prognoses are given at the very

last moment and, therefore, these are also precise. For collision with a person, most prognoses

are given more than 10 minutes before the end of the incident. The analysis showed an

overestimation of 10 to 20 minutes, which results in additional waiting time. This shows that

the reliability of the prognoses can still be improved. In chapter 3, a literature review will be

performed to identify models to make a reliable prediction of the end time of an incident.

(26)

18 3 Literature review

In this chapter, a literature review is performed. First, previous research at ProRail about prognoses is described in Section 3.1. In Section 3.2, methods for selecting features, to be included in a prediction model, are described. Methods proposed in the literature to develop a model for incident time prediction are discussed in Section 3.3. This chapter ends with model validation in Section 3.4 and a summary in Section 3.5.

3.1 Previous prognosis research at ProRail

Four research projects about prognosis have been performed at ProRail from 2015 to 2019. In this section, the focus, methods and outcomes of these projects are summarized.

De Wit (2016) focused on the initial prognosis. Four methods for initial prognosis were proposed: confidence intervals of probability distributions, regression analysis, nearest neighbour and prediction by an expert. For these methods, features were determined that result in a more accurate prediction. To communicate the reliability of a prediction, confidence intervals were proposed. Probability distributions and regression analysis showed the best prediction performance. The average success percentage for 25-minute intervals was only 5 percent off from the actual time.

The projects of Zilko (2017), DataLab ProRail (2019) and Wemelsfelder (2019) at ProRail focused on both the initial and updated prognosis. After an initial prognosis was determined, an improved updated prognosis was supported when new data became available.

Technical incidents

Zilko et al. (2016) proposed a Bayesian Network model to predict the length of an incident based on the statistical dependencies of variables. This model can give a prediction for the incident duration when information is still missing. When new information becomes available, the distributions are updated and this results in a new prediction.

The length of an incident was split into the latency time and the repair time. Features that influenced the latency time were time, location and weather. Features that influenced the repair time were contract type and the cause. An example model was created and resulted in a better prediction compared to the initial prognosis. The model represented the data well, however, the R

²

was low. They concluded that the data used was of poor quality and that expanding the model with more influential features could have a potential benefit (Zilko, 2017).

In 2019 the DataLab of ProRail focused on the initial prognosis of section malfunctions (DataLab ProRail, 2019). The project attempted to determine the cause of the incident with text mining. As also identified by Zilko (2017), the cause influences the incident duration. With the results from text mining, new decision trees were constructed with the features: time, location, contract type, Train Incident Scenario (TIS, Trein Incident Scenario), equipment type and the cause.

These decision trees showed that the distribution of incident end time changes per cause, but

still had large deviations. To inform about the uncertainty of a prognosis, the project proposed

to communicate the point estimate at the 65th percentile and also the 35th and 85th percentile

of the prediction distribution.

(27)

19 The impact of new information during the incident on the width of the prediction distributions intervals was also investigated. This showed that the later a prediction is given, the higher the certainty. The recommendation from this project was to identify moments when new information becomes available to give a new prognosis. However, the project showed that the incident recovery process is difficult to predict. New prognoses still have uncertainty, for which intervals can be a good method of communication.

Non-technical incidents

The research of Zilko (2017) and DataLab ProRail (2019) was limited to technical incidents.

Therefore, Wemelsfelder (2019) researched a dynamic model for prognosis that can determine an updated prognosis when new data becomes available, even for non-technical incidents. The methods used were Bayesian Networks (BN) and k-Nearest Neighbour (kNN). A decision tree was also identified as a suitable method but was excluded because the actual CQM decision tree (see Section 2.1.1) was already a decision tree. The feature selection for the model of Wemelsfelder combined features from Zilko, De Wit and CQM. As an example, features selected for three incident types are displayed in Table 2.

Table 2 Features per incident type (Wemelsfelder, 2019)

The performance of the kNN and BN model was measured with RMSE and MAE (see Section 3.7.1) and showed similar performance to CQM for the initial prognosis. For the updated prognosis only the performance of the collision/hindrance incidents improved. This evaluation has however been performed on a small number of data points, 20 and 10 respectively. An analysis where extra time was added to the prediction, showed that the impact of overprediction on the prediction errors was minimal. Therefore, adding time to the predicted time from the model would decrease the probability of underpredicting and results in a minimal increase in prediction errors.

3.2 Machine learning

Studies in China (Huang et al., 2020), Sweden (Corman & Kecman, 2018; Nilsson & Henning, 2018), Denmark (Grandhi, 2019) and the Netherlands (Wemelsfelder, 2019; Zilko, 2017) showed the use of Machine Learning (ML) models for prediction in railway. ML can help solve problems that are complex and contain large amounts of data (Mehryar et al., 2019).

Rolling stock Section TOBS Collision/Hindrance

HSL/Betuwe Day/Night Randstad

Driving characteristics Working hours Working hours

Rolling stock type Randstad Day/Night

Freight train Contract type Thing train collided with

Day/Night Temperature Nature of incident

TIS Overlapping incidents Location of base

Train table adjusted Rush hour Train table affected

Train company Year of replacement

Shunting point Location of base

Activity Contractor

Tao indicator Wind direction Cause

(28)

20 The goal of ML is to find a balance between bias and variance. Bias is when the model cannot capture the complexity of the real-life situation. Variance is the amount that the prediction changed when different historical data is used. Obtaining low bias and low variance is the goal.

Before explaining the models used, the different types of ML models will be presented.

There are three types of ML: (i) supervised learning, where historical data is used to train the model to predict the output variable, (ii) unsupervised learning, where the output variable is unknown and the model has to find the structure on its own (Hastie et al., 2008), (iii) and reinforcement learning, where the actions to maximize a reward have to be found. The model provides no answer but has to decide the actions to perform itself (Abu-Mostafa et al., 2012).

Because of the previous use of machine learning methods for prediction, only machine learning methods will be researched. Other, more statistical, methods can also be used but will not be researched in this project.

Machine learning process

For this project, previous incidents will be used for prediction. Therefore, this project focusses on supervised learning methods. The process of developing a supervised machine learning model consists of multiple steps (Akinsola, 2017). An overview of these steps is given in Figure 13.

Figure 13. Supervised machine learning process

Methods for feature selection will be researched in Section 3.3. Different models for prediction

in railway and hyperparameter tuning will be researched in Section 3.4. Performance metrics

to compare models for selection will be explained in Section 3.5. The other steps of the

supervised machine learning process will be discussed and applied to the incident data in

Chapter 4.

(29)

21 3.3 Data pre-processing

Data pre-processing is one of the very important steps because data is prone to a lot of anomalies, missing information and inconsistencies. Data pre-processing aims at improving the quality of raw data and, consequently, the quality of mining results. It also prepares the data to enable further analysis (Jambhorkar & Jondhale, 2015).

After the data collection, first, the data has to be analysed for anomalies and inconsistencies.

When the data is cleaned, missing data can be dealt with.

Missing data

Some machine learning methods can deal with missing data. When a machine learning model is not able to deal with missing data, the data has to be deleted or imputed. There are three common reasons why data is missing in a data set:

1. Missing Completely at Random

The data that is missing has no relation with the other data in the data set.

2. Missing at Random

The missing data can be explained by data from other features.

3. Missing not at Random

The reason why the data is missing is related the data itself.

When the data is missing completely at random, or at random, it can be deleted without impacting the bias of the model. For data that is not missing at random, deleting data would increase the bias of the model. In this case, imputation can be an option (Allison, 2001).

The imputation method depends on the type of data. If the data is categorical, some imputation methods are adding missing as a category, select the most frequent category, or use prediction models to predict the missing values. For numerical data, mean, mode, median or regression can be used to impute the missing data. k-Nearest Neighbour can also be used for both categorical and numerical data to impute the data with the average of the neighbours.

Imputation can result in higher bias because imputed data may be too similar to the other data.

Also, imputation does not have to lead to better results than deletion, but when data is sparse and the model cannot deal with missing data, it can be useful.

3.4 Feature selection

Selecting which features to include in the model is an essential step in the model creation process. Feature selection can result in better model performance by reducing overfitting, decreasing the training time and providing an understanding of the data. There are three different methods for feature selection: filter methods, wrapper methods and embedded methods (Guyon & Elisseeff, 2003). These three methods can be used both separately or combined for a more robust model selection.

Filter methods are statistical techniques to evaluate the relationship between features and the

output variable. Filter methods mostly compare one feature with the output variable. Because

of this, the interaction between features is not evaluated. When the features are numerical,

correlation techniques like Pearson’s correlation can be used. When the features are

categorical, ANOVA can be used (Kuhn & Johnson, 2013).

(30)

22

Wrapper methods use a different machine learning algorithm in the core of the method and its

performance is used as an evaluation method to select features. Many models are created that add or remove features to find the combination of features with the best performance. Common wrapper methods are forward selection, backwards elimination and stepwise selection.

Forward selection starts with an empty model and adds features that result in the highest increase in the performance measure. Stepwise selection makes forward selection less greedy by revaluating all features in the model for elimination after a feature is added. Backward elimination starts will all features and removes features that results in the smallest decrease of the performance measure.

Recursive Feature Elimination (RFE) performs a greedy search by iteratively removing features from the model and creating models on the remaining features. The feature that showed the lowest performance is removed. When all features are evaluated, feature ranking is giving by the order of elimination.

Embedded methods perform the feature selection during the training of a model. Common

embedded methods are regularization methods that penalize additional features. During the optimization, constraints penalize extra features leading to a higher bias model with fewer features and variance.

There are two types of regularization, L1: Lasso Regression and L2: Ridge Regression. Lasso Regression penalizes the absolute value of the magnitude of the feature where Ridge Regression penalizes the square of the magnitude of the feature. Lasso Regression can shrink the coefficients of features to 0, where Ridge regression uses all features in the model.

Therefore, Lasso regression excludes useless features where Ridge regression is better when most features are useful (Hastie et al., 2008).

Other embedded methods are tree-based methods (see Section 3.5.2) which calculate feature importance during the training of the model.

3.5 Modelling methods

In the next two paragraphs on modelling and evaluation methods for prediction, both projects on incident duration and train delay prediction are reviewed, because the prediction methods used in these project are very similar (Ghofrani et al., 2018).

Two types of variables can be predicted: discrete and continuous variables. The prediction of a discrete output variable is a classification. The prediction of a continuous output variable, such as the duration of an incident, is done with regression.

Linear Regression

The simplest type of regression model is a linear regression. In linear regression, the output

variable is predicted by the linear combination of the input variables with weights. The weights

are optimized to minimize the squared error between the prediction and the actual value of the

output variable. Linear regression is built on the assumption that there is a linear relationship

between the input variables and the output variable. When there is no linear relationship

between these variables or unequal variance across the variable (heteroscedasticity), linear

regression shows low performance (Hastie et al., 2008).

(31)

23

Tree-based models

Trees are used to partition the feature space in groups. For partitioning, the feature with the least reduction in accuracy is selected. To prevent a tree from splitting on too many features which results in overfitting, a minimal number of observations in a group can be set, or a maximum number of splits can be defined (Hastie et al., 2008). Advantages of decision trees are that they are easy to interpret by users and that nonlinear relationships of features do not influence the performance.

Ensemble learning is the combination of multiple individual models which combined give a more accurate model. For instance, Random Forest is an ensemble of decision trees. With Random Forest, many decision trees are constructed in parallel based on subsets of the dataset.

The subsets are created randomly by selection with replacement, this is called bootstrapping.

The available features differ per tree due to the randomness of the bootstrapping. The final prediction of the Random Forest is determined by averaging the predicted values of the trees.

Since many random trees are built, a Random Forest is resistant to overfitting and the accuracy is higher compared to decision trees (Breiman, 2001).

Another method to build forests is boosting. In a boosting method, subsets are selected from data sequentially. A first subset is selected randomly and points that have low prediction performance are included in the next sample with new random points. This helps models to improve wrongly predicted points by focusing on them. However, this can increase overfitting and variance (Hastie et al., 2008).

Different boosting techniques exist. AdaBoost and gradient boosting are the most common. In AdaBoost weights of each tree can be different and trees are based on the error in previous trees. AdaBoost is used by Nilsson & Henning (2018) for train delay prediction and showed reasonable performance. Gradient boosting is a greedy method that sequentially selects trees at each step that minimize the loss function. XGBoost is an extension of gradient boosting and uses more regularized model formalization to control overfitting (Chen & Guestrin, 2016).

Grandhi (2019) showed that XGBoost performed well for incident duration prediction on training data, but showed overprediction due to the data used, causing a lower test performance.

Bayesian Networks

A Bayesian Network (BN) is a directional acyclic graph. The nodes represent the random

variables, and the edges correspond to the conditional probability of the nodes. A node first

holds the probability distribution of the random variable independent of the other nodes. When

the value of a random variable is known, the BN updates the probabilities of the connected

nodes based on the conditional probabilities. Figure 14 shows the BN of Zilko (2017) where

the contract type is known, but the cause of the incident is not yet known. Therefore, the

independent probabilities or different causes are displayed and the distribution of repair time

is determined based on the statistical dependency between these variables. When the cause is

known, the distribution of the repair time changes and therefore the distribution of the

disruption length as shown in Figure 15.

(32)

24

Figure 14. Bayesian Network with unknown cause (Zilko, 2017)

Figure 15. Bayesian Network with known cause (Zilko, 2017)

Corman & Kecman (2018) proved that a BN is an appropriated method to model the complex interdependencies with train delays. Lessan et al. (2019) showed that a BN, built with domain knowledge and experts’ judgements, can achieve a prediction performance for train delays.

Artificial Neural Networks

Artificial Neural Networks (ANN) are inspired by the neural network of the brain. An ANN

consists of layers and each layer consists of nodes. There is an input layer for the features, one

or multiple hidden layers and an output layer for the prediction. The way nodes are connected

between layers depends on the architecture and influences the ability of the nodes to retain

information. The connections between the nodes have a weight and each node has a bias. The

activation of a node is determined by an activation function. The activation function receives

as input: the weighted sum of the connections and the values of nodes plus the bias. During

model training, the weights are changed to minimize a loss function with gradient descent. This

is called backpropagation (Hastie et al., 2008).