Using predicted incident resolution time as a predictor for an incident going out-of-time

(1)

Using predicted incident resolution time

as a predictor for an incident going

out-of-time

Roos E.M. Riemersma 11004401

Bachelor thesis Credits: 18 EC

Bachelor B`eta Gamma, major Artificial Intelligence

University of Amsterdam Faculty of Science

Science Park 904 1098 XH Amsterdam

Supervisor Dr. Sander van Splunter

Informatics Institute Faculty of Science University of Amsterdam Science Park 904 1098 XH Amsterdam July 10th, 2019

(2)

Abstract

Large scale organisations have to deal with thousands of incidents a day. Each incident has to be solved within a certain maximum resolution time, or the incident will go out-of-time, which causes fines. Making sure incidents are solved in time is a complex and time-consuming task that can be optimized in several different ways. An aspect that may help this process is knowing the resolution time of an incident beforehand. This thesis explores how the incident resolution time prediction can be a predictor for an incident going out-of-time. The possibility of predicting incident resolution time and the possibility of using resolution time as a certainty rate for an incident going out-of-time are explored using data provided by the ABN AMRO.

The best performing Artificial Neural Network predicted the resolution time of high priority incidents with an mean absolute error of 4275.05, which corresponds to an error of 1.19 hours. The predicted resolution times can be used to provide a certainty rate for the incidents going out-of-time. Further research could improve the performance on predicting incident resolution time.

(3)

1 Introduction

In large scale organisations thousands of incidents occur daily. These incidents can be reported by clients or users, but can also be machine-generated. Dealing with thousands of incidents a day is a complex and time-consuming task. Optimizing this process could save time, costs, and man-hours. One of the difficulties of dealing with a great number of incidents is that they differ in seriousness. Each incident has its own priority, which means all incidents have a different maximum resolution time, varying from two hours to two working days. This maximum resolution time is, amongst other agreements, documented in the Service Level Agreement (SLA). The SLA is a contract between a service provider and a customer. It specifies what services will be provided and what penalties will be given if the service provider cannot meet the goals (Marilly et al., 2002). Often, the SLA is used to ensure incidents within the company are solved as quickly as possible. Unfortunately, it often happens an incident cannot be solved within the resolution time. This is when an incident is registered as out-of-time. Companies get fined for their out-of-time incidents, thus a quick and efficient solving method is desired.

An aspect that may help this process is knowing the resolution time of an incident beforehand and if an incident will go out-of-time. This way, workload can be estimated more effectively and work hours can be spend more efficiently. This problem yields the following research question: How is the incident resolution time prediction a predictor for an incident going out-of-time? To answer this question, two sub-questions arise:

• Can an Artificial Neural Network be used to predict the resolution time of incidents? • Can the resolution time be a tool to predict if an incident will go out-of-time? This research is conducted as a case study at the ABN AMRO bank. The data that is used are incidents occurring in a time span of three months at the ABN AMRO; from January 2018 till March 2018. These incidents all get a priority ranging from 0-3; respectively emergency, high, medium and low priority. Emergency priority incidents have a maximum resolution time of two hours. High priority incidents have to be resolved in four working hours and medium priority incidents in eight working hours. The incidents with low priority have a maximum resolution time of two working days.

The next section, section 2, discusses Artificial Neural Networks and their applications in previous related research. In section 3.1 and 3.2, an overview is given of the retrieved data and the prepossessing steps that were taken. Next, in section 3.3, several evaluation methods are discussed and in section 3.4 the architecture of the used model is described. In section 4, the conducted experiments are described, after which the obtained results are analysed in section 5. Finally, section 7 provides a discussion, followed by the conclusion in section 6.

1.1 Blue Student Lab

This thesis is part part of the Blue Student Lab, a collaboration between the University of Amsterdam and the ABN AMRO bank. The aim of this project is for students to find a scientific solution to a real world problem within a large scale organization. The ABN AMRO provided the context and data for five students. The goal was to all find a different method to improve incident management. Knigge (2019) designed a method that uses event correlation and root cause analysis to support incident management. Niewenhuis (2019)

(5)

provided a method for improving feature selection using Weight of Evidence and XGBoost. Wiggerman (2019) provided an approach to predict the first assignment group for a smooth incident resolution process. Velez Vasquez (2019) designed a method to predict causal relations between incidents and changes within IT Service Management.

2 Theoretical Background

In this section relevant literature is discussed briefly to set up the required theoretical back-ground.

2.1 Artificial Neural Network

Artificial Neural Networks are a strategy to develop simplified mathematical models of brain-like systems (Alpaydin, 2014). These models can be used to solve various computational problems (Rumelhart et al., 1994). The ANN has neurons as the basic processor, which are characterized by an activity level (representing a neuron’s polarization state), an output value (representing a neuron’s firing rate), a set of input connections (representing synapses on the cell and its dendrite), a bias value (representing a neuron’s internal resting level), and a set of output connections (representing a neuron’s axonal projections) (Rumelhart et al., 1994). These characteristics are all mathematically represented by real numbers (Schmidhuber, 2015). Each connection has a weight which determines the effect of the incoming input on the activation level of the artificial neuron. The output of a unit is a function of its activation value. A possible choice for this function is the ReLU:

ReLU = max(0, x) (1)

An ANN consists of a large network of these units, which are richly connected to each other.

Learning in a neural network is equal to finding a set of connection strengths that allow the network to carry out the desired computation (Rumelhart et al., 1994). A basis architecture for a feedforward neural network is given in Figure 1. Feedforward implies that the activity of a unit cannot influence its own input, which is the case in Recurrent Neural networks (Rumelhart et al., 1988). It consists of a set of input neurons that are connected, through a set of hidden neurons, to an output neuron. In practice the number of neurons and hidden layers may vary.

The network is provided with a set of examples of input-output pairs, the training set, that are used for training the network. This set is used to modify the connections of the network to approximate the function from which the input-output pairs have been collected Rumelhart et al. (1994). After training, the network is tested on new data. The training of the network uses an error correcting. During training an input is put into the network and flows through the network generating a set of values on the output units. Hereafter, the actual output is compared with the desired target, and a match is computed. If the output differs from the target, a change is made to some of the connections in the network. Essentially, a measure of the overall performance of the system is defined and the performance is optimized. The performance of the network can be defined by the Mean Absolute Error:

MAE =

n

X

i=1

(6)

where yi is the predicted value and xi the true value. The goal is to minimize this function.

This is obtained by changing the weight of the system in proportion to the derivative or the error with respect to the weights (Rumelhart et al., 1988). In a neural network learning is essentially estimating the parameters.

Input #1 Input #2 Input #3 Input #4 Output Hidden layer Input layer Output layer

Figure 1: Representation of an artificial neural network with an input layer, one hidden layer and an output layer.

2.2 Related work

Valenti et al. (2010) compared several methods to predict traffic incident resolution time, including an ANN. In this particular research various ANN architectures were trained, after which the best performing architecture was chosen. This research states that the ANN model gives the best results predicting resolution time for long duration incident cases.

Wang et al. (2005) developed two models to predict the vehicle breakdown duration; one model based on ANNs and one based on fuzzy logic. Although the research demonstrated that ANN and fuzzy logic can provide reasonable estimates for the breakdown duration with few variables, both models had difficulties predicting the outliers.

Many studies have demonstrated that Artificial Neural Networks have the potential to ac-curately predict incident conditions on freeways (Wei & Lee, 2007). Guan et al. (2010) deployed an Artificial Neural Network approach to study the prediction problem of freeway incident duration. They state an ANN can be used to predict incident duration, but the accuracy of the prediction depends on the quality of the incident data. According to Guan et al. (2010) the ANN can help improve the incident management process.

Based on these previous researches it was decided to use an ANN to predict the resolution time of incidents within ABN AMRO’s IT environment.

3 Method

In this section the used data and methods are explained. First, a description of the data retrieval and data preprocessing is given. Thereafter, several evaluation methods are de-scribed. Finally, the training architecture for the ANN is given.

(7)

3.1 Data retrieval

The data used in this research is retrieved in a similar matter as in the research of Ten Kaate (2018). The data consists of incident data of the ABN AMRO, extracted from the documen-tation service ServiceNow, where incidents are documented and described in roughly two hundred fields, which can be filled in or left blank. Each field contains information about the incident, such as the opening and closing date, which are filled in automatically, or a description of the incident, which are filled in by hand.

The description fields can be classified into three groups; unique fields, categorical fields and text fields. Unique fields consist of values bound to a specific incident, such as its incident number and date of creation. Fields from this group are left out of this research, as these fields do not contain useful information on the similarity between incidents. Categorical fields have a mixed set of values to choose from. Because of the fixed set of values, the number of unique instances will not grow linear with the amount of incidents. This is important during the processing of the data, as computation time will rise rapidly when the number of unique values increases. The assignment group and the priority of an incident are examples of categorical fields. Fields that can be true or false, such as incidents being active or not, also belong to the categorical group. The third type of fields are text fields. Text fields do not consist of any given instances a user can choose from, but the open text fields are filled in with extra information by the employee that documented the incident. An example is the description field, where all the additional information on the incident is given. The text fields are all very different in length and content, which heightens the complexity of preprocessing the data and therefore these particular fields have been left out.

To summarize, this thesis focuses on using categorical fields only. The unique fields and text fields are left out, because without additional domain specific knowledge and interpretations, these do not contribute valuable information about the incidents and the processing lies beyond the scope of this research.

3.2 Data preprocessing

After extraction from ServiceNow, the data set consists of 151,240 incidents and 95 fields, based on a period of 3 months. The data set is exported to a Pandas1_{data frame, which is a}

Python library commonly used for analysing and working with data sets (McKinney, 2011), and is treated similarly as in the research of Ten Kaate (2018). The rows of this data frame represent the incidents and the columns represent the fields. First the fields mentioned in section 3.1 and fields that were left blank at first documentation of the incident are excluded from the data frame. Thereafter, rows are excluded based on several characteristics. Rows where direct close is True or active is True are deleted. Rows from which the following columns are empty are removed; due date, priority, resolved at u sla start date (the first documentation date) and u sla breached (if the incident was solved in time).

The resulting columns used are overlapping with the selected columns used in Ten Kaate (2018). Appendix A provides the exact fields that were used for training the neural network. Added to the columns used by Ten Kaate (2018) are business duration, calendar duration and u sla breached. The first two columns were used to create the targets for the neural network; business duration for incidents with high or medium priority, calendar duration

(8)

for incidents with emergency or low priority. The u sla breached column is used to exclude rows where the u sla breached did not correspond with the resolution time of an incident. Thus, if the u sla breached indicated the incident is out-of-time, but the resolution time does not exceed the limit established in the SLA, or if the resolution time does exceed the SLA limit, but the u sla breached indicated the incident is not out-of-time, this incident is removed from the data set. The number of incidents is now reduced to 99,180 and the number of fields is reduced to 36.

After the preparation of the data set, it is encoded and split into a test set and a training set before the data is ready to use for machine learning.

3.2.1 One-hot-encoding

The retrieved data set needs to be encoded to transform it to usable input for a neural network, as the data mostly consists of categorical values. One-hot-encoding is a method to transform categorical values into a one-hot numeric array. It creates new binary columns, indicating if each possible value is present or not. Figure 2 shows a simple example of the data set converted to a matrix using one-hot-encoding. To prevent over-fitting, and because of computational considerations, only for values occurring more than 50 times a node is created.

Figure 2: Simple example of a one-hot-encoded data frame

The Pandas function get dummies2 _{is used to create the encoded data frame. This makes}

it possible to create a sparse vector, which needs less data storage space. The data frame now consists of 99,180 rows and 1,763 columns.

3.2.2 Split data in training set and test set

After encoding, the data is split into a training and test set. The data is randomized and split on a 0.8-0.2 train-test ratio. To validate the neural network while training, 20 percent 2_{Pandas.get dummies documentation:} _{https://pandas.pydata.org/pandas-docs/stable/reference/}

(9)

of the training set is used as a validation set. The data set is split into three different train-test sets (Table 1); one containing incidents of every kind of priority, one only low priority incidents and one only high priority incidents. Appendix B shows the distribution of targets for each train-test set.

Size train set Size test set

All prio (79344, 1796) (19836, 1796) Low prio (41334, 1796) (10334, 1796) High prio (7764, 1796) (1942, 1796)

Table 1: Size of all three train-test data sets, where All prio stands for the data set containing incidents of all four priority types and Low prio and High prio respectively stand for the data sets containing only low priority and high priority incidents.

3.3 Evaluation metrics

To evaluate the performance of the neural network, the Mean Absolute Error (MAE) and the Mean Squared Error (MSE) are calculated. The performance also is evaluated on how accurate the out-of-time prediction would be, given the predicted resolution time.

3.3.1 MAE and MSE

The MAE and the MSE are used to calculate the difference between the predicted value and the true values. These are used as a measurement of how good the predictions are; the smaller the error, the better the prediction.

The MSE is much higher in case of outliers compared to the MAE, as squaring an error will make it even bigger.

Mean Squared Error is calculated by:

MSE =

n

X

i=1

(yi− xi)2 (3)

where yi is the predicted value and xi the true value. See Section 2.1 for the formula of the

Mean Absolute Error.

3.3.2 Out-of-time evaluation

To evaluate the results of this research in a broader perspective, they also are evaluated on how accurate the out-of-time prediction would be, given the predicted resolution time. This out-of-time evaluation uses the predicted resolution time and the SLA resolution time restriction to calculate if an incident would be out-of-time or not. This out-of-time prediction is then compared to the actual out-of-time value, after which the precision (Formula 4) is calculated for incidents that are predicted as not out-of-time and the negative prediction rate (NPR, Formula 5) is calculated for incidents that are predicted as out-of-time.

(10)

Precision = T P

T P + F P (4)

NPR = T N

T N + F N (5)

Where TP = True Positive, TN = True Negative, FP = False Positive and FN = False Negative. Figure 3 is a schematic representation of when a result is correctly predicted (TP and TN) and when it is not (FP and FN).

Actual

Pos

Neg

Predicted

Pos

TP

FP

Neg

FN

TN

Figure 3: Representation of when a result is a TP, TN, FP or FN

3.4 Training architecture

In this section the main aspects of the training architecture are discussed. In Section 3.4.1 the implementation language, used packages, and machine used for training are discussed. In Section 3.4.2 the Neural Network architecture used is given.

3.4.1 Tools

The neural network is implemented in Keras3_{, which is a Python library that uses}

Tensor-Flow4_{as a back-end, implemented with Python version 3.5. The Keras sequential model is}

used to build a model with dense layers that constitute the architecture. The model was trained in a Jupyter Notebook5 _{on a Dell laptop with an Intel core i5 processor.}

3.4.2 Architecture Artificial Neural Network

In this research various configurations of neural networks are tested. All configurations are basic sequential models from Keras using an Adam optimizer 6, which all train with the mean absolute error as loss function. All hidden layers are dense layers, with a ReLU activation function. The architectures of the networks differ in number of hidden layers, the number of nodes in each layer, and the number of epochs. A description of all the tested network configurations is given in Section 4. The number of hidden layers ranges from 2 to 5. The number of nodes per hidden layer ranges from 512 to 64. The number of epochs ranges from 50 to 200. The input layer consists of 79,344, 41,334 or 7,764 nodes, depending on the data set it is training with (see Section 3.2.2). The output layer always consists of 1 output layer.

3_{Keras documentation: https://keras.io/models/sequential/}

4_{TensorFlow documentation: https://www.tensorflow.org/api docs/python} 5_{Jupyter documentation: https://jupyter.org/documentation}

(11)

The targets that are used to train with is the actual resolution time of the incidents. For incidents of high, medium and low priority the targets are in business time. For incidents of the emergency priority the targets are in real time.

The model is fit using the training data described in subsection 3.2.2, of which a 0.2 part is used as validation set during training. A batch size of 32 is used; this is the number of samples that is propagated through the network, before the network’s weights are adjusted.

4 Experiments

To predict the resolution time of incidents, three different experiments are conducted: on incidents of all priorities, on solely low priority incidents and on solely high priority incidents. Preceding the description of the experiments, the experiment setup is discussed in section 4.1.

4.1 Experiment setup

The research is split into three different experiments, as the resolution time of each incident differs greatly between the different types of priority. When split into data sets consisting only one type of priority, it is more likely the neural network predicts the correct resolution time, as the target values are in a much smaller range of numbers.

After the resolution time is predicted using the model described in section 3.4, these results are used to provide a certainty on that an incident is resolved in time or that the incident will be out-of-time. This certainty rate is represented as a probability. To create this probability the results are split into bins and the NPR and precision (see section 3.3) are calculated over these bins.

For each bin where the predicted resolution time is lower than the SLA maximum resolution time, the NPR is predicted. For each bin where the predicted resolution time is higher than the SLA maximum resolution time, the precision is predicted. Expected is that the further away the predicted resolution time is from the SLA resolution time, the higher the certainty rate (Figure 4).

Figure 4: Example of an idealised hypothesised certainty rate. The further away the predicted resolution time lies from the maximum SLA time, the higher the certainty rate.

(12)

The certainty rate provides a score on how likely it is that an incident with a low predicted resolution time will be resolved in time and on how likely it is that an incident with a high predicted resolution time will go out-of-time.

The first experiment was conducted on a data set containing every type of priority. The second experiment was conducted solely low priority data. Finally, the third experiment was conducted on solely high priority incidents.

4.2 Experiment on all priorities

The resolution time of incidents is predicted on a test set with 79,344 incidents and 1,796 fields. The network is trained on a training set containing 19,836 incidents and 1,796 fields. Table 2 represents all the configurations of the neural network used to predict the resolution time of incidents from all priorities.

Config. # hidden layers # nodes hidden 1 # nodes hidden 2 # nodes hidden 3 # nodes hidden 4 # nodes hidden 5 # epochs learning rate A 5 512 256 128 64 32 100 0.0005 B 5 512 256 128 64 32 30 0.0005 C 4 512 512 256 256 - 50 0.0005 D 4 512 512 256 256 - 50 0.001 E 4 512 512 256 256 - 80 0.0002 F 3 512 512 512 - - 80 0.0005 G 3 512 512 512 - - 20 0.0008

Table 2: Configurations neural network predicting resolution time on incidents of every type of priority

4.3 Experiment on low priority incidents

The resolution time of incidents with low priority is predicted on a test set with 10,334 inci-dents and 1,796 fields. The network is trained on a training set containing 41,334 inciinci-dents and 1,796 fields. Table 3 represents all the configurations of the neural network used to predict the resolution time of low priority incidents.

Config. # hidden layers # nodes hidden 1 # nodes hidden 2 # nodes hidden 3 # nodes hidden 4 # epochs learning rate A 3 512 256 64 - 50 0.0002 B 3 512 512 512 - 50 0.0005 C 3 512 512 256 - 50 0.0003 D 2 512 512 - - 50 0.001 E 2 512 256 - - 80 0.001 F 4 512 256 128 64 100 0.0005 G 3 512 512 256 - 80 0.0002

Table 3: Configurations neural network predicting resolution time on low priority incidents

4.4 Experiment on high priority incidents

The resolution time of incidents with high priority is predicted on a test set with 1,942 incidents and 1,796 fields. The network is trained on a training set containing 7,764 incidents

(13)

and 1,796 fields. Table 4 represents all the configurations of the neural network used to predict the resolution time of high priority incidents.

Config. # hidden layers # nodes hidden 1 # nodes hidden 2 # nodes hidden 3 # nodes hidden 4 # epochs learning rate A 3 512 512 512 - 50 0.005 B 3 512 512 512 - 80 0.0002 C 3 512 256 64 - 100 0.0001 D 3 512 256 64 - 200 0.0001 E 3 512 256 64 - 50 0.0005 F 4 512 512 256 256 100 0.0001

Table 4: Configurations neural network predicting resolution time on high priority incidents

5 Results and analysis

In the following section the obtained results are presented and analysed. For every exper-iment a table containing the applied configuration, the resulting mean absolute error, the MAE in hours, and the mean squared error is provided. In addition, a graph showing the course of the MAE for each configuration and a graph of the MAE of the best and the worst performing configuration is supplied. The predicted resolution times of the best configura-tion are used to conduct the certainty rate on whether an incident will go out-of-time or not. This certainty is represented in a graph as a probability.

5.1 Experiment on all priorities

_Config. _MAE _{in hours} _MSE

A 2.88e+5 80.04 4.76e+11 B 3.05e+5 84.67 5.19e+11 C 2.88e+5 80.06 4.64e+11 D 2.90e+5 80.61 4.90e+11 E 2.95e+5 81.82 4.89e+11 F 2.88e+5 80.11 4.67e+11 G 3.03e+5 84.22 5.10e+11 Table 5: Results predicting resolution time on incidents from all priorities with several configurations of a neu-ral network

Table 5 shows the results of running the configura-tions mentioned in section 4.2 on incidents of every type of priority. The maximum SLA resolution time for this data set ranges from two hours to two working days. This variation makes predicting the resolution time challenging, which becomes clear in the third column of Table 5. The resolution time predictions have an error of eighty to eighty five hours, which is fairly high considering some incidents have to be solved within two hours. The MAE of every

configu-ration is shown in Figure 5 and the MAE of the best and the worst performing configuconfigu-rations are highlighted in Figure 6.

(14)

Figure 5: MAE for several configurations of a neural network predicting resolution time on all priority incidents.

Figure 6: MAE for the best and worst config-urations of a neural network predicting reso-lution time on all priority incidents.

After testing every configuration, configuration A has the lowest error and the best pre-dictions on the resolution time. These predicted resolution times are used to calculate the certainty rate described in section 4.1. For each of the priorities in the test set the certainty on whether an incident will go out-of-time or not is calculated and represented in a graph (Figure 7). As expected, the certainty of an incident going out-of-time or not is the lowest around the maximum SLA boundary and grows as the predicted resolution time gets fur-ther away from the maximum SLA resolution time. The certainty rate for the low priority incidents, displayed in Figure 7 on the bottom right, deviates from the hypothesis. It seems incidents with a high resolution time do not have a very high certainty rate. The hypoth-esis on the certainty does seems to apply for the emergency and high priority incidents, in particular for the incidents going out-of-time (the right side of the SLA line in Figure 7).

(15)

Figure 7: Probability on whether an incident will go out-of-time or not, given a predicted resolution time. The maximum SLA resolution time associated with the level of priority is displayed as a vertical grey line.

5.2 Experiment on low priority incidents

Table 6 shows the results of running the configurations mentioned in section 4.3 on the test data set containing only low priority incidents. The maximum SLA resolution time on this data set is two working days. The error the predicted resolution times is shown in Table 6. The error is better than the error of the configurations on the full data set (Table 5), but an error of around 24 hours on a maximum resolution time of two days is still quite high. The MAE of every configuration is shown in Figure 9 and the MAE of the best and the worst performing configurations are highlighted in Figure 10.

Config. MAE in hours MSE

A 85914.38 23.87 9.34e+10 B 87852.25 24.40 9.12e+10 C 84286.45 23.41 9.02e+10 D 84097.43 23.36 9.10e+10 E 83780.66 23.27 9.08e+10 F 90128.09 25.04 1.04e+11 G 84376.74 23.44 9.02e+10

Table 6: Results predicting resolution time on low priority incidents with several configura-tions of a neural network

Figure 8: Probability on whether a low inci-dent will go out-of-time or not, given a pre-dicted resolution time using configuration A

(16)

Configuration E has the lowest error on the low priority incidents, thus the predicted reso-lution times resulting from this configuration are used to calculate the certainty rate on low priority incidents. The incidents are divided into bins over which the certainty on whether an incident will go out-of-time or not is calculated and represented in Figure 8. This data set also behaves as expected; the certainty of an incident going out-of-time or not is the lowest around the maximum SLA boundary and grows as the predicted resolution time gets further away from the maximum SLA resolution time. Other than the certainty rate on the low priority incidents in the full data set (Figure 7), the certainty rate on low priority incidents going out-of-time in this data set does get to a hundred percent as the predicted resolution time gets higher.

Figure 9: MAE for several configurations of a neural network predicting resolution time on low priority incidents.

Figure 10: MAE for the best and worst con-figurations of a neural network predicting res-olution time on low priority incidents.

5.3 Experiment on high priority incidents

Table 7 shows the results of running the configurations mentioned in section 4.4 on the test data set containing only high priority incidents. The maximum SLA resolution time on this data set is four working hours. The error the predicted resolution times is shown in Table 7. The error on this data set is the lowest of all three. The best configuration has an error of one hour and ten minutes. The MAE of every configuration is shown in Figure 12 and the MAE of the best and the worst performing configurations are highlighted in Figure 13.

(17)

Config. MAE in hours MSE A 4383.68 1.22 7.21e+08 B 4275.05 1.19 7.18e+08 C 7694.68 2.13 1.20e+09 D 4338.22 1.21 7.29e+08 E 4811.01 1.34 7.25e+08 F 4790.24 1.33 7.34e+08

Table 7: Results predicting resolution time on high priority incidents with several configura-tions of a neural network

Figure 11: Probability on whether a high in-cident will go out-of-time or not, given a pre-dicted resolution time using configuration B

Configuration B has the lowest error on the high priority incidents, thus the predicted resolution times resulting from this configuration are used to calculate the certainty rate on high priority incidents. The incidents are divided into bins over which the certainty on whether an incident will go out-of-time or not is calculated and represented in Figure 11. This data set also behaves as expected; the certainty of an incident going out-of-time or not is the lowest around the maximum SLA boundary and grows as the predicted resolution time gets further away from the maximum SLA resolution time. Comparable to the certainty rate on the high priority incidents in the full data set (Figure 7), the certainty rate on high priority incidents going out-of-time in this data set gets to a hundred percent as the predicted resolution time gets higher. The difference with the full data set is that in this data set the certainty rate rises faster and in a more fluent matter.

Figure 12: MAE for several configurations of a neural network predicting resolution time on high priority incidents.

Figure 13: MAE for the best and worst con-figurations of a neural network predicting res-olution time on high priority incidents.

(18)

6 Conclusion

This thesis examined how the incident resolution time prediction can be a predictor for an incident going out-of-time. To that end, the possibility of predicting incident resolution time and the possibility of using resolution time as a certainty rate for an incident going out-of-time needed to be explored. Experiments showed that with information obtained from ServiceNow, it is possible to predict the incident resolution time to a certain degree. The best results were obtained with a data set consisting of solely high priority incidents, with an MAE of 4275.05 (1.19 hours). The neural network performed better on data sets consisting of solely one type of priority, rather than containing incidents of every priority type. The conducted experiments also showed that the predicted resolution times can be used to provide a certainty rate for incidents going out-of-time. An organization can use this certainty rate to estimate workload more effectively and deploy employees more efficiently. This way out-of-time incidents will be detected sooner and can be resolved before the due date, which will decrease the number of fines and lower the costs. The performance on predicting the incident resolution time could be improved after further research, as there are many propositions for improvement.

7 Discussion and future work

The aim of the experiment was to examine if it is possible to predict the resolution time of incidents and if this resolution time can be used to conduct a certainty rate. It appears that with a relatively simple neural network a rough estimation of the resolution time can be made. The network appears to be working better on a data set containing only one priority type; the set containing only high priority incidents performed the best of all three, with an average MAE over all configurations of 5048.81 (1.40 hours). The best performing configuration on high priority incidents is configuration B (Table 7), with an MAE of 4275.05 (1.19 hours). Although this is the configuration with the lowest error, a difference of 1.19 hours between the predicted and the true resolution time is not a very good score, as the maximum SLA time for high priority is two hours. This leaves room for improvement.

One decision that was made during the encoding of the data was to only make a field for variables that occur fifty times of more. This means a vast amount of variables, that could further improve the prediction performance when included while training the model, were not taken into account. Other improvements while training could be made by varying the number of epochs, using a different loss function or different activation functions, or train with more data.

Upon exploration of the data, there appeared to be some abnormalities in the data set. In some cases, the registered resolution time did not correspond with whether incidents were registered as out-of-time or not out-of-time. Thus, if the incident had been registered as out-of-time, but the resolution time did not exceed the limit established in the SLA, or if the resolution time did exceed the SLA limit, but the incident was registered as not out-of-time, this incident was removed from the data set. Although hopefully all abnormalities have been removed using this method, the data could still have been corrupted, which could have had impact on the prediction performance.

The targets used for training on the data set containing all priorities consisted of business time and real time. It could be that this inconsistency in the targets has an impact on the

(19)

prediction performance. In future research a choice could be made to convert all targets into one consistent time.

Only categorical fields have been taken into account in this research, which means a vast amount of information has been lost. The open text fields, for example the description field, could store a lot of valuable information. Future research might use natural language processing to explore these fields.

For the calculation of the certainty rate the data was split into bins. In this thesis the bins were sorted by the resolution time, so the number of incidents per bin is different for each bin. Another option could be to allocate a specific amount of incidents per bin. This could alter the certainty rate.

To evaluate the certainty rate, an assumption in made on how this certainty would behave. The certainty rate shown in Figure 4 is used as an reference to evaluate the computed example rates. It is assumed that every priority behaves the same way, but it could also be possible the certainty rate used as reference should be different for each priority.

Acknowledgments

Throughout the writing of this thesis I have received a great deal of support and assistance. In the first place, I would like to thank my supervisor, Dr. Sander van Splunter, who was a great help during the whole process of writing this thesis.

I would also like to thank Ronald van der Veer and Monique Gerrets, my supervisors at ABN AMRO. They made me feel very welcome and part of the team, and helped to overcome every obstacle we faced while conducting this research.

References

Alpaydin, E. (2014). Introduction to machine learning (3rd ed.). Cambridge, MA: MIT Press.

Guan, L., Liu, W., Yin, X., & Zhang, L. (2010). Traffic incident duration prediction based on artificial neural network. In 2010 international conference on intelligent computation technology and automation (Vol. 3, pp. 1076–1079).

Knigge, D. M. (2019). Event correlatio and dependency-graph analysis to support root cause analysis in itsm environmets. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

Marilly, E., Martinot, O., Papini, H., & Goderis, D. (2002). Service level agreements: a main challenge for next generation networks. In 2nd european conference on universal multiservice networks. ecumn’2001 (cat. no. 02ex563) (pp. 297–304).

McKinney, W. (2011). Pandas: a foundational python library for data analysis and statistics. Python for High Performance and Scientific Computing , 1–9.

Niewenhuis, D. (2019). Preprocessing data by feature selection using weight of evidence and XGBoost. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

(20)

Rumelhart, D. E., Hinton, G. E., Williams, R. J., et al. (1988). Learning representations by back-propagating errors. Cognitive modeling , 5 (3), 1.

Rumelhart, D. E., Widrow, B., & Lehr, M. A. (1994). The basic ideas in neural networks. Communications of the ACM , 37 (3), 87–93.

Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural networks, 61 , 85–117.

Ten Kaate, P. (2018). Automatic detection, diagnosis and mitigation of incidents in multi-system environments. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

Valenti, G., Lelli, M., & Cucina, D. (2010). A comparative study of models for the incident duration prediction. European Transport Research Review , 2 (2), 103–111.

Velez Vasquez, M. A. (2019). Predicting causal relations between itsm incidents and changes. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

Wang, W., Chen, H., & Bell, M. C. (2005). Vehicle breakdown duration modelling. Journal of Transportation and Statistics, 8 (1), 75–84.

Wei, C.-H., & Lee, Y. (2007). Sequential forecast of incident duration using artificial neural network models. Accident Analysis & Prevention, 39 (5), 944–954.

Wiggerman, M. G. (2019). Predicting the first assignment group for a smooth incident resolution process. Bachelor Thesis Artificial Intelligence, University of Amsterdam.

List of Tables

1 Size of all three train-test data sets, where All prio stands for the data set containing incidents of all four priority types and Low prio and High prio respectively stand for the data sets containing only low priority and high priority incidents. . . 8 2 Configurations neural network predicting resolution time on incidents of every

type of priority . . . 11 3 Configurations neural network predicting resolution time on low priority

in-cidents . . . 11 4 Configurations neural network predicting resolution time on high priority

in-cidents . . . 12 5 Results predicting resolution time on incidents from all priorities with several

configurations of a neural network . . . 12 6 Results predicting resolution time on low priority incidents with several

con-figurations of a neural network . . . 14 7 Results predicting resolution time on high priority incidents with several

(21)

Appendices

A

Training columns

The following list of columns was included in the training/test data set:

columns = [u business value, cmdb ci, caller id, caused by, contact type,

u control group, u controlled by, sys created by, u dv bs contract domain, u euc list, u english language, u ci environment, u existing knowledge article, u external initiated, u it business service, u it business service status, u it product, u it product status, u bs contract domain, impact, u affected item, u initial contract domain, u input method, u knowledge article, u number users impacted, opened by, u open group,

u open provider service, u opened in incident window, priority, u reasoncode, u sla breached, u security related, urgency, u vip]

Number of columns = 35

B

Targets training set and test set per priority

Figure 14, Figure 15 and Figure 16 respectively show the targets of the training set and test set of incidents of all priorities, low priority incidents and high priority incidents.

Figure 14: Targets training set and test set, all priorities.

(22)

Using predicted incident resolution time as a predictor for an incident going out-of-time