• No results found

3.5 Production Step

Finally, it is the right time to apply predictions on the on–going business process cases. In this regard, the same stages in the data preparation and the modeling steps are implemented with two differences: a) Incomplete cases are selected for the predictions of on–going cases. b) There is no train, validation, and test set in this step. Instead, the suitable classifiers per process step are picked from the modeling step and are applied on the on–going cases based on their case–lengths. In the end, final results are sent to Celonis software as shown in Table 3.3.

Case ID Prediction

("Change Price" occurrence)

106883 Yes

344371 No

576623 No

754202 Yes

... ...

... ...

Table 3.3. Results of predictions on ongoing cases.

In this chapter, the applied methodology in this study was elaborated. In the next chapter, the obtained empirical results by applying this methodology are explained.

Chapter 4

Results

“The truth, however ugly in itself, is always curious and beautiful to seekers after it.”

Agatha Christie, The Murder of Roger Ackroyd (1923) An Amazon© Web Services instance is used as the computational infrastructure in this study (see Table 4.1). Programming libraries which are used with Python 3.6 and Celonis 4.3 are listed in Table 4.2.

Operating System CPU RAM Storage

Windows Server 2012 Intel(R) Xeon(R) CPU E5–2686 v4 2.30 GHz 122 GB 235 GB Table 4.1. The hardware configuration used in this study.

Python Libraries Version

Anaconda 5.0.1

Seaborn 0.8.1

Graphviz 0.8.2

Scikit-learn 0.19.1

LightGBM 2.1.0

Mlxtend 0.11.0

Table 4.2. The utilized Python libraries in this study.

Exploratory Data Analysis helps us to understand the dataset better by visualizing it. For example, Table 4.3 shows statistics of the preprocessed dataset at each process step. As shown in Table 4.3, this dataset is imbalanced and the ratio of non–change price to ‘Change Price’ class is less than 0.1 in the all process steps.

Apparently, this ratio decreases until step 15 where the ratio reaches 0.0 and the number of instances goes below 1000. Finally, a heat map which shows correlations among different features and target labels is drawn in Figure 4.1. This map shows that correlations between the features and the target labels are not significant.

CHAPTER 4. RESULTS

Process step Cases “Change Price” non–“Change Price” Ratio

1 52261 4109 48152 0.0853

2 52261 3994 48267 0.0827

3 52218 2173 50045 0.0434

4 51557 980 50577 0.019

5 51229 688 50541 0.0136

6 50793 444 50349 0.0088

7 47045 275 46770 0.0058

8 32750 145 32605 0.0044

9 14801 51 14750 0.0034

10 6451 24 6427 0.0037

11 3291 12 3279 0.0036

12 1825 7 1818 0.0038

13 1046 3 1043 0.0028

14 567 1 566 0.0017

15 281 0 281 0.0000

16 114 0 114 0.0000

Table 4.3. The ratios of the target class over different process steps in the dataset.

36

Figure4.1.Aheatmapshowingcorrelationsamongthefeaturesandthetarget labels.

CHAPTER 4. RESULTS After applying the modeling step, the results of different classifiers from process step (1) to (9) are stored. There is no classifier for process steps ranged from 10–16 due to lack of sufficient instances from the target class in these process steps. Statistics of training sets up to process step (9) are shown below:

Statistics Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9 Training set size 41,808 41,808 41,774 41,245 40,983 40,634 37,636 26,200 11,840 Number of the features 3,363 6,700 10,124 13,602 17,098 20,549 24,099 28,232 33,027

Table 4.4. Statistics of the training sets up to process step (9).

Table 4.4 shows that over time, the training set size decreases, while the number of its features increases. Finally, the results of different classifiers over process steps in the training sets are shown below:

Classier Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9

Dummy Classifier 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

Naive Bayes 0.9075 0.9076 0.8723 0.8737 0.8742 0.8565 0.8611 0.8847 0.8676 Decision Tree 0.9051 0.9085 0.8939 0.8603 0.8579 0.9029 0.8403 0.9254 0.9328 Random Forest 0.9069 0.9123 0.8816 0.8880 0.9225 0.9394 0.9665 0.9920 0.9979 LGBM 0.9596 0.9679 0.9795 0.9646 0.9860 0.9960 0.9994 0.9999 0.9998 Stacked Classifier 0.9596 0.9692 0.9801 0.9638 0.9853 0.9957 0.9994 0.9999 0.9997

Table 4.5. The results of different classifiers over different process steps on the training sets.

Figure 4.2. The AUC ROC scores over different process steps on the training sets.

Figure 4.2 shows that the LGBM and the stacked classifiers fit the training sets better compared to other classifiers. Surprisingly, the random forest gets closer to

38

the LGBM and the stacked classifier after process step (4). The results of different classifiers over the process steps on the test sets are shown below:

Classier Step 1 Step 2 Step 3 Step 4 Step 5 Step 6 Step 7 Step 8 Step 9

Dummy Classifier 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

Naive Bayes 0.8924 0.9003 0.8491 0.8478 0.8702 0.8151 0.7939 0.7893 0.6474 Decision Tree 0.8832 0.8816 0.8442 0.8244 0.8080 0.7596 0.8010 0.7942 0.8946 Random Forest 0.8853 0.8965 0.8448 0.8563 0.8791 0.8433 0.8393 0.8779 0.7854 LGBM 0.9207 0.9248 0.9076 0.8878 0.8915 0.8681 0.8859 0.9053 0.7891 Stacked Classifier 0.9201 0.9211 0.8991 0.8702 0.8570 0.8132 0.8631 0.8955 0.7908

Table 4.6. The results of different classifiers over different process steps on the test sets.

Figure 4.3. The AUC ROC scores over different process steps on the test sets.

Figure 4.3 shows that the LGBM outperforms other classifiers on the test sets in almost all process steps. Decreasing of the AUC ROC score on the test set over time is an interesting pattern. This pattern could be due to a decrease in both the test set size and the target class ratio.

CHAPTER 4. RESULTS

For each classifier, two plots are drawn as explained below:

1. The TPR and the FPR over different threshold values:

This plot shows the TPR and the FPR over different threshold values for a given classifier on both the training and the test sets. This helps us to choose the cut–off threshold value for a classifier. As an example, the plot for the LGBM at process step (5) in the training set is shown in Figure 4.4. This Figure shows that by increasing the cut–off threshold value, both the TPR and the FPR decrease.

Figure 4.4. The TPR and The FPR over different thresholds of LGBM at process step (5) on the training set.

40

2. The AUC ROC Plot:

This plot shows the area under the ROC curve. For example, the AUC ROC of the decision tree algorithm at process step (3) in the test set is shown in Figure 4.5.

Figure 4.5. The AUC ROC plot of the decision tree at process step (3) on the test set.

Training of the classifiers over all process steps took 19 hours and 56 minutes on the IT infrastructure explained in ??. During the training, the execution time of each classifier was logged. The obtained results are shown in Figure 4.6. This figure shows that the decision trees spend huge computational time until step (6), as the number of features grows at each process step, e.g. process step (6) uses all features from process steps (1) to (6) (see Table 4.4). The execution time is reduced

CHAPTER 4. RESULTS after process step (6) due to the decrease in the number of instances in the dataset (see Table 4.4). The time logged for the stacked classifier includes the time needed to train both its base–learners and the meta–classifier. That is why the stacked classifier has a peak at time step (3) which is because the decision tree classifier is one of the top two classifiers the stacked classifier is using in this process step which is not the case in other process steps. It should be noted that Naive Bayes and Dummy classifiers are among the fastest classifiers.

Figure 4.6. The training execution time of different classifiers over all process steps.

In this chapter, obtained empirical results were shown. In the next chapter, final discussions are presented.

42

Chapter 5

Discussion

“There are questions that you don’t ask because you’re afraid of the answers to them.”

—Agatha Christie, The Moving Finger (1942) In this study, we faced different challenges as any other project. When it comes to do the self–evaluation part, numerous trial and errors were performed with the most time spent on the data preparation step. This study only focused on non–neural network machine learning classifiers with an intensive feature engineering. In order to accomplish this task, there is a strong demand for the business domain knowledge.

To avoid this tedious step, some experts prefer using more complex algorithms like deep neural networks to capture the interactive relations among features by the algorithm automatically. These techniques are black–boxes not easy to interpret.

But with considering an intensive feature engineering, we have generated acceptable and interpretable results from the industrial partner’s perspective in the end. This study can be easily applied to other similar tasks listed in the next chapter. In the rest of this chapter, we discuss different aspects of the obtained results in the previous chapter in more details.

5.1 The Cut–off threshold Value

When it comes to the cut–off threshold value, we should state that the best possible method to choose it is applying the financial cost function approach over the con-fusion matrix (see in Section 2.4). This approach clearly satisfies business needs;

while it requires the business domain knowledge to define it. That is the reason to not apply this method in this study due to lack of information from the customer side.