• No results found

In this chapter, the validation and verification of the results is described. The goal of this phase is to test whether the models for performance meet the requirements stated in previous phases, and to test whether the tool supports BTS consultants in evaluating the performance of P2P process execution paths qualitatively and consistent. The realized framework is applied to a new data set (P2P data from a company that has not been used in the analysis phase) and tested for validity (i.e. does the performance on the dimensions as shown by the framework correspond with the assessment consultants give of the dimensions), and a verification was done to test to what extent the framework matches the requirements.

The conceptual framework was also tested, by interviews and against the requirements. Based on these tests, adjustments were made to the framework. These adjustments are described in detail, and the models that were selected based on this phase are presented.

Validation of the framework

In order to find out whether the framework is capable of giving a valid performance score on all dimensions, a new data set was assessed by two consultants who both took part in identifying the list with performance indicators (one during the brainstorm, one by checking the list for completeness), and also have experience with that particular company. They received exactly the same survey as the one that was created for the first companies, and were asked to rank the performance of the five variants.

The performance values that they assigned to the dimensions were then tested to see whether they fit into the 95% confidence intervals of the three models that were selected for each dimension in section 4.2.5. For each model, the mean absolute prediction error (MAPE) was calculated as well, to test which model has the smallest deviation between predicted performance and performance according to the validation surveys. Table 10 shows the results of these tests. For all dimensions, at least one model has a MAPE lower than two, indicating that the average prediction according to that model is less than 2 points from the value that was assigned by the consultants in the validation phase. As the rating takes place on a scale from 1 to 10, this deviation is quite big. Especially for the flexibility dimension, the other models have a MAPE that is a lot wider than the 1-10 range on which respondents were asked to rate the process variants.

As generalizability of the framework is an essential part of this research, it is more important that a model is capable of predicting the performance of a process outside the data sets that were used to generate the models than that it has a slightly higher adjusted R2 (since this value only indicates the fit of the model on the data set that was analyzed in the first phase). Therefore, the model with the smallest MAPE was selected as the most valid model based on the validation phase. For time, quality and flexibility model 2 is selected, for cost model 3. These models are marked bold in table 10.

Appendix K depicts the upper- and lower bounds for the 95% confidence intervals for each dimension, the expected performance predicted by all models, as well as the values for performance coming from the validation-surveys.

Table 10: Validation data

Time Cost Quality Flexibility

MAPE

37 6.1.1 Conclusion

This section has shown that the models that were created for each dimension are capable of predicting performance for a dimension, which proves the external validity. Because this validation consists of 10 observations, the expected number of observations outside the confidence interval is 0,5 observation, so both null or one observation outside the confidence interval meet the expectation, translating to 90%

or 100% of observations within the confidence interval. The selected models all have 90% or 100% of cases within the confidence interval, so they meet the expectation. Since the 95% confidence intervals are quite wide, the MAPE was used as an extra prediction-accuracy measure, and this showed that the average prediction by the selected models is less than two points (on a 1-10 scale) from the assessed value. Model 3 for the cost dimension is the most accurate predictor of all models, with a MAPE of only 1,23.

By adding more data to the analysis, confidence intervals will narrow down and the prediction error is expected to be smaller, i.e. the models are expected to be better predictors of performance. This is discussed in more detail in section 7.4 (further research).

Verification of the framework

The verification of the framework was done based on the requirements that were stated in section 5.1 to find whether the framework that was designed can actually be used by BTS consultants to assess performance. The requirements are stated below, with per requirement an indication of whether it was met or not.

In order to find out whether the framework is usable and helps consultants in assessing process performance, the consultants that completed the validation-surveys measured the time it took to do so.

Both consultants stated that they used approximately 20 minutes to assess the five variants. While using the framework, a consultant is able to assess the performance of a number of process variants by analyzing the visualized quadrangles and by looking at the significant performance indicators, within a matter of minutes.

The following conservative assumption is made: the time used to complete a survey equals the time a consultant needs to assess the execution variants in the ‘traditional’ way. This assumption is justified by taking the following three aspects into account: first, the fact that when a process is analyzed using process mining in the traditional way, generally more than five process variants are evaluated. Next to that, performance indicators have to be looked up, or made up, and programmed into a dashboard.

Finally, as became apparent during the brainstorm sessions, no consultant used all of the 43 identified performance indicators to assess performance but used his own subset of performance measures, so this aspect would actually enable him to assess performance faster as soon as he has created a dashboard with relevant performance indicators.

The usability of the significant performance indicators per dimension, the Celonis dashboard and Excel-tool were confirmed by multiple consultants as being helpful in assessing performance and creating insight into what performance indicators are truly important. Next to that, the Devil’s quadrangle was repeatedly confirmed as being an excellent tool to assess process performance, especially linked to process mining, by SAP consultants, Celonis data scientists and even Celonis’ CEO. Especially the visualization of the quadrangle, that provides a multidimensional view on performance and includes the interaction between the different dimensions is said to be an extremely useful improvement.

38

The following list presents a clear overview of the requirements with a statement of whether they were met, and an explanation motivating this conclusion:

 The framework should show the ideal shape of the Devil’s quadrangle. Since the Excel-tool that was designed shows the ideal quadrangle, this requirement is met.

 For each dimension, a list with performance indicators that are significant predictors for performance has to be presented. This list is present in the framework (and can be seen both in the Celonis-dashboard and the Excel-tool), the framework satisfies this requirement.

 For each dimension, a model that calculates performance on each dimension (so the Beta-values for the significant performance indicators) have to be shown. The formulas presented show the Betas for the performance indicators, and they are integrated in the Excel-tool. During the validation it became apparent that adding the formulas for performance into the tool would improve the usability, therefore the formulas were added and, after this step, this requirement is met.

 The framework has to show the shape of the Devil’s quadrangle for each variant. The Excel-tool shows the quadrangle automatically based on the performance per dimension, so this requirement is met.

Veracity and validity of the framework

During the validation, for each dimension a model with an MAPE of less than 2.0 was found. As stated before this is still quite a large value regarding the 1-10 range that was used to assess performance. The models with the lowest MAPE per dimension all have at least 90% of the predicted values in the 95%

confidence interval. Since there are only ten observations analyzed per model, the expected number of observations outside the confidence interval would be 0,5; so zero or one observation outside the interval both meet the expected value. Since the models all have a sufficient internal consistency, as described in section 4.2.5, the models with the lowest MAPE are chosen as the models best suited to predict performance.

The models in formula 5 through 8 represent the model with the highest external validity, and are the preferred models based on this phase. They are also the final models from this research. Table 11 gives an overview of the significant performance indicators according to these models, as well as the abbreviations used in the formulae.

An overview of the updated Excel-tool is given in figure 13, the updated Celonis-dashboard is shown in figure 14.

39

Figure 13: Updated Excel tool

Figure 14: Updates Celonis dashboard

Ideal shape Variant 1 Variant 2 Variant 3 Variant 4 Variant 5

Cost 8,6 10,5 7,1 8,4 10,1

LOG(Users per €bln spent) 1,00 0,50 100 15 1

Time 5,9 8,3 10,0 9,7 9,8

Duration (days) 54,00 44,00 18 22 17,44

Quality 4,0 7,6 5,0 9,0 10,0

% payments done late 18% 1% 12% 30% 99%

% unautomated activities 90% 10% 70% 25% 10%

Avg spend per supplier 2.469,06 900,00 55,74 465,85 526,64

Vendor delivery performance 18% 30% 45% 97% 55%

Flexibility 2,5 9,2 2,8 9,1 11,3

Case coverage 7% 5% 3% 2% 1%

% of order types 51% 13% 3% 20% 15%

Other

Relative percentage of PO value in this variant 27% 32% 6% 14% 21%

Total number of execution variants 2514 2514 2514 2514 2514

Average PO value (€) 8.514,00 10.000,00 1.858,00 4.235,00 6.583,00

Goods Receipt' activity present? 1 1 1 1 1

‘Create PR’ activity present? 1 1 1 1 0

Formula's for performance:

Performance evaluation for P2P processes

Quality

Time Flexibility Cost

40

Table 11: Significant performance indicators in the validated models

Dim. Performance indicator Abbreviation (used in formulas below)

Time Duration (days) E2E

Cost # of users per € bln spent8 Users_BLN

Quality

% payment done late (vs contract conditions) Paym_Late

% of manual executed activities Manual

Vendor timely delivery performance Vend_Perf

Average spend per supplier Avg_Sup

Flexibility % cases handled8 Cases

Other

Relative percentage of PO value in this variant Rel_PO

Total number of execution variants VAR

‘Goods receipt’ activity present? Goods

‘Create PR’ activity present? Cr_PR

𝐸(𝑇𝑖𝑚𝑒) = 10.015 − 0.058 ∗ 𝐸2𝐸 − 0.0004 ∗ 𝑉𝐴𝑅 (Formula 5)

𝐸(𝐶𝑜𝑠𝑡) = 10.100 − 1.476 ∗ log(𝑈𝑠𝑒𝑟𝑠_𝐵𝐿𝑁) (Formula 6)

𝐸(𝑄𝑢𝑎𝑙𝑖𝑡𝑦) = 6,770 − 3,263𝐸−09∗ 𝐴𝑣𝑔_𝑆𝑢𝑝 + 2,158 ∗ 𝑃𝑎𝑦𝑚_𝐿𝑎𝑡𝑒 + 1,934 ∗ 𝑉𝑒𝑛𝑑_𝑃𝑒𝑟𝑓 − 3,971 ∗ 𝑀𝑎𝑛𝑢𝑎𝑙 (Formula 7) 𝐸(𝐹𝑙𝑒𝑥𝑖𝑏𝑖𝑙𝑖𝑡𝑦) = 4,497 − 23,334 ∗ 𝐶𝑎𝑠𝑒𝑠2+ 4.054 ∗ 𝑅𝑒𝑙_𝑃𝑂 − 1,351 ∗ 𝐶𝑟𝑃𝑅− 1,656 ∗ 𝐺𝑜𝑜𝑑𝑠 (Formula 8)

Conclusion

During the validation phase, some small adjustments had to be made to let the P2P framework meet the requirements. The conceptual framework met all requirements, and remained unchanged. This indicates that the designed solution helps BTS consultants in assessing performance of processes. Furthermore, the fact that I was able to create models that are accurate enough to place 90% or 100% of the validation-cases in the 95% confidence interval, and also have a relatively low MAPE, shows that the framework not only meets the requirements regarding usability, but is also able to give a valid performance assessment.

6.4.1 Limitations

As stated before, the confidence intervals for the various models are quite wide. Also, the MAPE is relatively high. So, although the models can significantly predict performance, there is room for improvement. The most straight forward way to improve accuracy and narrow down the confidence intervals is by assessing more data.

After the framework was updated, based on the validation, it could not be validated again, as no more data was available and the time did not allow extra data collection. As described before, the restricted amount of available data is a limitation in this research, since a single additional data set could again change the model that is best, as happened in this chapter.

8 To calculate the performance of a dimension, this performance indicator has to undergo a mathematical transformation.

The formula used to calculate performance shows the specific transformation.

41