Future work - Eindhoven University of Technology MASTER Discovery and analysis of field service

in the next section.

6.2 Future work

Currently, there are several limitations which cannot be handled properly and re-quire future work.

• Dealing with parallelism and multi-choice: One of the limitations of the cur-rent approach is that, curcur-rently, the replay algorithm used for performance measurement does not cover properly the situation of parallel activities while mapping the sequence of activities in an event log to a series of nodes in a ow path in a graph. The main reason for this limitation is caused by the fact that the model used has no execution semantics, and the ow paths are considered in isolation when compared to the base patterns. For any model that has rich/exible semantics, such as Petri net or heuristics net, it is possible to get the complete execution traces from the model.

Consider the trace t = abcders and the process map depicted in Figure 6.1.

Let us assume that the set of base patterns starting with the current activity a is {abcde, abc, ar}. The chosen base pattern at the current position of activity a equals to abcde, that has a continuous manifestation in the trace.

The set of possible ow paths in the graph of the sub-process of the possible abstraction X is {abe, ace, ade}. Currently, during replay, only one ow path will be considered for mapping the activities in the base pattern, being either abe, ace or ade. For example, if the chosen ow path is abe the complete sequence is mapped with activity X, while the rest of activities in the base pattern (i.e., c and d) will be mapped at further iterations.

Figure 6.1: Process map example

However, this situation can be overcome using the concept of shortest common supersequence, that nds the shortest sequence common to all sequences in a set of sequences. The solution is not to enumerate/generate all possible short-est common supersequences, but to check whether the current base pattern forms a shortest common supersequence of the ow paths.

For the running example, the solution is to identify if the current base pattern abcde forms a shortest common supersequence of the set of ow paths in the graph {abe, ace, ade}, which is indeed the case.

Figure 6.2 depicts an example of discovery of shortest common supersequences using trace alignment. For the set of possible ow paths in the running exam-ple ({abe, ace, ade}) we exemplify the possible alignment and discover that the corresponding base pattern forms a shortest common supersequences for the ow paths.

Figure 6.2: Example of discovery of shortest common supersequence using trace alignment In this way the parallelism of activities b,c and d in our example, will be cov-ered during the mapping of activities.

Similar to the parallelism situation, the replay algorithm has limitations on handling multi-choice while mapping a sequences of activities in the event logs with the ow paths in a graph. When comparing the base patterns to nd a corresponding ow path, the paths are considered in isolation, and multi-choice constructs (i.e., OR) are not identied in the graph in order to properly map the activities in the base pattern.

Figure 6.3: Process map example

For example, consider trace t = abcdef and the process map depicted in

Fig-6.2. Future work 89 ure 6.3. Let us assume that the set of base patterns starting with the current activity a is {abcd, abd}. The chosen base pattern is abcd. The set of pos-sible ow paths is {abc, abd}. Currently, only one of the ow paths will be considered for mapping the activities in the base pattern, i.e. the multi-choice construct in the graph is not identied such that the sequence abcd can be properly mapped. For example, if the chosen ow path is abc, activities a, b, and c in the base pattern abcd will be mapped to X, while activity d will be mapped at the next step.

There are multiple ways to overcome this situation. One of the solutions is to use the already existing information provided by the Fuzzy miner on a hierar-chical process model: the edge (ow) signicance can be used to identify the semantics of a certain construct.

For example, in the graph of the process map presented in Figure 6.3 we have two ows: (a → b) and (a → c). If the signicance of the rst ow is 0.6 and the signicance of the second ow is 0.9 we can identify a multi-choice construct and consider both ow paths abc and abd for the mapping.

In addition to considering the ow signicance metrics, we can use again the concept of shortest common supersequence over the set of ow paths. Instead of checking if the base pattern can be a shortest common supersequence, we can check for an approximate manifestation of the base pattern in the set of the shortest common supersequences over the set of ow paths.

Another solution for handling parallelism and multi-choice is to construct the execution semantics based on discovery of frequent episodes/correlated events in the event log [18], [4]. Episodes are a collection of events that occur within time intervals of a given size in a partial order. They can also be described as directed acyclic graphs and can be both serial or parallel as presented in detail in [18]. If we make the assumption that the event log and the process map used for replay are strongly connected, we can identify the execution semantics based on the discovery of episodes in the event log. Thus, when choosing the

ow paths for mapping we should take into account the execution semantics and not consider the ow paths in isolation.

• Dealing with loops: Another limitation of the current performance measure-ment approach is that it does not cover the situation of loops in the sub-process of a process map. For example, let us consider trace t = abcbcdrs and the corresponding process map depicted in Figure 6.4. Assume that the set of base patterns starting with the current activity is {abcd, abc, ars}. The chosen base pattern at the current position of activity a equals to abcd, that has an intermittent manifestation in the trace. The set of possible ow paths in the graph of the sub-process of the possible abstraction M is {abcd}. Currently, during replay, the loop bc is not identied in the graph of the sub-process of the possible abstraction M. Since the base pattern abcd has an intermittent manifestation, the trace will be adjusted to abcdbcrs and activities in the

Figure 6.4: Process map example

This can be overcome by rst discovering cycles in the graph [14] of the sub-processes. During replay, at the current position in the trace, we look further in the trace to identify tandem arrays [6] corresponding to the graph cycles.

For the running example, while identifying the set of ow paths in the sub-process of the possible abstraction, we also detect cycles in the graph of the sub-process. At this step we will identify the cycle in the sub-process of ab-straction M and the corresponding activities bc. Now at the current position in the trace we look further for tandem arrays and we identify the tandem array bc. Before proceeding with the disambiguation, we rst map the se-quence bcbc with a temporary activity X. Further on, the trace will become aXdrs and we will proceed with the disambiguation. Eventually, for the ac-tivities mapping to the temporary abstraction X we update performance. In this way, the loop construct is handled properly.

• Dealing with discrepancies in abstraction levels: Another limitations of the current performance measurement approach is that the update of control ow in the sub-process of an abstract node is not done properly if the nodes be-longing to the control ow are at dierent hierarchical levels in the process map. For the example process map depicted in Figure 6.5, let us consider the trace t = abcdef. Let us also assume that the set of base patterns starting with the current activity a, is {abd, abc, ay}. The chosen base pattern at the current position of activity a is abc, that has a continuous manifestation in the trace. The set of possible ow paths in the graph of the sub-process of the possible abstraction X is {aM}. Currently, during replay, at rst iteration activity a will be mapped to X and activity M will be considered missing; also the ow (a,M) will be considered skipped.

This issue can be solved in the following way: when parsing a trace dur-ing replay, at each iteration we consider for mappdur-ing only the activities that map to nodes at the same level in the process map as the level of the trace.

For a better understanding, we dene the level of the trace based on the level of the process map (as the level of the log dened in Section 3.4), as being the deepest possible level in the hierarchy at which an activity in the trace is encountered in the process map. Considering the running example, with the suggested improvement of the replay algorithm, rst we will determine the level of the trace: the corresponding node mapping to activity a is at level 1

6.2. Future work 91 in the process map, and the corresponding nodes mapping to activities b and c are at level 0 in a process map. Thus, the level of the trace is 0. In the rst iteration of replay all activities in the trace at level 0 will be mapped to the possible abstractions, i.e., b and c will be mapped to M. At the next iteration all activities in this trace (i.e., the abstract trace aM, at level 1) are at the same level and the control ows can be updated properly, i.e., a and M will be mapped to X and the ow (a,M) will be updated.

For choosing the level of the trace in the case when it exists an activity in the event log that can map to more nodes in the process map that are at dif-ferent hierarchical levels, before proceeding to the disambiguation phase, we will rst choose the best possible level, based on the context of the event log.

For this we can consider the base pattern of the current activity and choose the possible node (thus the corresponding level) based on the graph path con-taining this node that best ts the base pattern. This is a similar approach as Step 24 of the replay algorithm 1.

Figure 6.5: Process map example

Besides the above mentioned limitations of the replay algorithm, there are several points that still need to be addressed. The computational complexity analysis of the replay algorithm is not yet explored. Moreover, the empirical evaluation of the performance analysis plugin using real life event logs is left for future work.

Complex event logs with variations in both the number of events/traces and the manifestation of process model constructs have to be used for generating complex hierarchical process models and also for the replay of event logs onto process maps.

Denition A.0.1 (Event Logs)

An event log L is dened as: L = (Σ, (T , f), E, time), where:

Σ denotes the set of distinct activities/event-classes E is the set of all events in the log

time : E → R⁺0 is a function assigning a timestamp to events e =time(e) the time of an event e

T is the set of traces

f : T → N^≥1 denotes the number of occurrences of a trace t Denition A.0.2 (Trace)

A trace t is a nite sequence of events t ∈ E^∗ such that each event appears only once and time is non-decreasing, i.e., for 1 ≤ i ≤ j ≤ |t| and t(i) 6= t(j).

Moreover,

t(i) denotes the name of i^th activity in t (unless specied otherwise) t(i, j), i < j denotes the subsequence from the i^th position to the j^th

position in the trace

mapped : E →N is a function mapping an event to a node in the process map (it is obtained after replay of log onto process map)

Let L = (Σ, (T , f), E, time) be an event log. Moreover, let PM be a process map and N is the set of nodes in the map.

A.1 Process KPIs

Process KPIs refer either to performance metrics which are measured at the level of process instances (i.e., traces) or to performance metrics which are measured on a process level and can be computed solely from event logs without a process model.

• Performance metrics computed from event logs without a process model:

1. Number of traces

The total number of traces in L: P

t∈T

f (t) 92

A.2. Node KPIs 93 2. Number of unique traces

The total number of unique traces in L: |T | 3. Arrival rate of traces

The number of traces that arrive per unit time in L:_P

t∈T

f (t) maxe∈E(e)−min_e0∈E(e⁰)

• Performance metrics computed from event logs with a process model:

1. Activities in the model but not in the event log SPM =N\ Σ

2. Activities in the event log but not in the process model SL= Σ \N

A.2 Node KPIs

Let t(i, j), i ≤ j be a sequence of activities, such that ∀k, i ≤ k ≤ j, mapped(t(k)) = n, n ∈ N. Let TI be the set of all sequences t(i, j) in L that are mapped to n.

1. Execution time of a node

The execution time is computed for abstract activities only. The execution time of an instance t(i, j) mapped to n: exectime(n) = t(j) - t(i).

For this KPI several statistics are calculated:

• The average execution time of all instances of node n in L:

t(i,j)∈TIt(j)−t(i)

|TI|

• The minimum execution time of all instances of node n in L:

min_t(i,j)∈TIt(j) − t(i)

• The maximum execution time of all instances of node n in L:

max_t(i,j)∈TIt(j) − t(i) 2. Number of executions

The number of times all instances of node n are executed in L: |TI|

3. Number of times a node is skipped

Let Ns be the set of ow paths pn = (v,v⁰,[e1...ek]) that have a skipped node n. The number of times node n is skipped is |Ns|.

4. Activity initialization frequency

The total number of traces in L that start with an instance of node n:

t∈Initⁿ

f (t), where Initn = {t ∈ T |mapped(t(0)) = n} is the set of traces that start with n;

5. Activity termination frequency

The total number of traces in L that end with an instance of node n: P

t∈Termn

f (t), where Termn = {t ∈ T |mapped(t(|t| − 1)) = n} is the set of traces that end with n;

mapped to the source node n and t(j) is mapped to the target node n⁰. For this KPI several statistics value are calculated:

• The average execution time of all instances of an edge f :

P(t(i),t(j))∈MTt(j)−t(i)

|MT|

• The minimum execution time of all instances of an edge f : min₍t(i),t(j))∈MTt(j) − t(i)

• The maximum execution time of all instances of an edge f : max₍t(i),t(j))∈MTt(j) − t(i)

where an instance of an edge refers to the situation in which control is passed from the source node to the target node of the edge, i.e. the activity corre-sponding to the source node and the activity correcorre-sponding to the target node are consecutively executed in L.

2. Number of executions

The number of times edge f is being executed, i.e. control is passed from the source node to the target node of the edge: |MT|

3. Number of times an edge is missing

Let Em be the set of ow paths pn = (v,v⁰,[e1...ek]) that have a missing edge e. The number of times edge e is missing is |Em|.

Bibliography

[1] ProM framework. http://prom.win.tue.nl/tools/prom/.

[2] ProM6 framework. http://prom.win.tue.nl/tools/prom6/.

[3] A. Adriansyah. Performance Analysis of Business Processes from Event Logs and Given Process Models. Master's thesis, Eindhoven University of Technol-ogy, Eindhoven, 2009.

[4] R. Agrawal and R. Srikant. Mining Sequential Patterns. In Data Engineering, 1995. Proceedings of the Eleventh International Conference on, pages 314.

IEEE, 2002.

[5] Christian Blum, Carlos Cotta, Antonio J. Fernandez, and Jose E. Gallardo. A Probabilistic Beam Search Approach to the Shortest Common Supersequence Problem. In EvoCOP, pages 3647, 2007.

[6] R.P. Jagadeesh Chandra Bose and Wil M. P. van der Aalst. Abstractions in Process Mining: A Taxonomy of Patterns. In BPM, pages 159175, 2009.

[7] R.P. Jagadeesh Chandra Bose and Wil M. P. van der Aalst. Trace Alignment in Process Mining: Opportunities for Process Diagnostics. In BPM, pages 227

242, 2010.

[8] M. Bozkaya. Business Process Analysis with Semantic Dotted Chart. Master's thesis, Eindhoven University of Technology, Eindhoven, 2009.

[9] B. F. van Dongen. Process Mining and Verication. PhD thesis, Eindhoven University of Technology, Eindhoven, 2007.

[10] B. F. van Dongen and A. Adriansyah. Process Mining: Fuzzy Clustering and Performance Visualization. In BPM (Workshops), pages 158169, 2009.

[11] Christian W. Günther. Process Mining in Flexible Environments. PhD thesis, Eindhoven University of Technology, Eindhoven, 2009.

[12] Christian W. Günther and Wil M. P. van der Aalst. Fuzzy Mining - Adaptive Process Simplication Based on Multi-perspective Metrics. In BPM, pages 328343, 2007.

[13] P.T.G. Hornix. Performance Analysis of Business Processes through Process Mining. Master's thesis, Eindhoven University of Technology, Eindhoven, 2007.

[14] A. Kamil. Graph Algorithms, 2003.

[15] Jirí Kubalík. Ecient stochastic local search algorithm for solving the shortest common supersequence problem. In GECCO, pages 249256, 2010.

[16] J Li, R.P. Jagadeesh Chandra Bose, and Wil M. P. van der Aalst. Mining Context - Dependent and Interactive Business Process Maps using Execution Patterns. In BPM (Workshops), 2010.

[20] M. Song and Wil M. P. van der Aalst. Supporting Process Mining by Showing Events at a Glance. In 17th Annual Workshop on Information Technologies and Systems (WITS), pages 139145, 2007.

[21] K.J.F.R. van Uden. Extracting User Proles with Process Mining at Philips Medical Systems. Master's thesis, Eindhoven University of Technology, Eind-hoven, 2008.

[22] Wil M. P. van der Aalst, H. T. de Beer, and B. F. van Dongen. Process Mining and Verication of Properties: An Approach Based on Temporal Logic. In OTM Conferences (1), pages 130147, 2005.

[23] Wil M. P. van der Aalst, M.H. Schonenberg, and M. Song. Time prediction based on process mining. BPM Center Report BMP-09-04, 2009.

[24] Wil M. P. van der Aalst, A.J.M.M. Weijters, and A.K Alves de Medeiros. Pro-cess Mining with the Heuristic Miner Algorithm. Technical report, Eindhoven University of Technology, Eindhoven, 2006. BETA Working Paper Series, WP 166.

[25] Wil M. P. van der Aalst, Ton Weijters, and Laura Maruster. Workow Mining:

Discovering Process Models from Event Logs. IEEE Trans. Knowl. Data Eng., 16(9):11281142, 2004.

In document Eindhoven University of Technology MASTER Discovery and analysis of field service engineer process using process mining Rusu, S.M. (pagina 91-100)