• No results found

model. We will represent the ow paths as being the sequences of nodes in a path.

For our previous example, the ow path will be efgh.

The activity-process node association graph provides a at structure of the process map. This data structure is used in the replay algorithm presented in the following section.

3.4 Replay algorithm

In order to obtain performance information from an event log and project it onto a process map, we replay the log on the given process map. The approach is generic and does not make any assumption about the structure of the process.

For the replay algorithm we rst need to identify the number of hierarchical lev-els in the process map which represents the depth of the process map. The level of the log is computed based on the level of the process map, as being the deepest possible level in the hierarchy at which an activity in the log is encountered in the process map.

For the example in Figure 3.3, the number of hierarchical levels in the map is equal to 2. If the event log contains any of the activities: a complete, b complete, c complete, q complete, l complete, m complete, n complete, p complete, or s complete, then the level of log will be 0; if the event log contains any of the activi-ties: i complete, j complete, e complete, f complete, g complete, h complete, r complete, Abs2 complete, or Abs0 complete, then the level of log will be 1; oth-erwise level of log will be 2. If the event log contains activities at level 0 in the map and also activities at level 1 in the map, the level of the log will be 0, being the deepest level in the hierarchy.

The replay algorithm is a multi-phase iterative approach. The logic of the algo-rithm relies in identifying the common sub-sequences of activities in the event log that have a corresponding (exact/approximate) ow path in the process map. These sequences are identied based on base patterns [16] in the event log. The basic idea is that the algorithm starts from the level of the log and rst tries to map the con-tinuous/intermittent/approximate manifestation of the patterns with its abstract activity at the appropriate level in the process map. New abstract traces are cre-ated by replacing the sequence of activities with the abstract activity. In the next iteration, the base patterns are identied in the new traces and these traces are parsed using the algorithm. This is done until all activities in the initial log are mapped up to the highest possible level of the process map.

Each trace in the log is treated independently, therefore, we iterate over all traces and scan each trace from left to right. Iterating and parsing all traces will result in a new event log L0 which consists of the transformed traces in which sequences of activities are mapped to an abstract/primitive node in the map and are replaced

in the log up to the highest possible in the process map, and breadth rst because of the mapping of sequences in the event log to a corresponding control ow in the process map.

Algorithm 1 presents the replay of a given event log on a process map. Steps 1-14 in the algorithm deal with the discovery of common patterns of invocation of activities in traces proposed in [6]. We rst construct a single sequence which is obtained by the concatenation of traces in the event log with a distinct delimiter between traces.

Then base patterns (maximal repeats and loop patterns) are identied across all traces in the event log using the concatenated sequence (Step 1, Algorithm 1). Let us consider the process map given in Figure 3.3 and the corresponding activity-process node association graph in Figure 3.4. Two examples of the approach to replay a trace in an event log on the process map are going to be presented. These two traces belong to an event log L in which at rst step of the replay algorithm we identify a total of 39 base patterns in the event log. {abcde, abc, d, efgh, egh, fij, jxy, k} are a few of the base patterns.

After base pattern discovery, each trace in the event log is treated independently and we scan each trace from left to right to identify the longest pattern that has a manifestation in the trace at the current position. The length of the sequence, being the length of the base pattern, is not a xed value, but is a characteristic of the log. Thus, for our running example, Figure 3.5 shows a sequence of events in a single trace t = abcdeghfi.

Figure 3.5: An example eliciting the approach to replay a trace in an event log on a process map

This trace is at level 0, so we need to perform three iterations: rst two iterations are used to map all activities in the log up to the highest level in the map, level 2 (there are two hierarchical levels in the process map), while the third iteration is used for updating the control ow at the highest level in the map. Each activity in the trace can be associated to zero, one or more possible nodes in the map. The replay starts by scanning the trace from left to right.

3.4. Replay algorithm 49

5: Get SN the set of corresponding nodes in G of the current activity t(j)

6: if SN 6= {} then

7: Let LPs be the list of patterns in PLb starting with t(j) ordered in descending order of their length

8: for every pattern α ∈ LPs do

9: if there exists a continuous/intermittent/approximate manifesta-tion of α at index j in t then

10: if intermittent manifestation then

11: Re-adjust the intermittent manifestation in t

12: else if approximate manifestation then

13: Re-adjust the base pattern α

14: end if

15: Let P A be the set of possible abstractions t(j)

16: Let GP be the set of all ow paths: GP = {}

24: Choose the abstraction a and the corresponding ow path β that best ts the base pattern α

25: Let α0 be the adjusted pattern, i.e. the maximum common subsequence between the base pattern α and β

26: UpdatePerformance(α0, β, a); mapped(α0) = a

• The rst activity in the trace is t(0) = a.

Taking the list of base patterns starting with the current activity (Step 7, Algorithm 1), ordered in descending order of pattern length, we look for a continuous/intermittent/approximate manifestation of the base pattern in the trace (Step 9, Algorithm 1).

The list of base patterns starting with a is {abcde, abc} (Step 7, Algo-rithm 1). The longest tting subsequence at this position in the trace is abcde. The chosen base pattern is α = abcde, where α has a continuous manifestation in the trace (Step 9, Algorithm 1).

The sequence of activities in the base pattern starting with the current ac-tivity can have more possible abstract nodes to be mapped to. In order to choose an abstraction we rst get the set of ow paths in the sub-process of all possible abstractions (Step 19, Algorithm 1). If there is no possible abstraction, the set of ow paths will be the node corresponding to the individual activity (Step 21, Algorithm 1).

For the current activity a, the possible abstraction is node Abs2 (Step 15, Algorithm 1). The set of all ow paths of the abstraction node is {abc}, so we choose the ow path β = abc. In this case the base pattern is adjusted to be α0 = abc (Step 25, Algorithm 1) and all activities in α0 will be mapped to the abstract node Abs2 as depicted in Figure 3.6.

Figure 3.6: Mapping of sequence abc to abstract node Abs2

Further on, we update performance for the activities in the mapped base pattern α = abc (Step 25, Algorithm 1).

 the nodes corresponding to activities a, b, and c have the number of executions incremented incremented by 1

3.4. Replay algorithm 51

 the node corresponding to activity a has the initialization frequency incremented by 1

 the abstraction Abs2 has execution time incremented by c − a

 the ows (a,b) and (b,c) have number of executions incremented by 1 and execution times incremented by b−a, and c−b respectively

• Next activity under inspection t(3) = d. There exists only one base pattern starting with d in L, i.e., α = d, so the possible abstraction is {d}. The ow path is also d. Finally activity d is mapped to the primitive node d as depicted in Figure 3.7.

Figure 3.7: Mapping of sequence d to primitive node d

 the node d has the number of executions incremented by 1

• Next activity under inspection t(4) = e. The list of base patterns starting with e is {efgh, egh}. The longest tting subsequence is the base pattern α = egh. The possible abstraction is the abstract node Abs4 with the corresponding set of ow paths {efgh, egh}.

In the case that the current activity has more possible ow paths, at step 24 in the algorithm we identify the abstraction and the corresponding

ow path that best ts the base pattern. The approach is presented in detail in Algorithm 2. The algorithm rst searches if there exists a ow path that directly matches the base pattern, i.e., the base pattern equals the ow path. If this is the case then we have a direct match and the algorithm will return the ow path and the corresponding abstraction.

Therefore, we choose β = egh directly matching α (Step 1, Algorithm 2).

Activities e, g and h are directly mapped to Abs4 as depicted in Figure 3.8.Further on, we update performance for the activities in the mapped base pattern:

 the nodes corresponding to activities e, g, and h have the number of executions incremented by 1

 the abstraction Abs4 has execution time incremented by h - e

 the ows (e,g) and (g,h) have number of executions incremented by 1 and execution times incremented by g - e, and h - g respectively

pattern

1: if there is a path p ∈ GP such that p = α then

2: Let n0 be abstract node, the sub-process of which has a ow path p

3: β = p; a = n0; return;

4: else

5: Let ldmin = minp∈GP LD(α, p), be the minimum distance between the base pattern α and the set of paths in GP ;

6: Let GPmin be the set of all paths with minimal distance ldmin

7: if |GPmin| > 1 then

8: Let PMN be the list of previous mapped nodes, in reverse order (rst element is the last mapped node)

9: for all p ∈ GPmin do

10: Let n0 be the abstract node, the sub-process of which has a ow path p

16: Choose the path with the maximum intersection of activities between p and α; and the corresponding abstraction a; return;

17: Randomly choose β from GPmin and the corresponding abstraction a;

return;

18: end for

19: else

20: Let n0 be the abstract node, the sub-process of which has a ow path p

∈ GPmin

21: β = p; a = n0; return;

22: end if

23: end if

• Next activity under inspection t(7) = f. The list of base patterns starting with f is {fij}. The pattern has an approximate manifestation in the trace and will be adjusted to be α = fi (Step 13, Algorithm 1). The approximate manifestation here refers to the situation where the execu-tion of the subsequence corresponding to a pattern is not complete; it captures optional activities in a process model. The manifestation of the base pattern fij in the trace abcdeghfi is approximate because fij is

3.4. Replay algorithm 53

Figure 3.8: Mapping of sequence egh to abstract node Abs4

not complete but just fi. Thus, we adjust the base pattern to be α = fi.

The set of possible abstractions of f is {Abs1, Abs4} and the set of ow paths in these abstract nodes are Abs1: {fij}, and Abs4: {fghe}.

If there does not exist a direct match between the base pattern and the possible ow paths, we propose an approach to determine the best tting

ow path. The approach is based on the minimal edit distance between the ow path and base pattern. To obtain the edit distance, we decided to use the Levenshtein distance, that seeks the minimum number of edit operations (insertions, deletions, and substitutions) needed to transform one sequence into another. We will consider both ow path and base patterns and use Levenshtein distance (LD) between these two sequences (Step 5, Algorithm 2).

The Levenshtein distance between the adjusted base pattern fi and the

ow path is computed for all ow paths: {fij} and {fghe}. The mini-mal distance is 1 for β = fij (Step 5 and 6, Algorithm 2), and activities f and i are mapped to Abs1 node as presented in Figure 3.9.

Figure 3.9: Mapping of sequence fi to abstract node Abs1

Further on, we update performance for the activities in the mapped base pattern:

 the nodes corresponding to activities f and i have the number of executions incremented by 1

 the ow (i,j) has the number of missing edges incremented by 1

→ The resulting abstract trace t0 = Abs2 d Abs4 Abs1; All traces in L are abstracted in this way, thus L will transform into L0.

2. Second iteration

For the second iteration we identify the set of base patterns in the abstract log L0, {Abs2, Abs4, Abs1} are a few of them. After we have discovered the set of base patterns, we scan each trace from left to right.

• For the rst activity in the trace t0(0) = Abs2. The set of base patterns starting with Abs2 in L0 is {Abs2 d, Abs2}, and we choose α = Abs2 d.

The possible abstraction is Abs20 (Step 15, Algorithm 1). The set of all ow paths of the abstraction node is {Abs2 Abs0}, so we choose the

ow path β = Abs2 Abs0 (Step 21, Algorithm 2). The base pattern will be adjusted to α0 = Abs2 and all activities in α0 will be mapped to the abstract node Abs20 (Step 26, Algorithm 1) as depicted in Figure 3.10.

Figure 3.10: Mapping of sequence Abs2 to abstract node Abs20

 the node corresponding to Abs2 has the number of executions incre-mented by 1

 the node corresponding to Abs2 has the initialization frequency in-cremented by 1

 the node corresponding to Abs0 has the number of times skipped incremented by 1

 the ow corresponding to (Abs2,Abs0) has the number of missing edges incremented by 1

3.4. Replay algorithm 55

• Next activity under inspection t0(1) = d, the activity has already been mapped to the highest possible level in the map.

• Next activity under inspection t0(2) = Abs4. There exists only one base pattern starting with Abs4 in L0, i.e., α = Abs4. As presented in Figure 3.11, the possible abstraction is the abstract node Abs4, the corresponding set of ow paths {Abs4} and we choose β = Abs4.

Figure 3.11: Mapping of sequence Abs4 to abstract node Abs4

Further on, we update performance for the activities in the mapped base pattern:

 the node corresponding to Abs4 has the number of executions incre-mented by 1

• Next activity under inspection t0(3) = Abs1, with the possible mapping node Abs1. There exists only one base pattern starting with Abs1 in L0, i.e., α = Abs1. The possible abstraction is the abstract node Abs1. As presented in Figure 3.12, the corresponding set of ow paths is {Abs1}

and we choose β = Abs1.

Figure 3.12: Mapping of sequence Abs1 to abstract node Abs1

3. Third iteration

• First activity in the trace t00 is t00(0) = Abs20. There exists only one base pattern starting with Abs20 in L0, i.e., α = Abs20. The possible abstraction is the abstract node Abs20. As depicted in Figure 3.13 the corresponding set of ow paths {Abs20} and we choose β = Abs20.

Figure 3.13: Mapping of sequence Abs20 to abstract node Abs20

Further on, we update performance for the activities in the mapped base pattern:

 the node corresponding to Abs20 has the number of executions in-cremented by 1

 the node corresponding to Abs20 has the initialization frequency in-cremented by 1

→ As depicted in Figure 3.14, the entire trace t00 is mapped at the highest level in the process map so we update the control ow at the highest level (Step 37, Algorithm 1). The ow path that contains the nodes Abs20, d, Abs4, and Abs1 is Abs20 d Abs4 Abs1:

 the ows (Abs20,d), (d,Abs4), (Abs4,Abs1) have the number of executions and the execution time incremented

All activities in the trace are unambiguously mapped to some node in the process

3.4. Replay algorithm 57

Figure 3.14: Update control ow at the highest level in the process map

map. Each time a sequence of activities is mapped to a node, the sequence is replaced in the trace with the corresponding abstract node. Once trace t is parsed the result will be a new abstract trace t0. This trace is parsed using the same algorithm and will result in the abstract trace t00. Since the level of the event log (considering the example trace) is 0, the number of iterations is equal to 3. The nal trace t00is at the highest level of abstraction. When parsing the trace t and updating performance, if the trace has duplicates in the event log (i.e., f(t) ≥ 2), the performance will be updated for all the duplicate traces.

Let us look at another example trace as illustrated in Figure 3.15. The trace t is represented as the sequence of events t = abckefgjhxy. The update performance is in similar lines with previous example and will not be presented in detail for the current example.

Figure 3.15: Another example eliciting the approach to replay a trace in an event log on a process map

1. First iteration

• For the rst activity in the trace t(0) = a, the list of base patterns starting with a is {abcde, abc} (Step 7, Algorithm 1). The longest tting subsequence at this position in the trace is abc. The chosen base pattern is α = abc, where α has a continuous manifestation in the trace (Step 9, Algorithm 1). The possible abstraction is node Abs2 (Step 15, Algorithm

Figure 3.16: Mapping of sequence abc to abstract node Abs2

• Next activity under inspection t(3) = k. There exists only one base pattern starting with k in L, i.e., α = k, so the possible abstraction is {k}. The ow path is also k. Finally activity k is mapped to the primitive node k as depicted in 3.17.

Figure 3.17: An example eliciting the approach to replay a trace in an event log on a process map

• Next activity under inspection t(4) = e. The list of base patterns starting with e is {efgh, egh}. The longest tting subsequence is the base pattern α = efgh. This pattern has an intermittent manifestation in the trace.

The intermittent manifestation refers to the situation where the execution of the subsequence corresponding to a pattern is interrupted by other activities. For the base pattern efgh, the manifestation in the trace abckefgjhx is intermittent because efgh is interrupted by j.

Before proceeding, the trace has to be adjusted. As we can see in Figure 3.18, in the resulting trace tadjusted activity h is moved in the trace before activity j. Further on, the possible abstraction is the abstract node Abs4.

The corresponding set of ow paths is {efgh, egh}. We choose the ow path β = efgh. Activities e, f, g and h are directly mapped to Abs4 as depicted in Figure 3.19.

3.4. Replay algorithm 59

Figure 3.18: Adjusted trace due to intermittent manifestation of the base pattern in the trace

Figure 3.19: Mapping of sequence efgh to abstract node Abs4

• Next activity under inspection is t(8) = j. The list of base patterns starting with j is {jxy}. We choose the base pattern α = jxy that has a continuous manifestation in the adjusted trace. The set of possible ab-stractions is {Abs1, Abs3}. The set of ow paths of the abab-stractions is {fij, rj}. The Levenshtein distance between the base pattern and the

ow path is computed for all ow paths. The minimal distance is 3, and the set of paths with minimal distance is {fij, rj}.

If there are more ow paths with the minimal distance (Step 7, Algorithm 2), the next step is to search for a previous mapped node that passed con-trol to any of the current possible abstractions. This is done by checking in the activity-process node association graph if there is a ow path from the previous mapped nodes to any of the possible abstractions (Steps 8-12, Algorithm 2). If found, the possible mapping abstraction is chosen

If there are more ow paths with the minimal distance (Step 7, Algorithm 2), the next step is to search for a previous mapped node that passed con-trol to any of the current possible abstractions. This is done by checking in the activity-process node association graph if there is a ow path from the previous mapped nodes to any of the possible abstractions (Steps 8-12, Algorithm 2). If found, the possible mapping abstraction is chosen