Process mining in flexible environments

(1)

Process mining in flexible environments

Citation for published version (APA):

Günther, C. W. (2009). Process mining in flexible environments. Technische Universiteit Eindhoven. https://doi.org/10.6100/IR644335

DOI:

10.6100/IR644335

Document status and date: Published: 01/01/2009

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at:

openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Process Mining in

Flexible Environments

(3)

A catalogue record is available from the Eindhoven University of Technology Library.

G¨unther, Christian W.

Process Mining in Flexible Environments / by Christian W. G¨unther. Eindhoven: Technische Universiteit Eindhoven, 2009. Proefschrift.

-ISBN 978-90-386-1964-4

NUR 982

Keywords: Process Mining / Flexibility / Business Process Management

The work in this thesis has been carried out under the auspices of Beta Research School for Operations Management and Logistics.

This research was supported by the Technology Foundation STW, applied science division of NWO and the technology programme of the Dutch Ministry of Economic Affairs, under project number EIT.6447

Beta Dissertation Series D117

Printed by University Press Facilities, Eindhoven Cover illustration by Felix M. G¨unther

(4)

Process Mining in

Flexible Environments

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de

Technische Universiteit Eindhoven, op gezag van de

rector magnificus, prof.dr.ir. C.J. van Duijn, voor een

commissie aangewezen door het College voor

Promoties in het openbaar te verdedigen

op dinsdag 22 september 2009 om 16.00 uur

door

Christian Walter G¨unther

(5)

prof.dr.ir. W.M.P. van der Aalst Copromotor:

(6)

To Anne, my family, and the Commodore Amiga 500

— Were it not for you, this would not be here.

(7)

(8)

Part I

(15)

(16)

1 Introduction

Both persons and organizations are constantly involved in a large number of complex and long-running activities. While the human mind is generally not capable of think-ing about these complex activities at once, we have a powerful and very fundamental principle at our disposal, which is applied to tackle problems of arbitrary complexity. This principle is called divide and conquer.

Cooking is a simple example for the application of the divide and conquer prin-ciple. Recipes first divide the meal to be prepared into its major components, e.g., the meat, vegetables, and sauce. For each of these components, the way they are pre-pared is further subdivided into ever-smaller steps. The final recipe is a hierarchy of activities, which successively decomposes the potentially long and complex task of preparing a meal down to small, atomic activities. These activities can be understood by any person, executed in a short timeframe, and need minimal context knowledge to be performed (e.g., one does not need to think about the meat and sauce compo-nents of a meal when chopping vegetables).

The successive division of complex tasks into smaller units makes them man-ageable, and allows even untrained persons to perform small parts of a much larger endeavor. To make these pieces come together in a meaningful manner, one further needs to define how their execution is organized. For example, some tasks may de-pend on the results of other tasks, which implies an ordering relation. In general, there are some constraints on which subset of tasks needs to be executed, and in which order. The combination of a set of small, manageable tasks with a set of exe-cution constraints is what we call a process (e.g., the cooking recipe).

Processes play a fundamental role in our society, and their importance is steadily increasing. Already in the middle ages, craftsmen used explicitly defined (and anx-iously guarded) processes to describe and guide their work, e.g., the part-wise pro-duction and assembly of a chair, or of a cathedral. Especially for creating complex, large-scale products, a well-defined process is the fundamental precondition for col-laboration, i.e., sharing the total workload among a large number of helping hands.

During the industrial revolution, the divide and conquer principle and processes ultimately became the centerpiece in operating an enterprise. This is especially true for assembly-line production, which was perfected by Henry Ford. The meticulous

(17)

planning of production processes allowed industry to draw upon the masses of un-skilled workers. Each one of these workers would execute only one small part of production in a highly repetitive manner, i.e., individual workers would specialize to efficiently perform particular tasks (e.g., putting one screw in a car part). While there are certainly many drawbacks to this approach for production, most of them of a sociological nature, it provides the utmost efficiency, enabling low-cost production and a controlled, steady quality of products.

In modern times, we have witnessed the transformation of our society, from one that is primarily based on industrial production, towards a service-oriented and knowledge-intensivesociety. Most enterprises, such as banks or insurances, do not produce any physical items. Instead of cars, the items flowing through their pro-duction lines are of an informational nature, e.g., they “produce” loans or insurance cases. Nonetheless, the fundamental principle of divide and conquer, and collabora-tive work guided by defined business processes, persists.

Another dramatic change was triggered by the advent of affordable computers, enabling companies to support their workforce with information systems. Today, the majority of business processes are supported by information systems. These systems control, guide, and observe the correct and efficient execution of business processes, and distribute tasks automatically to appropriate workers in the organization.

In contrast to processes concerned with the production of a physical item, busi-ness processes are harder to observe and monitor. Since one cannot physically follow the product along its stages of production, analysts and managers need to rely on in-formation about process executions provided by the supporting inin-formation system. One particularly powerful method for analyzing and monitoring business processes in execution is process mining, which can extract a wealth of information from event logsrecorded by the information system. Among others, there are process mining techniques that can derive a process model describing the control flow of the pro-cess, or construct a social network describing the interactions between employees.

In this thesis we focus on the application of process mining to flexible pro-cesses.In contrast to a well-defined production process, the processes found in most knowledge-intensive environments exhibit a much larger variety of behavior. Since businesses constantly have to react to changes in their environment, e.g., changing market demands or legislation, they are dependent on flexible process support in or-der to be efficient. Flexible processes are an interesting and important application area for process mining. Since these flexible processes allow for a wide variety of possible behavior, process owners typically have limited knowledge about their ac-tual behavior.

While process mining can deliver useful insight into the actual behavior of pro-cesses, the current set of process mining approaches have a number of problems dealing with flexible processes. In this thesis, we analyze the root causes for these problems, and propose an adjusted set of goals and guidelines for process mining in flexible environments. We will further present a number of novel process mining approaches that are based upon these insights, and are thus more suitable for the analysis of flexible processes.

(18)

1.1 Business Process Management 5

In the next section, we will introduce the domain of Business Process Management (BPM) [173], which is concerned with information systems and the business pro-cesses they execute, in more detail. In Section 1.2, we introduce the field of process mining, which allows to analyze the execution of processes supported by informa-tion systems. Many environments for executing business processes are flexible and unstructured. This poses a number of challenges to process mining, which are in-troduced in Section 1.3. The work presented in this thesis is mainly concerned with analyzing these challenges, and providing solutions for them. Our main contributions are introduced in Section 1.4. Finally, Section 1.5 provides a roadmap, outlining the contents of this thesis.

1.1 Business Process Management

Especially in very small organizations, business processes may be implicit and un-documented. An office with three travel agents, for example, may have an implicitly agreed-upon workflow, based on the preferences of each agent for specific types of tasks. However, once the process is sufficiently complex, long-running, and having many people involved, this implicit approach is no longer feasible.

Consider the following example scenario. Jack has a small company that provides IT support for small companies and offices. Having just started his business, he has support contracts with only a handful of customers, and can thus do all the work himself. The typical business process for Jack is as follows.

• A customer calls Jack’s office with a support request, describing an IT problem he has.

• If the problem cannot be solved on the phone, Jack drives to the customer to provide on-site support, e.g. for solving hardware issues.

• After solving the problem, Jack sends the customer a bill.

• Every seven days after sending the bill, Jack sends reminders to customers which have not paid yet.

• After receiving the payment of a bill, Jack files the completed case in a drawer. As long as Jack is working alone, there is no need for him to define or write down this simple process, as he has it memorized. However, a few months down the line, his business has started to grow, and he needs to hire additional personnel for helping him out. Jack decides to have the billing and filing part of the process handled by a secretary for all cases. Further, when he needs to train new personnel, it becomes cumbersome to explain the process again every time. In short, Jack has a real need to explicitly design and document his process.

The design and documentation of business processes is an important part of BPM. For any goal, there may be many possible process designs. Some of these processes may perform better than others, some may be more resilient to unexpected exceptions, and so forth. Also, once a process has been properly designed and doc-umented, i.e. modeled, it can serve as a point of reference, both for executing the process, and for communicating about it.

(19)

Ideally, a well-defined process model is unambiguous, i.e. it clearly describes the process without leaving uncertainties. There are a number of process modeling paradigms in the BPM field, each having their specific up- and downsides. However, almost all of these modeling paradigms are graph-based, i.e. they model the process as a directed network of nodes, connected by edges. Nodes typically represent the tasks in the process, states, events that may occur, or describe split or join points for the described behavior. Arcs, connecting the nodes, represent ordering relations upon the tasks in a process (i.e., an arc from nodeA to node B in a model graph may represent that taskA needs to be executed before B). Examples for graph-based business process modeling formalisms are Petri nets, Event-driven Process Chains (EPCs), or the Business Process Modeling Notation (BPMN).

Graph-based process modeling formalisms are an excellent means to define and document processes. Their graphical nature allows for a very efficient communica-tion of even complex behavior. In contrast to, e.g., a verbal descripcommunica-tion of the process, a graph has a much higher density of information, and it typically contains less dis-tractions. As proverbial knowledge has it, “one picture is worth a thousand words”. The human brain is highly optimized for the effortless analysis of visual artifacts, i.e. a well-designed graph with an appropriate layout can convey information in a very short time.

Process modeling formalisms vary in the strictness of their executional seman-tics, so formalisms that have a more relaxed semantics may still allow for ambiguity. The benefit of modeling formalisms with clearly-defined semantics is thus that they are unambiguous. The information they contain is clear, and thus provides a solid foundation for communication. An important property of unambiguous, graph-based process models is that they can easily be translated into machine-readable forms. Thus, besides being a powerful tool for communication among humans, they can also be interpreted by software tools in an unambiguous, high-level manner.

Figure 1.1 shows the example process of Jack’s company in Petri net notation. Tasks are represented by rectangular transition nodes, while the circular place nodes encode the state of the process. Places can contain tokens (visualized as dots), which visualize the current state of the process. A transition (i.e., a task) can be executed when all preceding places (i.e., places from which there is an arc to the transition) have a token. The execution of a transition removes a token from each preceding place, and places a token into each place succeeding it. The solid black rectangle in Figure 1.1 represents a silent transition, i.e., executing this transition will change the state of the process, but will not be observable (i.e., does not correspond to any explicit action).

Instead of explaining his process to employees again and again, Jack can thus simply give them this process model, and potentially quickly explain the fundamental semantics of Petri nets. This may appear to be an overly complicated solution to his particular problem. However, consider that most businesses have hundreds of business processes, each consisting of hundreds of tasks. Thus, in a more realistic situation, the set-up costs for learning a modeling formalism quickly pay off.

Jack could also print out this process model for each case he is currently handling. By marking the places in the model with tokens, according to the current state of the

(20)

1.1 Business Process Management 7 Phone support On-site support Send bill Receive payment File case Send reminder Problem solved Timeout

Fig. 1.1. Model of the example process in Petri net notation.

case, he can keep track of overall progress. Furthermore, now he always knows which tasks are currently waiting to be executed. This means, Jack can use the process model to control and monitor the execution of his process.

However, the largest benefit of process models is that they allow the process to be controlled and monitored automatically. Rather than executing and tracing the process manually, an information system can be used for this, based on the defined process model. Let us say that Jack’s business has grown tremendously, and he is thinking of opening branches also in other cities far away. However, he wants to keep the bill handling and accounting centrally at his original location. Jack could set up an information system for supporting his process, and configure it with the process model he has designed.

For every new support case, his employees on the phone will start a new case in the system. The system will keep track of the progress of each case, and whenever one of Jack’s employees checks the system for work items, it will present him with tasks that are suitable for his role in the company. There is no need for support people to call his secretary about sending the bill, since all process-related communication and data exchange is conveniently provided by the information system.

(21)

The BPM field is concerned with business processes supported by some kind of Process-Aware Information System (PAIS) [58]. A PAIS can thus be any system that supports the execution of business processes. In older systems, or those which have been designed for a specific task, the process may be hard-coded into the system. Another type of PAIS are Workf low Management Systems (WfMSs), e.g. Staffware or IBM WebSphere. WfMSs can be configured with a process model, which they subsequently use to execute the process. Another, third type of PAIS are Enterprise Resource Planning (ERP) systems, e.g. SAP R/3 or PeopleSoft. ERP systems only provides access to common tasks, leaving the coordination and control of the process to the people involved.

We can thus see that different types of PAISs provide different levels of flexibility. Systems that are rather rigid and inflexible, however, provide the highest level of support for users. Flexible PAISs require the involved workers to be well-aware of, and educated about, the process. What all of these systems have in common is that they ease collaboration, by providing means for communication and data sharing, and that they make the execution of business processes observable. Every execution of a task in the process is noticed by the PAIS, and can be recorded in an event log. Thus, the system can document the execution of the process for legal proof, and for later analysis (e.g., debugging, performance measurement, auditing, etc.).

process design process configuration process enactment process diagnosis

Fig. 1.2. BPM life cycle [58].

Figure 1.2 shows the BPM life cycle, i.e., the meta-process that a business pro-cess repeatedly undergoes over its lifetime in an organization. At first, the organi-zation needs to design the business process. For this task, consultants traditionally use queries, or conduct interviews with the workforce, in order to understand the re-quirements and constraints. Process design is typically performed by skilled experts, which can ensure that the process is modeled in an appropriate, efficient, and correct manner (i.e. that there are no potential deadlocks or the like).

After the design phase, the process model is used for the configuration of a PAIS. Potentially, the process model may need to be translated into the modeling formalism

(22)

1.2 Process Mining 9

of the PAIS. In every case, the PAIS will need to be configured with knowledge about the organization and its structure, i.e., the involved people and their roles. This information needs to be mapped to the process model, by defining possible roles for each task in the process.

Once the PAIS is configured, it can start the enactment of cases for that process. The system will keep track of the progress for each case, and offer each employee the current set of tasks he can perform, according to his roles. During enactment, the PAIS can record the sequence of tasks that have been executed for each case in a so-called trace, and store these traces in an event log for the process.

These event logs are essential for the subsequent diagnosis of the business pro-cess. One may, for example, be interested in the slowest, or most expensive, 10% of all cases. By studying the traces for these problematic cases in the event log, an analyst may yield insight into what causes these problems, and pinpoint problematic bottlenecks in the process model. Process designers can then use this knowledge as starting point for a process redesign, which can remedy the experienced problems.

However, a planned process redesign is not the only reason for performing pro-cess diagnosis. Especially in PAISs that allow for greater degrees of freedom, i.e., where the process is not as strictly enforced, one may find that the actual behavior of the process, as observed from event logs, differs significantly from the idealized behavior, as prescribed by the process model. Studying the actual behavior can pin-point misalignments between the process model and the organization. The results from such an analysis can be used to greatly improve the performance and quality of the process, and to increase the satisfaction of users involved in its execution.

One of the most powerful tools for process diagnosis is provided by the field of process mining. The next section introduces this topic in more detail.

1.2 Process Mining

Observing and analyzing an industrial production process is typically fairly straight-forward. You can follow the product along the assembly lines, as it is being produced step by step. You can time the duration of production steps, see where work items are queueing up because of a bottleneck, and identify points where failure is likely to occur.

For business processes, this kind of observation is not as simple. When a process is implicit, the knowledge about how to execute it is tacitly spread among all workers involved, i.e., each person knows where he gets what kind of work from, what to do with it, and where to pass on the results. However, an explicitly modeled process, supported by an information system, is also tricky to analyze. There is an explicit process model one may study, but this model is, most of the time, an idealized and simplified description of reality.

Even processes that are modeled in a very strictly defined, unambiguous man-ner are susceptible to modification as they are being executed. This is because, once executed, the process model clashes with the realities inside the executing organiza-tion, and with unexpected stimuli from the environment. Some workers involved in

(23)

the process may feel patronized by the defined process, and continue to work in a conflicting manner that they deem more suitable. Further, there may be unforeseen exceptions during execution, such as special customer requests, deadlines to be met, a whole office falling ill, and the like.

In these situations, people tend to override the prescribed process in a number of ways. They may trigger the administrator of the PAIS to change the process state on the fly, or they may simply work “behind the system’s back”. The latter means that people will perform their work in their own ways, while simulating the correct execution of the process towards the PAIS, e.g. by entering bogus information to advance the process1_.

Process miningis a field of research that is concerned with the a-posteriori anal-ysis of business processes, based on event logs of their execution. It aims to extract aggregate, high-level information about several aspects of the process from these logs. For example, a process mining technique may generate a process model from an event log, describing the observed control flow of the process.

In contrast to other analysis techniques, process mining can deliver accurate and factual information about the process, rather than having to rely on an idealized model of reality. This makes it a powerful tool for an as-is analysis of processes, i.e., for painting an accurate picture of the current situation, in order to draw the appropriate conclusions.

For an illustration of process mining, let us return to the previously mentioned example of Jack and his IT support business. Jack has hired additional staff to help him out with fielding support calls and handling on-site support, and a secretary to handle the billing of cases. He is using the process model he has designed to instruct his co-workers. While, initially, he was sure that this model was an accurate description of his workflow, lately, he is having doubts of whether this assumption is true. Overhearing his staff at the water cooler, he has the impression that their style of working may differ in some aspects. Further, he has noticed himself that there are situations, where he needs to deviate from the prescribed process.

In order to get an impression of how accurate his process model actually is, and of how the process is performed in reality, Jack wants to start monitoring its execution. Whenever a new case is started, the phone agent receiving the call now needs to record his steps in the process on a sheet of paper. As the process continues, with a support agent driving to the customer site, or later with the secretary, this paper is handed to these persons for them to add their performed steps. Thus, for every support case Jack’s company has handled, he now has a complete record of all the steps performed. His papers are essentially traces, forming an event log of his process.

Figure 1.3 shows the paper traces from four cases of Jack’s support process. Using his defined process model, as shown in Figure 1.1, Jack now starts to analyze 1_{Note that, when the PAIS is bypassed by employees or administrators, these effective} changes to the process are not directly visible in event logs, since the PAIS has no notion of them. However, when analyzing event logs, e.g., by process mining, these deviations are possible to detect from atypical patterns they leave in event logs.

(24)

1.2 Process Mining 11 Phone support On-site support File case 1 2 3

Case 3

Phone Support

Send Bill Receive Payment File Case 1 2 3 4 Phone support On-site support Order hardware On-site support 1 2 3 4 Send bill Receive payment File case 5 6 7

Case 4

Phone Support Send Bill Receive Payment File Case 1 2 3 4 Phone support On-site support Send bill Send reminder 1 2 3 4 Send reminder Receive payment File case 5 6 7

Case 2

Phone support Send bill Receive payment File case 1 2 3 4

Case 1

Fig. 1.3. Event log observed from the example process.

these traces. He quickly recognizes “Case 1” as a simple support request, which has been resolved on the phone, and which has been paid for in time. In “Case 2”, a support agent had to drive to the customer to resolve the problem, and the customer only paid after two reminders had been sent to him.

Jack is, however, puzzled at first when he looks at “Case 3”. After performing on-site support, the case had been filed without any billing whatsoever. The trace for “Case 4” even contains an activity that had not been mentioned in his process model: After performing on-site support, the support agent had to “order hardware”, which was followed by another support visit at the customer site. Jack quickly realizes that, in his original process model, he had forgotten to include these two, relatively rare, situations.

In the third case, the support call was from a good, long-time customer, which had a simple problem. For such cases, Jack has the policy to perform them free of charge, i.e. on the basis of good-will. Further, while Jack’s support agents usually carry all kinds of replacement hardware to a customer site, there are situations where exotic hardware fails. In these cases (e.g., Case 4), the replacement hardware needs to be ordered first, which requires a follow-up visit to the customer site for installation. Jack is surprised by these results, having thought that his model was complete and accurate. He decides to model an updated version of his process, but now based strictly on his observations, i.e. the paper traces. He looks at all the ordering relations between tasks that are recorded in his traces, and tries to find a model that explains the process as faithfully to his observations as possible.

After a long night of reading traces, sketching process models, and checking them against the observations again, Jack has finally come up with the process model shown in Figure 1.4. He now has the benefits of a process model, i.e. a high level, unambiguous description of his business process. But now, he has the added benefit of his model being an accurate depiction of reality, i.e., the model does not show how he would like the process to be executed, but how it is actually executed in reality.

(25)

Phone support On-site support Order hardware Send bill Receive payment File case Send reminder Problem solved Good-will Timeout

Fig. 1.4. Actual process model in Petri net notation.

The way Jack has analyzed his process is obviously very cumbersome. Once his process is sufficiently complex, and he needs to record a large number of cases, he will find it impossible to come up with a faithful process model in a reasonable time frame. However, once a business process is supported by an information system, the event logs for that process can be recorded by that information system, and are then available in a machine-readable format.

Most business processes today are supported by information systems, and every day an overwhelming number of process cases are being executed. Even for one in-dividual process in a single organization, this results in large amounts of event log data that are impossible to analyze without computer support. Business processes are, however, not the only source of event log data. Increasingly, complex appliances and systems (e.g., x-ray appliances, or railway control systems) rely on internal pro-cesses for their operation, and are recording detailed event logs about their usage. Even more, people are involved in many large web-based systems on a daily basis. Contributing to social web communities, submitting search queries, and communi-cating via email can all be considered parts of larger processes, which are meticu-lously recorded in central event logs (i.e., the web server access logs, or application server logs, of web sites providing these services).

(26)

The phenomenon that many of our daily activities are implicitly being recorded in databases and event logs, and the amount of data created, is often referred to as data explosion. With every year the amount of processes, however trivial, whose execution is recorded in event logs, is increasing. Thus, there is a real and pressing need for techniques like process mining, which can help in the analysis of process-related information recorded in event logs.

For the actual discovery of a process model, based on such event logs, the field of process mining provides a large number of techniques that can be applied. Rather than requiring active thought and reasoning on the side of the user, however, process mining algorithms can generate a process model in an automatic and unsupervised manner. Further, most process mining algorithms can analyze large amounts of event log data in a relatively short time, so that process mining can be performed interac-tively, and can thus become part of the daily routine for process diagnosis.

In the above example, we have illustrated control flow mining. However, process mining also addresses other perspectives of the analyzed process, e.g., the data per-spective or the organizational perper-spective. Most event logs contain the originator for each event, i.e., the person (or resource) that has executed each task. Given this infor-mation, it is possible for process mining to generate a social network of these actors involved, describing who collaborates with whom, and who has the most central role in the organization.

While these examples address the discovery of original process models, gener-ated only from an event log, process mining can also address conformance and ex-tension. Conformance measures and quantifies, how closely a given process model conforms to reality, as observed in event logs. Jack could have used a conformance algorithm, e.g., to see how accurate his originally designed process model (cf. Fig-ure 1.1) describes the work in his company. Extension algorithms can be used to augment a given process model with additional information found in even tlogs. One example is analyzing the performance of a process, and projecting this information onto a control flow model of the process. This can be a very useful tool for pinpoint-ing problematic parts (e.g., bottlenecks) in the control flow or in the organization.

The field of process mining has created a large number of mining techniques, addressing many perspectives of a process from a discovery, conformance, and ex-tension point of view. Many of these approaches have been implemented in the tool ProM, a process mining framework. In the next subsection, this framework will be introduced in more detail.

1.2.1 ProM – A Process Mining Framework

Process mining has been a very active field of research for the past years, producing new approaches, refining existing techniques, and maturing from experimental setups to industrial applications. This astonishing growth and success has been partially enabled by the existence of a common basis for the implementation and application of process mining techniques: The Process Mining framework ProM2_.

2

(27)

ProM is made available free of charge, and under an open source license, with convenient installation packages for the MS Windows, Mac OS X, and general UNIX platforms. ProM is implemented in the JavaTMprogramming language, and is thus largely platform independent. The development of ProM is coordinated by the pro-cess mining group at Eindhoven University of Technology, and most development has also been performed by this group thus far. However, there is now an active community of researchers emerging, contributing new process mining approaches to the framework, and applying the wealth of mining techniques provided by ProM in practice.

Fig. 1.5. Screenshot of the ProM framework.

Figure 1.5 shows a screenshot of ProM, Version 5.1 (as of late 2008). The user interface of ProM is based on the MDI (Multiple Document Interface) paradigm, i.e., the main window of the application hosts a set of internal windows that present views on logs, models, and configuration options for mining techniques.

The main benefits of the ProM framework for the development and application of process mining techniques can be summarized as follows.

• ProM provides a common application framework, which simplifies the develop-ment of user interfaces for mining techniques, and which makes a large set of functionality available in one place.

• ProM is a plug-able framework, which means that while the framework provides a common UI and base functionality, all process mining and analysis functional-ity is provided by optional plugins. Plugins can be integrated into the framework

(28)

both from source and from binary packages, allowing for proprietary or commer-cial extensions.

• The framework provides a wide variety of model type implementations, which can be used as input or output for plugins. These model types include Petri nets, Heuristics nets, EPCs, Social networks, and YAWL. Plugin developers us-ing these model types can leverage common and frequently-used functionality, such as reading, storing, accessing, and modifying models. Further, the frame-work provides default visualizations for provided model types, making the task of displaying, e.g., a Petri net very straightforward.

• ProM is based on the idea of a shared, common object pool which can be ac-cessed by every plugin in the framework. A plugin can be applied to any subset of objects in that pool as input data, and can transfer the objects it creates back into that pool. This enables plugins to leverage the functionality of other plugins, by being executed in sequence.

Files EPML MXML

ProM

Object pool Logs Petri nets EPCs ... Import plugins Export plugins ProMimport WebSphere plugin Apache plugin SAP R/3 plugin ... PNML HN Heuristic nets Conversion plugins Log filter plugins Mining plugins Analysis plugins User interface Visualizations

Fig. 1.6. Overview of the ProM framework architecture.

The architecture of the ProM framework is sketched in Figure 1.6. In the center of this model the object pool is shown, which can contain any number of objects from a variety of types (e.g., event logs, Petri nets, or EPCs). Plugins in the ProM framework can belong to one out of six possible types.

Import plugins are used to read a specific model type (e.g., a Petri net) from a specific serialization format (e.g., PNML). This object is then added to the object pool, in order to make it available within the framework.

(29)

Export plugins take the opposite route, i.e., they can take objects of a specific model type from the framework’s object pool and store them in a specific se-rialization format.

Conversion plugins are used to transform models of a specific type (e.g., EPCs) to another model type (e.g., Petri nets). After conversion, the resulting model is added to the object pool. This makes it possible to, e.g., perform a Petri net-based analysis technique also on an EPC model, by converting it to the appropriate model type first3.

Log filter plugins can be considered a specific type of conversion plugins, trans-forming any event log to another event log. These plugins are highly useful to pre-processevent log data before the actual analysis, e.g., by removing superflu-ous event types that are not considered interesting.

Mining plugins are characterized by the fact that they use one event log from the object pool as input. Based on this log, they may create an arbitrary number of resulting models, which are added back into the event pool. In contrast to the previously mentioned plugin types, which typically do not need a user interface, mining plugins usually present their configuration options, and visualizations of the mining results, in a user interface.

Analysis plugins are the most flexible type of plugins in the framework, and can be considered a generalized form of mining plugins. They can use any combination of model types from the object pool as input, and return an equally arbitrary set of resulting model types. Like mining plugins, analysis plugins typically provide a user interface for setting the plugin’s parameters, and for presenting results. Based on this generic architecture, ProM has grown into a viable ecosystem of process-related analysis functionality. It supports a large set of model types, contain-ing the most popular process modelcontain-ing formalisms, and provides import and export functionality. Further, ProM currently supports a mostly complete set of conversions, so that a model of any type can be converted to virtually any other type (potentially by using intermediary conversions, e.g., from Heuristics nets to EPCs via Petri nets). A large number of mining plugins allows users to analyze event logs from many angles, with an equally large number of analysis plugins to explore their results fur-ther. Note that ProM also contains functionality that is not strictly related to process mining, e.g., the Woflan plugin can be used to check the correctness (i.e., soundness) of Petri nets.

While the Mining XML (MXML) format, defined in the context of ProM, has emerged as the standard format for storing event logs for the purpose of process mining analysis, it is currently not supported by many PAIS implementations used in practice. To remedy this shortcoming, the ProM ecosystem has been extended exter-nally by the ProMimport framework. This framework provides a convenient layer of abstraction for the implementation of PAIS-specific plugins for event log extraction.

3

Note that conversion plugins may be lossy, i.e., some information may be lost or distorted during conversion.

(30)

1.3 Flexibility – A Challenging Necessity 17

These event logs can then be easily written to standards-compliant MXML so that they can be analyzed with ProM (or other MXML-compliant tools).

The ProM ecosystem has proven to be a viable platform for the advancement of process mining, both in research and in practice. Researchers and developers can easily implement their techniques, and leverage other functionality in the framework for supporting their results. With the addition of the ProMimport framework, the functionality of ProM has been made accessible also for real-life application in prac-tical settings. In combination with an extensive redesign of essential functionality in ProM, such as the user interface or the log management subsystem, process mining has now become applicable to actual problems in industry, and can also be applied by non-experts.

1.3 Flexibility – A Challenging Necessity

Workflow management systems are one of the most popular options for implement-ing business processes. One reason for their success is that they allow process own-ers complete control over the business process execution. This is especially impor-tant for application domains, where the strict adherence to a well-defined process is paramount, e.g., a court of law needs to process cases precisely according to leg-islation. Since a WfMS enforces the execution of cases exactly according to the predefined process model it is configured with, ensuring this conformance is trivial. One benefit of this approach is that the correct execution of all cases can be guaran-teed, given that the process model does not contain any structural errors. Further, it is straightforward to monitor the progress of cases, since their current state can always be mapped unambiguously onto the process model.

However, the strength of the WfM approach, i.e. its rigidity that can guarantee correct execution of a process, is also the source of serious problems. Most enter-prises are constantly facing the need to change their business processes over time. A change in the environment of the business, e.g., changing market demands or new legislation, may force the modification of core processes, in order to adapt to these new requirements. Triggers for change can also be internal, e.g., it may become nec-essary to modify a business process in order to improve its performance, or to remove an error.

In a traditional WfMS, it is very cumbersome to implement such changes. The only possibility for modifying a business process in that kind of PAIS is to re-design the process, and use the updated process model for newly started cases. Transferring the currently-running cases to the new process model is typically not supported. This may incur serious problems, especially for long-running cases (e.g., in the banking or insurance domain) that need to react to a mandatory requirement for change (e.g., in order to conform with updated legislation).

Therefore, many organizations find the inflexibility of WfMSs, i.e. their inabil-ity to respond to change, to be incompatible with the requirements of their domain. These organizations have turned to alternative solutions that offer more flexibility in process execution. Some of these alternatives, e.g. Case Handling [2, 17], attempt

(31)

to provide flexibility by reducing the strictness of their process modeling formal-ism, thus allowing for more potential behavior. Other solutions, e.g. ERP systems, largely eschew the notion of an explicitly defined process altogether. In these sys-tems, merely the integrity of data is ensured, while the actors involved in a process are mostly free to execute any activity at any given point in time. In general, more flexible PAIS paradigms gradually shift the burden of correct process execution, and knowledge about the process, from the PAIS towards the users involved.

It is obvious that, given more flexibility, the people involved in executing a pro-cess will make use of it. Thus, the greater flexibility a PAIS allows during the exe-cution of a process, the greater the diversity of behavior will be for that process. A strictly-defined process typically only allows for a small number of possible traces. However, in a completely flexible process, which places no constraints on the execu-tion of tasks, the number of different traces is potentially unlimited.

Processes with a large diversity of behavior provide a great opportunity for the field of process mining. For rigidly-structured workflows, a mining algorithm can only re-discover the process as it had been designed4. Given a process allowing for more flexibility, however, process mining can be used to discover originally new knowledge. This is not only an interesting challenge from a scientific point of view, but there is also a real need for techniques that can analyze flexible processes in practice. High-level knowledge about the structure and behavior of flexible processes is often not available, since no person in the organization has a complete view about how the process is actually executed, as opposed to what is theoretically allowed.

Process mining has the potential to fill that gap, by providing powerful means for analyzing and understanding flexible business processes. However, business pro-cesses management is not the only domain that can benefit from process mining anal-ysis. There are many other fields where complex, flexible processes are executed, and need to be monitored and analyzed. One example are development or testing processes for complex systems. Further, using any sufficiently complex system can be considered a flexible process, constrained only by the user interface, technical constraints, and the common sense of users. Thus, there is a wide gamut of flexi-ble processes, for which the application of process mining techniques can be highly beneficial.

However, most traditional process mining techniques have been developed with the analysis of well-structured processes, like workflows, in mind. When they are applied to event logs from an unstructured, flexible process, the results are complex, confusing, and hard to understand. These resulting process models, containing a large number of nodes and entangled arcs, are often called spaghetti models.

Figure 1.7 shows an example spaghetti model, which has been mined from some usage logs of an x-ray appliance. While these models are certainly correct, in the sense that they model a complex reality in a complex but systematic manner, they are largely useless for any real analysis. Their size and complexity makes them hard

4

Note that process mining is still useful in well-structured settings. It can be used to extend the model based on event log data, e.g., to show bottlenecks, the handover of work, etc.

(32)

1.4 Contributions 19 ArtificialStartTask

(complete) 60 SelectAcqExam; Confirmation is IGNORED, command will be executed (playback mode or not needed)

(complete) 76 0.984 60 SelectRevExam (complete) 88 0.946 52 SelectFluoFlavour (complete) 198 0.929 21 SelectAdmExam (complete) 33 0.875 21 UpdateWorklist (complete) 9 0.5 5 0.963 27 XTraVisionReadyStatus (complete) 272 0.804 128 StartFluoroscopy(complete) 1237 0.875 29 PivotBeamFrontal.Move (complete) 62 0.5 3 AutoExamClose (complete) 4 0.667 2 BLAutoWedgeFollow (complete) 4 0.5 1 0.971 114 StartPrepare (complete) 286 0.967 94 SetAttendingPhysician (complete) 32 0.769 21 0.8 5 StorageCommitRequest (complete) 18 0.5 2 KeepFluoStore(complete) 6 0.5 5 0.986 263 Minimize (complete) 1336 0.995 930 StartViewing (complete) 1407 0.977 879 0.993 88 SelectEnableRadiation (complete) 74 0.909 47 SetOperatorName (complete) 30 0.75 19 MoveBeamLongitudinalLateral.Move (complete)92 0.75 17 SetAcquisitionApplication (complete) 73 0.923 50 SelectInjectorControl (complete) 68 0.971 47 StartStopWatch (complete) 3 0.5 3 SelectPhase1_ImageSpeed (complete) 9 0.667 9 BLWedge2TranslateIn (complete) 35 0.833 27 SelectPhase2_ImageSpeed (complete) 2 0.5 2 BLWedge2TranslateOut (complete)19 0.8 7 SelectFluoSubtractMode (complete) 12 0.889 12 SwitchElectronicShutters (complete)1 0.5 1 SelectExposureChannel (complete) 9 0.667 9 StopFluoroscopy (complete) 1237 0.918 1086 0.923 96 0.941 18 StopExposure (complete) 282 0.966 27 Lowerpriority (complete) 1336 0.998 1336 StopViewing (complete) 1267 0.975 1163 StartExposure (complete)282 0.976 46 MoveDetectorFrontal.Move (complete) 650 0.857 70 EnablePatientSupportBrake.Brake (complete) 1922 0.879 725 MovePatientSupportLongitudinal.Move (complete)338 0.75 18 BLStopShutters (complete) 368 0.8 20 RotateBeamFrontal.Move (complete) 788 0.917 50 CopyRunToFile (complete) 2 0.5 1 0.964 972 0.995 199 StoreRecallB.StopRecall (complete) 44 0.667 3 SelectReceptorFieldSize (complete) 124 0.947 47 ResetFluoBuzzer (complete) 29 0.9 18 AngulateBeamFrontal.Move(complete) 208 0.833 19 SelectAcquisitionInputFocus (complete) 7 0.5 2 0.996 236 Exposure.Start (complete) 282 0.996 282 0.995 155 0.976 69 0.973 53 StopPrepare (complete) 286 0.995 282 Exposure.Stop (complete) 282 0.986 282 0.928 280 0.983 277 AutoPushExport (complete) 242 0.95 232 0.951 124 ChangePatientSupportHeight.Move (complete) 584 0.875 38 StoreRecallA.StartRecall (complete) 95 0.929 27 SelectContrastArrivalTime (complete) 93 0.75 6 StoreRecallB.Store (complete) 11 0.667 6 0.909 39 0.997 440 0.923 127 RotateDetectorFrontal.Move (complete) 154 0.889 22 StopStepRunFwd (complete) 508 0.5 10 0.923 99 0.999 945 0.947 230 BLOpenShutters (complete) 121 0.917 73 0.9 65 0.917 216 ToggleLandmarking (complete) 7 0.667 6 TogglePatientSupportLateralBrake (complete) 2 0.5 2 AngulateBeamLateral.Move (complete) 10 0.5 5 0.909 40 StopStepImgFwd(complete) 5007 0.5 11 0.9 30 SelectView (complete) 915 0.857 23 SelectRunForCurrentRevExam(complete) 25 0.667 4 ArtificialEndTask(complete) 60 0.985 67 0.667 4 StopStepImgRev (complete) 2768 0.5 9 0.875 11 0.979 51 0.7 50 0.995 272 StopStepRunRev (complete) 332 0.5 6 0.889 37 0.99 113 0.936 211 0.947 103 0.996 292 MoveDetectorLateral.Move (complete) 40 0.75 10 StartStepImgFwd (complete) 5009 1 1658 1 3341 1 2119 1 1655 AddAnnotation(complete) 256 0.975 52 0.98 228 StartStepImgRev (complete)2768 0.998 755 StoreRecallA.StopRecall (complete) 95 0.667 8 StartStepRunRev (complete)335 0.967 73 StartStepRunFwd(complete) 508 0.986 105 CopyImageToFile(complete) 273 0.992 178 ShowSingleRunOvv (complete) 3 0.5 2 SetSpeedAndDirection (complete) 22 StartReplay (complete) 22 0.957 22 StopReplay (complete) 14 0.9 10 0.875 50 0.909 31 0.996 302 0.983 321 0.98 95 0.974 70 SetReplayScope (complete) 9 0.833 5 0.909 23 SelectFile (complete) 32 0.833 32 StepPageRev (complete) 6 0.182 5 StepPageFwd (complete) 1 0.5 1 DeleteAnnotation (complete) 21 0.889 20 SetLandmarkingGain (complete) 8 0.5 1 SetElectronicShutters (complete) 1 0.5 1 EditAnnotation (complete) 277 0.977 121 0.982 66 0.962 64 SetAcquisitionProcedure(complete) 113 0.663 29 0.941 47 0.923 29 BLWedge1TranslateIn(complete) 49 0.968 15 BLWedge1TranslateStop (complete) 61 0.969 32 0.969 8 0.974 10 BLWedge1Reset (complete) 9 0.9 9 BLWedge1RotateClockwise(complete) 9 0.8 6 BLWedge1TranslateOut (complete) 12 0.8 10 BLWedge2RotateCounterClockwise (complete) 5 0.5 5 BLWedge1RotateCounterClockwise (complete) 5 0.75 4 0.667 8 BLWedge2Reset (complete) 15 0.8 9 BLCloseShutters (complete) 241 0.96 204 0.5 8 0.985 291 BLResetObjectShutters (complete)60 0.979 53 0.929 57 0.5 73 0.917 45 0.964 47 0.933 17 StoreRecallB.StartRecall(complete) 44 0.957 7 0.955 36 0.955 11 0.952 5 0.857 19 0.973 12 0.972 77 1 873 1 1885 0.998 716 0.957 19 1 985 1 874 0.975 46 DoOverrideManualAdjustments (complete) 11 0.833 5 SetActiveAsNewMask (complete) 78 0.985 70 SelectAnalysisMethod (complete) 12 0.833 5 ToggleSubtraction (complete) 15 0.833 6 StartViewTrace (complete) 31 0.957 25 ExportToNetworkNode (complete) 2 0.5 1 ToggleFixedZoom (complete) 2 0.5 1 0.933 16 0.972 21 0.971 10 0.917 33 StoreRecallA.Store(complete) 19 0.8 5 0.9 56 0.889 12 0.979 56 0.9 24 0.998 43 0.998 288 0.955 52 0.966 32 0.998 162 0.998 38 0.976 42 ShowFileOvv (complete) 10 0.667 5 0.998 41 0.998 467 0.974 63 0.99 87 0.982 125 0.969 47 0.958 32 0.998 196 0.998 31 ToggleFlagRun (complete) 13 0.75 3 SelectViewPosition(complete) 13 0.8 6 ShowFullScreen (complete) 6 0.5 4 0.75 3 SetProcessingFocus (complete) 19 0.8 4 SetReplayType(complete) 6 0.857 6 0.875 8 0.917 17 0.889 2 0.833 23 0.992 160 0.929 23 0.833 232 0.917 41 0.929 24 0.998 442 0.75 5 StopStopWatch(complete) 3 0.667 3 0.5 2 0.667 3 0.667 1 0.989 203 0.909 24 0.98 38 0.667 1 0.75 6 0.929 15 Xper_module_1 (complete) 1 0.5 1 0.857 6 0.8 9 0.875 11 0.833 15 ForceToIdleState(complete) 1 0.5 1 ToggleFlagFile(complete) 12 0.8 8 0.5 1 Xper_module_2 (complete) 1 0.5 1 0.5 1 0.833 2 0.667 1 0.667 1 0.8 4 ToggleFlagImage (complete) 8 0.667 3 0.909 14 0.889 20 0.667 3 0.667 7 0.8 8 0.985 82 BLWedge1RotateStop (complete) 14 0.727 9 0.889 10 0.75 8 0.8 2 0.941 7 BLWedge2TranslateStop (complete) 54 0.944 22 0.917 13 0.944 1 0.952 17 BLWedge2RotateClockwise (complete) 5 0.833 5 0.75 4 BLWedge2RotateStop(complete) 10 0.8 5 0.667 3 0.933 25 0.938 35 0.974 52 0.5 2 SetFullScreenPixelShift (complete) 7243 0.938 40 0.974 15 1 7185 0.75 6 0.5 1 0.96 7 0.75 6 SelectCalibrationImage(complete) 10 0.667 5 ResumeAnalysisMethod (complete)1 0.5 1 SelectAnalysisImage (complete) 12 0.833 9 StartAnalysisMethod (complete) 10 0.889 10 ManCalDist(complete) 11 0.909 8 Accept (complete) 5 0.667 3 DeleteLast (complete)3 0.5 1 SwitchToAnalysis (complete) 3 0.5 1 DistanceEditBox(complete) 3 0.667 2 AcceptAutoCal (complete)2 0.667 2 ManCalCath (complete)1 0.5 1 QuitAnalysis (complete) 9 0.8 5 0.75 13 SetContrast (complete) 94 0.5 2 Length (complete) 6 0.75 4 SmoothCorrect (complete) 1 0.5 1 0.667 3 SaveReport (complete) 3 0.5 1 Delete (complete) 4 0.5 1 0.5 1 ReportPageDown (complete) 2 0.5 2 0.5 1 AddViewTraceNextImage(complete) 84 0.969 31 0.955 13 0.75 10 0.979 47 SkipViewTraceImage (complete) 12 0.083 6 UndoViewTraceLastImage (complete) 7 0.625 6 0.8 15 0.8 11 0.5 5 0.75 15 0.571 3 0.667 5 0.889 6 0.5 1 0.8 4 StopViewTrace(complete) 1 0.5 1 AutoMPPSComplete (complete) 4 0.8 4 AutoArchive (complete)9 0.8 4 0.875 8 0.5 3 0.875 4 0.941 5 AutoExamComplete (complete)4 0.8 4 0.8 3 0.833 6 0.75 6 0.8 3 0.5 2 0.5 6 0.5 2 0.9 12 0.5 2 SwitchToCal (complete) 3 0.5 3 0.5 1 0.667 2 DiscardCorrect (complete) 1 0.5 1 CompDefRef (complete) 4 0.5 1 0.8 2 Report (complete) 1 0.5 1 0.667 1 0.5 1 0.5 1 HardCorrect (complete)2 0.5 1 0.25 1 0.5 1 ReportPageUp (complete) 2 0.25 1 0.667 1 0.667 2 0.5 1 0.5 3 0.989 88 SetEdgeGain (complete) 9 0.5 3 SetBrightness (complete) 59 0.25 4 0.983 55 0.5 3 0.966 6 0.5 1 0.5 1 0.5 6 0.5 4 0.923 3 RotateBeamLateral.Move (complete) 6 0.167 3 0.5 1 0.5 2 0.5 1 0.667 1 0.75 5 0.96 29 0.889 8 0.667 1 0.5 1 0.667 2 0.889 3

Fig. 1.7. Example excerpt of a spaghetti model.

to read, and understanding what is going on, or deriving any useful knowledge about the process, is essentially futile.

We argue that spaghetti models are merely the symptoms of an underlying prob-lem that exists when traditional process mining approaches are applied to event logs from flexible processes. Since these techniques had been developed with structured, well-defined processes from WfMSs in mind, they have been optimized and tailored towards that use case. However, many assumptions that can be made with respect to a well-structured process are no longer true for more flexible, unstructured processes. It is this mismatch of assumptions that leads to a number of problems, eventually resulting in the complexity and confusion exhibited by spaghetti models.

The work presented in this thesis is concerned with the application of process mining to processes in flexible environments. We will revise some of the foundations for process mining under these changed circumstances, develop requirements that are more appropriate, and present a set of new approaches that are more suitable for flexible environments. All new process mining approaches presented in this thesis have been implemented in the ProM framework, and have been applied in real-life case studies.

The following section summarizes the main contributions of this thesis.

1.4 Contributions

The work presented in this thesis is characterized by the following five main contri-butions.

• A complete framework for event logs from a process mining perspective. This framework includes (i) a general taxonomy, defining a generic meta-model for event log data, (ii) a set of metrics to describe the structural properties of a log,

(33)

(iii) a framework for the elicitation, transformation, and synthesis of event logs, and (iv) a complete, efficient framework for the storage, access, and management of event logs.

• An in-depth analysis of the problems and challenges associated with the appli-cation of process mining to flexible processes. Based on this analysis, a com-prehensive set of requirements and goals have been derived. These can be used to guide the design of new process mining techniques that are more suitable for flexible processes.

• A set of techniques for event log schema transformation, including (i) event class projection for log simplification, (ii) trace segmentation for activity and trace discovery, and (iii) process type discovery by trace clustering.

• A complete framework for adaptive process simplification, which constitutes the main contribution of this thesis. This technique allows for the analysis and vi-sualization of complex, unstructured processes on arbitrary levels of abstraction, based on a set of log-based metrics. It can be employed in an interactive, ex-plorative manner, is supplemented by quality and authority metrics to guide this exploration, and is compatible to other process mining techniques.

• A set of techniques for the exploration and visualization of complex event log datafrom flexible environments. This includes (i) the application of the dotplot visualization to raw event log data, and (ii) the projection of event log data onto process models for interactive animation of actual process behavior.

Additional contributions of our work are:

• A discussion of flexible environments from a process mining point of view, ana-lyzing the factors that lead to diverse and unstructured behavior.

• The application of our newly introduced process mining techniques in practice, as demonstrated by four case studies.

• The design and implementation of all introduced process mining techniques in an open source framework (ProM).

• The design and implementation of a complete toolkit for event log elicitation and conversion (ProMimport), and for efficient event log storage, access, and management (integrated into ProM).

The following section provides an overview about the structure of this thesis.

1.5 Roadmap

This section briefly describes each chapter of this thesis, in order to give a rough overview about the structure.

Chapter 1 The current chapter. This chapter introduces the general domain of BPM and process mining, and motivates the need for new mining approaches that are more suitable for flexible processes.

(34)

1.5 Roadmap 21

Chapter 2 Introduces the field of flexible environments, i.e., environments that de-mand the flexible execution of processes. This chapter provides an overview about different types of flexible environments, and positions these with respect to the factors leading to diverse, unstructured behavior.

Chapter 3 Provides a comprehensive overview about event logs, from a process mining point of view. This chapter introduces a general taxonomy and structural metrics for event logs, discusses their elicitation, transformation, synthesis, and efficient storage and management.

Chapter 4 Introduces the field of process mining. This chapter also gives an over-view of related work in this field.

Chapter 5 Presents an in-depth discussion on the problems faced by process mining when applied to flexible processes. This chapter identifies several misalignments of traditional approaches, and presents a set of requirements and goals for mining techniques that are useful in flexible environments.

Chapter 6 Presents a set of approaches for event log schema transformation, i.e., the reorganization of event log structures to provide a preprocessed log, for sub-sequent analysis from a particular point of view.

Chapter 7 Introduces an approach for adaptive process simplification, which pro-vides a means for arbitrary abstraction, and is highly suitable for mining flexible processes. This chapter also introduces a new, adaptive type of process models, with quality metrics and means for conversion.

Chapter 8 Discusses the use of visualization for the exploration and analysis of large, unstructured behavior, as recorded in event logs. This chapter introduces the application of dotplots, a well-known visualization technique, to event logs, and a new method for the animation of event log data with a process model. Chapter 9 Shows how the approaches introduced in this thesis can be applied in

practical settings, by means of four select, real-life case studies. Chapter 10 Concludes this thesis and points out directions for future work.

(35)

(36)

2 Flexible Environments

In most large-scale organizations, e.g. enterprises or governmental bodies, the ma-jority of work is performed in the context of the business processes at hand. These processes implement both core functions of an organizations and supporting aspects, such as managing human resources, accounting, and so forth. Our intuitive idea of a process is that of a complex function, which is broken down into a partial order of smaller steps, which can more easily be handled by the resources (i.e., employees, systems) involved in the process. A process is usually further equipped with a set of rules, which may govern the control flow (i.e., precedence relations and routing between tasks), resource assignment (i.e., which person may execute which set of tasks), data flow (i.e., information dependencies), etc.

It has become commonplace that complex business processes are supported by some kind of Process-Aware Information System (PAIS). The first family of such systems to receive significant adoption in practice were Workf low Management Systems (WfMSs), which are specifically constructed to support business processes. Given a suitable process definition, containing the set of tasks and corresponding business logic, a WfMS can completely control the correct and efficient execution of the business process. Users involved in the process have a so-called worklist interface to the system, which the WfMS uses to offer tasks for which the person is authorized. After finishing a task, it is committed back to the WfMS, which then adjusts the state of the process, potentially enabling a new set of tasks.

The reason why WfMSs are able to efficiently control the correct execution of processes is, that they are based on the notion of rigid and strictly-defined processes. This entails that the set of tasks and resource classes (i.e., groups and roles), all possible routings between tasks and assignment rules, and so forth, are known be-forehand, and can be completely and correctly encoded in the process definition. Also, the manner in which workflows are modeled is very precise, e.g. using the well-known set of workflow patterns [6, 1, 146, 144, 145], which ensures that every process can properly terminate and not enter illegal states.

In an ever-faster evolving world, where businesses need to respond quickly to changes of the environment, such as market demand or legal regulations, it is obvious that the rigid, well-structured nature of workflow management makes it a problematic

Process mining in flexible environments

Process mining in flexible environments

Process Mining in

Flexible Environments

Process Mining in

Flexible Environments

PROEFSCHRIFT

ter verkrijging van de graad van doctor aan de

Technische Universiteit Eindhoven, op gezag van de

rector magnificus, prof.dr.ir. C.J. van Duijn, voor een

commissie aangewezen door het College voor

Promoties in het openbaar te verdedigen

op dinsdag 22 september 2009 om 16.00 uur

door

Christian Walter G¨unther

To Anne, my family, and the Commodore Amiga 500

— Were it not for you, this would not be here.

Contents

Part I

1

Introduction

1.1 Business Process Management

1.2 Process Mining

Case 3

Case 4

Case 2

Case 1

ProM

1.3 Flexibility – A Challenging Necessity

1.4 Contributions

1.5 Roadmap

2

Flexible Environments