A Process Mining approach for redesigning Enterprise
Information Systems for a Service-Oriented Architecture
University of Groningen August 2007
N.R.T.P. van Beest 1333410
Supervisors:
Index
1. INTRODUCTION ... 3
1.1.PRELIMINARIES ... 3
1.2.CONTEXT ... 4
1.3.CONTRACT MANAGEMENT PROCESS ... 5
2. RESEARCH DESIGN ... 7
2.1.RESEARCH OBJECTIVE... 8
2.2.RESEARCH QUESTIONS ... 8
2.3.RESEARCH METHODOLOGY ... 9
3. ANALYSIS OF THE BUSINESS PROCESS ... 12
3.1.OPERATIONALIZING PERFORMANCE ... 12
3.2.INTRODUCTION TO PROCESS MINING ... 13
3.3.GLOBAL OUTLINE OF DESIGN METHODOLOGY ... 14
3.4.PROMFRAMEWORK ... 14
3.5.CONVERSION AND MERGING OF RAW DATA ... 15
3.6.CREATING A PROCESS MODEL BASED ON THE LOG DATA ... 19
3.7.PROCESS MODEL AND PERFORMANCE ... 23
3.8.INITIAL SETTINGS FOR SIMULATING PETRI NETS ... 24
3.9.DISTRIBUTIONS ... 29
3.10.COMPARING THE PETRI NET MODEL WITH THE MODEL OF REALITY FOR VALIDATION ... 30
4. OVERVIEW OF APPLIED DESIGN METHODOLOGY ... 30
5. CONCLUSION ... 33
5.1.LIMITATIONS ... 34
6. GLOSSARY ... 35
1. Introduction
1.1. PreliminariesNowadays, organizations operate in a fast changing and evolving environment, which is due to new markets, changing government rules, new competitors and emerging technologies. Therefore, organizations have to adjust their business processes along with the changing environment, in order to maintain a competitive advantage. Organizations depend heavily on IT support for the execution of their business process. Due to the required agility of the business process, IT systems are required to be flexible as well.
Standard enterprise systems, like ERP, BIS and CRM, are positioned by the vendor as being very flexible. Many heterogeneous organizations have adopted these systems and in many cases the implementations can be regarded more or less successful. Apparently, flexibility at implementation is readily available. However, after the initial implementation of the enterprise systems, it appears to be very difficult to change those systems especially without hiring external specialists. Once implemented, these systems are renown for lack of changeability.
In order to obtain the required flexibility, a possibility might be to replace the system by a new system which fulfils the requirements concerning flexibility and changeability. However, these standardized enterprise systems, which are often referred to as legacy systems, have involved considerable investments in the past. The maximum of those investments should be leveraged out of these systems. Although, these systems have been used for many years and are based on standards that were common back then (consider for instance performance standards), it is for that reason desirable to reuse as much as possible.
inter-able to support this need for integration, several integration technologies and techniques emerged over time, like component model infrastructure standards such as CORBA [5] and technologies such as DCOM and JavaBeans, in which services are grouped into coherent contractual units called interfaces [7]. By themselves, integration technologies do not create more changeability in monolithic software. These technologies enable componentisation of enterprise information systems, which may allow “piecewise” upgrading. Moreover, integration technology is often combined with Work Flow functionality, which in principle opens the possibility for explicit business process representation in enterprise information systems ([9], [10], and [12].)
As a potential solution, service-oriented architecture (SOA) advocates changeability by creating a coherent set of services that together provide enterprise functionality. SOA supports both flexible business processes with the use of web services [13] and integration of disparate applications, including the reuse of legacy applications. The emergence of XML-based Web Services promises to provide a set of standards for application integration, which will make integration simpler through common transports, data formats and associated services (security, transactions, etc) [8]. The services-oriented vision offers many benefits to enterprises by creating services that are modular, accessible, well-described, implementation-independent, and interoperable [5].
Vendors are currently adopting SOA as the new paradigm and, as they bring many new technologies to the market (like for instance Workflow, interactive forms, web services, XML) the question of changeability needs further study. These technologies offer customers (such as GasUnie) many possibilities to create computerized support for their business process, including reuse of legacy systems.
However, the amount of flexibility in architecting a system, which is composed of web services, can quite possibly lead to an architectural mess inside the system, resulting in issues concerning reliability and maintainability. For that reason, architectural guidelines for development of service-oriented systems are necessary for creating a coherent and solid system.
1.2. Context
N.V. Nederlandse GasUnie is a gas infrastructure company. GasUnie owns one of the largest high pressure gas pipeline grids in Europe. Through its Gas Transport Services B.V. subsidiary, GasUnie provides the management and development of the gas transport network. GasUnie also offers other services in the gas infrastructure field, including gas storage [15]. After liberalization (July 1st, 2005), the NV Nederlandse GasUnie was split. GasUnie Trade & Supply was separated and the remaining activities are bundled in a gas transport company with three main divisions: Gas Transport Services (GTS), GasUnie Construction & Maintenance and GasUnie Participations & Business Development. This resulted in many changes with respect to their Enterprise Information System (EIS), especially for GTS that supplies its services in this liberating market. Because of unbundling, new and other type of market players, new business processes and new roles appeared and with that new information needs, as a result of these changes GTS has to redesign their EIS.
One important factor with these changes is the increased externalization. Information needs from external parties, like GasUnie Trade & Supply, become as important as internal needs, this impacts current IT solutions and infrastructure. Furthermore, as response to the customer becomes increasingly important, performance emerges as an issue as well. Therefore, a strong demand can be identified for creating a flexible system with a high performance that reuses as much functionality from older systems as possible.
1.3. Contract management process
GTS uses an entry-exit system, in which the gas enters the grid at entry points and leaves it at exit points. Shippers can book transport capacity at entry points and at exit points. Before awarding transport capacity to a shipper, GTS performs an availability and a technical transport test. Tariffs have been set for all individual entry and exit points. Entry capacity gives the right to inject a specific volume of gas per hour into the national gas transmission grid at an entry point; exit capacity gives the right to extract a specific volume of gas per hour from the national gas transmission grid at an exit point.
into the system and the volume that he extracts from it. To achieve this, GTS has introduced a balancing regime.
The GasUnie system consists of a front-end system called GEA Click & Book (GEA) and a software package, in which GTS performs contracting and billing. Both systems are connected by middleware systems. GEA offers the opportunity for on-line booking of most entry and exit points. Only in special circumstances, specific entry or exit points (or clusters of points) cannot be handled on-line. If this is the case the booking request should still be made through GEA, but will be transferred to a manual evaluation with feedback by the GEA. GEA functionality includes entry / exit booking, transfer of capacity as well as (firm) wheeling. Other standard services, which will not be provided for by GEA, have to be requested by email / fax. [www.gastransportservices.nl]
2. Research design
The main purpose of an Enterprise Information System is to support the business process of an organization. When redesigning an information system based on an SOA, the redesigned system comprises at least partially components that are originating from legacy software. In this section, an overview is provided of the field of service-oriented redesign along with the focus of this thesis. Furthermore, the objective and methodology of this research project are described.
Motives for adopting an SOA based on legacy include needs for flexibility, reusability and increased performance, while retaining a fit with the organizations business process. Therefore, service-oriented redesign concerns three different categories of objectives. Creating an application with high flexibility and reusability (1), creating an application that fits the business process (2) and streamlining the business process (3). The service-oriented redesign can be regarded as a process that consists of those three parts. Although these parts are strongly connected, their methodologies, however, differ. Figure 1 provides an overview of the service oriented design, including the three parts that were discussed above.
Figure 1. Perspective of this thesis in service-oriented redesign
Furthermore, the application should fit the organizational business process. The process of establishing a fit between the business services that are required by the business process and the services available in the legacy system is referred to as the Top-Down Analysis. The top-down analysis concerns an analysis of the business services that are provided to the customer by the organization. Starting from that perspective, we have drilled down at the customer level into the system in order to identify the services that are, as a consequence, required from the system. Based on the business services that are defined at customer level, the services that are available from the system or the services that are additionally required can be identified.
Process mining concerns the identification of the embedded process flow in the system of an
organization, which can be used to determine the performance of the current system. Based on this current system, improvements are proposed for the redesign. This perspective is also referred to as Bottom-Up Analysis. When a legacy system is to be redesigned towards a system based on a service-oriented architecture, potential performance issues can be identified using process mining techniques and be solved accordingly. Streamlining the business process is, therefore, covered in this part.
Figure 1 shows the three perspectives that are included in a service-oriented redesign along with the grey area that marks the focus of this thesis. In this thesis, we a process mining methodology is developed, by providing a generalized design methodology for identifying and resolving performance issues in the business process that is embedded in enterprise applications.
2.1. Research Objective
The main objective of this thesis is to develop a design methodology using process mining with the aim of redesigning the embedded business process of an enterprise application.
2.2. Research Questions
Based on the above, the following research questions can be stated, that will be answered in this thesis.
1. Which performance criteria are relevant for service-oriented redesign?
Based on this definition, the contract management process in the case study can be assessed in terms of these performance criteria.
2. How can the business process be modelled, in order to be able to define potential performance issues?
In order to obtain an overview of the process that is suitable to draw conclusions with respect to performance, the business process should be captured in a process model. Based on the process model a performance analysis can be executed, to assess the performance based on the criteria that are defined by the first research question.
3. How can a business process be simulated for redesign purposes?
In order to derive conclusions about the performance of the business process along with potential improvements, the original model and the redesigned model should be simulated. This way we are able to compare the performance of the old situation with the redesigned situation, which enables us to draw conclusions on the performance gain.
2.3. Research Methodology
In this section the research methodology concerning the development of the design methodology will be discussed. It is important to clearly distinguish both types of methodology, as the research methodology describes the way the design methodology will be developed and validated.
In this research project, information about the contract management process at GasUnie is to be gathered in systematic way, in order to create an understanding of the actual process flow along with deficiencies related to performance. According to Berg [21], this implies the use of a case study method. The case is used as a supportive environment, in which the actual research is performed. Therefore, the case study method used in this thesis can be categorized as an instrumental case study [21].
Several techniques are combined in order to respond to the research questions stated in section 2.2. The combination of both process mining techniques and simulation techniques, which is explained in detail in section 5, concerns the main contribution of this thesis. In the context of the Contract Management Process at GasUnie, the combination of both process mining techniques and simulation techniques is applied to the case. A global graphical representation of the design methodology is shown in figure 2.
Figure 2. Graphical representation of research design.
As shown in figure 2, during the process mining analysis, a process model is generated from the log files, which is supported by the process mining tool that is used (ProM [14]).
A Petri Net is a graphical and mathematical modelling tool. Petri Nets are a good way of representing process models in a formal way. Petri Nets were invented by Carl Adam Petri to model concurrent systems and the network protocols used with these systems. The Petri Net is a directed bipartite graph with nodes representing either "places" (represented by circles) or "transitions" (represented by rectangles). Details about Petri Nets can be found at [18], [19] and [20].
In order to be able to assess performance, a business process should be represented as a Petri Net. The process model should, therefore, be converted to a Petri Net as an intermediate step, again using the process mining tool (M1). Based on this generated Petri Net, which is directly based on the log files, the performance of the business process is measured and analyzed (P1). In figure 2, the generated Petri Net is represented as Reality (M1).
both adjustable and able to simulate a business process. Based on that simulation, a performance analysis is possible (P2).
As there exists a strong incentive to create as much similarity as possible between the model based on reality (M1) and the model used for simulation (M2), it is important to validate M2 by a number of criteria and check the conformance of the model to the real data. The details of this procedure are described in section 3.
3. Analysis of the business process
The process mining starts with an analysis of the system from a performance point of view. In order to do so, first of all the concept ‘performance’ will be operationalized. Next, the contract management process is modelled. Finally, a model is created of the contract management process that can be simulated and adjusted to implement the changes. This section provides, therefore, an elaborate description of the design methodology used in this project.
3.1. Operationalizing performance
In this context, the term performance refers to operational performance, which can be quantitatively identified like, for instance, response time. Performance can be defined as a quantitative measure, characterizing a physical or functional attribute relating to the execution of an operation or function [16]. In this research project, the main concern is the entire throughput time of the process, as shippers want to be able to book capacity as fast as possible. Throughput time can be defined as the average time that is required to perform an activity or a process.
A bottleneck can be defined as a step in a process which has the highest utilization of the entire process [17]. High utilization easily results in high throughput times caused by long queuing times [17]. For that reason, performance can be influenced by the location and severity of the bottlenecks present in the process. Therefore, the bottlenecks that occur and the severity of those bottlenecks are used as a performance measure. In addition, as the contract management process consists of many activities, the performance of the contract management process is assessed by throughput time both per activity as for the entire process.
RQ1: Which performance criteria are relevant for service-oriented redesign?
bottlenecks and the throughput times are taken into account, because in this way an additional variable is introduced to validate the similarity between the model and reality.
3.2. Introduction to Process Mining
By using different sources of information, a process model can be constructed with the aid of process mining tools. These tools also provide the ability to validate the resulting process models. This is a form of conformance testing between the model and the transaction logs. In addition, the mining will provide insight into quantitative data on usage (e.g. throughput time of activities etc). The result of the mining could be in the form of Petri-nets or in Event Process Chains (EPC). In this project, we have used Petri Nets, as they can be used for simulation along with CPN Tools, which is explained later on.
The process perspective and the case perspective are mined. In other words, the control flow (process perspective) as well as the content / data level (case perspective) of the booking process is analyzed. The mining case should be on a high level of granularity. When a shipper is placing a request for capacity this results in an exchange of messages between systems. A message key should be found that will help to determine the mining case.
Process mining targets the automatic discovery of information from an event log. This discovered information can be used to deploy new systems that support the execution of business processes or as a feedback tool that helps in auditing, analyzing and improving already enacted business processes. Process mining techniques are helpful because they gather information about what is actually happening according to an event log of an organization, and not what people think that is happening in this organization. There might be a difference, therefore, between the processes identified by the logs and the formal description of the process as provided by available documentation.
3.3. Global outline of design methodology
As is pointed out in the previous section, based on the logged data of the current situation simulations is executed to compare the performance of the current situation with the predicted performance of the redesigned situation. The methodology of the process mining and simulation consists of several steps. The global outline of the design methodology is enumerated below:
1. Execute performance analysis using ProM.
2. Create Petri Net model using the simulation tool (CPN Tools) based on the current
situation.
3. Validate Petri Net model by executing a performance analysis based on the Petri Net
model, which is to be compared with step 1.
4. Create redesigned model based on the performance issues, which are identified in step 1.
5. Modify Petri Net model according to the proposed redesign.
6. Execute a performance analysis using the redesigned Petri Net model.
7. Compare the results of the redesign with the original performance.
After showing a global outline of the entire methodology, first some explanation is required about the tools that are used to perform the process mining and simulation.
3.4. ProM Framework
process verification, converting between different modelling notations etc). ProM plug-ins can be used to answer questions about processes in organizations.
Figure 3. ProM plug-ins overview.
For our process mining purposes, we have chosen to use two plug-ins: Heuristic miner and Genetic miner. These algorithms are suitable because they illustrate the most frequent behaviour of the process and they are robust to noise. However, the genetic miner takes more time to run. Therefore, due to limit availability of time, we will primarily use the Heuristic miner.
3.5. Conversion and merging of raw data
The data as provided by the organization or the system is often not in the format that is required by ProM. Furthermore, the raw data may incorporate more fields and details than strictly required for ProM. Therefore, the raw data should be converted to a format that is accepted by ProM. The raw data, along with the additional details, may still be very useful for other purposes, as is explained below. For that reason, it is not recommended to perform the restructuring proposed in this section on the original data.
aggregation are present. It is important that the entries or activities recorded in the logs are of the same level of aggregation in order to be comparable and consistent. A redesigned model, on the other hand, may very well have a very different aggregation level than the activities represented in the original log data, which are used for modelling the current situation. The level of aggregation within the redesigned model is, however, constant (this is a design property). This fundamental difference is caused by the way most systems create their event logs. Most systems do not maintain a constant level of aggregation while generating event logs, because in many cases the application developer decides ad hoc to log a certain activity during development time. The activities that are logged may, therefore, vary from a high level entry like Booking processed by Gas Transport Planning to a very low level entry like database entries. It is clear that the level of aggregation between these activities show a large difference. Nevertheless, the redesigned model should be compared to the current situation. In this section, the methodology is explained to merge the raw data to a consistent level of aggregation and convert the logs to the ProM data format accordingly. Essentially, two basic types of conversions should be performed on the raw data: the conversion from raw data to merged data and the conversion from the merged data to the ProM data format. Figure 4 provides a graphical representation of the conversions and the sequence of the conversions.
Figure 4. Sequence of conversion of data
As shown in figure 4, the raw data has to be merged, using a merging algorithm to create a consistent level of aggregation in the logs. Secondly, the merged data has to be converted to a format that is accepted by ProM.
Furthermore, as these log data will be used later on for creating a process model, retaining log data at a high level increases the readability and, therefore, the ability to determine the bottlenecks in the process. Once bottlenecks are identified at a high level, the algorithm of merging can be adjusted in such a way that the level of aggregation of those specific activities is decreased. This way, only activities that are related to bottlenecks are visible at a low level of aggregation. The exact cause of a bottleneck can, therefore, still be identified while only focusing in detail on activities that are relevant.
Figure 5. Methodology for identifying exact locations of bottlenecks.
This way a systematic methodology for identifying bottlenecks is introduced, while retaining a readable model. For the merging, the start and end of a high level activity should be identified. The start of the high level activity is inserted in the merged log file as high level activity start event. The end of the high level activity is inserted as high level activity end event. Identifying the start and end of a high level activity can be done in the following ways:
• Based on content of activities. Some activities can be recognized as part of one higher
level activity. This can be a clue that activities occurring in this particular order should be merged. For example:
Check A required complete
Perform check A complete
Return result check A complete
Check A start
Check A complete
• Based on changes in the originator. The process can be divided in parts executed by
various users. For example, the activities performed by a certain user can be seen as one high level activity.
• Based on changes in logtype. Logtype may indicate, for example, the system in which the
particular activity is executed.
Figure 6. Sources of logfiles in our case
In our case, our logfiles were based on the situation shown in figure 6. In the case described in section 2 three different logtypes could be identified: GEA, system and the interface. The highest level of aggregation only distinguishes between the systems and the interface. For that reason, the first merging loop was solely based on the logtype.
For some process instance, the representation of a logfile may look as shown below:
Figure 7. Example of a logfile
Figure 8. Example of a merged logfile
It is clearly visible in the example above that we used the attribute Logtype to merge. The merged logfile represents the processes for each system on a very high level. That is, only the start and end timestamp are recorded. We have chosen not to merge the interface log-entries, because it was in our case especially interesting to see the throughput times in between the systems.
As shown in figure 4, the merged data should subsequently be converted to a format that is accepted by ProM. This data conversion can be performed while using a standard database-macro that is provided with the ProM Import framework [14]. As this step only concerns the rearranging of attributes in a table, this is not covered in further detail in this thesis.
The merging and conversion that is described in this subsection is the first part of step 1 as stated in the design methodology
3.6. Creating a process model based on the log data
After finishing the preliminaries that are described in the previous sub-section, this section provides an answer to the second research question:
RQ2: How can the business process be modelled, in order to be able to define potential performance issues?
The methodology for creating a model of the business process with the aim of assessing the performance consists of two steps. The first step concerns the creation of a process model. Secondly, the model should be converted to a Petri Net, in order to be able to execute a performance analysis. These steps are described subsequently in this subsection.
(MXML) that can be read by ProM. The functionality of this tool is extensively described at [14]. In this thesis, this tool will not be discussed.
The processes represented in the log-file have for each process instance rarely the same start activity. Nearly every process instance starts with a different activity and ends with a different activity. Therefore, it is important to select an artificial start task and an artificial end task, which is supported by ProM. This way, each process instance is equipped with a compulsory start and end event, which enables us to assess the throughput time of the entire process later on. Next, the heuristics miner method is executed. Leaving all settings as default, this results in a process model, which is based on the log data.
Conformance of the model
However, the first look at the process model may reveal some strange process flows with loose ends and irregular loops. This may indicate that some outliers are present in the log data. These outliers, at least those with the most irregular process flow, should be eliminated from the log data, because they creating difficulties while determining the most frequent process flow.
Figure 9 shows an example of two transitions that are a result of the presence of outliers. Due to the merging algorithm, Request for booking (start) should always be followed by Request
for booking (complete). This is, however, not the case. The bold, red arrows show how ProM
tries to create a coherent model out of a set of process instances that contain rare process flows. As a result, the model shows even errors.
A variable provided by ProM, “Continuous semantics fitness”, shows the fitness of the model. That is, to which fraction the model is able to describe reality. If this value is low (in our case the initial value was 0.33, which is very low), the noise should be reduced, in order to obtain a better fitting model. In order to resolve the influence of outliers, ProM provides a plug-in to identify these irregular process instances.
Figure 10. Conformance of specific process instances.
Figure 10 shows an example of the results obtained with the fuzzy miner. The red bars represent invalid transitions that occurred in during a process instance. In the log data, the irregular process instances can be removed from the table. In this case, we eliminated all process instances from the log file, which contained more than 75% invalid transitions. By doing so, the model will have a much higher value of continuous semantics fitness. Furthermore, it immediately shows that the irregular process flows have disappeared, leaving a tight and clean model that can be used to execute a performance analysis as shown in figure 11.
Figure 11. Erroneous process flows are eliminated after removing the most irregular transitions.
process instances. Due to the small number of outliers, performance related predictions that are described in the next subsection can achieve a relatively high level of reliability, because relatively few uncertainties were be identified.
The methods described in this subsection comprise the next part of step 1 as stated in the design methodology.
3.7. Process model and performance
Next, based on this data, a process model can be generated using ProM. Accordingly, this model can, while still using ProM, be converted to a Petri Net. Using this Petri Net, an analysis can be executed for calculating the throughput times and determining bottlenecks. This procedure is supported by ProM and can, therefore, be executed automatically. Figure 12 and 13 show an example of individual analyses of bottlenecks and throughput times on activity level.
Figure 13. Example of assessing throughput time per activity
In order to eliminate bottlenecks by, for instance, doubling capacity at a certain activity, the performance should be simulated again, in order to test whether the proposed change in the process results in the desired performance gain. For that reason, the Petri Net should be adjustable. However, ProM does not allow for changes of the Petri Net model. This implies the use of a simulation tool that is capable of both creating and simulating Petri Nets.
3.8. Initial settings for simulating Petri Nets
how specific modelling issues should be solved. This section provides, therefore, an answer to the third research question:
RQ3. How can a business process be simulated for redesign purposes?
The following structure is required as a start of the process in the Petri Net. This implies that regardless of the case or the process to be modelled, this structure should always be present at the beginning of the model. In this part the case id’s (individual process instances) are generated and put into the system at an interarrival time of 100 seconds. Of course it is possible to use a function instead of a constant to insert an arrival rate with a certain distribution. In this example, the process instances are created here at a certain arrival rate, which is fixed. In practice, there are business processes, which can start with different activities or end with different activities. These processes do not have a fixed start and end event. Therefore, it is recommended to introduce an artificial start task and an artificial end task as these allow for a simple analysis of the overall throughput time, which is similar to the arguments provided in section 3.6.
Statement in CPN Net Explanation
if id < 1000 then 1`(id + 1) else empty
In this sequence the case id’s are ‘generated’. That is, this place and the transition Generator create tokens with a unique number (id), which represent the process instance number. Generator is putting the token back to the initial place under the following condition: if id < 1000 then a new token is created with value id + 1. Else no token is returned. Therefore, a maximum of 1000 case id’s are created and put into the system.
input(id); output();
action(createCaseFile(id));
During simulation, a new casefile is created for each new case id. Each id generated by Generator is put into a new XML-file, which later can be converted to a MXML-file using ProM-import.
1`(id + 1)@+100 This statement consists of several parts. A’ is the
anti-place of A. That is, A’ serves as a capacity-constraint. A’ contains only 1 token. Therefore, A can only contain 1 case id at a time. Transition Arrival is putting a token with id + 1 back to A’, thereby releasing a new case-id from Generator. The part @+100 means that every time a new case-id is released a time is added of 100 to the token. This statement serves, therefore, as an interarrival time. input (id, role0);
output ();
action (addATE(id, "Art Start Task", ["complete"], calculateTimeStamp(), role0, []));
A new entry is added to the XML-file, containing id as process instance, “Art Start Task” as activity, “complete” as eventtype, calculateTimeStamp() as timestamp, role0 as originator.
Figure 15. Example of high level activity modelled in CPN Tools
The details of the High Level Activity A shown in figure 15 are shown in figure 16 below:
Figure 16. Example of the details of a high level activity modelled in CPN Tools
A process may contain many XOR-splits. For example, two activity sequences may occur: ABC and ACB. Suppose that the ABC occurs 100 out of 500 times and the sequence ACB occurs 400 out of 500 times. This implies a relative frequency of 20% versus 80%. As standard Petri Nets only support OR-splits, in which activity sequences occur on a probability basis, a way to model an XOR-split has to be invented. In CPN Tools a data type should be created, which consists of the data type that is required in the token and a probability integer. Suppose, for instance, that the token contains an integer representing the process instance. Before the XOR-split this data type should be converted to a data type consisting of two tupels, one containing the original value, and one containing a random value between 0 and 999. Based on this value a decision can be made to which place a token should be produced.
Figure 17. Example of XOR-split modelled in CPN Tools.
In the example above, an XOR-split is made. Before place A, the token id is converted to a data type containing an integer and a random value. After place Silent XOR the token is directed to either place B (20.0%) or place C (80.0%).
CPN Declarations Explanation
colset INT = int
colset TIMEDINT = int timed colset PRBINT = product INT * INT var id : INT
var prbint : PRBINT
fun getValue(t: PRBINT) = #1(t) fun getP(t: PRBINT) = #2(t)
Create colorset integer
Create colorset of timed integer
Create colorset with two tupels of integers Create variable id of type integer
Create variable prbint of type PRBINT
3.9. Distributions
The average throughput times of the activities along with some parameters can be obtained from ProM. In many cases the throughput times can be assumed to be exponentially distributed. However, business processes may have, in some cases, extreme outliers. Therefore, it is advisable, especially for activities with large outliers, to have a more detailed analysis of the distribution of the throughput times, in order to be able to model the process as accurate as possible.
If a certain distribution is not provided as a standard function by CPN Tools, it can be approximated using combinations of other functions. However, CPN Tools does not support variables containing real numbers. For that reason, it is not possible to create a custom function to describe the distribution, which implies that a function is tied to consist of multiple standard supported functions.
A more thorough analysis of the exact distribution might show that for a certain activity the distribution consists of several standard defined distributions, which are all defined for a certain domain. For instance, 70% of the cases is normal distributed with mean A and 30 % is exponentially distributed with mean B. Approximations of that kind are possible to model in CPN Tools.
Distribution approximations can be obtained using the following 3 steps: 1. Create list of throughput times (using query)
For the activity to be approximated, create a list of all the throughput times for that specific activity. Throughput times can be obtained by creating a list with the differences between the start and complete event of the activity. Usually, it is required to write a short algorithm for this purpose.
2. Determine distribution, mean etc. (using graphs).
Next, the distribution kind can be obtained, by plotting the data into a graph. This way, the family of functions can be determined. Furthermore, the mean, maximum and outliers are to be determined, in order to obtain the required parameters.
3. Create approximation in CPN Tools.
3.10. Comparing the Petri Net model with the model of reality for validation
Once the distributions are a close approximation of the real values, the Petri Net model is nearly finished. A final point of interest concerns the location and severity of the bottlenecks that occur after simulating the model.
In order to be certain that the CPN Tools model is a close approximation of the model based on the original logfiles, replicating the bottlenecks is an important aspect. Summarizing, the following indicators of similarity between the models (M1 and M2) can be enumerated:
Semantics and process flow. First of all, both models (M1 and M2) should have the same
process flow and the same semantics. This is the first indicator of similarity.
Throughput times. Throughput times are an important criterion for assessing performance. For
that reason, the throughput times per activity and the throughput times of the entire process should be identical for M1 and M2.
Bottlenecks. Bottlenecks are another important criterion for assessing performance. Therefore,
the bottlenecks should be identical with respect to location and severity for M1 and M2.
4. Overview of applied Design Methodology
In the previous section, the steps included in the outline of the methodology were applied to the GasUnie case. During the application of the methodology, several issues arose, which have been dealt with. This section provides an overview of the design methodology that was to be developed in this research project. The design methodology presented in this section is, therefore, a more general description that is not specifically tied to details from the GasUnie case.
1. Execute performance analysis.
1a. 1b. 1.c 1d. 1e. 1f.
Merge data to a consistent high level of abstraction. Convert data to a format that is accepted by ProM. Create process model.
Execute fuzzy miner to identify irregular transitions, with the aim of removing outliers in the log data.
Create the process model again and verify the conformance. Generate Petri Net and execute performance analysis.
2. Create Petri Net model based on the current situation.
The current situation has been shown already in ProM. However, ProM is not capable of creating a Petri Net that can be modified. Therefore, a Petri Net of the process model should be created using CPN Tools.
3. Validate Petri Net model by executing a performance analysis based on the Petri Net
model, which is to be compared with step 1.
The correctness of the model when compared to reality is based on equivalence in process flow, throughput times and location of bottlenecks. Once a Petri Net model has been created that matches these criteria, the process described by the model is assumed to behave similarly as reality.
3a. 3b. 3c. 3d.
Execute performance analysis of Petri Net model.
Compare semantics and process flow with the model based on log data. Compare performance criteria with the model based on log data. If necessary, adjust the Petri Net model, to improve the match.
4. Create redesign based on performance problems in the current situation.
The process can be redesigned according to the issues that were identified during the performance analysis.
5. Modify Petri Net model according to the proposed redesign.
6. Execute a performance analysis using the redesigned Petri Net model.
7. Compare the results of the redesign with the original performance.
5. Conclusion
Organizations have to operate in fast changing environments. In order to retain their competitive advantage, organizations should have a flexible business process, which allows for fast adaptations according to the environment. Furthermore, organizations have to increase their response time to the customer in order. As most organizations support their business process by means of an enterprise information system, this system should be flexible as well. Currently, a new software architecture (which is called service-oriented architecture) has been emerging, which promise both flexibility and reuse of the legacy systems.
Legacy systems are built according to standards that were common when they were developed, but are outdated when compared to current standards. Moreover, as organizations are forced to change along with their environment, the potential evolving mismatch between the business process of the organization and the software system may cause inefficiencies in the way the business process is supported by that software system. When redesigning a system, it is important that potential inefficiencies with respect to performance are eliminated on the way.
In this research project, we have developed a design methodology, which describes how a business process can be assessed with respect to performance. Moreover, the methodology provides guidelines on how to simulate the current process along with a redesigned process, in order to be able to predict the potential performance gain after implementation of the redesign.
process represented in the redesigned model. By comparing both models, the potential performance gains of the redesign can be predicted.
Finally, in section 4 the description of the design methodology has been formulated in a generalized format independent from the case. A step by step description has been provided to provide a clear methodology for applying process mining with the aim of redesigning the embedded business process of an enterprise application.
5.1. Limitations
The research presented in this thesis concerns a number of limitations. First of all, the design methodology has been applied to a single case. This implies limited generalizability concerning the design methodology. The research performed in this thesis is, however, able to present an outline for a methodology along with suggestions on how to overcome issues that can expected to be common issues when compared to other cases.
6. Glossary
In this research project several techniques and concepts are used in both software development and process mining. For that reason, it is necessary to provide an explicit definition of each of the relevant concepts that are discussed or used in the remaining part of this thesis. This section provides an overview of the relevant concepts along with the working definition that is used in this thesis.
Legacy Software. Software built in the past, which is, although outdated, not yet replaced by
the organization. The term is commonly associated with a system which is inflexible and not fully compatible with current business requirements but difficult to replace.
Service-Oriented Architecture (SOA). A Service Oriented Architecture is a software
architecture consisting of loosely coupled components called web services that incorporate and expose parts of business functionality. SOA supports both flexible business processes with the use of web services [13] and integration of disparate applications, including the reuse of legacy applications. By wrapping existing legacy functionality into web services, this functionality can be re-used for new purposes. New functionality can be added gradually, in the form of new services.
Business Process. A collection of activities which has an input and creates an output that is of
value to a customer.
Workflow. A workflow is a pattern of activities, executed upon triggering by an external event
or case. A workflow is a special type of business process, not only well described, but also characterized by a clear begin and end (Wortmann, Szirbik).
Methodology. In this research project, ‘methodology’ is used for two specific purposes. The
Petri Nets. Petri Nets were invented by Carl Adam Petri to model concurrent systems and the
network protocols used with these systems. The Petri Net is a directed bipartite graph with nodes representing either "places" (represented by circles) or "transitions" (represented by rectangles). A Petri Net is a graphical and mathematical modelling tool. Details about Petri Nets can be found at [18] [19] and [20].
Process Model. A process model is a classification of processes (usually business processes)
in a model which represents the flow of activities within the process.
Process Mining. Process mining is the process of creating and verifying process models.
Process mining techniques allow for extracting information from event logs. For example, the audit trails of a workflow management system or the transaction logs of an enterprise resource planning system can be used to discover models describing processes, organizations, and products.
Producer. Any party owning, controlling, managing, or leasing any gas well and/or party who
produces in any manner natural gas by taking it from the earth or waters.
Trader. A company that buys and sells commodity (gas) with the goal of profiting from
short-term price swings.
Transporter. A legal entity that has the capability of providing the service of transporting gas.
This includes gathering companies, pipeline companies and local distribution companies.
Shipper. Owner of the transportation contract, for whom gas is transported.
Supplier. A company that buys gas from shippers to sell it to consumers and / or industries. End user. An entity which is the ultimate consumer for natural gas. An end-user purchases the
7. References
[1] Arsanjani, A., 2004. Service-oriented modeling and architecture: How to identify,
specify, and realize services for your SOA. IBM White Paper.
[2] Souder, T., Mancoridis, S., 1999. A Tool for Securely Integrating Legacy Systems into
a Distributed Environment. Sixth Working Conference on Reverse Engineering, IEEE
Computer Society.
[3] Alda, S., Won, M., Cremers, A.B., 2004. Managing Dependencies in
Component-Based Distributed Applications. Scientific Engineering for Distributed Java
Applications: International Workshop, FIDJI 2002 Luxembourg-Kirchberg, Book Series Lecture Notes in Computer Science.
[4] Li, M., Qi, M., 2004. Leveraging legacy codes to distributed problem-solving
environments: a Web services approach. Software Practice and Experience 34.
[5] Fremantle, P., Weerawarana, S., Khalaf, R., 2002. Enterprise Services. Examining the
emerging field of Web Services and how it is integrated into existing enterprise infrastructures. Communications of the ACM, Vol. 45, No. 10.
[6] Olsen, K.A., Sætre, P., 2007. IT for niche companies: is an ERP system the solution? Information Systems Journal, 17.
[7] Kontogiannis, K., Smith, D., O'Brien, L., 2002. On the Role of Services in Enterprise
Application Integration. 10th International Workshop on Software Technology and
Engineering Practice, IEEE Computer Society
[8] Gorton, I. Thurman, D. Thomson, J., 2003. Next generation application integration:
challenges and new approaches. Proceedings of 27th Annual International Computer
Software and Applications Conference, COMPSAC 2003.
[9] Wimmer, M., Albutiu, M.C., Kemper, A., 2006. Optimized Workflow Authorization in
Service Oriented Architectures. Proceedings of the International Conference on
Emerging Trends in Information and Communication Security (ETRICS), vol. 3995 of LNCS, (Freiburg, Germany).
[10] Papazoglou, M.P., 2003. Service-oriented computing: concepts, characteristics and
directions. Proceedings of the Fourth International Conference on Web Information Systems Engineering, WISE 2003.
[11] Papazoglou, M.P., Van Den Heuvel, W.J., 2006. Service-Oriented Design and
Development Methodology. International Journal of Web Engineering and Technology
(IJWET).
[12] Phippen, A.D., Taylor, J., 2005. Issues in moving from web services to service
[13] Moitra, D., Ganesh, J., 2005. Web services and flexible business processes: towards
the adaptive enterprise. Information & Management 42.
[14] www.processmining.org
[15] www.gasunie.nl
[16] http://sparc.airtime.co.uk/users/wysywig/gloss.htm
[17] Hopp, W.J., Spearman, M.L., 2001. Factory Physics. New York: Irwin /
McGraw-Hill.
[18] Petri, C., 1962. Kommunikation to mit Automaten. PhD Thesis, faculty of Mathematics
and Physics at the Technische Universität Darmstadt, Germany.
[19] Berkhahn, V., Klinger, A., Rueppel, U., Meissner, U.F., Greb, S., Wagenknecht, A.
Processes Modelling in Civil Engineering based on Hierarchical Petri Nets. Institute
of Computer Science in Civil Engineering, Universität Hannover. Institute for Numerical Methods and Informatics in Civil Engineering, Technische Universität Darmstadt.
[20] Cooper, K., Introduction to Petri Nets. University of Texas at Dallas.
[21] Berg, B.L., 2003. Qualitative research methods for the social sciences. Allyn &