Eindhoven University of Technology MASTER Data-driven product design discovering potential for automation by analyzing user behavior Uku, R.

(1)

Eindhoven University of Technology

MASTER

Data-driven product design

discovering potential for automation by analyzing user behavior

Uku, R.

Award date:

2019

Link to publication

Disclaimer

This document contains a student thesis (bachelor's or master's), as authored by a student at Eindhoven University of Technology. Student theses are made available in the TU/e repository upon obtaining the required degree. The grade received is not published on the document as presented in the repository. The required complexity or quality of research of student theses may vary by program, and the required minimum study period may vary in duration.

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights.

(2)

Data-Driven Product Design:

Discovering potential for automation by analyzing user

behavior

Master Thesis

R. (Raisa) Uku

Department of Mathematics and Computer Science

Supervisors:

dr. N. (Natalia) Sidorova ir. A. (Angelique) Brosens-Kessels, PDEng (Philips Healthcare)

Final Version

Eindhoven, July 2019

(3)

(4)

Abstract

The need to understand users lies at the core of product design process, but traditional approaches of studying user behavior carry potential limitations. Therefore, making use of the data logged from the user interaction with the system, provides one with factual insights on user behavior, to correctly identify and understand their needs and the actual use of the system. However, this data are often not logged for the purpose of studying user behavior, posing in this way a challenge for getting relevant insights. Therefore, this thesis proposes an approach which aims to contribute in the product design process by analyzing the user interaction with the system from a data-driven perspective. The focus will be on discovering user behavior that by being repetitive in a specific context, might represent a po- tential for automation. As such, the research goal that is addressed is: Given a system event log, how can the product design process be improved by identifying user behavior, which might represent potential for automation in a specific occurrence context.

The presented approach, proposes a relevant way to transform the data, in order to enable the discovery of the user behavior we aim for. Process instances are defined and the concept of high-level activities is presented, in order to enable the analysis of events and contextual information logged in different levels of abstraction. For this step we make use of the artifact-centric approach. Having the data in the appropriate format, allows us to discover the use patterns that might represent potential for automation. Moreover, the data are further mined to detect the contextual factors that are related with the occurrence of the discovered use pattern, enabling one to identify what leads to a specific user behavior.

Finally, using the detected context, insights are revealed by performing cross-analysis and conformance checking to detect expected occurrences of the use pattern or potential deviations.

The approach is primarily developed for getting insights from the event logs of an interventional X- Ray system of Philips Healthcare. However, its steps are clearly formalized, and a Python framework is design, to enable applying this approach to systems and data of the same or similar nature.

The evaluation showed that this approach can be successfully used to discover user behavior that might represent potential for automation, despite the complexity and fine-grained nature of the event logs.

Based on one of the use patterns discovered in the presented case study, configuration changes were performed from Philips Healthcare in the affected systems in one hospital in the Netherlands.

(5)

(6)

Preface

I would like to thank my family, for being there for me whenever I needed them most, for helping me to succeed, and for the unlimited love and support during all these years. Thank you for believing in me, for being my cheerleaders, and for teaching me to never give up.

I am extremely grateful to my supervisor, Natalia Sidorova, for the weekly guidance, the constructive feedback, invaluable ideas and extensive reviews. In particular, thank you for being critical and inspir- ing at the same time. I really appreciate that you offered me the chance to work for this project and that you put me in contact with Philips Healthcare.

Moreover, none of this would have been possible without the regular meetings and discussions with my Philips Healthcare supervisor, Angelique Brosens-Kessels. Thank you Angelique for introducing me to the wonderful and complex world of Image Guided Therapy system design, for your suggestions and discussions, and for being always positive, helpful and encouraging. In addition, I would like to thank doctor Lukas Dekker for his great assistance during our visits to a hospital in the Netherlands.

Finally, I would like to express my gratitude to the Big Data Management and Analytics (BDMA) Erasmus Mundus Joint Master Degree Programme, for giving me this amazing opportunity, and to my friends with whom I shared this experience. This two-year journey, that began at Université Libre de Bruxelles (ULB), continued at Polytechnic University of Catalonia (UPC) and ended at Eindhoven University of Technology (TU/e), helped me to grow both personally and professionally.

(7)

(8)

List of Figures

1.1 A visual representation of the research goals . . . 4

2.1 A model M of a simple healthcare process and its two artifact models A1 and A2. Every state in the process is a combination of a state from each artifact. [39]. . . 9

3.2 An overview of data gathering (the area marked with dashed lines), adapting the high-level data architecture from [11]. . . 16

4.1 Detailed steps of the approach . . . 22

4.2 Defining the high-level activity based on the Start and End State: (a) No parallel high-level activities (b) In case of a group of artifacts defining the HLAs, there might be parallel HLAs, but this depend on the artifacts that are involves . . . 26

4.3 Different types of high-level activities might be defined, and events from the same event classes might be part of different HLAs . . . 27

4.4 A fragment of UML class diagram for the event log meta-model, as defined in [13] and [38], adapted with the concept of high-level activities (identified in green) . . . 28

4.5 Integrating qualitative research methods with the proposed data driven approach . . . 31

4.7 An excerpt of a spaghetti-like process model from the raw traces of the event logs of Azur- ion systems . . . 35

4.8 An overview of the artifacts states and the events that trigger each state, used to define the high-level activity . . . 36

4.9 Pressing two pedals at the same time, illustrates the need for considering multiple artifacts when defining the high-level activities. . . 37

4.10 The percentage of use-pattern occurrence in one hospital in the Netherlands, in three Azurion systems. . . 39

5.1 The percentage of use pattern and anti-pattern occurrence, according to their severity level . . . 44

5.2 The KDE plot, illustrating the density distribution of use pattern occurrence over the nor- malized exam duration, according to their severity levels . . . 45

5.3 Correlation between the system unit from which the user action is performed and use pattern occurrence, according to their severity levels . . . 46

5.5 The relation of detector size greater than 190 to the occurrence of Anti-Pattern 4 . . . 47

5.6 The calculation of severity rating, as a combination of impact on the user experience and problem frequency [21] . . . 48

5.7 Kernel Density plot of the manual actions performed by users. Colors are used to indicate different systems . . . 49

A.1 What is the total number of exams per each system and EPX Protocol . . . 63

A.2 What is the hourly distribution of the exams duration? . . . 64

B.1 Generating association rules using Rapid Miner . . . 67

(11)

(12)

List of Tables

2.1 An illustration of a sequence database, as an ordered list of itemsets. [10]. . . 10

3.1 A subset of event logs from an Azurion system, after adding the case identifier . . . 16

4.1 Fragment of frequent closed subsequence mining for the Wedge artifact. [10] . . . 30

4.2 Activities that trigger states changes of lab exam artifact, as shown in Figure 4.8a. . . . 33

5.1 What is happening in 22% of the cases, for which the expected use pattern does not occur? 47 6.1 How did the users react to the configuration changes? . . . 52

(13)

(14)

Listings

4.1 Artifacts states used to define the Start State of the high-level activity . . . 37 4.2 A subset of available metadata, logged in the level of each acquisition . . . 38 D.1 A subset of activities logged in Philips Azurion systems. . . 72

(15)

(16)

Chapter 1

Introduction

This chapter gives an overview of the master thesis, carried out as part of the Big Data Management and Analytics Erasmus Mundus Joint Master Degree, with a specialization performed in the Department of Mathematics and Computer Science, Information Systems research group, at Eindhoven University of Technology (TU/e).

It starts with describing the motivation and the context of the project, performed at Philips Health- care, in Section1.1. Moreover, an overview of the current approaches used to study the user behavior is provided and the added value of a data-driven approach toward the correct identification and understanding of the user needs is highlighted. Section1.2and1.3introduce and formalize the research goals and the proposed approach, explaining why they are important and how do we approach them, respectively. The chapter concludes with the outline of the thesis in Section1.4.

1.1 Project context and motivation

This research project is performed within the Image Guided Therapy (IGT) Systems, System Design De- partment in Philips Healthcare. As part of IGT business cluster, this department is responsible for devel- oping integrated systems, which use advanced imaging guidance to help physicians perform minimally invasive treatment and procedures more effectively. These image-guided interventional solutions enable doctors to guide miniature tools through a small opening in the patient skin. Treatments can be provided for a variety of cardio, vascular and neuro diseases.

In broad terms, this projects aims to improve and contribute in the design process of these and similar systems, by using the potential of data-driven approaches to correctly study and understand the user be- havior, experience and their needs. The system that will be analyzed in this project is an interventional X-ray (iXR) system, named Philips Azurion, an image guided therapy platform, first launched by Philips in 2017. As an iXR solution, Azurion is able to support a wide range of image-guided and surgery-like procedures, while using X-ray as image guidance, to enable the patient treatment, and it is a complex system, composed by several units and sub-units, as elaborated in more details in Section3.1.

Currently, the approaches used for studying the user behavior include: usability studies, physical product monitoring, Flying Quality Squad tours and customer complaints. A more detailed description of them is provided in Section3.1. The outcome of these approaches, serve as input for possible usability improvements or changes of the system, which must be further analyzed by the usability engineers, in order to assign a priority level to each potential improvement. However, current approaches might be prone to potential limitations and risks. For instance, the scale of impact is small as the analysis is carried out per single instance of the problem, no cross-analysis is performed, it is often not known how other users are experiencing a specific issue, and therefore, it is difficult to put priorities and decide how serious the problem is. Moreover, these approaches (i.e. usability studies) make use of the knowledge

(17)

CHAPTER 1. INTRODUCTION

that usability engineers already have and there is quite a small probability to discover something they did not know. The risks for each approach are further elaborated in Section3.1, but this brief description indicates that improvements could be made, and as such a data-driven approach might enhance and add value to the understanding of user behavior.

In the view of this, we propose a data-driven approach which targets the above mentioned limitations and risks, and at the same time might represent a way of using the resources more efficiently. As such, it enables us to perform cross-analysis, for instance, by analyzing how other users in other hospitals, in other countries are experiencing or reacting towards the same or similar problems. Furthermore, this data-driven approach might allow one to reveal insights that were not expected and discover behaviors that were not known. In addition, by analyzing the context in which a specific user behavior occurs, one might be able to detect what leads to it and to evaluate its potential for automation.

Moreover, two previous master thesis projects, have been conducted in the same field, contributing in the data-driven design of iXR systems. The main goals for each of them were: giving insights in the complex workflow of iXR systems and generating usability testing scenarios [18]; specifying and detect- ing usage anti-patterns and investigating how users respond in case an anti-pattern occurs [23]. The outcome of these projects, were a proof that data-driven approaches have the potential to contribute in product design, by providing insights for their actual use. However, several challenges still exists; there- fore, our approach aims to further investigate and understand the user needs and their interaction with the system, with the purpose of detecting if there is potential for automating certain user behavior.

The proposed data-driven approach makes use of the logged data from Azurion systems. Currently, the events generated by the interaction of the users with the system and the ones generated by different system units during its operation, are logged in daily log files. The nature of the logging is very fine- grained and complex, influenced among others by: the diversity and the number of the logged events;

the fact that their origin is not easily obvious (i.e. if they are events generated by the system itself or by user interaction); the potential of the system to allow clinicians to work in different cases, from different parts of the system, at the same time, without explicitly distinguishing between them. Therefore, we can not simply rely on traditional process mining techniques to reveal insights for process instances from the available data. That being the case, a better logging could positively contribute to and facilitate the analysis of the processes and the discovery of the insights. However, the existence of real-life situations for which this is still a problem and a better logging can not solve it, motivates us to propose and de- velop a data-driven approach, able to deal with the tricky nature of the real-life event logs; i.e, the very fine-grained logging. This approach offers a tremendous potential for the design engineers to further obtain valuable and actionable insights into the user behavior, in addition to traditional techniques, as it examines the actual interactions of the users with the system.

It is also important to mention that we took advantage of qualitative research instruments like observations in hospitals and interviews with doctors, clinical marketing specialists and usability engineers, to enable us making use of the relevant domain knowledge.

To sum up, this thesis aims to evaluate the potential for automation of possible behavior that might be performed by users while interacting with the system. The importance of detecting these sets of actions, is not only in terms of enhancing the users experience, but also in terms of ensuring safety requirements. For instance, the detection of user behavior that might be partially or totally automated in the future, might not only minimize their annoyance and frustration but also reduce the possible negative influence in patients or staff health, such as the amount of emitted X-ray radiation. In order to evaluate the potential for automation, one must also analyze the context in which a specific behaviour takes place, as each user behavior might be interpreted differently in different contexts. Furthermore, this enables understanding what causes or leads to a specific user behavior. Therefore, we can further narrow down the goal of this thesis as: how to improve the product design process by identifying beha- vior, which at a specific occurrence context represents a potential for automation.

(18)

1.2 Research goals

The main research goal of this thesis can be formulated as: Given a system event log, how can the product design process be improved by identifying user behavior, which might represent a potential for automa- tion in a specific occurrence context. Detecting and evaluating the context and the predictors that might lead to the occurrence of a specific set of actions or use pattern, marks the initial step toward increasing the awareness of product designers about potentially small, but significant tasks automation that might enhance the user experience.

The research goal can be narrowed down in the following sub-goals, for which it is also described why it is significant to address them:

Research Goal 1: Find a relevant way to transform the data, such that it allows making use of contextual information and discovering user behaviour that potentially represent a need for automation.

The first step toward identifying potential for automation, is to detect specific behaviour or set of actions performed by the users in a repetitive way, in a particular context. Real-life event logs have shown that in practice the concept of process instance is often not explicit and events are not always logged in the same level of abstraction. The same might be expected for the contextual information, represented for instance, in the form of metadata (i.e. EPX protocol that describes the clinical situation: Cardio, Left Coronary) or surrounding events. Moreover, in reality the data logging is performed quite often in a very low-level. Therefore, it is essential to find a way to make the data from the event logs appropriate for further analysis, which in our case include pattern discovery and the detection of contextual information.

Domain knowledge plays undoubtedly an important role in this process and positively contributes into making data meaningful.

Research Goal 2: Discover use patterns that might represent a potential for automation and identify the context in which a given use pattern occurs.

A specific repetitive user behavior that is observed in most of the cases, is important to be detected as it might represent potential for future automation or it might be an indication of a miss-configuration or miss-implementation of the system. Thus, one might think of cases when for instance, task A is performed in 70% of the cases and task B only in 30% of the cases and performing A or B is not associated with any safety issue. At present, the system is configured to automatically perform B or it does not have any default configuration. Therefore, detecting these cases and automating the system to perform A, decreases the repetitive actions that users might need to perform in 70% of the cases and reduce it to only 30% of the cases. This identification might contribute in ensuring that the product is used conform the safety requirements and it is user-friendly.

If the case mentioned above occurs in clearly distinguishable context, the system might be implemen- ted in a way that depending on the situation it is smart enough to decide for the coming steps. Taking into account that users, as human being might perform tasks without thinking (i.e. as a habit they tend to perform task B), this approach mitigates the risks in critical situations. However, as human behavior is complex, the occurrence context is often not clearly distinguishable, and as such there is a need to mine the data and detect correlation between the occurrence of specific user behavior and contextual factors. Moreover, identifying the context in which a specific use pattern occurs, might enable us to determines what leads to this behavior, allowing in this way to evaluate its potential for automation.

Especially in the case of healthcare systems, attention is being paid toward the adaptive intelligence, named as such because it must be contextual to the situation. In addition , observations performed in hospitals and interviews with clinicians and domain experts made us aware of the fact that the discovery and the analysis of a use pattern must be seen as related to specific contexts.

(19)

Research Goal 3: Analyze the identified context, detect and evaluate the occurrence of the same or similar use-patterns

This goal will help in answering the following questions:

• Is the potential for automation detected in a specific use-case also valid for similar contexts? For instance, is the hypotheses true that different doctors in different hospitals are performing the same repetitive sequences? Does it depend on a procedure, doctor, hospital or is there no clear dependency discovered from the available event logs?

• What are the deviations from the described use-pattern in a specific context? What are the altern- atives? For instance, have other doctors found a work-around to deal with a specific case?

• Can we identify cases when the expected repetitive behavior is totally missing in a specific context? For instance, this might indicate cases when doctors skip them, due to frustration, annoyance or because they forget.

1.3 Approach

This section details the approach that is followed to fulfill the research sub-goals presented in Section 1.2and visually represented in Figure1.1. This approach will enable one not simply to gain insights into the user actions but also to better understand the factors underlying a specific user behavior, in order to evaluate its potential for automation.

Figure 1.1: A visual representation of the research goals

As illustrated by the first step in Figure1.1, data understanding, exploration and preparation is performed. Moreover, we make use of the domain knowledge and feedback from qualitative research methods throughout our project.

Firstly, to define an appropriate approach for transforming the original data in a form that can be fur- ther used for our analysis, not only the nature of the logging is considered, but we also make use of the domain knowledge. Qualitative research instruments, like observations in a hospital in the Neth- erlands and interviews with doctors and domain specialists were conducted to understand the system

(20)

and the way it operates and to further grasp the form of the user behavior that might represent potential for automation when occurring in a specific context. Taking advantages of the above, a mapping is performed between the low-level event logs and the high-level behavior performed by the users. For specific user actions, which were unable to be recorded from the observations in the hospital, the systems in the test labs were used to map them with the corresponding machine events. Moreover, the logged data which originally has a relational nature is transformed into event data, by defining a process instance and assigning a case identifier to each event.

Taking into account the fine-grained nature of the event data and the fact that events and also contextual information are not always logged in the same level of abstraction, we approach by defining some high-level activities, represented as a sub-sequence of events within a trace. Each high-level activity is defined by a Start and an End State. The artifact-centric approach is used, in combination with the notion of Composite State Machine, based on the reasoning provided in Section2.2and the outcome from the previous work [23] [18], which demonstrate that this is an appropriate approach for this type of event logs. The Start and End State is defined as a logical expression on a composite state. For instance, for our use case study, we figured out that most of the available metadata are logged per each individual acquisition; therefore, we decided to consider as a high-level activity the subset of the events occurring during and after a specific acquisition. However, depending on the user behavior that is being analyzed, different types of high-level activities might be defined.

The above-described approach allows one to first transform the original event logs of a relational nature into event data, and then enables one to interpret the events and the corresponding metadata from the abstraction level of interest, by defining the so-called high-level activities. These steps are formally defined in Section4.3and thus they might be used for transforming the data of the same or similar nature.

Secondly, we realized that identifying user behaviors that might be repetitive and represent potential for automation, blindly from the data, without considering the domain knowledge, is a challenge. We first tried applying frequent sequential pattern mining to detect repetitive sequence of events. However, as there were no useful insights, we once more took advantage of the qualitative research instruments, like observations and interviews, which proved to be an efficient starting point for the discovery, as they might provide one with indications for potential repetitive use patterns.

Armed with these initial findings, the following steps of the approach were defined. To enable the dis- covery of the use patterns, first trace abstraction is performed, by using set or bag abstraction approach over the sub-sequence of events belonging to the high-level activities, and then activity filtering is ap- plied over the abstracted traces that we described above, in order to detect occurrences of specific user behavior. After the pattern is discovered, one might interpret the context factors as variables of the predefined high-level activities and further mine for relation between these variables and the use pattern occurrence. This enables to detect what leads to a particular use-pattern, and to understand the context factors underlying a specific user behavior in order to further evaluate its potential for automation.

Finally, by combining all the context factors that are related with a particular user behavior, one might construct the overall contextual information in which this behavior occurs. This context is further mined and conformance checking is performed for other systems and similar use patterns, to detect expected occurrences or deviations. Moreover, the importance of the discovered use-pattern is evaluated using metrics from user research.

A Python framework is constructed to practically implement the proposed approach, from obtaining the event logs from the data lake, applying the necessary transformations and then discovering use patterns and detecting the context of their occurrence. The output from this framework is used to derive general statistics and construct plots using Jupyter Notebooks, constructed in a way that can be used later on, for other discovered use patterns.

(21)

1.4 Outline

The rest of the thesis is organized as follows:

Chapter2 provides an introduction to the main existing concepts and definitions that will be used throughout the rest of this thesis. It starts with the notion of event logs used in the context of process mining and concepts of artifact-centric approach and composite state machine. In addition the sequence mining and association rules are discussed.

Chapter3presents the case study used in this thesis, by first providing an overview of the iXR system that is analyzed and further detailing the current approaches of studying user behavior. In addition the main insights from data exploration, starting from the data nature, the main challenges and some preprocessing steps are described.

Chapter4formalizes the proposed approach for enabling the discovery of use-patterns that represent potential for automation. A running example is used to illustrate the approach, which is then detailed for the selected case study. Whereas, Chapter5describes further mining for detecting context factors related with the occurrence of the discovered use-pattern.

The proposed approach is evaluated from both a data and a qualitative perspective and the results of the evaluation are shown in Chapter6. Lastly, Chapter7presents the limitations, conclusions and potential future work related with this project.

(22)

Chapter 2

Preliminaries

This chapter aims to introduce the basic terms and concepts used throughout this thesis, before going into the details of our approach. As we will focus on analyzing instances of processes, Sections2.1 and provide a basic understanding of the main concepts of process mining, event logs, artifact-based approach and composite-state machine used to define the process instances. Section2.3describes data mining concepts, such as sequence mining and association rules, whereas, Section2.4describes the concept of context. At the end of each section, a short overview of how these preliminaries are used in this project, is given.

2.1 Event logs and process mining

The notion of an event and an event log or event data, form the basics of process mining. Van der Aalst [34] defines process mining as a discipline which aims to use event data to extract process-related information and which can be viewed as the missing link between data science and process science.

Moreover, as stated in [32], this discipline focus on analyzing and extracting insight in processes from event logs and it includes a combination of computational intelligence, data mining, process modeling and process analysis techniques and knowledge.

The main concepts of process mining are defined by van der Aalst [34] as following:

An activity is a well-defined step in the process. A case represents a process instance and trace is being referred to as the sequence of events recorded for each case. Each event refers to an activity and belongs to a particular case. Activity names are used to identify events. An event log is a set of cases, such that each event appears at most once in the entire log.

Process mining has three main types as described by van der Aalst [34]: process discovery, conformance checking and enhancement. Process discovery aims deriving models that describe the process, based on the event logs; conformance checking relates events in the event log to activities in the process model and compares both, with the goal to find commonalities and discrepancies between the modeled be- havior and the observed behavior; enhancement aims to extend or improve an existing process model using information about the actual process recorded in some event log. In addition, conformance check- ing is referred by Carmona et al. [7] as the analysis of the relation between the intended behaviour of a process as described in a process model and event logs that have been recorded during the execution of the process.

In the reminder, we will use the notion of event logs and other concepts of process mining as defined above. We make use of them in this thesis, due to the fact that our approach aims analyzing process instances, and we propose the concept of high-level activities, in order to enable the discovery of use- pattern that might represent potential for automation. Moreover, the concept of conformance checking will be used in this thesis, to enable comparing the expected user behavior, represented in the form of

(23)

CHAPTER 2. PRELIMINARIES

use-patterns, with the one observed in the log. The goal will be to detect whether or not the discovered use-pattern occurs as expected. In case of deviations, the allowed degree of flexibility is discussed and severity levels are assigned to use-patterns, and further analysis is performed to find out root causes of these deviations.

2.2 Artifact-centric approach and composite state machine

In this project we use the concepts of artifact-centric approach and composite state machine as a way to abstract from the complexity of the system and the logged data. Moreover, the fact that the system itself can been seen as composed by a specific number of unit and subunits, makes it more natural to propose an approach that takes advantage of the artifact notion.

The combination of both these approaches is used in this thesis to enable defining the process instances and the high-level activities. The term artifacts is used in this thesis as defined by Fahland et al. [9], which describes the artifact-centric process models based on the concept of proclets [35]:

Definition 1: An artifact describes a class of similar objects, together with the life cycle of states and possible transitions that each of these objects follow in a process execution. An artifact instance is an object that participate in the process and an artifact’s life cycle describes when an instance of the artifact is created, in which state of the instance which actions may occur to advance the instance to another state, and which goal state the instance has to reach to complete a case. Finally, an artifact-centric process model describes how several artifact instances interact with each other in their respective life cycles.

For instance, in the case of an iXR system , the x-Ray pedal for exposure is an artifact, which might be in states PedalPressed or PedalReleased, and the corresponding events that enable this artifact to advance from one state to the other are Command: Start Exposure and Command: Stop Exposure.

The definition presented above clearly implicates that when working with an artifact systems, we need to consider the states of the artifacts. Therefore, to model these state-based processes, we use the notion of Composite State Machine (CSM) as proposed by van Eck et al. [38], [39] and more recently extended by Pietraru [23]. The composition of individual artifacts states, define the state of a CSM, which is formally defined as:

Definition 2: A Composite State Machine M = (S,T,b, f ) is a model of a process with n artifacts where S ⊆ (S1× · · · × Sn) is a set of states, with S1, . . . , Snthe sets of artifact states, b = (b1, . . . , bn) is the initial source state, f = (f1, . . . , fn) is the final sink state,T ⊆ (S ∪ b) × (S ∪ { f }) × L is the set of transitions, and

∀(s, s⁰) ∈ T : s 6= s⁰. We define S = S ∪ {b, f } and Si= Si∪ {bi, fi} for i ∈ 1,...,n.

As previously mentioned this definition is extended by [23] with L, which is a set of transition labels, corresponding to events from E, where E = {e1, e₂, . . . , e_z} is a set containing z events, where each event e_i, triggers a state change for at least one artifact.

To clearly illustrate the concept of a Composite State Machine, Figure2.1, shown in [39], depicts a simple healthcare process M, which has two distinct artifacts: A₁, which is the status of the patient being treated, and A₂, which is the status of lab tests of the patient. The initial states (b₁, b₂) are marked with an incoming arrow and the final states ( f₁, f₂) are marked with an outgoing arrow. The healthcare process starts with the patient registration, after which a lab test is planned to diagnose the patient. In case the patient misses their appointment or the results are inconclusive, a new test is planned. Con- versely if the test results are ready then the treatment can proceed, during which additional tests may be required, until the patient is healthy again and the process ends.

(24)

Figure 2.1: A model M of a simple healthcare process and its two artifact models A1 and A2. Every state in the process is a combination of a state from each artifact. [39]

Definition2, remains suitable for our research as well, considering that the system that we will analyze has a natural division into artifacts. However, unlike previous research that make use of artifact-centric approach and composite state machine to discover process models [38], [39] or to define the patterns and anti-patterns [23], we use the combination of these approaches to define process instances and what we call high-level activities. In addition, we have modified the notion of pattern and anti-pattern from those defined in [23] and used only a sub-part of it, related with the fact that a pattern depicts expected desired use of the system whereas an anti-pattern depicts expected but undesired use of the system. Although, we propose a different approach for discovering them.

2.3 Association rule mining and sequential pattern mining

We use the concept of association rule mining, as a pattern mining technique, which does not take into account the sequential ordering of the events. However, taking into account that we will use it to mine for correlation between the contextual factors and the discovered use-patterns, and not to discover the use-pattern itself, this limitation does not negatively influence on the results. It is a Agrawal et. al [2]

were the first to introduce the problem of mining for Association Rules, based on the concept of strong rules, for discovering relations between sets of items in a large customers transactions database, with some minimum specified confidence. We will use the definition of association-rules as adapted by Hornik et al. in [19]:

Definition 3: Let the itemset I = {i1, i2, ..., in} be a set of n binary attributes called items. Let D = {ti, t2, ..., tm} be a set of transactions called the database. Each transaction in D consist of items pur- chased by a customer in a visit and has a unique transaction ID and contains a subset of the items in I . An association rule is defined as an implication of the form X ⇒ Y , where X ,Y ⊆ I and X ∩ Y = ;. The sets of items (itemsets) X and Y are called antecedent (left-hand-side) and consequent (right-hand-side) of the rule.

In the context of customer transactions data, a well-known example of an association rule, is the fol- lowing: {br ead , but t er } ⇒ {mi l k} , with the itemset I = {mi l k,br ead,but ter,beer }, meaning that if bread and butter (antecedent) is purchased, also milk (consequent) is purchased.

Several measurement of interest exist, aiming the selection of interesting rules from all the possible ones. Among them, the most frequently used are support and confidence. The support of a rule is defined as the fraction of transactions in D that satisfy the union of items in the consequent and ante- cedent of the rule, corresponding in this way to statistical significance, sup p(X ⇒ Y ) = supp(X ∪ Y ).

Usually, a minimum threshold of support is defined based on the domain knowledge, which is then used

(25)

to filter rules with support above this threshold. Confidence is a measurement of the rule‘s strength, as it shows the percentage of the transactions containing the antecedent, for which the rule is cor- rect and is represented as con f (X ⇒ Y ) = supp(X ∪ Y )/supp(X ). Like the support, association rules must also satisfy a minimum confidence level. To deal with cases when a large number of rules satisfy both support and confidence, one of the popular additional interest measurements that can be used is lift, which is a measurement of the rule‘s importance, represented by l i f t (X ⇒ Y ) = supp(X ∪ Y )/(supp(X )sup p(Y )), and interpreted as the deviation of the support of the whole rule from the sup- port expected under independence. Strong association is indicated by greater lift values. [19]

In addition, as the focus of this thesis is the discovery of potential for automation, represented by user behavior that is repetitive across process instances, this might be seen as related to the concept of sequential pattern mining. Viger et al. [10], while performing a survey on recent advances and research opportunities on sequential pattern mining, defines it as below:

Definition 4: Sequential pattern mining aims the discovery of interesting subsequences in a sequence database. A sequence database is defined as a list of sequences SDB =< s1, s₂, ..., s_p>, having sequence identifiers (SIDs): 1, 2, ..., p. A sequence is an ordered list of itemsets, s =< I1, I₂, ..., I_m> with an itemset I = {i1, i₂, ..., i_n} being a set of items, as previously defined in Definition3.

SID Sequence

1 < {a, b}, {c}, { f , g }, {g , e} >

2 < {a, d}, {c}, {b}, {a, b, e, f } >

3 < {a}, {b}, { f , g }, {e} >

4 < {b}, { f , g } >

Table 2.1: An illustration of a sequence database, as an ordered list of itemsets. [10]

Table2.1, illustrates a sequential database, and for instance, the sequence < {a,b},{c},{f , g },{g ,e} >

represents five transactions made by a customer in a store. Unlike frequent itemset mining and associ- ation rule mining, sequential pattern mining takes into account the sequential ordering of the events.

Therefore, it is frequently used with time-series and sequence data. Moreover, various criteria exists for measuring the interestingness of a subsequence, among which support, as it is defined below.

We will recall the key concepts of association rules while mining for correlation between the discovered use-patterns and potential contextual factors. Moreover, measurement of interest, such as support, confidence and lift will be discussed to make use of these rules. In addition, the use of some popular sequential pattern mining algorithm will be discussed, even though there were no successful results when trying to discover sequential patterns for the system under our investigation.

2.4 Contextual information

Detecting patterns that might represent potential for automation, requires taking into account contextual information. Context-awareness, has been extensively addressed in literature as a challenge for providing better and more valuable computational services. Oxford Dictionaries¹define context as

"the circumstances that form the setting for an event, statement, or idea, and in terms of which it can be fully understood".

The definition provided above is quite general and we see context as a very subjective concept, de- pending a lot on the field that it is applied on. Hence, to determine what context is in a computing environment, we will first provide an overview of context definition that already exist in literature and further adjust it according to our research focus in Chapter5.

1https://en.oxforddictionaries.com/

(26)

To start with, according to Abowd et al. [1] context is any information that can be used to character- ize the situation of an entity, where an entity can be a person, or a physical or computational object, while context-awareness is the use of context to provide task-relevant information and/or services to a user. Similarly, Brown [6], who represents a framework for creating context-aware applications, describes context as a combination of different examples of context elements, such as: location, adjacency of other objects, critical states, computer states and time. A definition of context, which is quite close to our research focus is given by Chen and Kotz [8], who performed a survey of context-aware mobile computing research. According to them context is the set of environmental states and settings that either determines an application’s behavior or in which an application event occurs and is interesting to the user. They also proposed to add the element time to context, in addition computing, user and physical context proposed by [29], as a natural and important context element.

(27)

(28)

Chapter 3

Case Study

A better understanding of the system and the available data, not only enables one to select the appropriate data-driven approach, but it also contributes in better interpreting of the finding and the outcomes of the chosen approach. Therefore, before diving into details, we will introduce our case study, focused on one of the iXR systems, named Philips Azurion. It illustrates the practical application of the pro- posed approach and its potential to be used in similar systems or products with logged data of the same or similar nature.

This chapter is a journey through the system and its logged data, starting with Section3.1, where an overview of Azurion system and its main components is provided. The current approaches of studying user behavior are discussed in Section3.2. Then, in Section3.3, the nature of the data that will be used for this project and some preparation steps are described. First, we provide an overview of data gathering, as well as a general data description in Section3.3.1. The initial challenges resulting from the data exploration are discussed in Section3.3.2. Finally, in Section3.3.3, some descriptive statistics for the data used in this project are provided, along with the preparation steps.

3.1 An overview of the system

Interventional X-ray systems like Azurion enable clinicians to guide miniature tools such as catheters, balloons and wires through a small opening in the patient skin. There are a variety of cardio, vascular and neuro diseases that can be treated, such as coronary disease (i.e. stenting), structural heart disease (i.e. valve placement or repair), vascular disease (i.e. aortic aneurism), stroke treatment (i.e. stenosis, aneurisms) and a variety of other less known treatments [11].

iXR systems are quite complex. To illustrate it, the main components of Azurion system are shown in Figure3.1aand a detailed description of them is provided below [16]:

1. C-arm: it can be monoplane or biplane (Figure3.1ais a biplane, with a frontal and lateral channel) and it is used to capture the required images in various angels and locations, by moving around the patient. It has two main components, one on each side of the arm: the detector and the X-ray tube.

2. Detector: can be thought as "a camera" which captures images during different procedures, us- ing an X-ray based technique. X-ray is turned on and off by using the foot pedals. The size of the detectors vary on the series of the system; however, the field-of-view can be easily changed according to the region of the interest.

3. X-ray tube: it generates the X-ray beam which then generates the X-ray image of the patient blood vessels and organs in details. These images are then captured by the detector. Two important

(29)

CHAPTER 3. CASE STUDY

components related to the X-ray tube are shutters and wedges. Shutters are collimators, which are used to limit the width and the height of the irradiated area and also to improve the quality of the image. Additionally, wedges are filters which contribute in reducing the intensity of X-ray for the irradiated area and to improve the quality of the image. Shutters and wedges can be adjusted from both Touch Screen Module and Control Module.

4. Patient table: the table on which the patient lies during the procedures. The patient is transported onto the table using moving beds.

5. Control Module (TSO): it provides the necessary controls required to adjust the position of the table, arm, detector, X-ray tube, shutters, wedges and perform specific image functions during acquisitions. The type of the module depends on the type of the C-arm (monoplane or biplane).

6. Touch Screen Module (TSM): it can be used to control acquisition settings, review and process images. In addition, remote control also can be used to control viewing functions on the system.

7. Foot Switch: is used to control the fluroscopy and exposure. Each pedal in the foot switch may be assigned to a different function, based on the configuration. They can also be monoplane or biplane.

8. Monitors and FlexVision: monitors are used to display live or referenced images. Their display is configurable. In case of FlexVision, individual monitors are replaced by a single large monitor that displays all the necessary views depending on the chosen configuration.

9. Catheter: it is not a component of Azurion system, but it is important to be mentioned when talking about minimally invasive treatments, as it enables clinicians, for instance, to look at the real-time visuals in the monitors and easily manoeuvre through blood vessels and place a stent to open them, in case they have been narrowed.

(a) Azurion system in an Exam Room (catherization lab) (b) Control Room

Figure 3.1: An overview of Azurion System

Figure3.1aalso illustrates the hospital environment where Azurion is located, called catherization lab.

In addition, Figure3.1bshows a Control Room, which has parallel monitors and controls to those in the Exam Room, allowing parallel interaction of the staff with the system, without interrupting each other.

For instance, while fluroscopy/exposure is taking place, a technician can perform image processing, reviewing, selection, analysis for the same or another patient from the Control Room.

(30)

3.2 Current approaches of studying user behavior

Currently, to study and understand user behavior, their interaction with the system, needs and motiva- tions, and to identify potential usability issues a combination of the following traditional approaches is used:

• Usability Studies: specific usability scenarios and set of tasks are prepared by usability engineers and are then completed by representative users while being observed, in order to test and evaluate the product.

• Physical Product Monitoring: instances of products are monitored on-site to identify any poten- tial usability issue and observe the user interaction with it.

• Flying Quality Squad: sites all over the world are visited, in one or two weeks tours. The tours con- sist of customer interviewing to collect their direct feedback for the performance of the product, and technical inspection of the product.

• User Complaints: complaints filed by users, mainly in case of system related critical events. The complaints include a description of the experienced problem or usability issue.

The input from the approaches listed above is then evaluated from the domain experts and priorities are assigned to the identified usability issues. However, there are some potential limitations or risks related to the traditional approaches of studying user behavior. For instance, usability studies only make use of the knowledge the engineers already have; therefore, they can only observe situations they predefined and check what they were aware of as potential risks, but the chances to discover something they did not know are quite small. Moreover, the number of scenarios and users participating in usability testing is limited, leading in this way to potential biased results. In addition, physical product monitoring and flying quality squads are mainly focused on problem instances, as they take place for specific systems, used by specific users, in specific locations and are carried out by specific domain experts. Hence, the knowledge is scattered and it is not known how others are experiencing an issue, as no cross-analysis is performed. Regarding the customer complaints, despite the fact that they are reported mainly for sys- tem related critical issues, even when reported the scale of impact is not known, as there is no evidence how others are experiencing it. Moreover, as complaints are handled by different domain experts and they might occur in a long time scale, the knowledge is scattered and this might lead to a missed pattern.

Therefore, we propose a data-driven approach, which in combination with traditional approaches will bring more insights and contribute in correctly understanding user behavior. Firstly, it will enable us to perform cross-analysis, by checking how other users, in other hospitals, in other countries are experiencing the same or similar issue; hence a bigger scale of impact is ensured. Secondly, it allows us to look at the reasons of a specific behavior, to further detect what leads to it, which are the preceding events and the context of its occurrence. This data-driven approach aims to contribute in improving the product design, by enriching the overall picture of current approaches for studying and understanding user behavior and experience. To analyze the user behavior from a data-driven perspective, we intend to make use of the event logs generated daily from the system, described in the following section.

3.3 Data understanding and preparation

As previously mentioned, the event logs of iXR systems are characterized by a high degree of complexity due to the very fine grained nature of the logging. In addition, they are not primary logged for the study of the user behavior and for getting insights into the process execution, and as such process mining techniques are not directly applicable. In the following sections we describe the findings from the data understanding and exploration and the initial preparation steps.

(31)

3.3.1 Data gathering and description

Daily log files from iXR systems, including both the events generated by the interaction of the users with the system and the ones generated by different system units during its operation, are loaded into Vertica SQL Data Warehouse, a distributed column-based storage solution, as illustratted in Figure3.2. To col- lect the data for this project, the data warehouse is queried using SQL queries integrated in our Python application, constructed based on the available data models. Queries are performed per systems and per date, to optimize its use, considering that Vertica is a shared resource, and daily event logs, which will serve as input for our analysis, are generated.

Figure 3.2: An overview of data gathering (the area marked with dashed lines), adapting the high-level data architecture from [11].

A subset of the gathered event logs from an Azurion system is provided in Table3.1below, as an illustration. Originally, there is no case identifier for the events logged from this system, as the data is relational in nature. However, to enable the analysis of process instances, cases must be defined and a case identifier must be assigned to the events, apart from the timestamp and the activity attribute. This step is further elaborated in Section4.6.1.2, and can also be applicable for similar cases. In the case of Azurion systems a better logging might solve this issue. However, in reality there are cases when this problem will not be solved with a better logging, as the notion of process instance is often not explicit, and that is why this step is significant.

Case Id Timestamp Activity

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.040 Command: StartFluoroscopy 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.044 Command: StartFluoroscopy 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.051 User output: LED on UI module is

set on

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.062 Command: BLDisplayOrientation 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.062 Start prepare generator

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.068 Acquisition -> Prepare devices for Acquisition

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.145 Viewpad: UiActivity detected 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.154 From Generator device: PoInt:

Shunt test and calibration data 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.156 XrayService: Prepare

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.202 XrayService: acquisition parameters

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.390 XrayService: Prepared 4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.393 Lab: lock

4226704_2018-04-06_15:12:48.169000_DLE 2018-04-06 15:47:05.406 Xray indication inside: On

Table 3.1: A subset of event logs from an Azurion system, after adding the case identifier

(32)

Several minimal invasive procedures are performed daily using Azurion systems. Hereinafter, we will refer to these procedures as lab examinations or lab exams. When it is time for a new lab examination, the clinical workflow might be described as following: the clinicians create a new lab exam or open an existing one, analyze the patient details obtained from the RIS application, position the patient and the system, and then start the examination during which several acquisitions take place. When the exam- ination is finalized the patient is transferred from the table and the system is re-positioned.

To perform the mapping from relational to event data, we need to assign a case identifier to each event that is logged. The notion of the process instance might differ, depending on the data or case study, but our approach consist in considering each lab exam as a unique case or process instance, in order to understand and get insights into the user behavior during or between the lab examinations. Therefore, in Section4.6.1.2, we have proceeded with assigning a case identifier to each event that belongs to a specific lab exam.

Before describing the approach that is used, lets first illustrate the process mining terms defined in Section2.1, using the subset of the event log given in Table3.1:

• A case, which is defined as a process instance, is represented by a lab exam. CaseId, is an identifier for the lab exam, for instance 4226704_2018 − 04 − 06_15 : 12 : 48.169000_DLE is used to identify events that are performed during the lab exam which starts at 15 : 12 : 48.169, on 06.04.2018, in the system with number 4226704.

• An activity, which is a well-defined step in the process is for instance: Command: StartFluoro- scopy or Acquisition -> Prepare devices for Acquisition.

• An event, is for instance the activity Command: StartFluoroscopy related with the case 4226704_2018 − 04 − 06_15 : 12 : 48.169000_DLE.

• An example of a trace, is the following sequence of events: <Command: StartFluoroscopy, Com- mand: StartFluoroscopy, User output: LED on UI module is set on, Command: BLDisplayOrient- ation, Start prepare generator , Acquisition -> Prepare devices for Acquisition, Viewpad: UiActiv- ity detected, From Generator device: PoInt: Shunt test and calibration data, XrayService: Prepare, XrayService: acquisition parameters, XrayService: Prepared, Lab: lock, Xray indication inside: On>

3.3.2 Initial challenges resulting from the data exploration

The logging nature of the available data is one of the main challenges of this project. The diverse range of activities which are logged from a wide variety of system components without a clear distinction between user and system activities, the low-level granularity of the logging and its relational nature, make the original logs of several systems complex and tricky. Therefore, one must employ a number of preprocessing steps, before applying mining techniques. Gijsbers et al. [11] have provided an over- view of use cases led by Philips Healthcare, which aim mining large amounts of data to offer predictive maintenance instead of maintenance at fixed time intervals. As clearly mentioned by the authors, the biggest challenge is to optimize the logged data for performing further analysis. This is one of the main challenges for our project as well, as it is evident that the data from iXR systems are not logged for the purpose of mining for processes and visible insights into the user behaviors. In the following sections we discuss some of the challenges one might face when dealing with data of the same or similar nature.

3.3.2.1 Fine-grained nature of the data

In practice, real-life system logs contain events generated by different system units and their corresponding sub-units, in a very fine-grained way. Moreover, often these data are not logged with the aim of studying the user behavior and their interactions with the system, and as such there is not an explicit distinction between the events performed by the users and the ones generated by the system during its operation. In addition, the complexity of the logged data is also affected by the possibility that systems

(33)

offer to the users to perform actions in parallel. For instance, Azurion system is specifically designed to enable people to work in parallel on different activities, such as for instance, from either Exam Room, Control Room or FlexSpot, without clearly distinguishing whether these events belong to the same lab examination or it is the case that the exam of another patient is being reviewed in parallel. Furthermore, Table3.1, illustrates that there are 13 events logged for a time period of around 0.4 seconds, indicating in this way the fine grained nature of the data. The number of distinct activities for the analyzed logs reaches 13,097 activities and these activities might be related to the following categories: managing the lab examinations and patient details, user commands, user guidance or output, Xray, acquisitions, col- lisions, calibrations, application errors, services, software versions, system communications with the network and other tools, system startup and shutdown, etc.

The fine-granularity of the data also leads to another challenge: performing the mapping between the low-level logged activities and their high-level interpretation.

3.3.2.2 Mapping relational data to event data

To enable the analysis of process instances, one must first transform the logged data into event data, in case they are originally relational in nature. In reality, quite often the system logs are first stored in data lakes, then extracted, loaded and transferred in data warehouses in the form of relational databases.

The data warehouses can be queried further on, to obtain the data of interest. However, to be able to get insights for processes, one is expected to perform a mapping of the relational data to event data. For instance, in the case of Azurion system logs, the original logged data does not have a case identifier of the process instances; hence, it is necessary to assign one for each event occurring in logs.

3.3.2.3 Making use of contextual information

One of the key components of an approach used to enable the discovery and the analysis of use-patterns that might represent repetitive user behavior, is that it must be contextual to a specific situations. For instance, in the case of iXR systems it must be contextual to the clinical and operational workflow. There- fore, the ability to access and analyze the available contextual information is an essential factor toward having meaningful insights on the user behavior.

The following situations might be faces regarding contextual information:

• Contextual factors that are already known, might serve as input for defining a specific situation, which will be later mined for discovering potential repetitive user behavior.

• The discovered use-patterns might be further mined, to detect other contextual information, which might clearly indicate when does a specific user-pattern occur and what leads to it.

The challenge stands in the fact, that in reality, contextual information might be quite often logged in different levels of abstraction; therefore, it must be interpreted as such.

Contextual information might include: (1) available metadata representing different system settings (i.e clinical situation such as EPX protocol; positions and dimensions of different components of the machine, such as table, beam, shutters and wedges positions, detector size, image dimensions, radiation amount, ect.); (2) the events that surround a specific pattern; (3) the point in time (i.e. relative time of pattern occurrence, considering the exam duration).

The metadata are logged for each acquisition, as additional information of specific events. For instance, Time Information event is used to log the starting and stopping time of X-ray radiation for each acquis- ition; XrayService: channel independent irradiation data event is used to log the EPX protocol, some component positioning (i.e. table), duration and radiation details, etc. The following shows a fragment of the available metadata logged in the level of individual acquisitions for an Azurion system:

ExaminationId, RunTag, DetectorX, DetectorY, DetectorDiagonal, ShutterLeft, ShutterRight, ShutterTop,

Eindhoven University of Technology MASTER Data-driven product design discovering potential for automation by analyzing user behavior Uku, R.