Modeling a Diabetes Guideline using the TMR model for interaction detection

(1)

MSc Artificial Intelligence

Track: Machine Learning

Master Thesis

Modeling a Diabetes Guideline

using the TMR model for interaction detection

by

Roelof van der Heijden

10906827

December 27, 2017

42 EC 2016–2017

Supervisor:

Dr V. Zamborlini

Dr A. ten Teije

Assessor:

Dr M. van Someren

Prof Dr F. van Harmelen

(2)

Abstract

Clinical guidelines are used to standardize medical treatments and help doctors with best practices in their field. The TMR model is used to detect conflicts between different guidelines. We have performed a case study of a real world diabetes guideline using the TMR model. We found 51 conflicts between 21 selected recommendations and discuss their implications. We also extend the TMR model to be able to handle preconditions of recommendations. For this we have developed a new representation of propositional logic formulas in RDF. With our extension we are able determine if the preconditions of a supposed conflict prevent a real conflict from occurring. Our results indicate that 3 of the 51 detected conflicts can indeed be ruled out because of this.

1 Introduction

Clinical guidelines are a standard in the medical field. There is a separate clinical guideline available for almost every major disease. They serve as a framework for the treating doctor, to put his actions in perspective and point him in new directions. Of course, they are only guidelines and doctors can deviate from them when they deem it necessary.

The recommendations in clinical guidelines can also be used to create a treatment plan for a patient. Because they are often long and complex documents, tools have been developed to help with this process. One of these tools is the modeling language Asbru [1], which allows practitioners to precisely formulate each step in a treatment plan and the conditions that need to be satisfied in order to perform the next step.

One feature that Asbru does not support however, is conflict detection between recommendations. When a patient suffers from multiple diseases, multiple guidelines might apply, whose recommendations might contradict or in other ways interact with each other. It might even be the case that within a single guideline, some recommendations interact with some other recommendations. These interactions might result in errors in the treatment of patients.

The TMR model, designed by Zamborlini et al. [2] is another approach to describe clinical guideline recom-mendations. It is designed to detect conflicts between recomrecom-mendations. It has been used in this regard for a constructed set of recommendations already. However, the TMR model is also limited, for example in its ability to handle preconditions of recommendations.

We want to know if the TMR model is able to detect interactions between recommendations from real life clinical guidelines. In this work we test the TMR model using the Scottish diabetes guideline SIGN116 [3] and report on our experiment. Furthermore, we extend the work of Zamborlini and implement some logic to allow the TMR model to handle preconditions.

The structure of this paper is as follows: first we explain the research question in section 2 and discuss the related research in this field in section 3. Then we explain which steps are necessary to answer it in section 3. We explain the TMR model in section 6. The actual experiment is described in section 7, 7.1 and 8 and the results are presented and discussed in section 8.2. Section 9 describes our extension to the TMR model that focuses on preconditions. Section 11 provides a general conclusion to this research and describes the lessons we have learned from it, as well as pointers for future research.

Acknowledgements

This work is part of a master project at the University of Amsterdam. The supervisors are Veruska Zamborlini and Annette ten Tije. We would like to thank Jan Wielemaker for his help with coding in Prolog.

2 Research question

The main research question of this thesis is as follows:

Is the TMR model sufficiently powerful to detect interactions within a single real world guideline? To answer this question, we first need to look at the information present in clinical guidelines. Is the information available to us in clinical guidelines sufficient to allow formal representation and detecting interaction in the TMR model? Is the help of a medical expert required? Are the recommendations clear and unambiguous?

After we have found answers to these questions, we can look at the limitations of the TMR model more closely. Which recommendations can be described using the TMR model and what choices need to be made in doing so? Are there recommendations that can’t be described by the TMR model and what extensions would be required to be able to?

The next step in our research is to focus on the preconditions in recommendations. How can we implement them and what new insights does this bring to our previous results? Does the handling of preconditions improve the detection of true interactions?

(3)

3 Related work

In this section we describe some literature and how it relates to our research.

Riano and Ortega [4] have performed a literature survey on computer systems that deal with multi-morbidity. They classify the recent research in this area according to several systems, one of them being a classification system developed Abidi et al. [5] and extended by Jafarpour [6], where the categorization is based upon the combination point of the different guidelines. Five distinct categories are defined: guidelines can be combined and then computerized, the computerized guidelines can be combined, combination can occur of the individual treatment plans, or during the process of computerization. Finally the knowledge from guidelines can be combined based on the stored records of patients that match the multi-morbid criteria. Riano and Ortega identify the work of Zamborlini et al. as belonging to the category where guidelines are computerized and then combined. The extension we describe in this thesis does not change this categorization.

Riano and Ortega also mention the strengths and weaknesses of the used techniques. They identify “reusable knowledge, conceptual simplicity, decremental costs” as the strengths of transition fitting, the technique used by the TMR model. Decremental costs in this context means that the required effort of adding more guidelines to the system’s knowledge base diminishes with each additional guideline, since the concepts shared between guidelines can be reused. We concur with this analysis, though we want to point out that the reuse of knowledge also brings its own dangers, as we will describe in more detail later. The weaknesses are identified as follows: “not completely automatic, premature, only suitable for short-term treatments.” Again, we must agree with this analysis. The TMR model is indeed premature, since it is not yet a fully functional system. It is also not completely automatic in the sense that it does not provide a user with a treatment plan for a specific patient. However, this was never part of the design goals of the TMR system: it focuses on conflict detection rather than conflict resolution. Because of this a certain level of expertise is always required from the user. For the same reason it can be called suitable only for short-term treatments. Further extensions and an adaptation of the TMR design goals can perhaps address these weaknesses.

A study with many similarities with our work is from L´opez-Vallverd´u et al. [7], though many differences exist as well. To start, both describe rule-based systems and both are applied to DM guidelines. However, their approach is completely different: they have created a system that takes a treatment plan for a specific patient as input and analyzes the drugs that are or aren’t prescribed. With the help of a medical expert, combination rules have been designed that can add, remove or replace certain drug prescriptions. These rules are executed based on the applicable diseases and presence of drugs. In a trial of 20 multi-morbid patients, physicians have agreed with all presented replacements and found the system to be valuable.

The TMR system is very different in these aspects: a partial CIG has been created without consultation of a medical expert. Besides drug interactions, also interactions between different types of medical actions can be detected. The TMR system aims to detect interactions within and between guidelines, rather than individual treatment plans. Perhaps the biggest difference is the scalability of the systems: the complexity of adding a new guideline in the TMR system diminishes slightly with each new guideline, whereas the complexity increases dramatically when 4 or more guidelines are to be combined using the methodology of L´oez-Vallverd´u et al. This is because new combination rules have to be constructed that take into account all the different combinations of diseases.

Another group that has been working on machine readable clinical guidelines is OpenClinical.net [8]. They have designed a repository where users can upload “workflows” derived from CGs that they have written using a process formalization language that is called PROforma. By allowing the users to upload, peer review and edit the workflows, they hope to keep the workflows up to date and allow users to adjust workflows for their specific needs. Over 50 applications have been created for a wide variety of medical fields. Over time, this could become a large repository for machine readable clinical guidelines. By completely distributing the work load to systems users, the combinatoric problem of constructing guidelines for combinations of diseases might become more manageable.

This study from Peleg et al. [9] gives a comparison between five CIG modeling languages: Asbru, Eon, GLIF, GUIDE, PRODIGY and PROforma. It asked experienced modelers of each language to model two guidelines and compared the resulting models syntactically and semantically. Their aim was to identify common components between the different languages to try and establish some standards, as well as providing a starting point for discussions on comparing CG modeling languages. They have concluded that there are indeed major components that are very similar between the languages, however they note that these are implemented in different ways because of the different goals of each language. They find that it is important to allow each research group behind a language to pursue their own goals rather than try to constrain them by imposing a standard.

GLARE [10] is another structured language for describing clinical guidelines. It consists of two modules: a guideline acquisition module, and a guideline execution module. The acquisition module provides a user friendly interface to load a clinical guideline. While the doctor is entering the guideline, it already detects many forms of semantic or syntactic inconsistencies: name and range checking, to ensure standard nomenclature is being used; logical consistency, to ensure that each set of alternatives is preceded by a decision and each decision is

(4)

preceded by a data query and to prevent circular dependencies; and temporal consistency, to ensure the entered guideline is still executable. This process ensures only high quality guidelines are entered into the system.

However, when applied to the area of multi-morbidity, we see some potential problems. All these different forms of consistency checking apply to a single guideline. When multiple guidelines are relevant for a single patient, as is the case for multi-morbid patients, their simultaneous execution could lead to conflicts. Moreover, in the results of our studies we have found that even within a single guideline conflicts and inconsistencies exist. This suggests that imposing rigorous restrictions on entered guidelines might not be practical.

Furthermore, in our research we have found that certain interactions between recommendations can be excluded based on their preconditions. In GLARE, preconditions are stored as plain text, so an automated analysis of preconditions is difficult.

In [4], Anselma et al. describe their extensions to the GLARE system. They note that their research is the first to focus on the temporal interactions between actions guideline actions with the intent to detect conflicts that actually occur, rather than conflicts that might occur when their effects happen to be active simultaneously. They construct a constraint satisfaction problem based on three sources of information: a user-provided log that describes when certain actions have been executed, temporal constraints extracted from the guidelines as they are expressed in GLARE, and information present in their knowledge base. Once solved using a standard STP constraint propagation network framework, they interpret the results and provide the user with a clear “YES,” “NO” or “MAYBE” to indicate whether an interaction occurs. A graphical representation of the solution is also provided, where a timeline is sketched for each time constrained event in such a way that it is clear to a user whether an interaction occurs and what steps need to be taken in order to prevent such an interaction.

Their work is a great addition to conflict detection between clinical guidelines. However, a user is still required to select the recommendations from the guidelines for the system to analyze. This means the user needs to be aware of possible interactions before using the system. In this thesis we present a method to present these possible interactions to a user, requiring only a selection of clinical guidelines to analyze, rather than the actual recommendations. This means that it could be useful for a doctor to use both our program and Anselma’s et al. system in conjunction, to determine whether guideline recommendations could interact for a specific patient.

Another similarity is that both studies use a form of constraint satisfaction programming to detect conflicts. We have used SAT, which is a special kind of constraint satisfaction programming, more specialized than the one [4] use. This raises the question if it would be possible to incorporate our work into the system of Anselma et al. A more in-depth analyses will need to be conducted to conclude if this is indeed possible or not.

4 Method and Contributions

In this section, we describe our methods to answer our research questions.

4.1 Step 1

To test the abilities of the TMR model to describe recommendations from real life guidelines, we first chose a real life guideline. We were looking for a guideline that based its statement on verifiable scientific research. It also needed to be written in English, so that the international scientific community can take part in the discussion. It would have our preference if the guideline was also recently updated. Next, we went through the guideline and identified several recommendations that could possibly interact with each other. Only if we know that interactions are supposed to exist between the recommendations, can we verify any results the TMR model might give us.

After that, we examined the recommendations closely, to determine if there is any missing or ambiguous data in the selected set. It is important to note these problems already, for it might influence the choices we will make during the modeling. Furthermore, clinical guidelines are written with a target audience of medical experts in mind. We need to determine if we are able to understand the guideline before starting to model the recommendations.

Results Selection of recommendations, their analysis and expected interactions.

Contribution Discussion about the ambiguities and missing information found in the guideline.

4.2 Step 2

The next step we took in our research was to try and model the recommendations. This means we fit each recommendation in the format that the TMR model requires. Any problems we encountered or decisions we had to make are discussed in this section.

With our models of the recommendation in hand, we executed the algorithm to detect the interactions. We compared these results with our initial expectations and analysed any differences.

(5)

Results Modeled recommendations, calculated interactions.

Contribution Discussion about (i) the benefits and difficulties faced while modeling, (ii) analysis of the resultant interactions compared against the expected ones, and (iii) how could the TMR model be improved.

4.3 Step 3

Subsequently, we extend the TMR model to allow for the handling of preconditions. To do this we have identified a common structure of preconditions and designed an implementation for it. Using these observations, we performed the actual encoding of the preconditions. We ran the algorithm a second time and discuss the results compared to our expectations as well as the previous results to determine whether we have been successful in extending the TMR model.

Results The TMR model extended with preconditions, new calculated interactions.

Contribution Discussion about the proposed solution, advantages & limitations.

5 Recommendation analysis

Before we explain the TMR model, we analyse the components of a typical recommendation in the diabetes guideline.

5.1 Recommendation components through an example

In this section we break down the example recommendation in Figure 1 to its core components. It is taken from SIGN116 [3], the Scottish CG on diabetes.

Figure 1: Example of a CG recommendation. Taken from SIGN116 (2013), the Scottish CG on diabetes.

5.1.1 Recommendation strength

Let’s start with the recommendation strength presented in the left margin of the guideline, in this case strength “B.” It is a one-letter scale ranging from “A” to “D” used to indicate the quality of the supporting research of this recommendation [3, p. 2]. In this scale “A” means that the recommendation is supported by at least one meta-analysis or systemic review, directly applicable to the target population; or multiple high quality studies directly applicable to the target population and an overall consistency in the results. Levels “B” and “C” levels indicate less and less support, and level “D” is used to indicate that there is only support from non-analytic studies or expert opinions.

Besides this scale, there is one more value the recommendation strength can take on, which is the check-mark₂_{. It denotes a recommendation that the guideline developers think is a best practice.}

This scale is used by multiple guidelines, for example the Spanish [11] guidelines use the same scale. Since no interactions can occur between guidelines based on the recommendation strength, and the TMR model has no implementation that uses the recommendation strength, we will ignore the recommendation strength from all future recommendations.

5.1.2 Deontic strength

The deontic strength of the recommendation is an indication to how strictly the recommendation should be followed. In our case study, we have seen uses of “may,” “should” and “must.” Their negative counterparts are also an option, “may not,” “should not” and “must not.” Most recommendations use the “should” strength level, including our example recommendation.

Note the difference between the recommendation strength and the deontic strength. It is possible for a recommendation to have a weak strength level (“may”) but a high recommendation strength (“A”) and vice versa. Examples of this would be: A target HbA1c of 6.5% (48 mmol/mol) may be appropriate at diagnosis and Careful clinical judgment must be applied in relation to people with long duration of type 2 diabetes on established oral glucose-lowering drugs with poor glycaemic control (>10 years, these individuals being poorly represented

(6)

in published studies) to ensure insulin therapy is not delayed inappropriately for the perceived benefits of GLP-1 agonists.

5.1.3 Action

Virtually all recommendations recommend a specific action. In our example recommendation this is participate in physical activity or structured exercise. Note that this action can again be broken down in two separate parts by the use of the word “or.” We call every recommendation that does not have a single atomic action a complex recommendation.

An example of a recommendation that does not mention any action is the following Target diastolic blood pressure in people with diabetes is ≤80 mm Hg.

5.1.4 Transition

Each action has some effect and this effect can be described by a transition from the initial state to the resulting state. In our example, the resulting state is minimal cardiovascular risk factors. The initial state is not explicitly stated, but we can assume it can be described as increased cardiovascular risk factors.

5.1.5 Precondition

Many recommendations describe some sort of conditions that need to be satisfied in order for the recommen-dation to apply. These conditions are called preconditions.

In our example recommendation it is people with type 1 diabetes. Again basically all recommendations contain a precondition. The few that don’t, need to be examined critically. As these recommendations are part of the clinical guideline on diabetes, it could be assumed that the listed recommendations only apply to people with diabetes (except where noted otherwise). A lack of a precondition could indicate that this assumption is made implicitly.

Preconditions can again be composed of several parts, being combined by conjunctions or disjunctions. An example of this is Sulphonylureas should be considered as first line oral agents in patients who are not overweight, who are intolerant of, or have contraindications to, metformin.

5.1.6 Causation belief

The part of the recommendation that describes what this recommendation aims to achieve is called the causation belief. In our example recommendation this is to improve cardiovascular risk factors. It is often the case that the causation belief is not mentioned as part of the recommendation, but instead can be extracted from the text around the recommendation.

6 The TMR model

In this section we discuss the architecture of the TMR model in order to provide the necessary information to understand our modeling decisions. This information is based on papers published by Zamborlini [2].

One of the goals of the TMR model was to allow for detection of interactions among recommendations. It does this by matching rules with the recommendations. These rules can be written down in FOL. For example, two recommendations are repeating each other when they recommend the same action. Besides actions and recommendations, the system also takes into account transitions and the reasons for performing an action, the so called causation beliefs. An UML class diagram of the TMR model can be found in Figure 2 and its components are explained in section 6.1.

An example recommendation and its encoding in the TMR model can be found in Figure 1 and Listing 1.

6.1 TMR components

The TMR model contains entities to describe each of the core components of the recommendations (except preconditions). It can also describe transitions from a starting state to an end state. Figure 2 pictures an UML class diagram of the TMR model. Each component is described below.

6.1.1 Clinical Guideline

Each recommendation is part of a clinical guideline. Since the TMR model supports comparisons between recommendations of multiple guidelines, it is necessary to associate each recommendation object with a specific guideline object. In our case, since all our recommendations come from the same guideline, we only have to define a single guideline object and can reference it in each recommendation.

(7)

Figure 2: UML class diagram of the TMR model.

6.1.2 Recommendation

Each recommendation is described by their own recommendation object. It specifies the strength of the recom-mendation and which clinical guideline it is a part of. It also references the action object and causation belief object that describe the recommendation. We also give it a textual label to describe it for humans. An example label would be “Type 1 should exercise regularly.”

6.1.3 Action Types

An action object describes an action in natural language through its attached label. This is often the adminis-tration of drugs, but can also be some other medical action. In our example the action describes exercising, so we label it with “Perform exercise.”

6.1.4 Transition Types

A transition has two objects associated with it: the transformable situation and the expected situation. These describe the beginning and ending state of a transition respectively. What these states are, can be found in the clinical guideline. If we perform exercise, we reduce our risk of cardiovascular diseases. So in our example the transformable situation is “increased cardiovascular risk” and the expected situation is “reduced cardiovascular risk.”

6.1.5 Situation types

Both transformable and expected situations are instances of situation types. They are used to describe states of the patient. It allows the system to detect if two actions have the same transition. Or if two actions have the opposite effect of eachother. In a situation we always talk about a specific variable that is some value, for example “cardiovascular risk is reduced.”

6.1.6 Causation Belief

A causation belief object is used to describe what transition is caused by a specific action. It answers the question: why do we perform this action? For example, we perform exercise to reduce the cardiovascular risks. Each causation belief has a frequency and a strength, which is defined by the quality of the evidence of the recommendation. The frequency information is currently not used by the TMR model, but future implementations may use this information to indicate the frequency with which the referenced action causes the referenced transition. As this information is currently not used by the TMR model, we have used a frequency value of “always” for all our causation beliefs.

6.1.7 Interaction

An interaction relates two recommendations. Each interaction is of a specific type, like contradiction or repe-tition. The different types of interaction are detailed in section 6.2. The detected interactions form the output of the system.

6.1.8 Missing features: Preconditions

One concept that isn’t implemented in the TMR model is preconditions. A precondition describes the conditions a patient needs to satisfy in order for this recommendation to be applied to them. Because the TMR model

(8)

does not reason with this information, any conditions of recommendations are simply ignored. As a result, we expect that the TMR model detects more interactions than actually exist. We will discuss this further in sections 9 through 10.

Listing 1: The example recommendation from Figure 1 as modeled in the TMR model.

@ p r e f i x : <h t t p : / / anonymous . o r g / d a t a / >. @ p r e f i x vocab : <h t t p : / / anonymous . o r g / vocab / >.

@ p r e f i x r d f : <h t t p : / /www. w3 . o r g /1999/02/22 − r d f −s y n t a x −ns#>. @ p r e f i x r d f s : <h t t p : / /www. w3 . o r g / 2 0 0 0 / 0 1 / r d f −schema#>. @ p r e f i x d a t a : <h t t p : / / anonymous . o r g / d a t a / >. @ p r e f i x xsd : <h t t p : / /www. w3 . o r g /2001/XMLSchema#>. # A c t i o n Types : A c t E x e r c i s e r d f : t y p e vocab : D r u g A d m i n i s t r a t i o n T y p e ; r d f s : l a b e l ” Perform e x e r c i s e ”@en . # T r a n s i t i o n Types : T r E x e r c i s e T y p e 1 r d f : t y p e vocab : T r a n s i t i o n T y p e ; vocab : h a s T r a n s f o r m a b l e S i t u a t i o n : S i t I n c r e a s e d C a r d i o v a s c u l a r R i s k s ; vocab : h a s E x p e c t e d S i t u a t i o n : S i t R e d u c e d C a r d i o v a s c u l a r R i s k s . : S i t I n c r e a s e d C a r d i o v a s c u l a r R i s k s r d f : t y p e vocab : S i t u a t i o n T y p e ; r d f s : l a b e l ” C a r d i o v a s c u l a r r i s k i s i n c r e a s e d ”@en . : S i t R e d u c e d C a r d i o v a s c u l a r R i s k s r d f : t y p e vocab : S i t u a t i o n T y p e ; r d f s : l a b e l ” C a r d i o v a s c u l a r r i s k i s r e d u c e d ”@en . # C a u s a t i o n b e l i e f d a t a : C B E x e r c i s e 1 { d a t a : A c t E x e r c i s e vocab : c a u s e s d a t a : T r E x e r c i s e T y p e 1 . d a t a : C B E x e r c i s e 1 a vocab : C a u s a t i o n B e l i e f ; vocab : s t r e n g t h ”L2 ”ˆˆ xsd : s t r i n g ; vocab : f r e q u e n c y ” a l w a y s ”ˆˆ xsd : s t r i n g . } # G u i d e l i n e : CIG−DB r d f : t y p e vocab : C l i n i c a l G u i d e l i n e ; r d f s : l a b e l ”CIG f o r D i a b e t e s M e l l i t u s ”@en . # Recommendation : RecDB−E x e r c i s e B 1 { : RecDB−E x e r c i s e B 1 a vocab : C l i n i c a l R e c o m m e n d a t i o n ; r d f s : l a b e l ”Type 1 s h o u l d e x e r c i s e r e g u l a r l y ”@en ; vocab : s t r e n g t h ” s h o u l d ”ˆˆ xsd : s t r i n g ; vocab : p a r t O f : CIG−DB; vocab : a b o u t E x e c u t i o n O f : A c t E x e r c i s e ; vocab : basedOn : C B E x e r c i s e 1 . }

Since we are using RDF to describe the recommendations and a RDF fact can also be interpreted as an annotated directed arrow in a graph, it is possible to visualize the recommendations as a graph. The TMR system can automatically generate these graphs. The graph of our example recommendation can be found in Figure 3. On the left in the figure we can see the recommendation. A blue arrow points from it to the action, on the right. The color of the arrow indicates the modality of the recommendation, in this case “should.” A red arrow would be used for “should not.” In the action, we can distinguish several components: At the top we can see the attribute that is being transformed by the action. On the left we can find the transformable situation, on the right the expected situation. These are connected by the action, in this case “Perform exercise.”

6.2 Definitions of TMR interaction types

There are currently four types of interactions that can be detected by the TMR model. These are: contradiction, repetition, alternative actions and repairable transitions. These interaction types are strictly defined. Other types of interactions, such as interactions in time or dosage are not yet supported.

To define a contradiction between recommendations, we first need to define what it means for two transitions to be inverse. Informally speaking, they are inverse when their effects counteract each other. The formal

(9)

Figure 3: Example graph of a CG recommendation.

definition can be found in Definition 1.

Definition 1 (Inverse) Two transitions T1 and T1 are inverse if the expected end state of T1 is the initial

state of T2 and the initial state of T1 is the expected end state of T2.

Using the definition of inverse, we can define the interactions types contradiction, repeating and alternative. Definition 2 (Contradiction) Two recommendations R1, R2 are considered contradicting if one of the

fol-lowing is true:

• Recommendation R1 recommends action A1, whereas R2 recommends not performing A1.

• Recommendation R1 recommends action A1 to achieve transition T , whereas R2 recommends performing

action A2 in order to prevent T from occurring.

• The recommendations R1, R2 recommend actions A1, A2 that promote transitions T1, T2 that are inverse

to each other.

Definition 3 (Repeating) Recommendations R1, R2, ..., Rnare considered repeating when they all recommend

the action A.

Definition 4 (Alternative) Recommendations R1, R2, ..., Rn are considered alternatives to each other when

their causation beliefs C1, C2 reference the same transition T , while the actions A1, A2 are different.

A recommendation is considered repairable when it is meant to prevent a transition while another recom-mendation recommends an action that promotes an inverse transition, i.e. it indicates that the undesired effect could be repaired. It is defined as follows:

Definition 5 (Repairable) Two recommendations are considered repairable when one recommends and the other does not recommend actions whose transitions are inverse.

7 Step 1: Experiment - Diabetes case study

This section contains the results and contributions of step 1 of our research, the study of the recommendations. We started this research with a study of the recommendations SIGN116 (2013), the Scottish CG on diabetes [3]. We selected a guideline on diabetes because of our existing knowledge of diabetes. We selected the Scottish guideline on diabetes, because it matches the criteria we set in section 3. This CG also lists both the recommendations and their supporting arguments close together. This allows us to try and look for additional information in the accompanying text relatively easy, if needed. Other guidelines that were considered are the Spanish guideline on diabetes [11] and the English guideline on diabetes [12].

We designed a case-study using this guideline where 21 recommendations were selected in such a way that we believe that they have interactions according to the TMR model. The process for selecting the recommendations was straightforward. We simply scanned through the guideline several times, looking only at the recommenda-tion texts. Whenever we suspected that a recommendarecommenda-tion could have interacrecommenda-tions with other recommendarecommenda-tions, we wrote it down in a separate list. After doing this a couple of times, we had constructed a list of around forty recommendations. From this list, we selected the 21 recommendations that actually had interactions. It is noteworthy that this does not guarantee that these 21 recommendations are the only recommendations in SIGN116 that have interactions. We might have excluded some due to requiring implicit or medical knowledge to detect or having interactions that the TMR model can’t detect (timing or dosage interactions). Although the utmost care was taken to prevent this, some interacting recommendations might not have been selected due to human error. Nonetheless, we believe that this selection displays a wide variety of recomendations and is sufficient for us to answer our research questions.

We divided the recommendations in 7 groups, A through G, in such a way that we expect interactions to occur within a single group. There is no specific ordering to these groups.

(10)

7.1 Information present in DB CG

In this section we discuss the findings of our analysis of the selected recommendations. We are mainly interested in the following questions: Is the information available to us in clinical guidelines sufficient to allow formal representation and detecting interaction in the TMR model? Is the help of a medical expert required? Are the recommendations clear and unambiguous?

To answer these questions, we take a closer look at some of the recommendations from recommendation group C.

C3 Pioglitazone can be added to metformin and sulphonylurea therapy, or substituted for either in cases of intolerance.

C5 When intensifying insulin therapy by addition of rapid-acting insulin, sulphonylurea therapy should be stopped.

7.1.1 Ambiguity

Natural language is full of ambiguity and recommendations are in this sense no exception. See for example recommendation C3, where it mentions “metformin and sulphonylurea therapy.” It is not clear whether this means therapy that includes both metformin and sulphonylurea, or it means metformin therapy and also sulphonylurea therapy. Without a medical expert, it is impossible to say which of the two options is meant.

7.1.2 Implicit causation

For recommendation C5 it was very difficult to identify the underlying motivation for this recommendation. At first we were unable to find the underlying motivation for this recommendation. The text accompanying the recommendation provided no help in this regard, at least not for someone witout medical skills. It might be the case that for a medical expert the causation belief of this recommendation is common knowledge and requires no further explanation, but this does illustrate that it is nigh impossible to fully encode this clinical guideline accurately in the TMR model without the assistance of a medical expert.

Ultimately we have found evidence in the clinical guideline that sulphonylurea therapy increases the risk of weight gain and hypoglycaemia. The main reason to take sulphonylureas — to lower the HbA1c levels of the patient — is also an effect of rapid-acting insulin. We believe that these three transitions form the causation belief for taking sulphonylureas.

7.1.3 Implicit preconditions

Another observation is that all recommendations in this group except recommendation C5 are taken from a chapter called “Pharmacological management of glycaemic control in people with type 2 diabetes.” Therefore we assume that all recommendations only apply for patients with type 2 diabetes. This however is not at all clear from the recommendations alone: it is an observation that can only be made when reading the clinical guideline from top to bottom. Because of this, one could easily miss the important fact that these recommendations only apply to patients with type 2 diabetes; a mistake that could have serious consequences.

This notion is especially relevant for recommendation C3. Without it, this recommendation would apply to most of the general public. We suspect it is not intended by the guideline developers to recommend piogliatzone to most of the general populace.

7.1.4 Levels of abstraction

In many recommendations it is difficult to concisely describe the motivation of the recommendation. When it is not explicitly included in the actual recommendation, it needs to be extracted from the accompanying text — a process which is best left to medical experts. However even when this information is provided by experts, errors might still occur due to a difference in abstraction level. For example, take a look at recommendation C4, reproduced below.

C4 Pioglitazone should not be used in patients with heart failure.

We have studied the accompanying text in the guideline and found that the motivation of this recommendation is that research has shown that pioglitazone increases the risk of hospitalizations in patients with existing heart problems [3, p. 44]. This is a high level causation belief: it mentions the end result, but not the actual effects of the drug.

Now suppose some other recommendation R recommends some drug D because it decreases the risk of hospitalizations in patients with existing heart problems. An automated system will conclude that these effects

(11)

might cancel each other out. However in reality it might be the case that both effects can occur simultaneously: they might influence different parts of the heart for example.

Now imagine that the causation belief of recommendation C4 was written using a lower level of abstraction, i.e. it describes in more detail the effects of the drug. In that case, no interaction would be detected between C4 and R. Unless, of course, recommendation R was also rewritten with a lower level of abstraction, in which case an interaction may or may not be detected, depending upon the actual effects of the drug.

The bottom line here is that the TMR system is unable to detect interactions between recommendations that use a different level of abstraction. Furthermore it is extremely difficult to guarantee that multiple medical experts use the correct levels of abstraction, especially when working with multiple guidelines.

7.1.5 No preconditions

Recommendation D1 is interesting as it mentions no preconditions. D1 Patients’ retinas should be screened at least annually.

This recommendations does not mention any conditions. When taken out of context, this would indicate that this recommendation holds for all patients in all situations. However, we think that we have to include the condition “has diabetes type 1 or 2,” based on the fact that this CG is about diabetes and the text surrounding the recommendation. This does raise the question if perhaps the condition “has diabetes type 1 or 2” has to be added to every recommendation from the diabetes guideline that does not specifically mention this, such as recommendation C4.

7.1.6 Summary of findings

In this section we have discussed the information that we can find in the diabetes clinical guideline. We have found five issues that make the modeling of this recommendation difficult. Issues like ambiguity in recommen-dations, implicit causation beliefs or implicit preconditions can be prevented by training guideline developers in this regard. On the other hand, recommendations that have no preconditions might not even be regarded as a problem by the developers: the implicit knowledge that medical practitioners have is enough to prevent any mistakes from happening. On a different scale is the issues of abstraction levels. This issue might never be fully resolved.

With these issues in mind, we describe our process of modeling the selected recommendations in the next section.

8 Step 2: Modeling recommendations using TMR

In this section we discuss any decisions we had to make while modeling the selected guidelines in the TMR model. These are the results of step 2 of our research. The complete model can be found at https://github. com/l0ft3r/SIGN116Model.

8.1 Modeling choices per recommendation group

For each group we describe our modeling choices.

8.1.1 Recommendation group A

This group consists of two recommendations.

A1 Self monitoring of blood glucose (SMBG) is recommended for patients with type 1 or type 2 diabetes who are using insulin where patients have been educated in appropriate alterations in insulin dose.

A2 Routine SMBG in people with type 2 diabetes who are using oral glucose-lowering drugs (with the excep-tion of sulphonylureas) is not recommended.

As the current version of the TMR system is not yet able to handle preconditions, we have modeled them as plain text objects attached to the recommendations. This is effectively the same as assuming that all the conditions have been satisfied for all recommendations. Since both recommendations talk about the same action, they reference the same action object. Therefore, we expect to detect a contradiction interaction between them. However, they do reference different causation beliefs. This is necessary since the effect of the action is different depending on the situation of the patient. This is not immediately apparent from the recommendations themselves, but the accompanying text says that only a minor improvement that is not cost effective is detected

(12)

for patients that satisfy the conditions of recommendation A2, and a bigger advantage is observed for patients for whom recommendation A1 is applicable.

To model this recommendation group, we defined two recommendation objects, two causation beliefs, two transitions, two situations and a single action.

In short, this recommendation group shows that recommendations can require separate causation beliefs even though they reference the same action.

8.1.2 Recommendation group B

Recommendation group B consists of three recommendations about physical exercise.

B1 People with type 2 diabetes should be encouraged to participate in physical activity or structured exercise to improve glycaemic control and cardiovascular risk factors.

B2 People with type 1 diabetes should be encouraged to participate in physical activity or structured exercise to improve cardiovascular risk factors.

B3 Patients with existing complications of diabetes should seek medical review before embarking on exercise programmes.

The interaction within this group is easily distinguisable: recommendations B1 and B2 recommend physical exercise, whereas recommendation B3 advises patients to seek medical review before starting with exercise programmes. The important element here is the fact that in general, physical exercise is advised by B3, but only if a medical review has been conducted. This means that there is a temporal aspect hidden in this recommendation: first perform medical review, then perform exercise programs. Unfortunately, it is not yet possible to capture this temporal relationship precisely in the TMR model. Instead, we have modeled it as a negative recommendation for the action “exercise” with the attached precondition “has existing complications and did not have medical review.” This way we have removed the temporal aspect from the recommendation.

Another sublety is that recommendation B2 does not mention “glycaemic control” as an end state of the transition. As a result, both recommendations require their own distinct causation belief and transitions. Moreover, it would be incorrect to model the causation belief associated with recommendation B1 as having two transitions, one describing the transformation of the glycaemic control and a second one to describe the transformation of cardiovascular risk factors. This is because we can’t say for certain that the expected endstate “good glycaemic control and cardiovascular risks” is equal to the endstate “good glycaemic control” combined with the endstate “good cardiovascular risks.” Instead, we need to define two transitions, one with endstate “good glycaemic control and cardiovascular risks” and another with just “good cardiovascular risks.” If a third recommendation would list only “good glycaemic control” as an endstate, then a third transition would be required.

This illustrates the fact that it is not clear if composition of situations behaves nicely, i.e. is the situation “A and B” equal to situation “A” and situation “B” simultaneously? Intuitively one would expect this to be the case, however without a medical expert we are unable to say if this holds in general.

In the TMR model, this set of recommendations can be modeled using three recommendations, all associated with the same action “exercise,” but with three different causation beliefs. Each causation belief references their own transition this action causes. The transitions tell you what happens when you execute the actions:

• Recommendation B1 references transition T1for patients with DM2 transforms a poor cardiovascular risk

factor and glycaemic control into a good cardiovascular risk factor and glycaemic control.

• Recommendation B2 references transition T2for patients with DM1 transforms a poor cardiovascular risk

factor into a good cardiovascular risk factor.

• Recommendation B3 references transition T3 for patients with existing complications and no medical

review transforms a good blood glucose into hyperglycaemia.

From this recommendation group we have learned: (i) temporal relations can exist within a single recom-mendation; (ii) the help of a medical expert is required to handle conjunctions of transitions.

8.1.3 Recommendation group C

Group C is a group of five recommendations.

C1 Metformin should be considered as the first line oral treatment option for overweight patients with type 2 diabetes.

C2 Sulphonylureas should be considered as first line oral agents in patients who are not overweight, who are intolerant of, or have contraindications to, metformin.

(13)

C3 Pioglitazone can be added to metformin and sulphonylurea therapy, or substituted for either in cases of intolerance.

C4 Pioglitazone should not be used in patients with heart failure.

C5 When intensifying insulin therapy by addition of rapid-acting insulin, sulphonylurea therapy should be stopped.

Recommendation C3 raises the question if it is advised to substitute pioglitazone for metformin and again for sulphonylurea if it turns out that the patient is intolerant for both. And what if pioglitazone was already added to the therapy? This recommendation also uses the word “can” instead of “should” to denote its recommendation strength. The TMR model currently only supports the strengths “should” and “should not” so therefore we have modeled this recommendation using the “should” strength. As a result, this means we are unable to fully encode this recommendation in the TMR model, but instead we have to make adjustments to make it fit.

This recommendation also features composition of situations, using the words “metformin and sulphonylurea therapy.” In this context “and” means simultaneously: the patient is simultaneously using the drugs metformin and sulphonylurea for some therapy. This recommendation says that, when taken simultaneously, these drugs produce an effect that is desirable in this case. However, we can’t say if this is different from “metformin therapy and sulphonylurea therapy.” In the latter situation, there are two separate therapies, each having their own causation belief.

This subtle difference becomes apparent if a patient is metformin intolerant. If a patient is under “metformin and sulphonylurea therapy” (1 therapy) and metformin intolerant, a doctor might conclude to stop with the therapy. When the therapy has ended, the patient should no longer take metformin nor sulphonylurea. It is different when a patient is under “metformin therapy and sulphonylurea therapy” (2 separate therapies) and metformin intolerant. If a doctor decides to stop administering metformin, there is no reason to assume that the administration of sulphonylurea should be stopped. We call this phenomenon “composition of situations.” The causation belief of two combined situations can be different from the combination of causation beliefs of the same situations.

Furthermore, the second part of this recommendation introduces additional complexity. To “substitute” a drug with another, means to stop administering one and start administering the other. This requires two actions and therefore two recommendation objects. The recommendation states that this applies to both drugs so we have moddeled this recommendation using five separate recommendation objects. So if a patient is intolerant for metformin for example, we know that the use of the drug metformin must be stopped and pioglitazone treament must be started. However, this doesn’t tell us anything about “metformin and sulphonylurea therapy,” since this is different from “metformin therapy.”

In short, we have interpreted this recommendation as the following five recommendations: C3a If under (metformin and sulphonylurea) therapy, pioglitazone should be added.

C3b If intolerant for metformin, pioglitazone should be used. C3c If intolerant for sulphonylureas, pioglitazone should be used. C3d If intolerant for metformin, metformin should not be used.

C3e If intolerant for sulphonylureas, sulphonylureas should not be used.

Another question, if sulphonylurea therapy is stopped as per recommendation C5, but pioglitazone was used instead of sulphonylurea as per recommendation C3, should pioglitazone be stopped or not? These recommen-dations do not provide a clear answer.

While modeling the transition of administering sulphonylureas, we found that the guideline also mentions side effects of sulphonylureas, including increased weight gain and increased rate of major hypoglycaemia [3, p. 43]. To capture this in the TMR model, we have associated multiple transitions with the action of administering sulphonylureas.

Ultimately, we have modeled this recommendation group using nine recommendations, six causation be-liefs, three actions (administer piaglitazone, administer metformin, and administer sulphonylureas), and six transitions.

These recommendations show that recommendation strengths other than “should” are also used. We expect to see many interactions from this group, as many of them share effects and preconditions.

(14)

8.1.4 Recommendation group D

The recommendation group D has two recomendations with mainly temporal interactions. D1 Patients’ retinas should be screened at least annually.

D2 Examination of the retina prior to conception and during each trimester is advised in women with type 1 and type 2 diabetes. More frequent assessment may be required in those with poor glycaemic control, hypertension or pre-existing retinopathy.

Note that recommendation D2 contains some ambiguity: does this recommendation apply only to women with both type 1 and type 2 diabetes, or to women who have type 1 diabetes or type 2 diabetes? This is a case of the classic example of “and” possibly meaning “or.”

However, more interesting is the notion that this recommendation has interactions with itself! This recom-mendation is of a rare complexity which means that in the TMR model more than one recomrecom-mendation object is required to completely model it. These different recommendations have repeating interactions. As a result, one could say that recommendation D2 has a repeating interaction with itself.

A third problem is also illustrated by this recommendation. For a human, it is obvious that only women can get pregnant. However, computers do not have access to this implicit knowledge and any implicit knowledge needs to be told explicitly. In general, when building knowledge based applications it is important to avoid this common pitfall. The medical field is no exception to this phenomenon.

Because of recommendation D2, we have created three recommendation objects to describe these two rec-ommendations. Similarly, we have three causation beliefs, three transitions. We need only two situation types, since recommendation D1 requires none, and only a single action object.

Because we have three recommendation objects, we expect to see three repeated action interactions.

8.1.5 Recommendation group E

Group E is a group with 5 recommendations.

E1 People with type 1 diabetes and microalbuminuria should be treated with an ACE inhibitor irrespective of blood pressure.

E2 Patients with diabetes requiring antihypertensive treatment should be commenced on an ACE inhibitor (angiotensin receptor blocking (ARB) medications if ACE inhibitor intolerant), or a calcium channel blocker, or a thiazide diuretic.

E3 Patients with clinical myocardial infarction should be commenced on long term ACE inhibitor therapy within the first 36 hours.

E4 All patients with stable angina should be considered for treatment with ACE inhibitors. E5 ACE inhibitors and ARB medications should be avoided as they may adversely affect the fetus.

Each of these recommendations has its own peculiarities.

First off, recommendation E1 mentions that it should be followed “irrespective of blood pressure.” To us, this seems like a detail that is completely unneccessary: it does not add any new information. Therefore we have not included this information into the model.

Next, recommendation E2 mentions four different options. In the TMR model we have to create four separate recommendation objects to capture this information. We expect this recommendation to be another example of a recommendation that interacts with itself.

The recommendation E3 contains another bit of information that we currently cannot include in the TMR model. It mentions that treatment should be started “within the first 36 hours.” This timing information is not yet supported, so we have to ignore this information when modelling the recommendation.

Now recomendation E4, when taken literally, appears to apply to “all patients.” However, we have assumed that this still only applies to “all patients with diabetes type 1 or 2” for this case study. This assumption is based on the fact that guideline text accompanying the recommendation specifically mentions research that used data from patients with diabetes.

Finally, recommendation E5 is also noteworthy, since the causation belief and expected transition are men-tioned in the recommendation. The drugs “may adversely affect the fetus.” In this case the word “may” is used to indicate the frequency of the occurence of this transition. The TMR model currently is able to store this data, however it does not yet utilize it. This recommendation also requires two recommendation objects to describe it completely.

This recommendation group of five recommendations has resulted in a model of nine recommendation objects, seven causation beliefs, four actions and three transitions.

(15)

There are multiple lessons that can be learned from this recommendation group. Sometimes useless informa-tion is present in recommendainforma-tions; sometimes up to four recommendainforma-tion objects are necessary to completely describe a single recommendation; sometimes the word “may” is used to describe the frequency of the transition.

8.1.6 Recommendation group F

It is immediately apparent that if we ignore the preconditions of the recommendations, the two recommendation in group F repeat eachother.

F1 Patients with type 1 diabetes should be screened from age 12 years. F2 Patients with type 2 diabetes should be screened from diagnosis.

We have selected these recommendations, since recommendation F1 uses numbers in the preconditions. This recommendation group is straightforward to model using two recommendation objects, one causation beliefs, two situations, a single action and a single transition.

8.1.7 Recommendation group G

Recommendation group G contains two interacting recommendations.

G1 Lipid-lowering drug therapy with simvastatin 40 mg should be considered for primary prevention in pa-tients with type 1 diabetes aged > 40 years.

G2 Patients under 40 years with type 1 or type 2 diabetes and other important risk factors, e.g. microalbu-minuria, should be considered for primary prevention lipid-lowering drug therapy with simvastatin 40 mg. There are a couple things worth mentioning about these two recommendations. To start, both recommen-dations mention an interval of some quantity in their precondition, in this case “aged > 40 years” and “under 40 years.” Strictly speaking, the latter of these two has to be translated into the age interval [0, 40), which means that there is a suprising gap in the recommendations for patients who are exactly 40 years old.

Recommendation G2 presents the most serious case of ambiguity we have seen thus far. It arises from the use of ‘or’ and ‘and’ simultaneously without any use of brackets. Should any patient under 40 and with type 1 diabetes be considered, or only those with “other important risk factors, e.g. microalbuminuria”?

Both recommendations use the words “should be considered” to indicate the deontic strength. Since the current state of the TMR model can only reason with a deontic strength of “should,” we model these recom-mendation with this deontic strength and incorporate the “consider” into the label of the action: “consider lipid-lowering drug therapy with simvastatin 40 mg.” Several recommendations in our selection have used “should be considered” to indicate the deontic strength (C1, C2, E4). Because of this, we think that either a new deontic strength value needs to be added to the system, or we should uniformly model this value using the strength “can.”

Furthermore, it is interesting to note that recommendation G1 considers ‘therapy for patients’, whereas recommendation G2 considers ‘patients for therapy’. We assume that in both cases the same is meant: to give the drugs to the patients. However this does indicate that, as long as this kind of confusion exists in recommendations in clinical guidelines, it will be virtually impossible to automate the process of modeling guideline recommendations in any formal language.

Lastly, these two recommendations are two of the three recommendations in SIGN116 that mention a drug together with a specific dosage. Apparently the dosage of this drug is irrespective of sex, age or body weight. We do not store this specific information separately, as we have no use for it. Instead it is included in the description of the action as plain text. This means that another recommendation that is associated with simvastatin, but with a different dosage, will not be detected as having an interaction with these recommendations. The system treats it as a completely different action. This is not ideal: we would rather have a seperate variable to describe this information and appropriate logic to reason with it.

The model of this recommendation group consists of two recommendations, one causation beliefs, one action, two situations and one transition.

It teaches us that (i) guideline developers are not always precise when using numbers; (ii) perhaps more deontic strengths values are required; (iii) it is impossible to automatically model guidelines unless recommen-dations take on a more structured form themselves; (iv) the inclusion of drug dosage in recommendation could lead to missed interactions.

(16)

8.2 Results: detected interactions and comparison with expectations

In this section we use the constructed TMR model from the previous section for detecting interactions. Note that it could be possible that we find interactions between recomendations from different groups. In practice, this did not occur. In total we did find 51 pars of interactions. The results are presented in groups in Tabel 1. Most of our results were in line with our expectations. In this section we discuss our results, focusing on the results that did not match our expectations.

Recommendations Expected interaction Detected interaction

A1–A2 interaction contradiction

B1–B2 interaction repetition B1–B3 interaction contradiction B2–B3 interaction contradiction C1–C2–C3 interaction alternative C1–C5 unknown contradiction C3–C4 interaction contradiction C3–C5 unknown contradiction C2–C5 interaction contradiction

D1–D2a interaction repetition

D1–D2b interaction repetition

D2a–D2b interaction repetition

E1–E2a–E3–E4 interaction repetition

E2a–E2b–E2c–E2d–E3–E4 interaction alternative

E5a–E1 interaction contradiction

E5a–E2a interaction contradiction

E5b–E2b interaction contradiction

F1–F2 interaction repetition

G1–G2 interaction repetition

Table 1: The groups of interactions with their expected and actual detection.

8.2.1 Comparisons with expectations

For recommendation group B we found two contradictions: between recommendation B1 and B3 and between recomendation B2 and B3. However, we also found a third interaction of type RepeatedAction between recommendation B1 and B2. This interaction is detected because the system is unable to make use of the preconditions. If we were to ignore the filters for type 2 and type 1 diabetes, as the system currently does, it is easy to see that indeed, there is a repeated action between these two recommendation. Therefore we can say that the system behaves correctly and as expected.

In section 8.1.3, regarding recommendation group C, we mentioned that it is unclear without the help of a medical expert if recommendation C4 also applies to metformin and sulphonylurea, since recommendation C3 mentions these two drugs as alternatives. The system however is very clear in this regard: it detects contradictions between C1 and C5, C2 and C5, and C3 and C5. Upon closer inspection, we can see this is caused by the fact that the causation beliefs associated with all these recommendations reference the same transition and C5 recommends that this transition should not occur.

Finally, the system also correctly detects three interactions of the type AlternativeActions between rec-ommendations C1, C2 and C3. These results match our expectations. The fact that this interaction is reported three times is due to the system’s architecture. First C2 is seen as an alternative to C1, then C3 is seen as an alternative to C1 and finally C3 is seen as an alternative to C1.

Many interactions were found in recommendation group E, mainly of the types repeated action and alterna-tive actions. Six interactions of the type RepeatedAction and twelve interactions of the type Alternaalterna-tiveAc- AlternativeAc-tions.

The reason that so many interactions are found in this group is because of the Implementation of the TMR model. The six RepeatedAction interactions all involve the same four recommendations: E1, E3, E4 and the first part of E2, which talks about ACE inhibitors in case of antihypertensive treatment on diabetic patients. This interaction is reported six times, since the system only detects pairwise interactions. This means it finds interactions between recommendation pairs E1-E3, E1-E4, E1-E2a, E3-E4, E3-E2a and E4-E2a. In a subsequent

(17)

step in the TMR model, these recommendations are grouped together with other recommendations involved for to provide a better overview of what is actually happening.

There exists a close form formula to find the number interactions that are found for a given number of interactions recommendations. This formula is f (n) = 1

2n(n − 1) where n is the number of involved interactions.

This means that in the case of, say, 20 recommendations that are all alternatives to eachother, the system will produce 180 interactions, each mentioning all 20 recommendations. It is clear that this is undesirable behaviour: in such a situation it would be difficult to find any of the other detected conflicts. We would like to estimate how bad this problem could possibly get when using mulitple real life guidelines. However it is difficult to put an estimate on the upper bound of the number of alternatives to a single action, especially when performing comparisons between complete guidelines. This in turn means that it is difficult to predict if a situation could occur where 20 recommendations interact.

However, Zamborlini et al. are aware of this behaviour and have found a solution to this problem [13]. It sim-ply has not yet been implemented at the time of this writing. The proposed solution completely solves the issue and makes the system report to the user only a single interaction that references all involved recommendations. The same procedure is responsible for the twelve AlternativeActions interactions. In this case the rec-ommendations E3, E4, E2a, E2b, E2c, E2d are involved. Note however, that f (6) = 15 and not 12. The number of interactions found in this case is lower because E3, E4 and E2a are RepeatedAction so there can not be AlternativeActions between themselves as well. A better way to describe what is going on here is that there are twelve AlternativeActions, caused by the recommendations E2b, E2c, E2d and the group {E2a, E3, E4}. This way we can clearly see that no AlternativeActions are detected within the group, resulting in f (6) − f (3) = 15 − 3 = 12 detected interactions.

All other interactions that were detected were also expected and vice versa.

8.2.2 Suggestions to extend the TMR model

Having modeled the recommendations in the TMR model and discussed the results, we can identify some features that would make this process easier.

First, a nice feature to have would be the ability to express recommendations that recommend multiple actions using a single recommendation object. Recommendations like B1, which recommends physical activity and structured exercise, currently require two recommendation objects, each with their own causation belief and recommendation strength. If we can combine this information in a single organized entity, we can create a 1-to-1 mapping of recommendations and recommendation objects. It also ensures that we don’t have to provide the same information twice in two different places.

Another great addition would be the handling of other strength levels than “should.” Our case study has shown that other strengths are used in recommendations. Currently we are required by the model to specify a strength level, which means we sometimes are forced to enter wrong data into our model.

A more difficult feature to implement would be the identification of the dosages of recommended drugs. Currently this extra information is ignored and stored as part of text describing the action. Because of this, no interaction will be detected with a different recommendation that recommends the same drug but with a different dosage. Only in rare cases is this specific information present in a recommendations, which means that defining the specifics steps to take will be difficult.

Finally, we have found several recommendations that the TMR model detects as interacting, that upon closer inspection can’t possibly interact because of their preconditions. If the TMR model would be extended to include this information, we could identify and remove these interactions from the results. This is discussed in more detail in section 9.

9 Step 3 - Extending the TMR model with preconditions

When we examine the interactions that were detected by the TMR model, we find that some detected interactions are not relevant, because their preconditions are mutually exclusive. This means no situation can occur where both recommendations are applicable.

In this section we detail our steps to extend the TMR system in such a way, that it is able to correctly reason with preconditions. First we take a closer look at the structure of the preconditions, in order to establish the requirements of the system. Next we describe our design and implementation.

9.1 Precondition analysis

(18)

9.1.1 Precondition HasDiabetes and empty precondition

Almost all the recommendations in SIGN116 talk about some or all types of Diabetes, so the precondition of rec-ommendation B1 is very common. It can be easily described using first order logic, where HasDiabetesType2(x) is a predicate that returns True when x has Diabetes type 2 and False otherwise.

The precondition of recommendation D1 is not as common, but it is very fundamental. It does not mention any conditions. When taken out of context, this would indicate that this recommendation holds for all patients in all situations. However, we think that we have to include the condition “has Diabetes type 1 or 2,” based on the fact that this CG is about Diabetes and the text surrounding the recommendation.

9.1.2 Ambiguity in preconditions

The precondition of recommendation A1 is long and complex and ambiguous. The basic structure is sim-ple: Some action X is recommended for people with characteristics Y . However, ambiguity comes from the precondition Y . We can identify these separate components:

A patients with type 1 B patients with type 2 C who are using insulin

D where patients have been educated in appropriate alterations in insulin dose

It can be encoded in predicate logic in a number of ways, where each encoding has a different meaning.

A ∨ (B ∧ C ∧ D) (1)

(A ∨ (B ∧ C)) ∧ D (2)

(A ∨ B) ∧ C ∧ D (3)

Without the help of a medical expert, we can’t say for certain which encoding is the correct encoding. We have chosen to model it as option (3) as it is the easiest to model.

Recommendation A2 is less ambiguous, but not less complicated. The precondition has a negation in it: a condition that is an exception, where patients who satisfy this condition are not to be included in the filter.

It is clear that a patient who satisfies the preconditions of both recommendations will receive contradicting recommendations. The only way to resolve this ambiguity is with the help of a medical expert.

9.1.3 References to other preconditions

The second part of recommendation D2 references the conditions in the first part and adds some more condi-tions to them. Note that in section 8.1.4 we said that we have implemented this recommendation using two recommendation objects. Now the link becomes even more defined. Patients that satisfy the conditions of the first part and also some additional conditions, can benefit from more frequent medical actions. The patients to whom the second part applies form a strict subset of the first part. The actual reference is not very explicit: the word “those” is used to describe “those patients that satisfy the conditions of the first part.”

To model this precondition, we wil need to be able to design a structure that is flexible enough to handle references to other preconditions.

9.1.4 Numbers in preconditions

The precondition F1 is one of several recommendations in SIGN116 that use numbers in their precondition. Most of the time the numbers are used to denote an interval, in this case the age interval [12, ∞). Although not impossible, predicate logic is not well suited to describe intervals. Techniques like discretization can be used to turn an interval of numbers into multiple Boolean variables, which can be used to compare two intervals.

However, when dealing with data with many different scales of magnitude — such as clinical guidelines, which can use years, weeks, months, milligrams, milliliters, kilograms and many other magnitudes — discretization can cause a lot of complexity and confusion, and could introduce errors into the system. For large scale implementations, predicate logic is not recommended. However, for our case study it is acceptable.

Another interesting note here is that, since it is possible that a patient has both type 1 and type 2 diabetes [14], we expect to detect interactions among the recommendations in this group even when we take preconditions in account. However, as these recommendations are presented right after one another in the clinical guideline [3, p. 98], it looks like the guideline developers have assumed that it is not possible for a patient to have both type 1 and type 2 diabetes at the same time; or at least those patients are not discussed by this guideline. This assumption likely comes from the fact that the cooccurrence of type 1 and type 2 diabetes is very unlikely and