Data-Driven Modular Clinical Decision Support

(1)

Data-Driven Modular Clinical Decision

Support

Daniel Dropulji´c, BSc

Tutor: Stephanie Medlock, PhD*

Mentor: Lilian Minne, PhD**

*Amsterdam UMC - Department of Medical Informatics

**Firely B.V.

a thesis submitted in partial fulfilment of the requirements for the degree of

Master of Science in Medical Informatics

(2)

(3)

Project Information

Student: Daniel Dropulji´c 10002263 danieldropuljic@fire.ly Tutor: Stephanie Medlock

Department of Medical Informatics

Amsterdam University Medical Center, Loc. AMC Meibergdreef 15, 1105 AZ Amsterdam 020-566-4543

s.k.medlock@amc.uva.nl

Mentor: Lilian Minne Firely

Bos en Lommerplein 280, 1055 RW Amsterdam 020-346-7171

lilian@fire.ly

(4)

(5)

Abstract

Introduction Healthcare is currently experiencing technological innovations at an increasing rate. The usefulness of new technologies depends on how well these integrate with systems. The recent rise of artificial intelligence based systems in healthcare for diagnostic and prognostic prediction have shown promising results, but are these also suitable for improving interoper-ability in healthcare? Data and Methods A scoping review was conducted to discover current knowledge on clinical decision support and electronic health record system interoperability and the role of web technologies in this. A decision tree and an artificial neural network were created using the Wisconsin breast cancer diagnosis dataset. Two convolutional neural net-works were created and trained on the EyePacs-1 retinopathy image dataset. A prototype FHIR resource was constructed to facilitate model sharing. Results The scoping study identified 38 final publications after applying exclusion criteria. These identified missing publications on using transfer learning as a pillar of improving clinical decision support interoperability. The decision tree (f1-score 0.94) and neural network (f1-score 0.89) models showed similar performance on breast cancer diagnosis, where the decision tree performed slightly better in classification and offers better insight into the inner workings of the model. The convolutional neural networks similarly showed similar results with a categorical cross-entropy loss of 0.77 for the baseline model and a slightly better 0.75 for the transfer learning initialised model. Conclusion This study revealed the gap in knowledge in considering transfer learning as a means to increase CDSS interoperability. Further research is required to explore if sharing models through FHIR resources is a viable step to improving CDSS interoperability using EHR data. Modern CDSS are shown to be modular, interoperable, and relatively accurate tools for healthcare professionals. The development of traditional and expert-based systems should not be abandoned. These systems have proved their worth over the years. Modern CDSS methods can, however, analyse complex data directly without a transcription of features into numerical formats, and they can lead to new links between concepts that were previously not thought of.

Keywords: Clinical Decision Support, Interoperability, Deep Learning, Transfer Learning, Fast Healthcare Interoperable Resources

(6)

(7)

Abstract

Introductie. In de gezondheidszorg vinden momenteel vele technologische innovaties plaats. De bruikbaarheid van nieuwe technologieën hangt af van de mate waarin deze geïntegreerd kunnen worden in systemen. De recente opkomst van kunstmatige intelligentiegebaseerde systemen in de gezondheidszorg voor diagnostiek en prognose heeft veelbelovende resultaten opgeleverd, maar zijn deze ook geschikt om de interoperabiliteit in de gezondheidszorg te verbeteren? Data en methoden Er is een verkennend onderzoek uitgevoerd naar de huidige kennis over de ondersteuning van klinische beslissingen en de interoperabiliteit van systemen voor elektronische medische dossiers, alsmede naar de rol van webtechnologieën in dit verband. Met behulp van de dataset Wisconsin borstkankerdiagnose dataset werden een beslisboom en een kunstmatig neuraal netwerk gecreëerd. Twee neurale netwerken werden gecreëerd en getraind op de EyePacs-1 retinopathie afbeeldingen dataset. Om het delen van modellen te vergemakkelijken werd een prototype FHIR resource gecreëerd. Resultaten In de verkennende studie werd vastgesteld dat er weinig publicaties zijn met als onderwerp transfer learning als een pijler voor het verbeteren van de interoperabiliteit van klinische beslissingen. De beslissingsboommodellen (f1-score 0,94) en het neuraal netwerk (f1-score 0,89) toonden gelijkaardige prestaties op het vlak van borstkankerdiagnose, waar de beslissingsboom iets beter presteerde in de classificatie en een beter inzicht biedt in de innerlijke werking van het model. De convolutionele neurale netwerken toonden vergelijkbare resultaten met een categorisch cross-entropie verlies van 0,77 voor het basismodel en een iets lagere 0,75 voor het transfer learning geïnitialiseerd model. Conclusie Uit deze studie is gebleken dat er onvoldoende onderzoek is gedaan naar de rol van transfer learning bij het verbeteren van CDSS interoperabiliteit. Nader onderzoek is nodig om na te gaan of het delen van modellen via fhir een haalbare stap is om de interoperabiliteit van het CDSS te verbeteren door gebruik te maken van EHR-gegevens. Moderne CDSS blijken modulaire, interoperabele en relatief nauwkeurige hulpmiddelen te zijn voor beroepsbeoefenaren in de gezondheidszorg. Moderne CDSS-methoden kunnen leiden tot nieuwe inzichten.

Sleutelwoorden: Klinische Beslissingsondersteuning, Interoperabiliteit, Deep Learning, Trans-fer Learning, Fast Healthcare Interoperable Resources

(8)

(9)

List of figures

2.1 Overview of the role of standards in connecting disparate systems in healthcare settings. . . 8

2.2 FHIR Patient Resource Structure definition in XML showing the basic structure sequence of a Resource. . . 13

2.3 Imagenet Benchmark classification performance over time . . . 14

3.1 Features of the Breast Cancer Wisconsin set . . . 22

3.2 Random sample of 3 images from every category of the retinopathy dataset [1]. The images show different levels of illumination and orientation. Clear cases of retinopathy can be seen by the presence of clusters of white dots such as in 1391_left. . . 23

3.3 Graph version of the model creation and evaluation pipeline for the Wisconsin dataset. . . 25

3.4 Graph version of the model creation and evaluation pipeline for the Retinopathy dataset . . . 26

4.1 PRISMA diagram of the scoping review. . . 30

4.2 Pie chart showing percentage of published articles by discipline for the included publications. . . 32

4.3 Number of publications per funder type. . . 32

4.4 Decision Tree classifier for the Wisconsin Breast Cancer dataset. . . 34

4.5 Data-driven Neural Network classifier Architecture for the Wisconsin Breast Cancer dataset. . . 34

4.6 Data-driven ’Generic’ Convolutional Neural Network Classifier Architecture for the EyePacs Retinopathy data. . . 36

4.7 Transfer learning VGG16-based Convolutional Neural Network Classifier Ar-chitecture for the EyePacs Retinopathy data. . . 37

(12)

xii List of figures

4.9 Training accuracy per epoch for the baseline model and transfer learned model. 39

4.10 Validation set losses per epoch for the baseline model and transfer learned model. 39

4.11 Validation set accuracy per epoch for the baseline model and transfer learned model. . . 40

4.12 Model losses and accuracies for both the baseline model and the transfer learning initialised model . . . 40

4.13 Model testing performances in wall time for increasing image batch sizes (log scaled). . . 41

4.14 Visualisation of the 64 filters in the first convolutional layer of the transfer learning initialisation model. . . 42

4.15 Visualisation of the 32 filters in the second-to-last layer of the transfer learning initialisation model. . . 42

4.16 Using the gradient of the output category with respect to the input image provides pixel-specific influence on the output. The images on the right show the salient image regions for the images on the left. . . 43

D.1 Rendered snapshot of the model architecture as a FHIR resource showing cardinality and datatypes per field. . . 72

D.2 Rendered snapshot of the model weights as a FHIR resource showing cardinal-ity and data types per field. . . 73

E.1 Confusion matrix of the baseline model showing correct classifications along the diagonal axis in blue and incorrect classifications above/below the diagonal axis in red. . . 75

E.2 Confusion matrix of the transfer learning initialised model showing correct classifications along the diagonal axis in blue and incorrect classifications above/below the diagonal axis in red. . . 75

(13)

List of tables

3.1 Search table for the research questions of this study. ^marks terms used

additionally in non-medical databases. . . 19

4.1 Included Publications per Cluster . . . 31

4.2 Performance metrics for the Wisconsin dataset decision tree and neural network models . . . 35

A.1 Publications found in the search . . . 63

A.2 Additional Publications used . . . 64

B.1 Frameworks used for the creation of the models . . . 65

C.1 Description of dataset features . . . 68

C.2 Optimal Hyperparameters for the Decision Tree . . . 69

C.3 Optimal Hyperparameters for the Artificial Neural Network Model (* marks default values). . . 69

(14)

(15)

Nomenclature

Acronyms / Abbreviations

API Application Programming Interface

CDA Clinical Document Architecture

CDSS Clinical Decision Support System

DNS Domain Name System

EHR Electronic Health Record

FHIR Fast Healthcare Interoperability Resources

FT P File Transfer Protocol

HIS Health Information System

HL7 Health Level 7

HT ML Hypertext Markup Language

HT T P Hypertext Transfer Protocol

HT T PS Hypertext Transfer Protocol Secure

ICDSS Intelligent Clinical Decision Support Systems

ICPC International Classification of Primary Care

ISO International Organisation for Standardisation

IT Information Technology

(16)

xvi Nomenclature

REST Representational State Transfer

SDO Standard Developing Organisation

SMART Substitutable Medical Applications and Reusable Technologies

SNOMED-CT Systematized Nomenclature of Medicine Clinical Terms

SOAP Simple Object Access Protocol

TCP/IP Transmission Control Protocol /Internet protocol

T L Transfer Learning

U RI Unique Resource Identifier

U S United States

V IPP Versnellingsprogramma Informatie-uitwisseling Patient en Professional

(17)

Chapter 1 Introduction

1.1 Context

Healthcare is currently experiencing technological innovations at an increasing rate [2–4]. Recent progress in artificial intelligence applications for healthcare is of particular interest for diagnostics and prevention [2]. The usefulness of new technologies and methods such as big data and artificial intelligence depend on how well these integrate with existing hospital sys-tems. Interoperability enables communication and data exchange between various information systems within one institution, but also between information systems of separate institutions.

Wide-scale interoperability has been possible in various sectors (financial, telecommuni-cation, aviation) for many years[5]. In the healthcare sector, hospitals have had data worth sharing for a long time. Exchanging data has not taken off partly due to conflicting interests of competing health Information Technology (IT) vendors. Having interoperable electronic information systems would make it easier for hospitals to migrate to other, more competitive vendors. The conflict of interest has led to a business model of vendor lock-in[6]. In addition, privacy and security reasons are mentioned as [4] a reason for the resistance to change.

Recently interoperability was made a more important consideration for vendors. Federal programs in the United States have included incentive payment models for healthcare institu-tions that adopt ’meaningful use’ technology. Meaningful use emphasizes care coordination and exchange of patient information [6]. This lead to major United States (US) -based Elec-tronic Health Record (EHR) vendors founding the CommonWell Health Alliance. The aim of this alliance is to focus on making interoperability a key feature of health IT systems[7]. In the Netherlands, a similar programme called Versnellingsprogramma Informatie-uitwisseling Patiënt en Professional (VIPP) was launched in 2017[8]. This programme includes incentive payment models for hospitals that standardise information exchange between patient and professional, and between healthcare institutions by 2020.

(18)

2 Introduction

Interoperability is facilitated through the adoption of standards. Many health information exchange protocols are considered to be weak standards, standards which are not widely adopted. Standards for transmission of clinical documents (CDA), and medical knowledge representation (Arden syntax) are not widely adopted among healthcare institutions [5]. Strong standards are better-known standards that are used on a large scale. Examples of strong standards are the HyperText Transfer Protocol Secure (HTTPS) protocol for secure web browsing, Domain Name System (DNS) for a domain name to IP address translation, and also device standards such as International Organization for Standardization (ISO) /International Electro-technical Commission (IEC) 9995 Keyboard layouts. To develop a strong standard one needs to make many attempts at communication between different implementations of standards rather than creating new standards. Out of the numerous existing health information standards the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT) and Health Level 7 (HL7) standards are considered key standards[5]. HL7 standards such as Fast Healthcare Interoperability Resources (FHIR) provide messaging structure and SNOMED CT provides the clinical terminology. These standards offer interoperability through REST and web technology Application Programming Interfaces (APIs). APIs that follow Representational State Transfer (REST) principles have shown that it is possible to build large integration-based ecosystems with services that scale well [5]. FHIR uses the RESTful paradigm to define data elements as a set of services that can be manipulated through resource Uniform Resource Identifiers (URIs). FHIR shows the most promise for a universal healthcare interoperability. It has recently been adopted by major United States (US) health vendors and US-based technology companies in the Argonaut Project as the main standard for increased healthcare interoperability[9,10].

1.2 Problem

The majority of third party Health Information Systems (HIS) are historically insufficient for integrating external modules. The integration problems were sometimes solved by replicating medical records into separate databases, or by duplicating streams of HL7 messages at their communication servers [11].

Recent government initiatives such as the ’meaningful use’ regulation have stimulated major EHR vendors to move towards more interoperable EHRs. Combined with recent developments in building prediction models in healthcare [12], clinical decision support development is be-coming an increasingly important factor in optimal care delivery. The degree of interoperability of these decision support applications remains to be seen, however, the expectation is that this will increase due to the governmental regulations and their financial reimbursements.

(19)

1.3 Goal 3

The adoption of EHRs by hospitals around the world led to a large increase in available data for secondary use. Additionally, new data-driven AI methods (Intelligent CDSS) are being implemented in favour of the more traditional expert systems and guideline formalisation based CDSS after having shown their effectiveness in a wide scale of use-cases. These methods tend to work better than traditional methods when combining heterogeneous types of data or when analysing complex data types such as images and recordings [2]. Depending on the complexity of the data, creating such models can require a large number of computational resources. Transfer learning [13] (TL) has shown potential in offering a scalable solution for classification based tasks. Transfer learning is based on the principle of reusing previously learnt information on new use-cases. Similar to how humans learn new tasks by partly relying on previously learnt skills or information, transfer learning can improve the learning capability and speed of algorithms [14].

1.3 Goal

The primary goal of this SRP is investigating the status of interoperability between EHR and CDSS, and how this can be improved using REST architecture. The focus lies on whether these standards facilitate the integration of applications using modern self-learning AI methods. The secondary goals will be to create a self-learning knowledge base through modern AI methods which offers decision support in the form of disease prediction, to investigate if these models can be made interoperable through transfer learning, and what the advantage of the interoperability of these models is.

1.4 Research Questions

Several research questions are synthesised to explore the goals of this study. The research questions are:

• How can interoperability of CDSS on EHR data be improved?

– What current standards for CDSS interoperability exist? – What is the role of REST in improving interoperability?

* What is the role of FHIR in improving interoperability?

– What are the differences between traditional decision support systems and Intelli-gent decision support systems?

(20)

4 Introduction

• What is the best way to build a prototype predictive model based on clinical knowledge?

• Can we make the pre-trained generalised model interoperable?

– How can REST be applied to the model?

1.5 Approach

A scoping review will be conducted to map key concepts, developments and gaps in interoper-ability of EHRs and CDSS by systematically searching, selecting, aggregating and synthesizing existing knowledge.

Additionally, a prototype application of a self-learning CDSS for the classification of the presence of breast cancer will be created using both traditional- as well as modern self-learning AI methods. A decision tree and a neural network will be created and compared. A second prototype model that classifies the degree of diabetic retinopathy using medical imaging data will be created. Two convolutional neural networks will be made, one trained without any prior knowledge within the network, and one using transfer learning with a knowledge-based from a previous, related image classification task. The possibilities of transfer learning as a data-driven version of a decision support knowledge base will be explored. After discussing with experts from the core FHIR group a prototype resource to share decision support models will be made. The validation of this resource is outside of the scope of this study.

1.6 Expected Results

The results will contain an overview of past, current and ongoing developments in CDSS and EHR interoperability. Key concepts, developments and gaps in knowledge will be identified. Lastly, data-driven decision support models will be created and compared to traditional decision support models. These models will be used to investigate the possibilities of a transferable knowledge base and to investigate if these models can be made interoperable as a modular FHIR-based resource.

1.7 Structure of this thesis

Preliminaries: The preliminaries chapter describes the background knowledge on key concepts that are required for understanding this thesis. Data and Methods: The data and methods section will describe the origin of the datasets that are used in this study. Furthermore, the exact

(21)

1.7 Structure of this thesis 5

methodology of both the literature search and scoping review synthesis will be described, along with the methodology for prediction model creation, validation, and increasing interoperability. Results: The results section will describe the scoping review findings, as well as the outcome of the self-learning prediction models. Additionally, the results of making the models more interoperable through REST technology will be described. Discussion: The discussion section will contain a discussion of the findings of the scoping review. Limitations and strengths of design choices, strategies, executions and practical matters of this thesis will be discussed. Implications for future research will be made. Appendix: The appendices will include detailed search strings, the source code for the self-learning models, and tables and graph that are not directly relevant to the matter discussed in the previous chapters.

(22)

(23)

Chapter 2 Preliminaries

The goal of this research is to describe the state of the art of interoperability between electronic patient recordsand clinical decision support systems. In particular recent efforts to improve interoperability through the use of representational state transfer such as the HL7 FHIR standard and the impact of these standards on the integration of AI-based methods such as transfer learningin healthcare. Each of these concepts will be explained in turn.

2.1 Interoperability

Healthcare interoperability, the extent to which different systems can exchange and interpret data, and interpret that shared data, depends on the usa of standards. Standards are defined by ISO as documents that provide rules, guidelines or characteristics for activities or results for optimizing the degree of order in a given context [15]. Two main types of standards can be defined: Exact specifications and minimum thresholds. In healthcare, mainly exact specifications are needed on the technical levels[5].

In healthcare interoperability, a distinction can be made between technical-, syntactic-, semantic-, and process interoperability. Technical interoperability facilitates protocols for data exchanged on a system. Syntactic interoperability allows for the exchange of data between different systems. Semantic operability adds a layer to the information that is exchanged so that the meaning of the exchanged information is kept when exchanged between systems. Lastly, the process interoperability level allows for health data users to leverage technologies and processes in a way that improves information exchange efficiency and cost effectiveness[16]. An example of technical interoperability is Hypertext Transfer Protocol Secure (HTTPS). As previously mentioned, it is a strong standard for secure communication between computer networks. It is widely used on the Internet where it is replacing the HTTP protocol. HTTPS offers bi-directional encryption which protects privacy and integrity of the data exchange

(24)

8 Preliminaries

while it is transmitted. Syntactic interoperability is provided by standards such as Extensible Markup Language (XML). XML encodes a document in a format that can be read both by humans as well as machines. It can be used to store and transport data independent of software and hardware[5]. Semantic interoperability is delivered by standards such as Systematized Nomenclature Of Medicine Clinical Terms (SNOMED CT). SNOMED CT defines static clinical concepts that are identified by unique concept identifiers. Each of these concepts is associated with human readable definitions of the concept[5]. Process interoperability incorporates all categories of standards [16]. It refers to the ability of cooperation between different workflow processes in different systems.

Fig. 2.1 Overview of the role of standards in connecting disparate systems in healthcare settings (adapted from [17])

Development, maintenance, and propagation of standards are usually done by Standard Developing Organisation (SDOs). SDOs are organisations that are active in the development of standards. An example organisation is the Unicode Consortium that develops and maintains the Unicode Standard used for representing text in all software products and standards. SDOs can also provide standards on a national level. Nictiz develops and maintains Dutch standards for communication within healthcare. The AORTA infrastructure is an example of such a standard. It provides a network for data exchange between healthcare institutions and healthcare professionals.

(25)

2.2 Electronic Health Records 9

ISO is an international organisation that is comprised of representatives from national SDOs. ISO provides international standards, technical specifications, and guides. Examples of common ISO standards are ISO/IEC 27001 for information security management, ISO 13485 for medical devices, and ISO 639 Language codes. ISO standards conformance is however voluntary, and by charging for ISO documents they are not resources that every developer would consider necessary.

2.2 Electronic Health Records

EHRs, the systems containing digital health data, are one of the main contributors to the rapid digitalisation and computerisation of the healthcare industry. In the USA the HITECH act of 2009 provided monetary incentives for hospitals use EHR systems. Since the act, a 9 fold increase in basic EHR system adoption has been reported [18]. EHR systems can have various functionalities with regards to what types of data are stored. The most basic EHR systems provide administrative tasks and relevant patient medical information such as physical characteristics, blood values, diagnoses, histories and treatments. More complex systems allow for storage of clinical notes and clinical decision support modules for prescriptions, diagnoses, and follow-ups.

The increasing adoption of EHRs has increased the amount of data contained in EHRs. The primary use of the EHR data is to improve efficiency and effectiveness of the medical processes done by medical professionals. Results of studies done using EHR data for research (secondary use) have become increasingly powerful. Due to the varying types of data stored in the EHR research can be done on patient characteristics, medical codes, text entries by healthcare professionals, laboratory tests, and medical images. The EHR has taken on an additional role as a patient data provision system for research.

EHR have an existing ISO standard that specifies the requirements for electronic health records, their content, clinical models, and their architecture (ISO 13940, 13606, 13972, 18308). Conformance to these standards is not enforced, leading to many EHR vendors creating

products in which interoperability is not highly important. Moreover, the EHR marketing model was historically one of vendor-lock-in. Vendors found little financial incentive for providing interoperable systems. The ’Meaningful Use’ and VIPP initiatives in the USA and the Netherlands respectively have caused developers to reconsider interoperability as an important factor in their products.

(26)

10 Preliminaries

2.3 Clinical Decision Support Systems

Clinical Decision Support System (CDSS) is an umbrella term that comprises systems with a large variety of tasks for managing clinical data, alerting clinicians of conflicting situations, decision-making tools for patient diagnosis and treatment. A CDSS can be any computer program designed to help health professions make clinical decisions. This can range from data management system tools to drug prescription interaction conflict tools [19].

Interaction with other clinical systems is desirable due to the dependency on clinical information. CDSS has been shown to encourage guideline adherence for prevention and treatment and has reduced medication errors [20]. However, after more than 30 years of developments, most CDSS are still stand-alone systems or small components embedded within the EHR [19]. In the last years, more EHR vendors have begun to incorporate CDSS modules and the ability to more easily connect a CDSS module to an EHR [21].

There are various frameworks that describe the dimensions of CDSS. When looking at the method of decision making one can divide decision support systems into knowledge-based CDSS, and non-knowledge-based CDSS.

The increased adoption of EHRs by healthcare institutions has increased the number of patient data available. However, the heterogeneity of clinical data sources in the forms of text, image, numerical, and categorical data have made it difficult to represent data. Furthermore, keeping the knowledge base up to date with developments in the medical literature is a tedious process.

Novel approaches to CDSS leverage the recent developments in AI domain of deep learning. Deep learning, as opposed to ’shallow learning’, involves successive layers of computational linked nodes. These models provide increased accuracy at the cost of decreased performance [2]. Self-learning CDSS circumvent some of the issues of knowledge-based CDSS by not requiring a pre-defined knowledge base, and by accepting various combinations of data types as input. The reasoning mechanism can learn features and relations regardless of the data type of the input and the knowledge base does not require periodical manual updating.

2.4 Representational State Transfer

In the early 1990s, the Web consisted of client and server components that were united through common protocols for addressing documents (URI), displaying documents (HTML), and transferring documents (HTTP). An HTTP object model was introduced as a test case for understanding how changes to the HTTP protocol might impact applications on the Web. The HTTP object model was reconsidered as an architecture and was given the name REST [22].

(27)

2.5 HL7 FHIR 11

The original description of Representational State Transfer (REST) defines it as an archi-tectural style for network-based applications with five associated constraints and properties. The constraints include resources that must be identified by one identifier mechanism, access methods having the same semantics for all resources, resources being manipulated through the exchange of representations, representations being exchanged via self-descriptive messages, and having hypertext as the engine of the application state. REST constraints do not constrain Web architecture, rather developers choose whether to constrain architectures with REST constraints [22].

REST offered an alternative to the server side web services offered by SOAP. SOAP is the underlying protocol layer upon which web services were built and is still widely used by enterprises due to the security it offers. REST, however, scales better than SOAP on the server-side, while at the same time being less complex to implement as it was specifically designed with internet-scale usage in mind [23], web services could now include direct ac-cess in Web-browsers. As a consequence, major service platforms providers have migrated from SOAP/WSDL to REST architectures [22]. Examples of widely used REST based web-applications are Google Docs, Whatsapp Web, and Office 365. REST also offers the potential for increased interoperability. Through APIs, it is possible to set up methods of communication between various technologies in a ’plug-and-play’ fashion. APIs simplify the development process by abstracting low-level implementation away and exposing operations that a developer would need. Developers do not need to understand exactly what is happening at a lower-level to use an API.

2.5 HL7 FHIR

HL7 is an SDO that focusses on standards for hospital information systems. Both information system exchange, as well as document structure and content standards are maintained by HL7. HL7 V2 was created to integrate disparate hospital systems. Because the V2 standard was created by clinical interface specialists it lacked modelling capabilities, complexity, and consistency. HL7 V3 was created to address these drawbacks. V3 introduced a reference information model that defined the structure of all semantic elements. Implementing V3 required a deep understanding of the reference model, making it relatively difficult to master. The semantic interoperability that HL7 V3 theoretically offers is hard to implement due to the syntactic complexity of the reference model [24].

The high entry barrier and low implementation levels for V3 made HL7 reconsider their approach to healthcare information representation and exchange. HL7 V3 evolved into HL7 FHIR. FHIR is a relatively new standards framework that builds upon the previous HL7

(28)

12 Preliminaries

standards. This approach is based on the REST architectural constraints as described by their creator Fielding [22]. A true RESTful API should have:

• a uniform interface

• a client-server application without interdependency,

• statelessness

• cacheable

• layered system

• code on demand (optional)

FHIR defines core components called Resources (fig 2.2). Each of these is an expression with a representing a concept defined through associated fields. Concepts such as Patient, Observation, and MedicationPrescription are defined among the core resources that can be customised (dubbed profiling in the FHIR API) to fit specific use-cases.

The reinterpretation of HL7 V3 to FHIR was executed by the ’fresh look task force’. This task force also included the participation of the Boston Children’s Hospital SMART team. The SMART team focusses on creating interoperable, vendor-independent apps [25]. SMART has extended FHIR to allow for a connection between health apps with EHR systems. This provides an app development capability for developers to create specific instances EHR data to be represented in a more intuitive way.

Over the last year, several major EHR vendors have adopted FHIR technology in their systems. The US-based vendors Epic and Cerner, while historically major rivals in the US healthcare market, have initiated the Argonaut project, which strives to advance the industry-wide adoption of interoperability standards through the development of an FHIR standard [26]. Participants include key industry actors such as the Mayo Clinic, McKesson, Cerner, Epic, Google, and Apple.

2.6 Artificial Intelligence in Healthcare

Traditionally data analysis of heterogeneous EHR data was done using statistical techniques such as logistic regression, decision trees, and multivariate regression. Advances in both available data and processing hardware have contributed to breakthroughs in artificial intelli-gence through the creation of deep learning algorithms. These algorithms have recently gained popularity in applied artificial intelligence fields. Deep learning has been applied to EHR data where it often showed better model performance.

(29)

2.6 Artificial Intelligence in Healthcare 13

Fig. 2.2 FHIR Patient Resource Structure definition in XML showing the basic structure sequence of a Resource [27].

(30)

14 Preliminaries

Both traditional and deep learning techniques fall within the artificial intelligence sub-domain called machine learning. Machine learning can be subdivided into supervised, unsuper-vised, and reinforcement learning. Supervised learning attempts to discover a function f (x) that transforms an input variable x to an output y where both input and output variables are known. An example would be to infer a function that maps a tumour as benign or malignant given the characteristics of the biopsy. Unsupervised learning attempts to discover information about the input variable itself. An example would be to discover similarities in patients with specific symptoms that would suggest a disease that they could have in common. Reinforcement learning is based on the idea of classical conditioning where a software agent is trained to reach a predefined goal while receiving rewards for nearing or completing the goal, and negative rewards (as less reward than in the positive case) for failing to complete the goal.

Fig. 2.3 Imagenet Benchmark classification performance over time

A large part of creating a well-functioning model depends on feature selection. With traditional models, features would need to be selected or created through domain-specific knowledge. Because not every relationship between features is known even by experts, it is often a combination of scientific evidence, intuition, and even luck that contributes to the selection and the importance of features [2]. With deep learning techniques, this process can be automated to a certain degree. Domain-specific knowledge is still required to select relevant data for analysis. What sets deep learning apart from traditional, or ’shallow’, learning is the depth of the layers of the model. Successive layers provide increasingly meaningful relationships between the features [28]. Figure 2.3 shows a steeper decrease in successive model architectures after the introduction of AlexNet [29], widely considered to be the first deep learning model for image classification.

Deep learning models can also be transferred to new data without a need for creating and training a new model entirely. The layers of weights and biases of a model can be exported

(31)

2.6 Artificial Intelligence in Healthcare 15

and reused for different problems that are at least roughly related. Similar to how humans use prior experiences as a starting point when discovering or learning a new task or skill, the field of transfer learning seeks to discover a method to improve model accuracy by transferring previously learnt relationships onto new problems. Transfer learning has been named the ’next frontier’ of machine learning[30].

(32)

(33)

Chapter 3 Data and Methods

This chapter describes the methodology of:

1. The scoping review study (including the search strategy and literature relevancy filtering).

2. The descriptions of the datasets used to create the models and the processes of model creation.

3. The transfer learning of the model knowledge and capture of the knowledge in a FHIR Resource.

All code used in the creation of the models can be found in Appendix C. The tools used to create the models, benchmark tests, and graphs can be found in Appendix B.

3.1 Scoping Review

A scoping review can be seen as a combination of a narrative review and evidence mapping. The scoping review adds a narrative integration of the relevant evidence to this[31].

The scoping review in this study is based on the framework described in Pham et al. [32]. This framework describes a five-stage process consisting of:

• Identification of research questions

• Searching for relevant studies

• Filtering through relevant studies

• Charting the data

(34)

18 Data and Methods

3.1.1 Search Strategy and In/Exclusion Criteria

The systematic search strategy was created according to the steps defined in Bartels[33]. The research questions stated several keywords that were used to create a search table (table 3.1).

The inclusion criteria was divided between search and relevancy filters. The search filter (appendix A) included a criterion of publications no older than 5 years at the time of performing the search. This was done to minimise the inclusion of outdated publications as the health information technology field and especially the artificial intelligence field were fast moving and changing fields. Relevancy was determined by checking for the presence of at least three of the keywords "Interoperable", "Electronic Health Record", "REST", "Clinical Decision Support System", "FHIR".

(35)

3.1 Scoping Review 19 T able 3.1 Search table for the research questions of this study . ^marks terms used additionally in non-medical databases. AND Inter operability Clinical Decision Making Health Record Systems Interoperab* DSS Electronic Health Record Representational State T ransfer Decision Support System EHR OR REST Decision Support Electronic M edical Record RESTful CDS (^) EMR F ast Interoperable Health Resources Clinical Decision Support (^) Electronic P atie nt Record FHIR Clinical Decision Support System (^) EPR HL7 FHIR CDSS (^) Health Inform ation System W eb T echnology HIS Standardi*

(36)

20 Data and Methods

The Search table was created by merging keywords with a logical AND on the horizontal axis, and a logical OR on the vertical axis. An example string would have been "(Interoperab* OR REST) AND (DSS OR CDS) AND (EHR OR EMR)". The * equaled a wildcard character that searched for the letters in front of a character and any letter behind it, making sure that various forms of the same term are found accounting for possible differences in American and British English spellings. The full search strings and MeSH terms per database search engines can be found in Appendix A.

Several databases were selected to maximise the number of articles found. Google Scholar was used for the ranking algorithm based on a number of citations. ArXiv and IEEE were used to include recent papers with a focus on computer science and artificial intelligence. ScienceDirect, Web of Science, Scopus and Pubmed for their focus on health science and medicine.

The inclusion criteria were:

• Published between 2012 and December 2017 (after the publication of the first deep learning architecture ’AlexNet’ [29].

• Main subject contains at least two of the keywords in the header of table 3.1 (Interoper-ability, Clinical decision making, Health record systems.

• Full text available in English

Exclusion criteria were:

• Unpublished Master Thesis

• Article not retrievable

• Editorials

3.1.2 Data Extraction and analysis

The resulting articles were filtered in three passes. The first pass included reading the title and subtitles. The second pass included reading the full abstract. The third and final pass included reading the introduction and reference lists.

Data extraction was done through full-text analysis of the content of each record. The data extraction included titles, authors, publication years, publication journals, publication journal sectors, number of times the publication was cited, funders, research questions, methods, results, self-reported strengths and weaknesses and conclusions. The publications were classified based

(37)

3.2 Intelligence Decision Support Models 21

on their subject matter using the keywords defined by the authors. Additionally, they were clustered by discipline based on the focus sector of the journals. Publication funder types were also extracted. The methodology and results of publications focussing on more than one category of the given categories in table 4.1 were fully extracted.

3.2 Intelligence Decision Support Models

The second step was to create self-learning intelligent decision support models. Specifically, the algorithms created in this study were data-driven algorithms. Specifically, a decision tree and a shallow neural network were created for the Wisconsin Breast Cancer dataset. For the EyePacs dataset two versions of a convolutional neural network were created, one trained using randomly initialised weights and one using transfer learning as a base.

3.2.1 Datasets

The medical datasets mentioned in the previous section were used to create the models. The first dataset was the Breast Cancer Wisconsin (Diagnostic) Data Set [34]. This dataset contains numerical features of images of breast biopsy masses that describe characteristics of cell nuclei present in the images. The set contained data on 569 individuals, with 32 attributes for every individual. The attributes included features such as biopsy radius, texture, area, concavity, and because this is a supervised classification dataset, the diagnosis labels for every instance. The diagnosis label was a binary categorical value that takes on either a value M for malignant, or B for benign. For a full description of all the attributes see appendix C table C.1.

The second dataset used was the EyePacs diabetic retinopathy dataset [1]. These images contained both left and right eyes with either a visible (degree of) presence or absence of diabetic retinopathy (Figure 3.2), as judged by healthcare professionals. The images displayed varying degrees of brightness and orientation. The images were made using funduscope that took images of the interior surface of the eye. Rather than using numerical data that describes image data such as in the Breast Cancer Wisconsin dataset, this dataset contained images that showed either a presence (of varying degrees) or no presence of diabetic retinopathy. A clinician rated the presence of diabetic retinopathy in every image by assigning a value ranging from 0 to 4. The value 0 represented a lack of retinopathy, 1 was equivalent to mild retinopathy, 2 equaled moderate retinopathy, 3 represented severe retinopathy and 4 represented proliferative retinopathy [1].

(38)

22 Data and Methods

(39)

3.2 Intelligence Decision Support Models 23

Fig. 3.2 Random sample of 3 images from every category of the retinopathy dataset [1]. The images show different levels of illumination and orientation. Clear cases of retinopathy can be seen by the presence of clusters of white dots such as in 1391_left.

(40)

24 Data and Methods

3.2.2 Model Types

A decision tree is a machine learning decision support tool that can be created manually or using data-driven methods. As such it is an algorithm that can be viewed as a method for both expert systems creation (figure 3.1) and data-driven algorithm creation. A data-driven decision tree learns rules that are inferred from the structure and relations within the features in the data. These rules can then be visualised in a tree graph. The tree can also be expressed using boolean logic. The high level of interpretability makes decision trees white box models. In contrast, neural networks are considered black box models. Partly inspired by the biological mechanisms of the human brain, artificial neural networks can be used to ’learn’ relationships between data. The network engineers feature within the data without requiring explicit feature engineering prior to the modelling. Neural networks can be used with both numerical data as is the case in the Wisconsin dataset. They can also be used for more complex data such as the images in the EyePacs data set.

3.3 Model Creation

3.3.1 Wisconsin Data Model Creation

The Wisconsin Breast Cancer dataset was a preprocessed dataset. It contains no missing values for any of the features. A single exception was the Feature Unnamed: 32 which does not contain any numerical values (figure 3.1). This column was removed. The ’diagnosis’ column was separated from the rest of the data as this column contained the correct labels for the classification of each participant. The remaining 31 columns were used as training data.

Figure 3.3 shows a graph version of the data splitting and model creation and evaluation stages. Because of the relatively low (569) amount of instances the dataset was split using stratified 5-fold cross-validation. The data was randomly shuffled into 5 sets where each set was equal to the original size. Each fold was then taken as training data and validated on the next fold one at a time. The ’diagnosis’ column was separated from both sets for later model evaluation. The decision tree model and neural network models were trained using the hyperparameters described in Tables C.2 and C.3 of appendix C respectively. The scores were evaluated using precision, recall, f1, and loss scores.

The dataset was used to create both a decision tree model classifier as well as a neural network classifier. Preprocessing included the removal of the unnamed column (figure 3.2). The creation of the model requires tuning the hyperparameters. Hyperparameters are parameters of a model that are assigned a specific value before the training of the model. These values are either randomly selected or picked through processes such as an exhaustive grid search

(41)

3.3 Model Creation 25

Fig. 3.3 Graph version of the model creation and evaluation pipeline for the Wisconsin dataset.

which tries out various combinations of pre-defined settings and registers the best performing combination of values. For the decision tree, these include the maximum tree depth, minimum samples for splitting and minimum leaf samples. The neural network hyperparameters include layer size, network depth, epochs, and the batch size (the hyperparameters are explained in depth in Appendix B). The metrics used to assess the predictive strength of the model included accuracy scores, loss scores, f1 scores, and receiving operator characteristics scores.

The hyperparameters for the decision tree and neural network were selected using an exhaus-tive grid search using scikit-learn and hyperas, using distributed asynchronous hyperparameter optimization, respectively. The models were trained on a training set and tested on a test set containing 33% of the data. The training and test set splits were created using a shuffled random split. The models were trained on the training set. The metrics used to assess the predictive strength of the model included accuracy scores, loss scores, f1 scores, and receiving operator characteristics scores. Transfer learning was performed by splitting the original dataset in half, training the neural network on the first half and using the trained network on the second half of the dataset. The strength of the transfer learning model was assessed using the same metrics as for the non-transfer learning model.

(42)

26 Data and Methods

3.3.2 EyePacs Model Creation

The dataset was shuffled and split into a training set of 10000 images, a validation set of 1000 images, and a test set of 2000 images. All images were resized to a 224 by 224-pixel height and width dimension for both computational reasons, as well as compliance with the requirements of the transfer learning model that was used later on. The images were first vectorized into tensors of shape (224, 224, 3) for the dimensions (height, width, RGBchannels). A batch dimension was added to allow batch processing of the vectorised images, creating a 4D tensor of shape (None, 224, 224, 3) where None represents an arbitrary batch number. As shown in figure 3.4, the training set was used to train the model. The validation set was used to test the predictive accuracy of the model at every epoch of the training process. The test set was used upon completion of the model to indicate the degree of generalisation of the model. The metrics used for this model were categorical cross-entropy loss and accuracy during training, with additional recall, precision and f1 scores for the evaluation of the training set. Hyperparameter optimisation was not possible due to a lack of computational resources.

Fig. 3.4 Graph version of the model creation and evaluation pipeline for the Retinopathy dataset

3.4 Differences between the datasets

A major difference between these datasets was the type of data that is used. The Wisconsin dataset indirectly describes medical images through the creation of computed variables of the breast mass. The data was fed into the neural network model in a 2D tensor of shape (None, 30) with None representing an arbitrary batch number and 30 representing the 31 features of the dataset (indexing starts at 0). The EyePacs dataset was used in the original image form, without the intermediate step of creating features upon viewing the images other than the diagnosis

(43)

3.5 FHIR Resource 27

given by the healthcare professional. Additionally, this model displayed validation accuracies and losses per epoch, showing the progress of the model ’learning’ the features within the data. Another difference was that the metrics of this model were measured against the performance of a healthcare professional that labelled the image data.

3.5 FHIR Resource

In cooperation with a FHIR core team expert, a logical model of a prototype FHIR Resource was created to store the knowledge learnt by the Intelligent Decision support systems. The logical model represents an abstracted data model of a non-core FHIR resource. This resource would enable self-learnt models to be interoperable, making it possible to share both the model architecture as well as the knowledge stored in the model weights. The structure of the resource was based on the native Keras model output format of the models created for the retinopathy dataset, the hierarchical data format. The logical model was created using the tool Forge[35]. The model contained a hierarchical data format with a succession of layers that represent the order of the network.

(44)

(45)

Chapter 4 Results

This chapter describes the results of both the scoping review and the decision support model creation. The scoping review results are split into quantitative and qualitative results. The decision support model results are exclusively quantitative. Additional tables and figures can be found in Appendix D.

4.1 Scoping Review

Figure 4.1 shows the results of the searching and screening process. The Scopus search included two articles that could not be retrieved after the screening of the abstract. The Web of Science result similarly contained two articles that could not be retrieved after screening of the abstract. 38 articles were ultimately included. A full list of all the publications selected in the search and the additional publications can be found in Appendix A figures A.1 and A.2. The relevant publications could be clustered by their scopes (table 4.1). These groups were selected using the keywords provided by the authors of the articles.

The number of publications per discipline could be quantified based on the type of journal a particular publication was published in. Figure 4.2 shows this in the form of a pie-chart as percentages of published articles by discipline. The field of Medical Informatics encompasses 36.8% of the publications. Biomedical Science and Health information technology publications are both equally represented at 13.2%. Computer Science related publications are present at 10.5%. The fields of Medicine, Health Informatics, Systems engineering, Biomedical Engineering, Computational Science and Public Health make up the remaining 26.3%.

The type of stakeholders could be identified by categorising the funder per study (figure 4.3). This was done to illustrate the possibility of bias as well as to gain a sense of the type of stakeholders. Fifteen publications were funded by public sector stakeholders. Three

(46)

30 Results

(47)

4.1 Scoping Review 31

Table 4.1 Included Publications per Cluster Category Publications Interoperability Ali 2017 Legaz-Garcia 2016 Marcos 2013 Martinez-Salvador 2017 Padgham 2016 Staudigel 2017 EHR Camara 2016 Demski 2016 Evans 2016 Meehan 2016 CDS Castaneda 2015 Duggal 2015 James 2017 Menekse 2015 Middleton 2016 Oh 2015 Samal 2017

(SMART on) FHIR

Bender 2013 Bloomfield 2017 Bosl 2013 Clotet 2017 Kasthurirathne 2015 Mandel 2016 Sinha 2017 Ulrich 2016 AI in Medicine Dewan 2015 Hung 2017 Li 2015 Yang 2017 Yoon 2017 Ethics Powles 2017 Standards Ash 2015 Blobel 2013 Cornet 2016 Meaningful use Anani 2016 Delaney 2015 Ethier 2016 Haarbrandt 2016

(48)

32 Results

Fig. 4.2 Pie chart showing percentage of published articles by discipline for the included publications.

(49)

4.2 Clinical Decision Support Models 33

publications were funded by private sector stakeholders. Fourteen publications did not report funding data. Six publications reported not having received special funding.

Of the included publications, most focus only on the concept they are listed under in table 4.1. Several publications incorporate multiple categories in their scope. Bosl et al [37] created a web application using the SMART on FHIR platform to provide scalable decision support for medication adherence. Hung et al [38] compared traditional machine learning algorithms with deep learning algorithms for future stroke occurrence using EHR data. They showed that deep learning based algorithms improve predictive power over more traditional algorithms. Castenada et al [39] investigate the effects of improved data integration via CDSS on patient outcomes, showing a need for AI systems that are able to integrate data and information into clinically relevant knowledge. Marcos et al [19] used a dual-model methodology using a reference model and archetypes to deal with interoperability issues of CDSS and EHRs. They show that this provided a satisfactory approach to achieve interoperability between CDSS and EHRs. Duggal et al [40] studies patient matching from disparate systems through scalable fuzzy matching algorithms for improved clinical decision support. None of the included publications investigated the possibility of creating reusable and interoperable intelligent decision support systems.

4.2 Clinical Decision Support Models

4.2.1 Wisconsin Model Architecture and Performance

The decision tree that was generated for classification of the Wisconsin Breast Cancer dataset can be seen in figure 4.4. This shows the general architecture of the tree that was generated, as well as the features that were most important for inducing branches in the tree. The tree contained 12 branches and 24 nodes. It was colour coded with orange representing a node with predominantly Malignant instances, blue representing predominantly Benign instances, and beige representing a (nearly) equal amount of Malign and Benign of instances.

The neural network that was created for the same dataset can be seen in figure 4.5. The key difference between this model and the tree model was that in this model the architecture was predetermined and not self-generated. Starting from the bottom, the first layer is the dark blue ’dense_5’ (dense meaning fully connected) layer and functions as the input layer. The green second and third fully connected layers are the hidden layers that perform the bulk of the computations. The turquoise top fully connected layer was the output layer with a sigmoid activation function that forced a binary output for every input instance i.e. M for malignant, B for benign.

(50)

34 Results

Fig. 4.4 Decision Tree classifier for the Wisconsin Breast Cancer dataset.

Fig. 4.5 Data-driven Neural Network classifier Architecture for the Wisconsin Breast Cancer dataset.

(51)

Table 4.2 Performance metrics for the Wisconsin dataset decision tree and neural network models

Metric Decision Tree Neural Network Accuracy 0.94(±0.1) 0.92(±0.1) Precision 0.94(±0.1) 0.95(±0.1) Recall 0.94(±0.1) 0.83(±0.3) F1-Score 0.94(±0.1) 0.89(±0.3)

Table 4.2 shows the performance metrics for the decision tree and the neural network.

4.2.2 EyePacs Model Architecture and Performance

The architecture for the baseline model of the EyePacs Retinopathy dataset is shown in figure 4.6. This architecture was selected after finding optimal values for the hyperparameters. This network was more complex than the network shown in figure 4.5 due to the nature of the image data of the EyePacs data. It contains multiple convolutional layers (with their activation functions shown as separate layers), pooling layers to sample the output of the convolutions, drop-out layers to prevent over-fitting, and the EyePacs specific classification block. This block consisted of a Flatten layer to resize the data to a 1D vector for easier more efficient processing, another drop-out layer to prevent over-fitting, a fully connected hidden layer, and an output layer with a softmax activation that forces an output of 5 categories.

The architecture of the VGG16 model is larger and more complex than the previous networks seen in figures 4.5 and 4.6. It contained 5 blocks of either 2 or 3 convolutional layers, followed by pooling layers. The use-case specific classification layer was built on top of the 5 block layers. The top right of the images showed the original classification layer with a flattening layer, two hidden dense layers ’fc1’and ’fc2’ with 4096 units each, and a prediction layer with a softmax activation function that forces the output into 1000 classes. The retinopathy classification block on the top-left side contained a ’basic’ sequence of reshaping-(in this case similar to flattening into a 1d vector), hidden-, drop-out-, and output layers.

The performance of the ’Generic’ Convolutional neural network model (Baseline) and the VGGNet-16 based model (Transfer Learning Initialisation) could be shown in terms of the training set losses and accuracies, validation set losses and accuracies, test set losses and accuracies, and evaluation times. The accuracy was defined as the percentage of the fraction of correct classifications over the incorrect classifications. The type of loss, the measure of the difference between the state of the model and the labelled truth, was the categorical

(52)

cross-36 Results

Fig. 4.6 Data-driven ’Generic’ Convolutional Neural Network Classifier Architecture for the EyePacs Retinopathy data.

(53)

Fig. 4.7 Transfer learning VGG16-based Convolutional Neural Network Classifier Architecture for the EyePacs Retinopathy data.

(54)

38 Results

entropy. This is a generalised variant of the log loss where the loss decreases as the predicted probability approaches the actual labelled truth.

Figures 4.8 and 4.9 show the loss and accuracies for both the Baseline and the Transfer Learning model. The loss decreased with subsequent epochs while the accuracy increased with each subsequent epoch. The Baseline model starts with a higher loss which quickly dove under the Transfer learning initialised model loss. The Transfer learned model started with a loss that is nearly half the size of the Baseline model that remains relatively constant. The accuracies of both models portrayed a similar shape during the training phase.

Fig. 4.8 Training losses per epoch for the baseline model and the transfer learned model.

The validation loss and accuracy that were evaluated upon the completion of every training epoch showed similar lines in figures 4.10 and 4.11. The loss value fluctuates for both models. The Transfer Learning Initialised model started at a slightly lower loss but eventually has a slightly higher loss value upon completion of training. Both models achieved equal validation accuracies after 8 epochs.

The final evaluation of both loss and accuracy values on the test set data can be seen in figure 4.12. The models showed very similar results with near equal accuracies, 74.78% (± 0,02 95 % CI) for the baseline model versus 74.84% (± 0,02 95 % CI) for the transfer learning initialised model. The model losses also showed similar results with a value of 0.77 for the baseline model, and 0.75 for the transfer learning initialised model. A more detailed breakdown of the performance can be found in Appendix E.

Performance wise, the total training time was 13 minutes for the baseline model, and 20.5 minutes for the transfer learning model. Both models were trained using the same Tesla K80

(55)

Fig. 4.9 Training accuracy per epoch for the baseline model and transfer learned model.

(56)

40 Results

Fig. 4.11 Validation set accuracy per epoch for the baseline model and transfer learned model.

Fig. 4.12 Model losses and accuracies for both the baseline model and the transfer learning initialised model

(57)

GPU. The effects of the test batch size on the duration of the classification can be seen in figure 4.14.

Fig. 4.13 Model testing performances in wall time for increasing image batch sizes (log scaled).

The progression of ’learning’ done by the models could be visualised by plotting the input image that maximises the output for a given layer in the network. Figure 4.14 shows the visualisation of the 64 filters that are present in the first 2D convolutional layer of the model (see fig. 4.7 ’block1_conv). As one of the earliest layers of the retinopathy specific classification layer, the models’ understanding of the fundus images was abstract, but concepts of pixel clusters and edge detection are already present given the pre-trained nature of this network. Figure 4.15 shows the visualisation of the 32 filters in the last layer prior to the classification layer. Here several filters in which distinct eye-shapes are present can be seen, indicating that the model has captured the visual features that make up a funduscopy image in multiple filter outputs.

In contrast to the perception of neural networks being black boxes, visualisation techniques that provide insights into the workings of the decision made by neural networks have recently surfaced. Saliency[41] is one of these techniques which when applied to the model provides the image shown in figure 4.16. saliency looks at the gradient of the output categories in the final layer of the model with regard to the given input image to assess a pixel by pixel influence on the output. A heat map of sorts can then be made showing for every input image showing the most influential pixel regions that determine the classification output.

(58)

42 Results

Fig. 4.14 Visualisation of the 64 filters in the first convolutional layer of the transfer learning initialisation model.

Fig. 4.15 Visualisation of the 32 filters in the second-to-last layer of the transfer learning initialisation model.

(59)

Fig. 4.16 Using the gradient of the output category with respect to the input image provides pixel-specific influence on the output. The images on the right show the salient image regions for the images on the left.

(60)

44 Results

4.3 FHIR Resource

A logical model of a prototype FHIR resource was made to capture both the architecture as well as the knowledge (weights) of the created models [42]. It is based on the structure of the hierarchical data format [43]. This model represents abstract data models that describe extensions to the core FHIR resources. When mapping models to this resource structured data exchange of trained artificial neural networks could theoretically be achieved. The rendered version of this model can be found in figures D.1 and D.2 in appendix D. This FHIR resource could capture both the architecture of a model as well as the knowledge in the form of the weights values. The model was not implemented or tested using the models created in this study, as this lay outside of the defined scope of this study.

(61)

Chapter 5 Discussion

The Discussion section contains the main findings of the scoping review and model creation. The strengths and weaknesses of both the scoping study and the model creation and evaluation process will be discussed. The findings and strengths and weaknesses will be considered in relation to other works. The clinical impact of the models will also be discussed. Finally, the implications that arise from this study will be discussed.

5.1 Main Findings

A scoping review was conducted through a systematic search that included 38 publications, with 19 additional publications used for supporting information. The results show an increasing number of publications on CDSS in recent years. This argument can be further supported by the creation of an ISO standard for CDSS that is currently under development[44]. The publications in the scoping review give several examples of improved interoperability through the incorporation of several new technologies. A case is made for web technologies, modular and shareable CDSS, as well as incorporating data-driven methods for automated creation of clinical knowledge.

The scoping review reveals that there are large differences in the acceptance of standards in the healthcare domain[45,5,46,4]. HL7 Project workgroups such as the Argonaut project [26] appear promising in potentially popularising one standard by having major companies such as Apple, Microsoft, and Google on board together with major healthcare vendors such as Epic and Cerner. These workgroups have selected HL7 FHIR as the interoperability standard for healthcare data exchange. Together they create profiles based on the available Resources. Apple has recently published their Health Record API which allows patients to retrieve health record data via FHIR Resources[47] to offer health record data access directly to iOS mobile devices. REST-style architecture can improve interoperability through the creation of APIs to

Data-Driven Modular Clinical Decision Support