Development and evaluation of a semi-automated annotation process for electronic Case Report Forms

(1)

Development and

evalua on of a

semi-automated

annota on process

for electronic Case

Report Forms

Mar jn Gerard Kersloot

Master thesis

Medical Informa cs

August 2019

(2)

Martijn Gerard Kersloot December 2018 - August 2019

Student no.10718559

m.g.kersloot@amsterdamumc.nl

Places of the scientific research project

Castor EDC Amsterdam UMC, location AMC

Paasheuvelweg 25-5D Department of Medical Informatics

1105 BP Amsterdam Meibergdreef 9 1105 AZ Amsterdam

Mentors

Sebastiaan Knijnenburg, PhD Derk Arts, PhD

Chief Technology Officer Chief Executive Officer

Castor EDC Castor EDC

sebastiaan@castoredc.com derk@castoredc.com

Tutors

Ronald Cornet, PhD prof. Ameen Abu-Hanna

Associate professor Prof. of Medical Informatics and PI

Amsterdam UMC, location AMC Amsterdam UMC, location AMC

r.cornet@amsterdamumc.nl a.abu-hanna@amsterdamumc.nl

Cover and title page design

(3)

Development and evalua on of a

semi-automated annota on process for

electronic Case Report Forms

Master thesis

Master of Medical Informa cs

University of Amsterdam

(4)

(5)

Summary

Background

Medical research contributes to the knowledge, cure, and prevention of diseases. Therefore, the results from clinical studies need to be shared and compared to one another. It is, however, estimated that 80 percent of the data that researchers collect and share are ‘re-useless’, since most datasets and their metadata are not Findable, Accessible, Interoperable, and Reusable (FAIR). This emphasizes the need for software platforms to support researchers in the FAIRification of their research data at the source: the electronic Case Report Form (eCRF). This study aims to develop and evaluate a web-based appli-cation that allows researchers to annotate their eCRF with ontology concepts (terminology binding) suggested by a Natural Language Processing (NLP) system and assesses the quality of the suggestions and final annotations based on methods found in the literature.

Methods

To determine the methods available for performing and evaluating the task of terminology binding, two state-of-the-art reviews were carried out. A suggestion application was developed that extracts eCRF questions from Electronic Data Capture (EDC) platform Castor EDC and provides SNOMED CT concept and LOINC code suggestions using BioPortal and local databases. Researchers in the EDC platform with projects containing more than five most-used questions were invited to use the system and annotate their eCRF questions. These annotations were assessed using the evaluation methods found in the literature and by the rating of two evaluators.

Results

There are three different methods for performing terminology binding in clinical research, ranked in order of preference: the reuse of annotations, semi-automated annotation, and manual annotation. The outcomes of terminology binding can be evaluated using binary evaluation and hierarchical evaluation. Sixteen research projects from four centers were included in the evaluation study with a total amount of 225 eCRF questions. The quality of the suggestions provided by our suggestion tool is rather average or low, with an overall F1 score of 0.616. However, the final annotations by researchers are of good quality with 80.63 % of value-ranked annotations rated as good.

Conclusion

Medical researchers are willing to make their research data FAIR and our methods have proven that semi-automated terminology binding is an effective and efficient method to assist them in that. While the suggestions given by the suggestion algorithm are of an average or low quality, researchers still select concepts that can be qualified as good. Future research is needed to improve the quality of the suggestions.

Keywords

Terminology binding, electronic Case Report Forms, FAIR Data, Ontology concepts, Natural Language Processing

(6)

(7)

Samenva ng

Achtergrond

Medisch onderzoek draagt bij aan de kennis, genezing en preventie van ziekten. Daarom moeten de re-sultaten van klinische onderzoeken worden gedeeld en met elkaar worden vergeleken. Er wordt echter geschat dat 80 procent van de gegevens die onderzoekers verzamelen en delen niet herbruikbaar is, aangezien de meeste datasets en hun metadata niet Findable, Accessible, Interoperable, and Reusable (FAIR) zijn. Softwareplatforms dienen onderzoekers daarom te ondersteunen bij het FAIR maken van onderzoeksgegevens bij de bron: het electronic Case Report Form (eCRF). Deze studie heeft als doel een webgebaseerde applicatie te ontwikkelen en te evalueren waarmee onderzoekers hun eCRF kun-nen annoteren met ontologieconcepten (terminologiebinding) voorgesteld door een Natural Language Processing (NLP) systeem. De kwaliteit van de suggesties en definitieve annotaties worden beoordeeld met behulp van methoden gevonden in de literatuur.

Methoden

Om de methoden te bepalen waarmee de taak van terminologiebinding wordt uitgevoerd en geëval-ueerd, werden er twee state-of-the-art reviews uitgevoerd. Daarnaast is er een toepassing ontwikkeld die eCRF vragen extraheert uit Electronic Data Capture (EDC) platform Castor EDC en SNOMED CT concepten LOINC-codesuggesties biedt met behulp van BioPortal en lokale databases. Onderzoekers in het EDC-platform met projecten met meer dan vijf meest gebruikte vragen werden uitgenodigd om het systeem te gebruiken en hun eCRF-vragen te annoteren. Deze annotaties werden beoordeeld met behulp van de evaluatiemethoden uit de literatuur en door de beoordeling van twee beoordelaars.

Resultaten

Er zijn drie verschillende methoden voor het uitvoeren van terminologiebinding in klinisch onderzoek, gerangschikt in volgorde van voorkeur: het hergebruik van annotaties, semi-automatische annotaties en handmatige annotaties. De resultaten van terminologiebinding kunnen worden geëvalueerd met behulp van binaire evaluatie en hiërarchische evaluatie. Zestien onderzoeksprojecten uit vier centra werden opgenomen in de evaluatiestudie met in totaal 225 eCRF vragen. De kwaliteit van de suggesties van onze suggestietool is gemiddeld tot laag met een algemene F1-score van 0,616. De annotaties van onderzoekers zijn echter van goede kwaliteit met 80,63 % van de annotaties beoordeeld als goed.

Conclusie

Medische onderzoekers zijn bereid om hun onderzoeksgegevens FAIR te maken en onze methoden hebben bewezen dat semi-geautomatiseerde terminologiebinding een effectieve en efficiënte methode is om hen daarbij te helpen. Hoewel de suggesties van het suggestie-algoritme van gemiddelde tot lage kwaliteit zijn, selecteren onderzoekers nog steeds concepten die als goed kunnen worden beordeeld. Toekomstig onderzoek is nodig om de kwaliteit van de suggesties te verbeteren.

(8)

(9)

List of abbrevia ons

API Application Programming Interface CRF Case Report Form

DMP Data Management Plan eCRF electronic Case Report Form EDC Electronic Data Capture

FAIR Findable, Accessible, Interoperable, and Reusable IC information content

LCH Leacock and Chodorow lcs Least Common Subsumer NLP Natural Language Processing

(12)

(13)

1

General introduc on

1.1. Background

1.1.1. Medical research data

Medical research contributes to the knowledge, cure, and prevention of illnesses and diseases, and im-pacts daily clinical practice. Therefore, the results from clinical studies need to be shared and compared to one another to support efficient, evidence-based medicine [1] and to ensure progress in science [2]. Researchers and knowledge consumers are increasingly aware of this need [3], and a recent survey from Springer Nature among researchers from a variety of fields shows that 61 percent of the medi-cal researchers share data in some way [4]. However, it is estimated that 80 percent of the data that researchers collect and share are ‘re-useless’ since most datasets are not machine-actionable (i.e., data that can be resolved on the web by web services [5]), nor machine-readable (i.e., data in a data format that can be automatically read and processed by a computer [6]) [7]. Making datasets machine-readable by standardizing them with metadata (i.e., data that describes and gives information about other data, for example listing the medical specialty where the dataset applies to), and sharing them, has many advantages. It can, for example, help to identify selective reporting and fraud, calculate and interpret pooled effect estimates, and understand and use datasets in the future [8, 9]. Moreover, standardized research data allows for combining datasets for single analysis studies, meta-analyses, and parameter modeling [9].

1.1.2. Electronic Case Report Forms

Data collection in clinical research is mostly structured and facilitated by the use of Case Report Forms (CRFs): printed, optical, or electronic documents designed to record all of the protocol required mation to be reported on each subject participating in the research project (e.g., demographic infor-mation and lab measurements) [10]. CRFs consist of multiple questions and are often subdivided into different sections. The development of CRFs forms a significant part of clinical trials and can affect study success, especially since the structure of the data that is collected on the CRFs has an impact on the interpretation and analysis of the collected data [11, 12]. Nowadays, the majority of the CRFs are implemented in Electronic Data Capture (EDC) systems as electronic Case Report Forms (eCRFs), making eCRFs the preferred method of data collection in clinical research [13–15]. An example eCRF is presented in Figure 1.1. The use of eCRFs over paper-based CRFs has many advantages, among which the improvement of the quality of clinical trials, halting the development of ineffective or unsafe drugs earlier, reducing unnecessary work, reducing cost, and accelerating time to market of new drugs [16–23].

(14)

Figure 1.1 | An example electronic Case Report Form (eCRF) in Electronic Data Capture (EDC) pla orm Castor EDC

1.1.3. FAIR Data

In 2016, a diverse group of researchers defined guidelines to enhance the reusability of research data: the FAIR Data Principles [24]. These principles state that (research) data should be Findable, Accessible, Interoperable, and Reusable (FAIR), both for researchers and machines. Funders, such as the European Union’s Horizon 2020 [25] and The Dutch Research Council (NWO) [26], require researchers to put an effort in making their data FAIR (which includes the standardization of their data) and require them to include their methodology in a Data Management Plan (DMP) when they apply for grants. FAIR Data accelerates innovation due to the primary use and secondary reuse of data and can, for example, reduce the time from drug discovery to market value by shortening the time for performing clinical trials [27]. It, moreover, allows for the development of more-segmented or more-personalised medicines by exploiting FAIR real-world data to match best treatment to relevant patient cohorts, and enables data sharing and collaborations across institutions and companies, in academia and in the industry [27].

The process of making data FAIR (FAIRification) consists of seven main steps [28], as shown in Figure1.2. First, one retrieves data that needs to be made FAIR (1) and analyses the content of the data (2) in order to determine the concepts and relationships present in the data. Next, one finds a machine-readable interpretation of the concepts and relationships and transform this into a semantic model that is machine-readable and machine-actionable (3). This resulting semantic model is then used to make the data linkable (4), to ensure machine-readability. After that, the researcher assigns a license for the usage of the data (5) and assigns metadata (e.g., information about the research project and dataset) to it (6). Lastly, the researcher deploys their data to a data repository (7) by uploading the dataset with the corresponding metadata. This research project focuses on steps 2 and 3 of the FAIRification process.

Analyse retrieved data

Deﬁne semantic model

Make data

linkable Assign license Deﬁne metadata Deploy data

Combine with other FAIR Data /

Query Questions across multiple sources Retrieve non-FAIR data 1 2 3 4 5 6 7

Figure 1.2 | The process of making data FAIR

(15)

1.1.4. Terminological systems

Data on eCRFs can be made readable and actionable, and thus FAIR, by using machine-readable definitions provided in medical terminological systems and attaching those definitions to questions on the eCRF. This process is known as terminology binding.

1.1.4.1. Types of terminological systems

A terminological system relates concepts, of a particular domain, among themselves and provides their terms and possibly their definitions and codes [30]. The different types of terminological systems are described in Box1.1. Often the term ontology is used to describe a terminological system. Ontologies provide, besides explicit formal specifications of terms, formalized relationships among terms in a do-main [31].

Terminology List of terms referring to concepts in a defined particular domain

Thesaurus Terms are ordered, e.g., alphabetically and concepts are described by more than one (synonymous) term

Vocabulary Concepts have definitions, either formally or in free text

Nomenclature A set of rules for composing new complex concepts or the terminological system resulting from this set of composition rules

Classification Concepts are arranged using generic (is a) relationships

Coding system Codes designate concepts

Box 1.1 | Types of terminological systems, as described in [30]. Each terminological system is a terminology and possibly one of the other types.

This research project focuses on the terminological systems SNOMED CT [32], a coding system, vocab-ulary, classification, and thesaurus, and LOINC [33], a coding system and vocabulary.

1.1.4.2. Classes and codes

Classes, referred to as concepts in SNOMED CT, are the focus of most ontologies and describe the concepts in the domain of the ontology [34]. A class can have subclasses that represent concepts that are more specific than the superclass [34]. Figure1.3shows the hierarchical concept diagram of SNOMED CT concept 271649006 | Systolic blood pressure (observable entity) |. The concept Systolic blood pressure (b) is defined by its superclasses or ancestors (e.g., Cardiovascular observable (observable entity), a) and is defining several subclasses or children (e.g., Lying systolic blood pressure, c).

Observable entity

... Clinical history / examination observable Cardiovascular observable

Blood vessel feature

Cardiovascular measurement

Blood pressure

Vascular measurements

Systolic blood pressure

Sitting systolic blood pressure

Standing systolic blood pressure

Systolic aterial pressure

Systolic blood pressure of neonate at birth

Target systolic blood pressure

24 hour systolic blood pressure

Baseline systolic blood pressure Systolic blood pressure on admission Lying systolic blood pressure

(b) class

(c) subclasses (a) superclasses

Figure 1.3 | The hierarchical concept diagram of SNOMED CT concept 271649006 | Systolic blood pressure (ob-servable en ty) |, adapted from a diagram generated by [35].

LOINC consists of codes for each test, measurement, or observation that has a clinically different mean-ing [36]. Each code is distmean-inguished across six dimensions called Parts. Box1.2shows the definition

(16)

of the different Parts accompanied by an example Parts of LOINC code 806-0 (Leukocytes [#/volume] in Cerebral spinal fluid by Manual count).

Component The substance or entity being measured or observed. e.g., Leukocytes(white blood cells)

Property The characteristic or attribute of the analyte. e.g., NCnc(Number concentration)

Time The interval of time over which an observation was made. e.g., Pt(Point in time)

System The specimen or thing upon which the observation was made. e.g., CSF(Cerebral spinal fluid)

Scale How the observation value is quantified or expressed: quantitative, ordinal, nominal. e.g., Qn(Quantitative)

Method Optional | A high-level classification of how the observation was made.

Only needed when the technique affects the clinical interpretation of the results. e.g., Manual Count

Box 1.2 | Deﬁni ons and examples of LOINC Parts using LOINC code 806-0 (Leukocytes [#/volume] in Cerebral spinal ﬂuid by Manual count), as described in [36].

1.1.5. Terminology binding

Terminology binding, often referred to as annotating, is the task of providing links (annotations) between the information model, in this case, the model describing the definitions of and relationships between eCRF questions, and the terminological system [37]. It is an important part of supporting data capture, retrieval and querying, and semantic interoperability [7, 37]. During the annotation process of eCRF questions, one aims to find one or more terms from terminological systems that match the description and context of the question, in order to provide the question with an unambiguous machine-readable definition. For example, the eCRF question Leukocyte count can be annotated with LOINC code 806-0 (Leukocytes [#/volume] in Cerebral spinal fluid by Manual count). Terminology binding is discussed in detail in Chapter2.

1.1.6. Retrospec ve and prospec ve FAIRiﬁca on

Currently, when researchers aim to make their research data FAIR, they annotate the final dataset, for example, in the form of a spreadsheet, with ontology concepts. This retrospective annotation process takes place after the medical research project is completed (Figure 1.4). A researcher designs their project, defines and submits a study protocol, builds a Case Report Form, conducts their study to collect data, and analyses the results. This results in a non-FAIR dataset that can be made FAIR by going through the FAIRification process as described in1.1.3.

Design research

project Deﬁne and submitstudy protocol Report Form Build Case Conduct study Analyze results Non-FAIR Dataset

Deﬁne

semantic model Make data linkable Deﬁne metadata Deploy data

Combine with other FAIR Data / Query Analyse

dataset

Figure 1.4 | Retrospec ve FAIRiﬁca on: the process of a medical research project (gray), followed by the process of making the resul ng dataset FAIR (white)

FAIR: Findable, Accessible, Interoperable, and Reusable.

However, since most steps contain coinciding tasks, it would be more efficient to incorporate the FAIRification process in the research project, and thus making it a prospective process (Figure1.5).

(17)

While a researcher designs their project and defines and submits the study protocol, the metadata of the study are already available. Moreover, during the definition and submission of the study protocol and the building of the Case Report Form, the concepts and relationships between concepts are already known and can be defined in a semantic model. While the study is conducted and data is being collected in, preferably, the eCRF, the data could be made linkable by the eCRF system. This ensures that at the end of the data collection period, the data is already FAIR. The researcher can then analyze the results of the study and also deploy the dataset, allowing other researchers to reuse their data.

Design research

project Deﬁne and submitstudy protocol Report Form Build Case Conduct study Dataset FAIR

Deﬁne metadata Make data linkable

Combine with other FAIR Data / Query Deﬁne semantic model

Analyze results

Deploy data

Figure 1.5 | Prospec ve FAIRiﬁca on: the process of making a dataset FAIR (white) throughout the process of a medical research project (gray)

FAIR: Findable, Accessible, Interoperable, and Reusable.

1.1.7. Empowering researchers in the FAIRiﬁca on process

While prospective annotation is the most efficient solution for making data FAIR, the FAIRification process in general is still time-consuming and health care professionals need proper training to use ontologies and to make their data FAIR. A recent study by Leiden University and Elsevier shows that only 25 percent of the researchers state that they have received sufficient training in research data shar-ing [38]. In addition, 40 percent of the researchers have problems in organizshar-ing their data for sharshar-ing, according to Springer Nature’s survey [4]. This emphasizes the need for software platforms to support researchers in the standardization and sharing of their research data [39]. One of the techniques that could be leveraged for this purpose is Natural Language Processing (NLP).

1.1.8. Natural Language Processing

NLP is a technique to process free text and can, for example, be used to identify ontology concepts relat-ing to a free-text phrase. In a study by Christen et al., NLP is demonstrated to be effective in generatrelat-ing valuable ontology concept recommendations for questions in medical forms [40]. These recommenda-tions can then be used to annotate the question in the form. Integrating such a recommendation plugin in existing eCRF software would enable researchers to annotate their CRFs while building them, saving extra work and time. There is no research, to our knowledge, that integrates such a recommendation plugin with an eCRF system.

1.2. Objec ves and research ques ons

This study aims to identify the methods and evaluation measures used for terminology binding in clin-ical research. These methods and measures were used to develop and evaluate a web-based application that allows researchers to annotate their eCRF with ontology concepts recommended by an NLP system and thus making their eCRF data more FAIR. We address the following four distinct research questions.

1. What is the preferred method for performing terminology binding in clinical research? 2. What measures can be used to assess the quality of ontology concept annotations?

3. What is the quality of ontology concept suggestions provided by a developed algorithm to encode free-text eCRF questions?

4. What is the quality of the annotations of eCRF questions made by researchers that used the on-tology concept suggestion algorithm?

(18)

1.3. Outline

This thesis is structured in three main parts. In the first two chapters, the state-of-the-art methods for the task of and evaluation of terminology binding were assessed. Chapter2covers the state of the art methods for terminology binding and aims to determine which method is preferred. Chapter3 dis-cusses the evaluation methods for the results of the task of terminology binding. The results of these two chapters are then used in chapter4 for the development and evaluation of a web-based annota-tion applicaannota-tion. Chapter5discusses our main findings in relation to the literature and provides a conclusion.

References

[1] Valkenhoef GV, Tervonen T, Brock BD, Hillege H. Deficiencies in the transfer and availability of clinical trials evidence: A review of existing systems and standards. BMC Medical Informatics and Decision Making. 2012;12(1).

[2] Tenopir C, Dalton ED, Allard S, Frame M, Pjesivica I, Birch B, et al. Changes in Data Shar-ing and Data Reusue Practices and Perceptions Among Scientistis Worldwide. PloS one. 2015;10(8):e0134826.

[3] Borgman CL. The conundrum of sharing research data; 2012.

[4] Stuart D, Baynes G, Hrynaszkiewicz I, Allin K, Penny D, Lucraft M, et al. Practical Challenges for Researchers in Data Sharing. Whitepaper. 2018;p. 30. Available from: https://figshare.com/

articles/Whitepaper_Practical_challenges_for_researchers_in_data_sharing/5975011.

[5] Starr J, Castro E, Crosas M, Dumontier M, Downs RR, Duerr R, et al. Achieving human and ma-chine accessibility of cited data in scholarly publications. PeerJ Computer Science. 2015;.

[6] Open Knowledge Foundation. Open Data Handbook - Machine Readable; 2018. Available from:

http://opendatahandbook.org/glossary/en/terms/machine-readable/.

[7] Mons B, Neylon C, Velterop J, Dumontier M, Da Silva Santos LOB, Wilkinson MD. Cloudy, increas-ingly FAIR; Revisiting the FAIR Data guiding principles for the European Open Science Cloud. Information Services and Use. 2017;.

[8] Chan AW, Song F, Vickers A, Jefferson T, Dickersin K, Gøtzsche PC, et al. Increasing value and reducing waste: Addressing inaccessible research. The Lancet. 2014;383(9913):257–266.

[9] Pasquetto IV, Randles BM, Borgman CL. On the Reuse of Scientific Data. Data Science Journal. 2017;16(Borgman 2015):1–9.

[10] International Council for Harmonisation of Technical Requirements for Pharmaceuticals for Hu-man Use (ICH). Good Clinical Practice: Integrated Addendum to ICH E6(R1); 2016. Available from: https://www.ich.org/fileadmin/Public_Web_Site/ICH_Products/Guidelines/Efficacy/

E6/E6_R2__Step_4_2016_1109.pdf.

[11] Nahm M, Shepherd J, Buzenberg A, Rostami R, Corcoran A, McCall J, et al. Design and implemen-tation of an institutional case report form library. Clinical Trials. 2010 12;8(1):94–102. Available from:https://doi.org/10.1177/1740774510391916.

[12] Bellary S, Krishnankutty B, Latha MS. Basics of case report form designing in clinical research. Perspectives in clinical research. 2014 10;5(4):159–66. Available from: http://www.ncbi.nlm.nih.

gov/pubmed/25276625.

[13] El Fadly A, Rance B, Lucas N, Mead C, Chatellier G, Lastic PY, et al. Integrating clinical research with the Healthcare Enterprise: From the RE-USE project to the EHR4CR platform. Journal of Biomedical Informatics. 2011 12;44:S94–S102. Available from: http://www.ncbi.nlm.nih.gov/

(19)

[14] Aiello EJ, Taplin S, Reid R, Hobbs M, Seger D, Kamel H, et al. In a randomized controlled trial, patients preferred electronic data collection of breast cancer risk-factor information in a mammography setting. Journal of Clinical Epidemiology. 2006 1;59(1):77–81. Available from:

http://www.ncbi.nlm.nih.gov/pubmed/16360564.

[15] Pyke-Grimm KA, Kelly KP, Stewart JL, Meza J. Feasibility, Acceptability, and Usability of Web-Based Data Collection in Parents of Children With Cancer. Oncology Nursing Forum. 2011 7;38(4):428–435. Available from:http://www.ncbi.nlm.nih.gov/pubmed/21708533.

[16] Le Jeannic A, Quelen C, Alberti C, Durand-Zaleski I, CompaRec Investigators. Comparison of two data collection processes in clinical studies: electronic and paper case report forms. BMC Medical Research Methodology. 2014 12;14(1):7. Available from:http://www.ncbi.nlm.nih.gov/pubmed/

24438227.

[17] Mitchel JT, Kim YJ, Choi J, Park G, Cappi S, Horn D, et al. Evaluation of Data Entry Errors and Data Changes to an Electronic Data Capture Clinical Trial Database. Drug Information Journal. 2011 7;45(4):421–430. Available from:http://www.ncbi.nlm.nih.gov/pubmed/24058221. [18] Thriemer K, Ley B, Ame SM, Puri MK, Hashim R, Chang NY, et al. Replacing paper data

collec-tion forms with electronic data entry in the field: findings from a study of community-acquired bloodstream infections in Pemba, Zanzibar. BMC Research Notes. 2012 12;5(1):113. Available from:

[19] Walther B, Hossin S, Townend J, Abernethy N, Parker D, Jeffries D. Comparison of Electronic Data Capture (EDC) with the Standard Data Capture Method for Clinical Trial Data. PLoS ONE. 2011 9;6(9):e25348. Available from:http://www.ncbi.nlm.nih.gov/pubmed/21966505.

[20] Wildeman MA, Zandbergen J, Vincent A, Herdini C, Middeldorp JM, Fles R, et al. Can an online clinical data management service help in improving data collection and data quality in a devel-oping country setting? Trials. 2011 12;12(1):190. Available from:http://www.ncbi.nlm.nih.gov/

pubmed/21824421.

[21] Thwin SS, Clough-Gorr KM, McCarty MC, Lash TL, Alford SH, Buist DS, et al. Automated inter-rater reliability assessment and electronic data collection in a multi-center breast cancer study. BMC Medical Research Methodology. 2007 12;7(1):23. Available from:http://www.ncbi.nlm.nih.

gov/pubmed/17577410.

[22] Kinnula S, Renko M, Tapiainen T, Pokka T, Uhari M. Post-discharge follow-up of hospital-associated infections in paediatric patients with conventional questionnaires and electronic surveillance. Journal of Hospital Infection. 2012 1;80(1):13–16. Available from:http://www.ncbi.

nlm.nih.gov/pubmed/22036627.

[23] Fleischmann R, Decker AM, Kraft A, Mai K, Schmidt S. Mobile electronic versus paper case re-port forms in clinical trials: a randomized controlled trial. BMC Medical Research Methodology. 2017;17(1):153. Available from:https://doi.org/10.1186/s12874-017-0429-y.

[24] Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific Data. 2016;3:160018. Available from: http://www.pubmedcentral.nih.gov/articlerender.fcgi?artid=

4792175&tool=pmcentrez&rendertype=abstract.

[25] European Commission. Guidelines on Data Management in Horizon 2020. 2013;(Decem-ber):6. Available from: http://ec.europa.eu/research/participants/data/ref/h2020/grants_

manual/hi/oa_pilot/h2020-hi-oa-data-mgt_en.pdf.

[26] The Netherlands Organisation for Scientific Research (NWO). Data management protocol;. Avail-able from:https://www.nwo.nl/en/policies/open+science/data+management.

(20)

[27] Wise J, de Barron AG, Splendiani A, Balali-Mood B, Vasant D, Little E, et al. Implementation and relevance of FAIR data principles in biopharmaceutical R&D. Drug Discovery Today. 2019 1;Available from:https://www.sciencedirect.com/science/article/pii/S1359644618303039?via%

3Dihub.

[28] GO FAIR. FAIRification Process;. Available from: https://www.go-fair.org/fields-of-action/

go-build/fairification-process/.

[29] GO FAIR. FAIRification Process [Image];. Available from:https://www.go-fair.org/wp-content/

uploads/2017/11/FAIRificationProcess-1.png.

[30] de Keizer NF, Abu-Hanna A, Zwetsloot-Schonk JH. Understanding terminological systems. I: Ter-minology and typology. Methods of information in medicine. 2000 3;39(1):16–21. Available from:

[31] Gruber TR. A translation approach to portable ontology specifications. Knowledge Acquisi-tion. 1993 6;5(2):199–220. Available from: https://www.sciencedirect.com/science/article/pii/

S1042814383710083.

[32] SNOMED International. SNOMED CT; 2019. Available from: http://www.snomed.org/

snomed-ct.

[33] Regenstrief Institute I. LOINC; 2019. Available from:https://loinc.org/.

[34] Noy NF, Mcguinness DL. Ontology Development 101: A Guide to Creating Your First Ontology;. Available from:www.unspsc.org.

[35] Valentia Technologies Limited. SnoChillies Browser;. Available from:https://snochillies.com. [36] Regenstrief Institute I. LOINC Term Basics — LOINC;. Available from: https://loinc.org/

get-started/loinc-term-basics/.

[37] SNOMED International. 2.1 Terminology Binding - Expression Constraint Language - SNOMED Confluence; 2019. Available from: https://confluence.ihtsdotools.org/display/DOCECL/2.1+

Terminology+Binding.

[38] Wouters P, Haak W. Open data : The researcher perspective. A study by Leiden University and Elsevier. 2017;p. 48 p. Available from:https://www.elsevier.com/__data/assets/pdf_file/0004/

281920/Open-data-report.pdf.

[39] Mennes M, Biswal BB, Castellanos FX, Milham MP. Making data sharing work: The FCP/INDI experience. NeuroImage. 2013;82:683–691. Available from: http://dx.doi.org/10.1016/j.

neuroimage.2012.10.064.

[40] Christen V, Groß A, Varghese J, Dugas M, Rahm E. Annotating Medical Forms Using UMLS. In: Ashish Naveen, , Ambite JL, editors. Data Integration in the Life Sciences. Cham: Springer International Publishing; 2015. p. 55–69.

(21)

2

Terminology binding in clinical research

2.1. Introduc on

To achieve interoperability, the ’I’ in Findable, Accessible, Interoperable, and Reusable (FAIR), researchers should perform terminology binding and map questions that are used in their electronic Case Report Forms (eCRFs) to ontology concepts. This chapter focuses on the methods available for terminology binding in clinical research and aims to determine which method is preferred.

2.2. Methods

A state-of-the-art review (i.e., a review of the most current research in a given area or concerning a given topic [1]) was conducted in order to find literature related to the task of terminology binding. MEDLINE and Google Scholar were searched on January 3, 2019. In addition, we performed a general internet search using the Google search engine in order to find any gray literature. Only English articles and web pages were included. Combinations of the following search terms were used.

• Terminology • Ontology • Concept

• Controlled Vocabulary • Medicine

• Case Report Form

• Medical Research • Clinical Research

One reviewer assessed the titles and abstracts of the search results. Next, the selected full-text pub-lications were assessed for relevance. Moreover, the reference lists of the retrieved pubpub-lications were explored and publications that matched our criteria were included.

2.3. Results

2.3.1. Selec ng an ontology

Before one can bind concepts to free-text terms, such as eCRF questions, a selection of one or more ontologies has to be made. The National Center of Biomedical Ontology’s BioPortal [2], a portal listing the most used ontologies and their domains, helps researchers to find ontologies that match their field. BioPortal now contains 774 ontologies that include nearly 9,000,000 concepts. Choosing the right ontol-ogy from all the ontologies available is the first step in the terminolontol-ogy binding process. Best practice is to use ontologies that are most popular in the field of research and contain most concepts matching the free-text terms [3]. Malone et al. [4] lists ten rules to select the right biomedical ontology as shown in Box2.1. Their rules focus on the (research) domain, the definition of the concepts, and development of the ontology.

(22)

1. The ontology should be about a specific domain of knowledge

2. The ontology should reflect current understanding of biological systems 3. The ontology classes and relationships should persist

4. Classes should contain textual definitions

5. Textual definitions should be written for domain experts

6. The ontology should be developed by the community but not incapacitated by it 7. The ontology should be under active development

8. Previous versions should be available 9. Open data requires open ontologies 10. Sometimes an ontology is not needed at all

Box 2.1 | Malone et al.’s ten rules for selec ng a biomedical ontology

2.3.2. Finding appropriate terminology concepts

The amount of data to be encoded as well as the large complexity of biomedical ontologies make it challenging to find correct annotations for both automatic approaches as well as human experts [5]. It is therefore recommended to reuse annotations of fellow researchers [6, 7]. In case these annotations are not available, several tools are used to automatically populate a list of candidate annotations [5, 8–10]. If these candidate annotations do not match the term, one can resort to manual annotation. These steps are visualized in Figure2.1and further described in the sections below.

Free-text eCRF

variable Search repository forexisting annotation Annotation found?

Perform lexical

matching Matches found? Select match

Annotated eCRF variable Manually ﬁnd concept in browser No Concept

found? Select concept

Yes Not annotated eCRF variable No Annotation matches context? No Yes Yes Select annotation

Figure 2.1 | Annota on workﬂow, containing the reuse of annota ons, lexical matching, and manual annota on

eCRF: electronic Case Report Form

2.3.2.1. Reusing annota ons

Because most annotations are not accessible to the scientific community, researchers re-invent the wheel every time they have to find an ontology concept matching their eCRF question [7]. Initiatives such as the Portal of Medical Data Models [7] and the NIH Common Data Elements [11] enable researchers to share, find, and reuse annotated forms or their individual data items (questions) through repositories. In order to reuse an eCRF question, one performs a search in the repository. Depending on the results, the researcher is presented with relevant eCRF questions and mapped concepts [8]. If no match was found, if the mappings are incorrect, or if the mappings do not capture the semantics and context of

(23)

the question, the researcher can perform automated lexical matching to find matching concepts [8]. 2.3.2.2. Semi-automated annota on

Automated lexical matching (a form of Natural Language Processing (NLP)) allows for the generation of a list of candidate annotations. First, one aims to find an exact match between the free-text question and the ontology concept’s description. If the free-text question is not in the same language as the ontology, (automatic) translators can be used to translate the label [12]. If no exact match is found in the ontology, an approximate search will be performed by normalizing the original search string (i.e., eliminating underscores, hyphen variations, and word order), as well as adding a wildcard (*) to the beginning and end of the string [8, 9]. A concept can be an exact match (same name, unit of measure and type), an ambiguous but likely match (similar or same name, issues with unit of measure or type), or there could be no match (no available concept) [10].

Systems such as MetaMap [13], cTAKES [14], and NCBO BioPortal Recommender [15] also imple-ment these lexical methods. Either the output of one tool can be used, or the output of multiple tools can be processed. A final list of candidate annotations of multiple systems can be obtained by taking the union (all concepts), intersection (concepts found by all tools) or majority (concepts found by a majority of tools) [5].

2.3.2.3. Manual annota on

When there are no matching concepts found using lexical matching, one can use online terminology browsers to find concepts matching the eCRF question. Browsers such as the UTS Metathesaurus Browser [16] and the SNOMED browser [17] offer search features, as well as functions for showing the concept in its hierarchical position and the concept’s description.

2.3.3. Selec ng a concept

Mapping concepts to terminologies is not trivial, it requires an understanding of the definition, purpose, and context of the described entity [18]. In order to find the concept that matches the eCRF question best, one has to take the definition of the concept and the position in the hierarchy into account [18, 19]. Moreover, one should also exploit the relationships with other concepts. Some ontology concepts may represent the same medical term, but with partially differing characteristics [20]. One example is ”Systolic blood pressure” and ”Lying systolic blood pressure”. All these factors have to be taken into account to select a concept that is reused, is part of a list of candidate concepts or found using a browser. In situations where a concept is not found, one can attempt to create a post-coordinated expression [8].

2.4. Discussion and conclusion

The task of terminology binding is a non-trivial task, for the selection of ontologies and their concepts has to happen accurately. One should take the research domain, the definition of the concepts, the loca-tion of concepts in the hierarchy, and the relaloca-tionships with other concepts into account. This ensures that the definition of the ontology concept matches the definition of the term.

The task of terminology binding can be be performed in a semi-automatic manner. In an ideal situa-tion, all researchers would share and reuse annotations from their colleagues. However, when there are no annotations available for reuse, a list of candidate annotations can be obtained using NLP systems that perform lexical matching. The output of these systems can be combined using the union, inter-section, or majority of the concepts to increase accuracy and precision. A researcher can subsequently select the concept from the list that matches the free-text term best. In the case that there are no match-ing concepts found, the researcher can always resort to manual codmatch-ing usmatch-ing an ontology browser.

We aim to develop a web-based application for the annotation of eCRF questions with input from the literature ad the results of this review contribute to this evidence-based method of software devel-opment. One limitation of this study should be noted. Since we focused on a state-of-the-art review instead of a systematic review of the literature, we could have missed articles that were potentially relevant.

(24)

This research project focuses on the semi-automated method of terminology binding the results of this review will be used in the methodology of the development of a ontology concept suggestion application. Chapter4 covers the development and evaluation of the application, and includes the terminology binding methods found in this chapter.

References

[1] Dochy F. A guide for writing scholarly articles or reviews for the Educational Research Review. Educational Research Review. 2006 1;4:1–2.

[2] Whetzel PL, Noy NF, Shah NH, Alexander PR, Nyulas C, Tudorache T, et al. BioPortal: enhanced functionality via new Web services from the National Center for Biomedical Ontology to access and use ontologies in software applications. Nucleic acids research. 2011;39(suppl_2):W541–W545. [3] Han L, Yesha Y, Salkeld R, Martineau J, Ding L, Joshi A. Finding Appropriate Semantic Web

On-tology Terms from Words. 2009;.

[4] Malone J, Stevens R, Jupp S, Hancocks T, Parkinson H, Brooksbank C. Ten Simple Rules for Select-ing a Bio-ontology. PLoS computational biology. 2016 2;12(2):e1004743–e1004743. Available from:

https://www.ncbi.nlm.nih.gov/pubmed/26867217.

[5] Lin YC, Christen V, Groß A, Domingos Cardoso S, Pruski C, Da Silveira M, et al. Evaluating and Improving Annotation Tools for Medical Forms;Available from: https://doi.org/10.1007/

978-3-319-69751-2_1.

[6] Musen MA, Bean CA, Cheung KH, Dumontier M, Durante KA, Gevaert O, et al. The center for expanded data annotation and retrieval. Journal of the American Medical Informatics Association. 2015;22(6):1148–1152.

[7] Dugas M, Neuhaus P, Meidt A, Doods J, Storck M, Bruland P, et al. Portal of medical data models: information infrastructure for medical research and healthcare. Database. 2016;2016.

[8] Pathak J, Wang J, Kashyap S, Basford M, Li R, Masys DR, et al. Mapping clinical phenotype data elements to standardized metadata repositories and controlled terminologies: the eMERGE Net-work experience. Journal of the American Medical Informatics Association. 2011;18(4):376–386. [9] McCray AT, Srinivasan S, Browne AC. Lexical methods for managing variation in biomedical

terminologies. Proceedings Symposium on Computer Applications in Medical Care. 1994;p. 235– 239. Available from:https://www.ncbi.nlm.nih.gov/pubmed/7949926.

[10] Bonney W, Doney A, Jefferson E. Standardizing biochemistry dataset for medical research. In: Proceedings of the International Joint Conference on Biomedical Engineering Systems and Technologies-Volume 5. SCITEPRESS-Science and Technology Publications, Lda; 2014. p. 205–210. [11] Rubinstein YR, McInnes P. NIH/NCATS/GRDR® Common Data Elements: A leading force for

standardized data collection. Contemporary clinical trials. 2015;42:78–80.

[12] Van Mulligen EM, Afzal Z, Akhondi SA, Vo D, Kors JA. Erasmus MC at CLEF eHealth 2016: Concept Recognition and Coding in French Texts;.

[13] Aronson AR. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program. In: Proceedings of the AMIA Symposium. American Medical Informatics Association; 2001. p. 17.

[14] Savova GK, Masanz JJ, Ogren PV, Zheng J, Sohn S, Kipper-Schuler KC, et al. Mayo clinical Text Analysis and Knowledge Extraction System (cTAKES): architecture, component evaluation and applications. Journal of the American Medical Informatics Association. 2010;17(5):507–513.

(25)

[15] Jonquet C, Musen MA, Shah NH. Building a biomedical ontology recommender web service. In: Journal of biomedical semantics. vol. 1. BioMed Central; 2010. p. S1.

[16] U S National Library of Medicine. UMLS Terminology Browser – Metathesaurus;. Available from:

https://uts.nlm.nih.gov/metathesaurus.html.

[17] SNOMED International. SNOMED International Browser;. Available from: https://browser.

ihtsdotools.org/.

[18] Leroux H, McBride S, Lefort L, Kemp M, Gibson S. A method for the semantic enrichment of clinical trial data. Studies in health technology and informatics. 2012;178:111–116.

[19] Lee DH, Lau FY, Quan H. A method for encoding clinical datasets with SNOMED CT. BMC medical informatics and decision making. 2010;10(1):53.

[20] Mucke R, Lobe M, Knuth M, Loebe F. A semantic model for representing items in clinical trials. In: Computer-Based Medical Systems, 2009. CBMS 2009. 22nd IEEE International Symposium on. IEEE; 2009. p. 1–8.

(26)

(27)

3

Measures to assess the quality of

ontology concept annota ons

3.1. Introduc on

This research project focuses on the semi-automated method of terminology binding applied to elec-tronic Case Report Form (eCRF) questions. To assess if the ontology concept suggestions provided by a Natural Language Processing (NLP) system are correct and whether the suggestions selected by a researcher are correct, one should evaluate the quality of the system’s and researchers’ annotations. This chapter aims to determine which measures are used for assessing the quality of ontology concept annotations.

3.2. Methods

Equivalent to the methods in Chapter2, a state-of-the-art review was conducted in order to find liter-ature related to the quality measures of annotations. MEDLINE and Google Scholar were searched on January 3, 2019. In addition, we performed a general internet search using the Google search engine in order to find any gray literature. Only English articles and web pages were included. Combinations of the following search terms were used.

• Ontology • Annotation • Encoding • Evaluate • Metric • Measure • Semantic • Coding • Accuracy • User • Quality

We assessed the titles and abstracts of the search results. Next, the selected full-text publications were assessed for relevance. Moreover, we explored the reference lists of the retrieved publications and included publications that matched our criteria.

3.3. Results

Annotating clinical text with ontology concepts could be considered as a classification task. Classifica-tion tasks are evaluated by comparing the task’s results to a reference standard, where the majority of the standards is created by domain experts. They manually perform the classification task and their re-sponses are combined to generate a reference standard [1]. There are two different types of classification tasks that focus on the classification of one single item or concept: binary and hierarchical classification [2]. Both tasks are evaluated with different outcome measures.

3.3.1. Binary evalua on

The results of the binary evaluation of ontology concept annotations can be classified into four main outcomes: true positives, true negatives, false positives, and false negatives (Box3.1) [2].

(28)

True positive (TP) Clinical text annotated with ontology concept when ontology concept is

present in reference standard

True negative (TN) Clinical text not annotated with ontology concept when ontology concept is

absent in reference standard

False positive (FP) Clinical text annotated with ontology concept when ontology concept is

ab-sent in reference standard

False negative (FP) Clinical text not annotated with ontology concept when ontology concept is

present in reference standard

Box 3.1 | Main evalua on outcomes for ontology concept annota ons, adapted from [2].

Figure3.1visualizes the relationship between the four main outcomes. The circle with the dotted line represents the annotations made during the annotation task. The circle with the solid line represents the annotations included in the reference standard. If the results overlap, i.e., an annotation is made during the annotation task and is included in the reference standard, the middle part of the diagram (proportion true positives) would become larger. In an ideal situation, there is a small proportion of false negative and false positive classifications and a large proportion of true positive classifications.

Figure 3.1 | A schema c overview of classiﬁca on results, adapted from [3].

The four main evaluation outcomes can be visualized in a confusion matrix, a 2 × 2 contingency table [4] (Figure3.2.1). If the system performs perfectly, there will be scores only in the diagonal positions, for any misclassification will be placed in the off-diagonal cells [5]. Confusion matrices allow for easy determination of confusion of classes [5] and thus show if the outcomes are skewed (e.g., a large number of true negatives and a high number of true positives). Several performance measures can be calculated based on the values of the confusion matrix, with precision (Equation3.1, Figure3.2.2), recall (Equation

3.2, Figure3.2.3), F1 score (Equation3.3), and accuracy (Equation3.4, Figure3.2.4) as measures that are used most in practice [2].

t n e s b A t n e s e r P t n e s b A t n e s e r P t n e s b A t n e s e r P t n e s b A t n e s e r P D et ec ted True positive False positive D et ec ted True positive False positive D et ec ted True positive False positive D et ec ted True positive False positive N ot d et ec te d False negative True negative N ot d et ec te d False negative True negative N ot d et ec te d False negative True negative N ot d et ec te d False negative True negative ll a c e R ) 3 ( n o i s i c e r P ) 2 ( x i r t a m n o i s u f n o C ) 1 ( Reference standard Ta sk re su lt (4) Accuracy Reference standard Ta sk re su lt Reference standard Ta sk re su lt Reference standard Ta sk re su lt

(29)

𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑃 (3.1) 𝑅𝑒𝑐𝑎𝑙𝑙 = 𝑇𝑃 𝑇𝑃 + 𝐹𝑁 (3.2) 𝐹1 𝑠𝑐𝑜𝑟𝑒 = 2 ×𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 × 𝑅𝑒𝑐𝑎𝑙𝑙 𝑃𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 + 𝑅𝑒𝑐𝑎𝑙𝑙 (3.3) 𝐴𝑐𝑐𝑢𝑟𝑎𝑐𝑦 = 𝑇𝑃 + 𝑇𝑁 𝑇𝑃 + 𝑇𝑁 + 𝐹𝑃 + 𝐹𝑁 (3.4) Precision focuses on agreement of the ontology concepts with the annotations resulting from the anno-tation task and recall focuses on the effectiveness of the annoanno-tation task to identify ontology concepts [2]. The F1 score focuses on the relation between the ontology concepts in the reference standard and those resulting from the annotation task and accuracy focuses on the overall effectiveness of the anno-tation task [2].

3.3.2. Hierarchical evalua on

The hierarchical evaluation of ontology concept annotations is performed calculating the semantic simi-larity between two concepts [6]. Figure3.3shows a part of the hierarchical concept diagram of SNOMED CT concept 271649006 | Systolic blood pressure (observable entity) |. Systolic blood pressure is similar to Blood pressure, for it is a type of Blood pressure, just as Lying systolic blood pressure is similar to Systolic blood pres-sure. However, Lying systolic blood pressure is less similar to Blood pressure, for Blood pressure is placed higher in the ontology’s hierarchy. Concepts that are lower in the hierarchy have a higher information content (IC) (i.e., contain more information) than concepts higher in the hierarchy [6], for their defini-tion is formed by all their superclasses. Thus, in Figure3.3, Lying systolic blood pressure is the concept that contains most information.

(a) superclasses Observable entity

...

Blood pressure

Vascular measurements

Systolic blood pressure

(b) class (c) subclasses

...

Lying systolic blood pressure

...

Figure 3.3 | A part of the hierarchical concept diagram of SNOMED CT concept 271649006 | Systolic blood pressure (observable en ty) |, adapted from a diagram generated by [7].

The types of semantic similarity in the biomedical domain can be roughly divided into knowledge-based and distributional-knowledge-based similarity metrics [8–10]. Knowledge-knowledge-based methods utilize existing knowledge sources, among which terminological systems, and can be divided into two measures: path finding measures and intrinsic IC measures [9–12]. Distributional methods utilize the concepts within a corpus and a knowledge source to compute similarity [9]. Since this project focused on classes from terminological systems, we focused on IC-based measures, for these metrics significantly and mean-ingfully outperform other knowledge-based semantic similarity metrics [6].

3.3.2.1. Intrinsic informa on content (IC) measures

Garla and Brandt [6] describe two Intrinsic IC measures as most meaningful measures: a similarity measure based on Leacock and Chodorow (LCH)’s measure [13] and a measure based on Pedersen et al.’s Path measure [9].

IC-based Leacock and Chodorow (LCH) measure

The LCH measure is based on the ratio of path length to the depth of the taxonomy on a logarithmic scale [13]. Equation3.5shows the original measure, where p is the number of concepts in the shortest path separating two concepts and d the maximum depth of a taxonomy. The adapted, IC-based, measure (Equation 3.6) takes the semantic distance (distjc) and maximum IC present in the taxonomy (icmax)

(30)

into account, instead of the path length (p) and distance (d) [6]. The semantic distance is calculated using Jiang and Conrath’s semantic distance (Equation3.7) [14] and Sanchez et al.’s intrinsic IC measure (Equation3.8) [10]. The Least Common Subsumer (lcs) in the semantic distance measure is defined as the closest common parent of both concepts. Sanchez et al.’s intrinsic IC measure focuses on the leaves and subsumers of concept c, where leaves(c) is the number of leaves (concepts without children) that are subclasses of the concept c and subsumers(c) the number of c’s superclasses + 1 (c) [10]. max_leaves is defined as the maximum number of leaves in the taxonomy. The measure is expressed in a rate between 0 (not similar at all) and 1 (completely similar).

𝑠𝑖𝑚 _{_} (𝑐 , 𝑐 ) = − log( 𝑝 2𝑑) (3.5) 𝑠𝑖𝑚 _{_} (𝑐 , 𝑐 ) = 1 −log(𝑑𝑖𝑠𝑡 (𝑐 , 𝑐 ) + 1) log(2 × 𝑖𝑐 + 1) (3.6) 𝑑𝑖𝑠𝑡 (𝑐 , 𝑐 ) = 𝐼𝐶 (𝑐 ) + 𝐼𝐶 (𝑐 ) − 2 × 𝐼𝐶 (𝑙𝑐𝑠(𝑐 , 𝑐 )) (3.7) 𝐼𝐶 (𝑐) = − log( ( ) ( )+ 1 𝑚𝑎𝑥_𝑙𝑒𝑎𝑣𝑒𝑠 + 1) (3.8)

Box3.2.2 shows an example calculation of the IC-based LCH measure based on Lying systolic blood pres-sure and Systolic blood prespres-sure.

IC-based Path measure

The Path measure (Equation3.9) calculates the similarity between two concepts by taking the inverse of the length of the path separating them (p) [9]. Similar to the IC-based LCH measure, the adapted, IC-based, Path measure (Equation3.10) replaces the length of the path (p) with Jiang and Conrath’s semantic distance (distjc, Equation3.7) [6, 14]. 1 is added to the distance to avoid dividing by 0 [6].

𝑠𝑖𝑚 (𝑐 , 𝑐 ) = 1

𝑝 (3.9)

𝑠𝑖𝑚 _ (𝑐 , 𝑐 ) =

1

𝑑𝑖𝑠𝑡 (𝑐 , 𝑐 ) + 1 (3.10)

Box3.2.3 shows the calculation of the IC-based Path measure based on Lying systolic blood pressure and Systolic blood pressure.

3.4. Discussion and conclusion

The outcomes of a terminology binding task can be evaluated using two different methods: binary evaluation and hierarchical evaluation. Binary evaluation methods classify the results into four main outcomes: true positives, true negatives, false positives, and false negatives. Various performance mea-sures can be calculated from these classes, with precision, recall, F1 score, and accuracy as meamea-sures that are used most in practice. The hierarchical evaluation of ontology concept annotations is performed calculating the semantic similarity between two concepts, with the IC-based LCH and IC-based Path measures as most meaningful performance measures.

Several strengths of this state-of-the-art review should be noted. First, a clear overview of terminol-ogy binding evaluation methods enables other studies to find appropriate methods for the evaluation of their annotations. Second, our review included studies that evaluated the evaluation methods and thus demonstrate their value for practical use. Second, we believe that the use of visualizations and examples in this review help to understand the subject matter better in comparison to an abstract list-ing of the formulas used. While this review used the state-of-the-art review methodology, instead of a systematic approach, potentially relevant publications could have been missed.

The evaluation methods described in this chapter are used in Chapter4for the evaluation of the web-based application that suggests ontology concepts to researchers. However, we believe that future

(31)

(1) Parameters

• max_leaves: The maximum leaves in SNOMED CT is 240705 (the number of SNOMED CT concepts without children).

• ICmax: The maximum information content (IC) in SNOMED CT is ≈ 12.39.

• lcs(c1, c2): The Least Common Subsumer (lcs) of both concepts is Systolic blood pressure, since Lying systolic blood pressure is a type of Systolic blood pressure.

• ICintrinsic(c1) = ICintrinsic(lcs(c1, c2)): Systolic blood pressure has 9 leaves and 9 subsumers (including

itself), so the IC is − log( ) ≈ 11.70.

• ICintrinsic(c2): Lying systolic blood pressure has 0 leaves and 11 subsumers (including itself), so the IC

is − log( ) ≈ 12.39.

Lying systolic blood pressure is also a subclass of Lying blood pressure, hence 11 subsumers instead of 10 (9 of Systolic blood pressure + 1 (Lying blood pressure) + 1 (itself)).

• distjc(c1, c2): The semantic distance is based on the calculated ICs, and is 11.70 + 12.39 − 2 ×

11.70 ≈ 0.69.

(2) IC-based Leacock and Chodorow (LCH) measure

• simlch_ic(c1, c2): The calculated semantic distance and the maximum IC of SNOMED CT can be

filled into the adapted LCH formula, which results in 1 − _{( ×}( . _. ) ₎≈ 0.84.

(3) IC-based Path measure

• simpath_ic(c1, c2): The calculated semantic distance can be filled into the adapted Path formula,

which results in _. ≈ 0.59.

(32)

studies cover the task of terminology binding could benefit from our overview of evaluation methods as well.

References

[1] Hripcsak G, Wilcox A. Reference standards, judges, and comparison subjects: roles for experts in evaluating system performance. Journal of the American Medical Informatics Association : JAMIA. 2002;9(1):1–15. Available from:https://www.ncbi.nlm.nih.gov/pubmed/11751799. [2] Sokolova M, Lapalme G. A systematic analysis of performance measures for classification tasks.

Information Processing & Management. 2009 7;45(4):427–437. Available from: https://www.

sciencedirect.com/science/article/abs/pii/S0306457309000259.

[3] Klintberg A. Explaining precision and recall - Andreas Klintberg - Medium; 2017. Available from:

https://medium.com/@klintcho/explaining-precision-and-recall-c770eb9c69e9.

[4] Pearson K. On the theory of contingency and its relation to association and normal correlation; On the general theory of skew correlation and non-linear regression. Cambridge University Press; 1904.

[5] Meyer-Baese A, Schmid V, Meyer-Baese A, Schmid V. Foundations of Neural Networks. Pattern Recognition and Signal Analysis in Medical Imaging. 2014 1;p. 197–243. Available from: https:

//www.sciencedirect.com/science/article/pii/B9780124095458000078.

[6] Garla VN, Brandt C. Semantic similarity in the biomedical domain: an evaluation across knowledge sources; 2012. Available from: http://code.google.com/p/ytex. Weprovideapublicly-accessiblewebservicetocomputesemanticsimilarity,availableunderhttp: //informatics.med.yale.edu/ytex.web/.

[7] Valentia Technologies Limited. SnoChillies Browser;. Available from: https://snochillies.com. [8] Agirre E, Alfonseca E, Hall K, Kravalova J, Pas¸capas¸ca M, Soroa A. A Study on Similarity and

Relatedness Using Distributional and WordNet-based Approaches;. Available from:http://www. lsi.upc.es/.

[9] Pedersen T, Pakhomov SVS, Patwardhan S, Chute CG. Measures of semantic similarity and relat-edness in the biomedical domain. Journal of Biomedical Informatics. 2007;.

[10] Sánchez D, Batet M. Semantic similarity estimation in the biomedical domain: An ontology-based information-theoretic perspective. Journal of Biomedical Informatics. 2011;.

[11] Al-Mubaid H, Nguyen HA. Measuring semantic similarity between biomedical concepts within multiple ontologies. IEEE Transactions on Systems, Man and Cybernetics Part C: Applications and Reviews. 2009;.

[12] Caviedes JE, Cimino JJ. Towards the development of a conceptual distance metric for the UMLS. Journal of Biomedical Informatics. 2004;.

[13] Leacock C, Chodorow M. Combining Local Context and WordNet Similarity for Word Sense Iden-tification. In: WordNet: An Electronic Lexical Database. vol. 49; 1998. p. 265.

(33)

4

Ontology concept sugges on genera on

for free-text eCRF ques ons

4.1. Introduc on

As mentioned in Chapter2, terminology binding in medicine can be performed using three methods: the reuse of annotations [1, 2], semi-automatic annotation [3–6], and manual annotation. When there are no annotations available for reuse or when the quality of existing annotations is low, researchers will resort to finding concepts by themselves, either using semi-automated methods or manual annotation effort [4].

The semi-automated and manual methods of terminology binding require a good understanding of the data, but also about the structure of terminological systems. For example, when a researcher aims to annotate the eCRF question Creatinine, they search for matching concepts in terminological systems, such as SNOMED CT. The SNOMED CT International Browser returns 199 matches and 76 unique concepts for creatinine. Since the question is listed in an electronic Case Report Form (eCRF), the concept 365756002 | Finding of creatinine level (finding) | would be the best fit: it relates both to the measured substance (creatinine) as the finding (level). However, the first concept in the list is 15373003 | Creatinine (substance) |, which only relates to the substance. Without proper training or guidance, a researcher would presumably pick the first concept without looking at the position of the concept in the hierarchy, for the label of the concept (Creatinine) fully matches the question. We, therefore, hypothesize that researchers would benefit from a list of ontology concept suggestions, ordered by their relevance. Concepts that are likely unrelated to eCRFs (e.g., Creatinine (substance)) would be positioned lower in the list in comparison to concepts that are likely to appear on eCRFs (e.g., Finding of creatinine level (finding)). Recent research has shown that Natural Language Processing (NLP) algorithms are able to provide this type of lists of recommendations [7] for medical forms. However, to our knowledge, there are no prior studies that have examined the implementation of an NLP algorithm on eCRF questions. This chapter focuses on the method of semi-automatic annotation of eCRF questions. We aim to determine the quality of ontology concept suggestions provided by an NLP algorithm and the quality of the re-sulting annotations that were made by researchers. These quality assessments allow us to examine if semi-automated annotation by researchers results in annotations with a quality similar to annotations by experts.

4.2. Methods

This study consists of three phases. In the first phase, we extracted the most commonly used eCRF questions from an Electronic Data Capture (EDC) platform. Next, these eCRF questions were annotated, resulting in a reference standard. In the second phase, we developed a web-based tool that allows researchers to annotate their eCRF questions using ontology concept suggestions. Lastly, in the third phase, we invited users of the EDC platform to use the tool and assessed the tool’s performance and

(34)

the researchers’ annotation accuracy using the developed reference standard and expert input.

4.2.1. Data source

We used metadata originating from research projects in Castor EDC [8], an Electronic Data Capture platform. Metadata about research projects (identifier, name, and contact person), their eCRFs (identi-fier, name), and questions on the eCRFs (label, variable name, type) were abstracted from the system.

4.2.2. Reference standard

In order to generate the reference standard, we included questions from eCRFs in research projects with the following characteristics:

• Actual research project (not labeled as test or pilot project) • Running research project (start date was not empty) • Research project has collected > 1000 data points

Research projects with this number of data points can be classified as running clinical trials or registries

Box 4.1 | Inclusion criteria for the reference standard

The language of all questions was detected and the questions were translated into English using the Google Translate Application Programming Interface (API) [9], if needed. Afterwards, the ques-tions were transformed into lower case and all non-alphanumeric characters were removed. To exclude questions that were used in a minor part of the research projects, we generated a frequency table of all questions and only included questions in the reference standard that occurred in more than 1% of the projects. We, furthermore, excluded questions that were part of the platform’s example eCRF and excluded elements on the eCRF that were not questions (e.g., section headers).

Three researchers in the field of semantic data (PvD, AJ, MK) manually annotated the resulting questions with SNOMED CT concepts and LOINC codes. After the annotation task, a meeting was set up in order to gain consensus on the annotations.

Out of the list of most-used questions, a new list was formed with the ten most-used eCRF ques-tions. Duplicates, such as questions both in Dutch and English (e.g., geboortedatum and date of birth) and ambiguous questions (e.g., comments), were removed. A selection of five most-used questions and five not most-used questions was used as development set for the suggestion tool.

4.2.3. Par cipants

We included researchers that were using the Castor EDC platform and were coordinating research projects with the following characteristics:

Inclusion criteria from Box4.1

+ Research project includes eCRFs that use ≥ 5 out of 10 questions that are most used in the EDC platform

+ Research project is coordinated by a Dutch researcher

This allowed for fast communication and troubleshooting

Box 4.2 | Inclusion criteria for par cipants

Researchers received an email with information about the study and a link to a form that allowed them to register themselves as participants of the study using Zoho Campaigns, Forms, and Survey [10].

Development and evaluation of a semi-automated annotation process for electronic Case Report Forms

Development and

evalua on of a

semi-automated

annota on process

for electronic Case

Report Forms

Mar jn Gerard Kersloot

Master thesis

Medical Informa cs

August 2019

Places of the scientific research project

Mentors

Tutors

Cover and title page design

Development and evalua on of a

semi-automated annota on process for

electronic Case Report Forms

Master thesis

Master of Medical Informa cs

University of Amsterdam

Summary

Samenva ng

Contents

List of abbrevia ons

1

General introduc on

1.1. Background

1.1.1. Medical research data

1.1.2. Electronic Case Report Forms

1.1.3. FAIR Data

1.1.4. Terminological systems

1.1.5. Terminology binding

1.1.6. Retrospec ve and prospec ve FAIRiﬁca on

1.1.7. Empowering researchers in the FAIRiﬁca on process

1.1.8. Natural Language Processing

1.2. Objec ves and research ques ons

1.3. Outline

References

2

Terminology binding in clinical research

2.1. Introduc on

2.2. Methods

2.3. Results

2.3.1. Selec ng an ontology

2.3.2. Finding appropriate terminology concepts

2.3.3. Selec ng a concept

2.4. Discussion and conclusion

References

3

Measures to assess the quality of

ontology concept annota ons

3.1. Introduc on

3.2. Methods

3.3. Results

3.3.1. Binary evalua on

3.3.2. Hierarchical evalua on

3.4. Discussion and conclusion

References

4

Ontology concept sugges on genera on

for free-text eCRF ques ons

4.1. Introduc on

4.2. Methods

4.2.1. Data source

4.2.2. Reference standard

4.2.3. Par cipants