Ontology Matching in Practice: Facilitating Ontology Alignments for Interoperable Research Data

(1)

Ontology Matching in Practice:

Facilitating Ontology Alignments for

Interoperable Research Data

Author:

Philip van Damme

Master’s thesis

Master of Medical Informatics

(2)

2

Ontology Matching in Practice: Facilitating Ontology

Alignments for Interoperable Research Data

Author: Philip van Damme Student number: 10742549

Email: p.vandamme@amsterdamumc.nl

Tutors:

Ronald Cornet, PhD Nicolette de Keizer, PhD

Associate professor Full Professor

r.cornet@amsterdamumc.nl n.f.keizer@amsterdamumc.nl

Mentor:

Jesualdo Tom´as Fern´andez Breis, PhD Full Professor

jfernand@um.es

Location:

Universidad de Murcia Amsterdam UMC - AMC

Facultad de Inform´atica Dept. of Medical Informatics

Campus de Espinardo Meibergdreef 9

CP 30100 Murcia, Spain 1105 AZ Amsterdam, The Netherlands

Period:

(3)

3

Abstract

Objective

This study analyzes the performance of automated ontology matching techniques in the domain of rare diseases. Current efforts in the rare disease community focus on integrating data distributed over multiple sources, which requires the need for automated ontology matching. Additionally, this study analyzes how these techniques can be applied to a practical use case of creating a machine-readable classification of vascular anomalies. An ontology based on this classification was matched to the Orphanet Rare Disease Ontology (ORDO), SNOMED CT, and the NCI Thesaurus (NCIt). The key rationale of this study is to contribute to interoperable data, which is part of the Findable, Accessible, Interoperable, and Reusable (FAIR) data principles.

Methods

Three ontologies (NCIt, SNOMED CT, ORDO) and matching systems (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0) were used in an experimental study. Pairwise alignments containing equivalence mappings between NCIt-ORDO, NCIt-SNOMED CT, and ORDO-SNOMED CT were created. Mod-ularization techniques were applied to extract a locality-based module from each ontology, using seed signatures based on rare disease data items. The performance of the matching systems was evaluated against reference alignments from BioPortal and the Unified Medical Language System (UMLS). In addition, mappings were evaluated by examining their top-level hierarchies. Finally, the matching systems were applied to a new ontology based on a classification of vascular anomalies.

Results

Evaluation of the NCIt-ORDO pair resulted in F1-scores of 0.53 (UMLS) and 0.42 (BioPortal). The NCIt-ORDO pair had F1-scores of 0.43 (UMLS) and 0.58 (BioPortal). The NCIt-SNOMED CT pair had the highest F1-scores of 0.60 (UMLS) and 0.78 (BioPortal). AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 had F1-scores of 0.55, 0.46, 0.55 for BioPortal and 0.66, 0.53, 0.58 for the UMLS, respectively. Evaluation with manually created top-level hierarchy mappings showed an average of 10% of mappings which classes belonged to an incorrect hierarchy. Application of the matching systems to the ontology based on the classification of vascular anomalies resulted in a coverage of 44% with ORDO, 35% with NCIt, and 36% with SNOMED CT. The precision of those alignments varied between 0.11 and 0.31.

Discussion

Measuring the performance of ontology matching systems is not a trivial task, as the requirements strongly depend on the practical application. This study found that currently available matching systems can produce meaningful mappings without user intervention. Moreover, combining universal reference alignments (BioPortal, UMLS) and analyzing top-level hierarchies seems to be promising for the automatic selection of useful mappings. This study concludes that available ontology matching systems can contribute to achieving interoperable data, considering limitations regarding the evaluation of mappings. Future research should focus on the automatic selection of useful mappings following real-life use cases.

Keywords

(4)

4

Samenvatting

Doelstelling

Deze studie onderzoekt de prestaties van systemen die automatisch concepten uit verschillende ontologie¨en met elkaar kunnen matchen. De aanleiding van deze studie is het streven vanuit het onderzoek naar zeldzame ziekten om data uit verschillende bronnen met elkaar te kunnen combineren. Daarnaast analyseert deze studie hoe matching van ontologie¨en toegepast kan worden op een voorbeeld uit de praktijk, door een ontologie voor vasculaire afwijkingen te matchen met de Orphanet Rare Disease Ontology (ORDO), SNOMED CT en de NCI Thesaurus (NCIt). Het doel van deze studie is om bij te dragen aan de realisatie van uitwisselbare en herbruikbare data, als onderdeel van het FAIR principe (Findable, Accessible, Interoperable, Reusable).

Methoden

Een experimenteel onderzoek werd uitgevoerd met drie ontologieën (NCIt, SNOMED CT, ORDO) en drie matching systemen (AgreementMakerLight 2.0, FCA-Map, LogMap 2.0). De paren NCIt-ORDO, NCIt-SNOMED CT en ORDO-SNOMED CT werden met elkaar gematcht. Uit alle drie de ontologieën werd tevens een kleinere module gegenereerd. De prestaties van de systemen werd onderzocht op basis van een referentiestandaard afkomstig uit BioPortal en het Unified Medical Language System (UMLS). Daarnaast werden de gematchte concepten geëvalueerd op basis van hun hiërarchie. Als laatste werden de matching systemen toegepast op een nieuwe ontologie voor vasculaire afwijkingen.

Resultaten

De evaluatie van NCIt-ORDO resulteerde in F1-scores van 0.53 (UMLS) en 0.42 (BioPortal). NCIt-ORDO had F1-scores van 0.43 (UMLS) en 0.58 (BioPortal). NCIt-SNOMED CT had de hoogste F1-scores, namelijk 0.60 (UMLS) en 0.78 (BioPortal). De systemen AgreementMakerLight 2.0, FCA-Map en LogMap 2.0 hadden F1-scores van respectievelijk 0.55, 0.46 en 0.55 op basis van BioPortal. De F1-scores voor UMLS waren 0.66, 0.53 en 0.58. Evaluatie met de handmatig gematchte hoogste hierarchie¨en leverde een gemiddelde op van 10% van het totaal aantal gematchte concepten die als incorrect konden worden aangemerkt. Het toepassen van de systemen op de ontologie voor vasculaire afwijkingen wees uit dat deze een overlap had van 44% met ORDO, 35% met NCIt en 36% met SNOMED CT. De precisie van deze gematchte concepten lag tussen de 0.11 en 0.31.

Discussie

Het meten van de prestaties van matching systemen is niet eenvoudig, omdat de interpretatie van de resultaten sterk samenhangt met de praktische toepassing van de gematchte concepten. Dit onderzoek laat zien dat beschikbaar matching systemen in staat zijn om automatisch bruikbare matches te vinden. Daarbij lijken het gebruik van referentiestandaarden en de analyse van hierarchie¨en potentie te bieden. Dit onderzoek concludeert dat matching systemen kunnen bijdragen aan uitwisselbare en herbruikbare data. Een uitdaging die nog open ligt is het automatich selecteren van bruikbare matches gebaseerd op hun toepassing, toekomstig onderzoek zal zich daar op moeten richten.

Trefwoorden

(5)

Introduction

1.1 Context of this study

1.1.1 Reusable research data

The generation, collection, and usage of data is crucial for scientific research. Consequently, the sharing and reuse of research data have become more and more important. In Europe, a report published in 2010 from the European High-Level Expert Group on Scientific Data urged immediate action to an open approach to science and the related digital infrastructure [1]. Furthermore, the European Union’s Horizon 2020 programme focuses on improving access to scientific publications and research data [2]. On the one hand, data sharing could be as simple as researcher A sending a dataset to researcher B, but would become more challenging as soon as researchers do not know each other personally or have no knowledge about which data is available. On the other hand, for data reuse, one would also need to have information about the data in order to be able to reuse it. Researcher A could give such information to researcher B and then B can reuse the data that has been shared by A. However, when considering multiple research groups, universities, countries, or even continents, specific arrangements are needed to make data reuse possible. Such arrangements can be included in a data management plan (DMP), DMPonline being an exemplar tool for researchers to create and share DMPs [3]. To support the harmonization of data management in Europe, Science Europe and the Dutch Research Council (NWO) published core requirements for DMPs [4]. Such requirements help individual researchers, research organizations, funding agencies, and research communities to align their DMPs.

Reusable data does not only relate to humans but also to machines, as machine-readable data enables both humans and machines to browse and discover information. In 2016, a group of researchers and other stakeholders with an interest in the findability and reuse of research data published a set of principles to propagate the reuse of research data for machines and humans [5]. These principles were presented as the FAIR Guiding Principles for scientific data management and stewardship (Findable, Accessible, Interoperable, and Reusable). The FAIR acronym is since then widely acknowledged by the research community to refer to reusable data. The process of making data FAIR is often referred to as FAIRification, which has been described in seven steps by the GO FAIR initiative (see Figure 1.1) [6]. The FAIRification process can also be described by ‘pre-FAIRification’ (identify objective, analyze (meta)data), FAIRification (define semantic model, make (meta)data linkable, and host (meta)data), and ‘post-FAIRification’ (assess the FAIR data) [7]. It is important to note that FAIR data, and so reusable

data, does not equal open data. Hence, accessibility is part of the FAIR Guiding Principles [5].

1.1.2 Ontologies and ontology matching

In information science, an ontology can be defined as a ‘formal, explicit specification of a shared conceptualization’ [8]. An ontology describes concepts and the relationships between those concepts of a

(8)

8 Context of this study

Figure 1.1. Seven steps of the FAIRification process. Adapted from GO FAIR [6]. This study focuses on step 3 (define the semantic model) and step 4 (make data linkable).

domain and can be made available in multiple languages. BioPortal is an open repository of biomedical ontologies and includes at the time of writing 835 ontologies [9]. An example of a biomedical ontology is SNOMED CT [10], which includes healthcare terminology on multiple domains and in multiple languages. Another example is the Orphanet Rare Disease Ontology (ORDO) [11], which focuses on rare diseases. When making data FAIR, a semantic model of the data needs to be created (step 3), and the data needs to be made linkable (step 4) (Figure 1.1), for both of which ontologies are useful as they provide a consensus about some domain.

Different ontologies can contain classes describing the same domain or concepts which makes them overlap. Using an ontology should improve the interoperability of data, however, no interoperability would be accomplished if researchers use a different ontology for describing the same concept(s). To overcome the problem of having to deal with multiple ontologies, the field of ontology matching comes into play. Ontology matching aims to make ontologies interoperable by matching semantically related concepts from different ontologies, resulting in alignments between ontologies. Such ontology alignments allow for interoperability even when multiple ontologies are involved.

A specific community (e.g. the rare disease community) could define standards to which the entire community should adhere, including specific ontologies. Ontology matching would then be of use to achieve interoperability within a community and between communities, as different communities might use different standards. Examples of achieving interoperability by defining community standards, adhering to the FAIR data principles, can already be found [12].

1.1.3 Combining FAIR data

Combining FAIR data, thus creating the ability to ask questions across multiple sources (e.g. datasets), should be the outcome of the FAIRification process [6]. Ontology matching can make data interoperable in case data(sets) are annotated using different ontologies [13]. Several initiatives embrace the FAIR principles to create an environment where existing data becomes reusable, such as the Personal Health Train [14]. The Personal Health Train focuses on health data and assumes data to be FAIR. Then, data stays at their source (‘stations’) while users can send queries or use cases (‘trains’) to interrogate data across multiple sources.

In January 2019 the European Joint Programme on Rare Diseases (EJP RD) has started, which is a large Europe-wide project focusing on creating an ecosystem for rare disease research and care [15]. One of the objectives of the EJP RD is to build a FAIR-compliant virtual platform that describes rare disease resources and enables researchers to interrogate data from multiple resources at different locations. Consequently, these sources can use different ontologies which requires ontology matching in order to enable interoperability.

(9)

Chapter 1. Introduction 9

1.2 Problem statement

Making data interoperable involves annotating the data with ontology classes which, as mentioned in Section 1.1.2, introduces the need for ontology alignments when multiple (overlapping) ontologies are involved. Ontology alignments could be created manually but that would rapidly become infeasible due to the large size of biomedical ontologies (e.g. SNOMED CT contains over 350.000 classes). Moreover, datasets change over time, as do ontologies, which creates the need for dynamic ontology matching services [13]. Lots of matching techniques and systems have been developed but there is a lack of papers describing real-life applications [16]. An example of a real-life application could be the implementation of ontology matching for a web service that queries data over multiple sources. Most matching systems, as a consequence, do not consider a specific real-life use case but rather focus on tasks concerning the alignment of specific ontologies. Consider two datasets containing data about rare diseases: the first dataset annotates diseases using SNOMED CT and the second one with ORDO, an alignment between ORDO and SNOMED CT would then be necessary to achieve interoperability without human intervention. A systematic analysis of term overlap and term reuse across biomedical ontologies in BioPortal found an approximate overlap of over 25% of ontology concepts and less than 9% of reuse [17]. Another work studied the reuse of logical axioms in biomedical ontologies and discovered that 49 out of 123 ontologies did not apply any type of reuse [18]. Hence, even when data is annotated with ontology classes it is not per definition interoperable, as such an ontology would need to be matched to other (similar) ontologies. Matching classes from different ontologies is a non-trivial task to which this study wants to contribute.

1.2.1 Use case: a classification for vascular anomalies

European Reference Networks (ERNs) are networks (24 in total) that aim to connect medical expertise across Europe and exchange information [19]. Each ERN focuses on a different group of rare diseases. VASCERN [20] is the ERN on rare multisystemic vascular diseases and aims to build a Registry of Vascular Anomalies (VASCA) based on the FAIR data principles. The VASCA registry is being built using the common data elements for rare disease registries, part of which is the set of data elements for the diagnosis of a patient. For the latter, VASCA has to register vascular tumors and malformations according to a classification of the International Society for the Study of Vascular Anomalies (ISSVA) [21]. Currently, this classification is only available in PDF-format and is non-machine-readable. Therefore, efforts were made by VASCA to transform this classification into an ontology. Ultimately the classes in this ontology should be mapped to ORDO and other relevant ontologies, to which this study wants to contribute using ontology matching. Ontology matching should help the ISSVA ontology to be interoperable with other ontologies, and to ensure that existing ontologies are reused instead of building a new ontology from scratch.

(10)

10 Objective and research questions

1.3 Objective and research questions

This study aims at analyzing the performance of existing automated ontology matching techniques, and their practical application in the domain of rare disease research. In addition, it studies the usage of ontology matching techniques for combining and/or relating data for querying over multiple sources. See Figure 1.2 for a simplified example. Hence, this project contributes to the facilitation of interoperable, thus FAIR, research data. The following research questions will be discussed:

1. What is the performance of automated ontology matching techniques to expose mappings between ontologies used in the rare disease research domain?

2. To what extent are currently available techniques for ontology matching useful for implementation in FAIR-related projects that focus on the integration of multiple data sources?

3. How can ontology matching be used in practice while creating new and using existing ontologies to create machine-readable data, with regard to term overlap and reuse?

Figure 1.2. Example of an ontology matching service. Two sources are annotated with different ontologies (SNOMED CT and ORDO). An ontology matching service can make

these two sources interoperable by matching equivalent annotations. Questions like ‘Which countries have data available about multiple sclerosis?’ could then be answered.

(11)

Chapter 1. Introduction 11

1.4 Outline

This thesis is divided into five chapters. The first chapter gives background information on the topic, states the problem, and describes the objective and research questions of this study. The second chapter provides all the necessary background information. The third chapter explains the methods that were used. In the fourth chapter the obtained results are presented. The fifth chapter discusses the results, puts the research in context of other related work, and presents the conclusions drawn from the work that has been done.

(12)

(13)

Chapter 2

Background

2.1 The Semantic Web and interoperability

Making data machine-readable is the goal of the Semantic Web [22]. The Semantic Web describes a set of technologies to create a collection of interlinked data (Linked Data) on the web, those technologies are set as standards by the World Wide Web Consortium (W3C) [23]. All things on the Semantic Web need to be unambiguously identified by Unified Resource Identifiers (URIs). W3C standards for the Semantic Web include:

• Resource Description Framework (RDF), enabling to represent data using subject/predicate/object triples. RDF allows for graph representations.

• RDF Schema (RDFS), an extension of RDF providing a vocabulary to create data models of RDF data.

• SPARQL, an RDF query language that allows querying RDF represented data.

• Web Ontology Language (OWL), a Semantic Web language designed to specify knowledge in a precise way. OWL extends the RDF and RDFS standards and allows for reasoning.

Semantic interoperability means that machines can exchange data unambiguously and with the same meaning [23]. The mentioned techniques of the Semantic Web support semantic interoperability. The FAIRification process as described by GO FAIR (Figure 1.1) includes creating a semantic model and making data linkable, which can be done using Semantic Web technologies.

2.2 Ontologies

As mentioned in Chapter 1, ontologies provide a formal description of concepts and the relationships between those concepts. OWL is the international standard of W3C to represent ontologies. Ontologies form an important part of the Semantic Web and the FAIRification process (Figure 1.1). Creating a semantic model of data includes the usage of appropriate ontological entities and making data linkable involves the application of Semantic Web technologies such as RDF and OWL. Figure 2.1 shows an example of the concept Cystic fibrosis screening (procedure) in the hierarchy of SNOMED CT, which is defined as a screening of respiratory disease and has three properties.

2.2.1 Module extraction

Due to the typically large size of (biomedical) ontologies, it can be useful to extract a smaller module. Such a smaller part of a larger ontology makes it easier to use and understand the ontology. The OWL

(14)

14 Ontology matching

171191008

Cystic ﬁbrosis screening (procedure)

171228002

Respiratory disease screening (procedure)

260686004

Method (attribute)

129265001

Evaluation - action (qualiﬁer value)

363702006

Has focus (attribute)

190905008

Cystic ﬁbrosis (disorder)

363703001

Has intent (attribute)

360156006

Screening - procedure intent (qualiﬁer value)

Figure 2.1. Example of the concept Cystic fibrosis screening (procedure) in SNOMED CT, from the SNOMED CT Browser [10]. The concept a subclass of ‘respiratory disease screening’,

and has three properties (called attributes in SNOMED CT): 1) ‘Method’ is ‘Evaluation - ac-tion’, 2) ‘Has focus’ is ‘Cystic fibrosis’, and 3) ‘Has intent’ is ‘Screening - procedure intent’.

API includes a syntactic locality module extractor [24]. This module extractor uses a seed signature as input, which contains a list of classes from the parent ontology which the module(s) should be based on. Using this seed signature, the module extractor can extract three different types of modules: star, bottom, and top. A top module includes all subclasses and (sub)properties (e.g. the properties mentioned in Figure 2.1) of the classes in the seed signature, a bottom module does the opposite by including the superclasses and (super)properties. Disjointness between classes is also included, e.g. if class A is disjoint with B then B will be included if A is included and vice versa. A star module combines both strategies by including the intersection of the top and bottom modules. Both the asserted and inferred versions of the ontologies are used. The difference between a top, bottom, or star module is shown as a simplified example in Figure 2.2.

2.3 Ontology matching

Ontology matching aims to make ontologies interoperable by matching semantically related concepts from two or more ontologies, resulting in alignments between ontologies. Such ontology alignments allow for interoperability even when multiple ontologies are involved, as mentioned in Chapter 1. For example, the Unified Medical Language System (UMLS) [25] integrates over 100 vocabularies and several key biomedical ontologies including SNOMED CT and ORDO. First, some basic definitions are given below, adopted from [26] and modified when desired for the scope of this work.

Definition 2.3.1. (Ontology matching) Matching is the process of finding relationships between different concepts of different ontologies. The matching process can be defined as a function f :

A0= f (O, O0, A, p, r)

where A0 is the output alignment of a pair of ontologies O and O0, A is an optional reference alignment that would be extended or completed, p is a set of parameters (e.g. weights) used during the matching process, and r the possible external resources that are used to match the ontologies (e.g. vocabularies).

Definition 2.3.2. (Alignment ) An alignment A is a set of mappings between classes of two ontologies O and O0 and is the output of the matching process.

Definition 2.3.3. (Mapping) A mapping m is the relation, according to an alignment, between different classes of two ontologies. Some papers refer to a mapping as a correspondence. Formally a mapping can

(15)

Chapter 2. Background 15

(a) Whole ontology (b) Top module

(c) Bottom module (d) Star module

Figure 2.2. Example of different module types. The classes B and F (red) are used as seed signature, classes marked in blue are included in a module. The top module contains all

sub-classes of the seed signature sub-classes (and properties), the bottom module contains all super-classes (and properties), and the star module contains the intersection of top and bottom. Sib-lings classes are disjoint, meaning that C is also included because B has a disjoint axiom with C.

be defined as a triple by a pair of ontologies O and O0 and a set of mapping relations Θ:

m = he, e0, ri where

e ∈ O and e0∈ O0_; r ∈ Θ.

Moreover, a mapping can include metadata such as a confidence value and identifiers. Although metadata will not be used for this work.

Example 2.3.3. (Mapping) The concepts ‘Physician’ in SNOMED CT and ‘Doctor’ in LOINC share the same Concept Unique Identifier (CUI) in the UMLS, which means that they are semantically equivalent. An equivalence mapping m between ‘Physician’ and ‘Doctor’ could then look like:

m = hhttp://purl.bioontology.org/ontology/SNOMEDCT/309343006,http://purl.bioontology. org/ontology/LNC/LA18968-0, ≡i

The English word ‘doctor’ is a homonym that can refer to either a medical practitioner or an academic degree for someone who has obtained a doctorate. This example shows that context of the domain is important for ontology matching. That is, whether or not the ontologies describe the same domain (e.g. medicine).

2.3.1 Matching biomedical ontologies

A work from 2011 [27] studied the state of biomedical ontologies and concluded that half of the ontologies in BioPortal fit the manageable OWL 2 EL profile (e.g. SNOMED CT). Classes in biomedical ontologies

(16)

16 Matching techniques and state-of-the-art systems

are often described using labels and several synonyms, which makes that their vocabulary is important for ontology matching [28]. Hence, lexical matching is often the main approach for matching biomedical ontologies. Structural matching is less common [28]. Most state-of-the-art matching systems depend on lexical similarity to either discover or evaluate mappings [29].

It is important to know certain requirements before performing ontology matching (description of the matching problem), as described by [26]. Such requirements involve:

• The type of input that is available for the matching system. E.g. if input is available as XML, or RDF/OWL, or in another format. The matching systems need to support the type of input that is available.

• If the matching systems need to be automatic or can receive feedback from the user, if the systems’ mappings need to be always correct, if the system needs to find all possible mappings, and if there is a limit in run time for the system.

• How the output alignment is going to be used.

2.4 Matching techniques and state-of-the-art systems

2.4.1 Classification of matching techniques

Ontology matching techniques can be classified using a classification model, which organizes techniques based on how they interpret the input [26]. Figure 2.3 shows an adapted version of this classification model. This model can be used to classify the matching systems based on their granularity and input interpretation. The granularity of a matching system can be defined by two levels: element-level and structure-level. Element-level matching techniques focus on a class without considering its relations to other classes (e.g. comparing the label strings using some similarity measure), structure-level matching techniques focus on a class within the structure of the ontology (e.g. super- or subclass similarities, path distance in the hierarchy, or property similarity). At each level, the model makes a distinction between semantic and syntactic matching techniques. Syntactic matching techniques use only the information of a class without interpretation, for example, the textual label or synonyms. Semantic matching techniques add meaning to the structural information (e.g. using a reasoner or external resources). Then, the input of a matching system can be interpreted using nine different techniques. Formal resource-based (e.g. an upper-level ontology), informal resource-based (e.g. external knowledge from encyclopedias), string-based (e.g. some string similarity metric), language-based (e.g. usage of a lexicon), constraint-based (e.g. comparing types, attributes, or the cardinality of two classes), taxonomy-based (e.g. if class A is a subclass of B, then their neighbors may also be similar), graph-based (e.g. comparing the depth of a class in the graph), instance-based (e.g. formal concept analysis: creating a concept hierarchy from a group of classes and their properties), model-based (e.g. reasoning techniques using description logic).

2.4.2 Matching systems

A matching system can be described using three dimensions [26]: input, process, and output. The input dimensions refer to which information a matching system uses. For example, which information is extracted from the ontology and if the system uses any external resources. The process dimensions include the algorithms and techniques that are used (the basic matching techniques), and how the system interprets the input (semantically and syntactically). The output dimensions regard the type of mappings that the system produces, the relationship(s) that are exploited, if the system matches classes one-to-one/one-to-many/many-to-many, and how mappings are delivered (with a confidence value, a probability, or binary as true/false). Several basic matching techniques together form a matching system. Matching systems often combine several matching techniques in order to be effective [26]. Table 2.1

(17)

Figure 2.3. Matching techniques classifications. Adapted from [26]. Be-low each category an example of a possible implementation is given.

shows an overview, based on the classification of matching approaches, of three state-of-the-art matching systems used in this study. These three systems will be explained in more detail down below.

A literature review from 2014 [16] included 694 articles about ontology matching, of which 302 were related to the development of matching systems and the enhancement of existing systems. Accordingly, many matching systems have been developed over the years. Some matching systems implement several basic matching techniques and offer customization of the matching process [30,31]. Others use a two-step approach and use the output of an element-level matcher as the input for a structural-level matcher [32]. Some systems focus on automating the aggregation of different matching techniques, as the selection of effective techniques is known as a major problem in ontology matching [33]. Machine learning techniques are also used, such as neural networks [34], and word embedding combined with a random forest classifier [35]. Moreover, the authors of [35] argue that standard machine learning approaches fail, because 1) feature engineering fails due to features that cannot be generalized, and 2) supervised learning suffers a class imbalance problem (the number of true mappings between two ontologies is much smaller than the number of all possible mappings).

Automatically matching large ontologies raised the need for ontology alignment repair, since alignments can be logically incoherent [36]. For example, two classes that cannot share an instance (disjointness). Moreover, the authors of [36] state that it is impossible to confirm the correctness of mappings automatically. Some matching systems include mapping repair algorithms [37].

AgreementMakerLight 2.0

AgreementMakerLight 2.0 (AML) matches a pair of ontologies (target and source) based on the following steps [30,31,38]. AML uses only element-level matching techniques.

1. Ontology loading: the creation of a lexicon data structure that includes the names of all classes, their labels, and synonyms. This lexicon also includes a weight system that can be used by the matching algorithms.

2. Primary matching: lexical matcher (literal name matches), mediating matcher (same as the lexical matcher, but matches each input ontology to an external ontology first, using the Human Disease Ontology, and the Uber-anatomy ontology), word matcher (word-based string similarity algorithm, using a weighted Jaccard index between the words of the class names).

(18)

Table 2.1. Classification of three state-of-the-art matching systems: AgreementMakerLight 2.0 [38], FCA-Map [39], and LogMap 2.0 [40].

AgreementMakerLight 2.0 FCA-Map LogMap 2.0 Element level

Semantic: Formal resource-based X -

-Syntactic: Informal resource-based - -

-Syntactic: String-based X X X Syntactic: Language-based X X X Syntactic: Constraint-based - X -Structure level Semantic: Model-based - X X Syntactic: Instance-based - X -Syntactic: Graph-based - - X Syntactic: Taxonomy-based - - X

(all-to-all string similarity algorithm, using the ISub similarity metric).

4. Selection and repair: combines the output of all matchers, in the case of duplicate mappings it discards the mapping with the lowest similarity score. It makes sure that each class appears in at most one mapping.

FCA-Map

FCA-Map matches a pair of ontologies based on the following steps [39].

1. Token-based formal context: uses the names, labels, and synonyms of the classes and performs normalization. Synonyms are retrieved using the UMLS Sub-Term Mapping Tools, lexical variations from the UMLS Lexicon. Then, two types of initial mappings are created.

(a) Exact match (e.g. SNOMED:muscle, ORDO:muscle).

(b) Partial match (e.g. SNOMED:left lung disease, ORDO:right lung disease).

2. Relation-based formal context: the initial mapping set from step 1 are validated with structural relations. It considers positive evidence (when a class of one mapping shares a relationship with a class in another mapping), and negative evidence (disjointness relationships between two classes).

(a) Conflicts between mappings are repaired by removing all conflicted mappings based on negative evidence.

(b) Mappings with no structural evidence are considered to be structurally isolated or incorrect. All partial matches from step 1 without positive evidence are removed.

3. Positive relation-based formal context: new structural mappings are identified based on the already discovered mappings.

(a) One-to-one structural mappings, e.g. ‘humerus bone’ - ‘upper arm bone’.

(b) One-to-many (e.g. ‘ear’ - ‘left ear’, ‘right ear’) and many-to-many mappings (vertebra 1,2,3,4,5 - L1,L2,L3,L4,L5 vertebra).

4. Property-based formal context: only the structural information of the class properties is used to match properties between the source and target ontology.

(19)

5. Restriction-based formal context: based on the property mappings an ‘anonymous ancestors’ formal concept analysis is created. For example: if SNOMED:Hemangioma of liver and NCIt:Hepatic Hemangioma are matched, and SNOMED CT has the property ‘finding site → blood vessel structure’ and NCIt ‘has associated anatomic site → blood vessel’, then the classes SNOMED:blood vessel structure and NCIt:blood vessel are anonymous ancestors of the original classes.

LogMap 2.0

LogMap 2.0 (LogMap) matches a pair of ontologies based on the following steps [37, 40].

1. Overlapping estimation: in this step, an overestimation of pairs of concepts based on their labels is made, based on lexical similarity. Then, a module is extracted from each input ontology based on this set of overestimated mappings. This should reduce the number of classes that need to be matched.

2. Lexical indexation: indexation of the labels of the two extracted modules. Performs tokenization and uses the UMLS Lexicon to find synonyms and lexical variations.

3. Computation of candidate mappings: the lexical index from each module is intersected to get a set of initial mappings. Then, two groups are created using the ISub string similarity metric on the neighbors of the initial mappings. If the neighbors of the classes in a mapping are similar, then the mapping is probably correct.

(a) Fixed mappings, mappings that have similar neighbors (e.g. ‘stenosis’ and ‘stenosis’ that are both classified as ‘disease’) and are therefore probably correct.

(b) Active mappings, mappings that probably need expert curation (e.g. ‘stenosis’ and ‘stenosis’ where the first is a ‘disease’ and the second a ‘body structure’).

4. Mapping repair: mappings in the two groups (fixed and active) are represented in Horn propo-sitional logic. A reason algorithm (Dowling-Gallier) is used to detect unsatisfiable classes in the ‘active’ group. A diagnosis algorithm then deletes those unsatisfiable active mappings. Leftover

active mappings are finally included as fixed mappings.

5. Structural indexation: the inferred versions of the ontologies are used for structural indexation. Two direct acyclic graphs (DAGs) are created.

(a) Descendants DAG (subclass relationships) (b) Ancestors DAG (superclass relationships)

Each class is represented as a node in the DAG and has information about the location of the class in the graph.

6. Conflict detection using the structural index: mappings that are disjoint are removed from the active mappings set.

7. (Optional: remaining active mappings can be presented to the user for evaluation, otherwise the matching process is finalized automatically using heuristics.)

2.4.3 Evaluation

After obtaining an alignment between two ontologies, the mappings should be evaluated. This evaluation can be done manually (which is often done by domain experts) or automatically. The results of an evaluation is often expressed using measures such as precision and recall, which provide information about the correctness and completeness of an alignment [26]. The evaluation of ontology matching systems is being coordinated internationally by the Ontology Alignment Evaluation Initiative (OAEI) since 2004 [41]. The OAEI facilitates the comparison of the performance of different matching systems.

(20)

Automatic evaluation of an ontology matching system can be done using a reference alignment that could be identified as [42]:

• Gold standard (complete alignments that are created manually by domain experts) • Silver standard (alignments that are not necessarily complete or correct)

• Baseline (alignments that are highly incomplete)

Determining whether the evaluation results of an alignment and/or its mappings are good or bad, depends on the situation in which the alignment will be used. For example, if some application requires that mappings are 100% correct, precision would be more important than recall. On the other hand, if it is required that all mappings are retrieved, recall would be more important instead. Moreover, apart from the alignment itself, other factors might also be considered. Such as the run time of the matching system, the required computational resources, and if the system is fully automatic or needs intervention from the user.

(21)

Chapter 3

Methods

An experimental study was performed for measuring the performance of ontology matching systems and techniques using two types of reference alignments, including an analysis of the top-level hierarchies of classes in the mappings. The matching systems were also applied to a practical use case in the rare disease domain. The following steps were carried out in chronological order: selecting rare disease data elements, selecting relevant biomedical ontologies, module extraction, selecting available ontology matching systems, calculating alignments between ontology pairs, and a two-part evaluation of the alignments that were obtained. Part one of the evaluation was done by measuring the performance using two types of reference alignments, the second part consisted of a top-level hierarchy analysis and application of the acquired alignments to a practical use case. Figure 3.1 shows a visual overview of these steps. All development work was done using Java 8, data analysis was done in R version 4.0.1 [43].

Figure 3.1. Visual overview of the performed steps. Highlighted blocks rep-resent tools that were used from BioPortal (blue) or the UMLS (orange).

(22)

22 Ontologies and matching systems

3.1 Ontologies and matching systems

3.1.1 Selection of ontologies

The ontologies used for this study needed to be appropriate for usage in the rare disease domain. This implies that their content should be useful for annotating datasets in this domain. Therefore, a set of rare disease data items was used as input for the BioPortal Recommender [44]. This is a tool that receives free-text or keywords as input and outputs a list of recommended ontologies based on annotations of the input text. The BioPortal Recommender uses a ranking algorithm that takes into account the coverage (which ontology has the best coverage after annotating the textual input), the acceptance (based on the number of visits on BioPortal and the presence of the ontology in the UMLS), the detail of knowledge (how many synonyms, definitions, and properties do the annotations provide), and lastly the specialization (using the position of the annotations in the hierarchy of the ontology). For retrieving the annotations based on the input, the Recommender uses the BioPortal Annotator [45]. The input text was extracted from the set of common data elements for rare disease registries (the element and coding names) [46], and the classifications of rare diseases from Orphanet (all categories and one random disease per category) [47]. Items, i.e. one or multiple words, could have multiple annotations. The Recommender was run using the default configuration and the first two ontologies were selected from the list (SNOMED CT (International Edition release 26-02-2020), NCIt (version 20.02d) [48]). ORDO (version 2.9.1) was then

added as a third ontology as it specifically targets the rare disease domain.

3.1.2 Module extraction

After selecting the input ontologies, the module extractor included in the OWL API, as explained in Subsection 2.2.1, was used for creating modules based on the original ontologies. The module extractor takes the whole ontology and a seed signature as input, and outputs a module based on the signature. The modularization of the ontologies firstly served to obtain manageable chunks of data in terms of available computational resources, and secondly to perform the experiments with parts of the ontologies based on the rare disease data items described above. Each seed signature, one per ontology, contained the annotations of the data elements described in Subsection 3.1.1, as returned by the BioPortal Annotator. To make sure the modules included the entire top-level hierarchy of the original ontology, the seed signature also included all ancestors of those annotated classes.

3.1.3 Selection of matching systems

This study aimed to include existing ontology matching systems that had a proven history of matching biomedical ontologies. Hence, the matching systems were chosen from the list of participants of the OAEI disease and phenotype track [42]. This track focuses on using disease and phenotype ontologies for practical use cases, such as data integration. The matching systems were chosen based on if they were actively developed, published with an open license, and had available source code. Following those criteria, three matching systems were selected: AgreementMakerLight 2.0 (AML) [38], FCA-Map [39], and LogMap 2.0 (LogMap) [40]. A description of each system and the underlying matching techniques is given in Section 2.4.2.

3.2 Alignments

All matching systems were run with their default configuration and no changes were made to the systems’ parameters. The output from the matching systems, the alignments, were saved in the general Alignment format of the Alignment API [49]. Each run was assigned 64GB of RAM. The matching systems did not need any user-input during the matching process, i.e. they provided automated ontology matching. All

(23)

Chapter 3. Methods 23

possible ontology pairs were used as input: ORDO-SNOMED CT, NCIt-ORDO, NCIt-SNOMED CT (matching A to B is equivalent to matching B to A). All output alignments contained pairwise equivalence mappings and included the URI of each class. Figure 3.2 shows an example of a mapping between NCIt and ORDO.

Figure 3.2. Example mapping of the class Polyplidy between NCIt and ORDO. Shown are a chunk of the RDF output from the alignment and a visual representation of the mapping.

3.3 Evaluation

3.3.1 Reference alignments

The alignments produced by the matching systems were evaluated using two reference alignments, based on mappings from BioPortal and the UMLS Metathesaurus [50]. Both were chosen because they are used as reference alignments in the OAEI disease and phenotype track and large BioMed track respectively. The BioPortal mappings for ORDO, NCIt, and SNOMED CT are lexical mappings based on the Lexical OWL Ontology Matcher (LOOM). LOOM is a simple string matching algorithm that compares the preferred names and synonyms of classes in both ontologies. BioPortal mappings were retrieved using the BioPortal API.

Unlike the matching systems and BioPortal, the UMLS does not provide pairwise mappings. Instead, the UMLS groups all classes from all included ontologies with the same meaning. Classes with the same meaning share a code: the concept unique identifier (CUI). Moreover, a single class can have multiple CUI codes. The UMLS reference alignment was extracted from a subset of the UMLS Metathesaurus (version 2020AA), which was done earlier by [51]. This subset was obtained using the MetamorphoSys tool of the UMLS by retrieving the MRCONSO.RRF file. Then, the subset was installed locally using MySQL Community Server version 5.6.48. The MRCONSO subset contains every concept of the UMLS and specifies the CUI, its language, its code in the source ontology, the term status, and other details that are not relevant for extracting the reference alignment. Pairwise mappings were retrieved by first getting all available CUIs for every class in each ontology. Then, all classes of ontology A and B that had at least one correspondent CUI were included as a mapping in the reference alignment. ORDO is not present in the UMLS but does include CUI code mappings as annotations in the ontology. Hence, ORDO CUIs were not retrieved from the UMLS but instead directly from the ontology itself.

Reference alignments for the modules were derived from their full-size counterparts. Namely, mappings containing classes which were not present in the module were deleted. Alignments were evaluated by classifying each mapping as true positive (TP, present in both the alignment as the reference alignment), false positive (FP, only present in the alignment) or false negative (FN, only present in the reference alignment). See Figure 3.3. True negatives were not considered in this study, as there was no gold standard available that contained 100% of all possible correct mappings.

(24)

24 Evaluation

Figure 3.3. Categories for evaluation with BioPortal and UMLS reference alignments, adapted from [26].

Three performance measures were calculated, precision (Equation 3.1), recall (Equation 3.2), and F-measure (F1-score) (Equation 3.3). Precision shows the proportion of mappings in the alignment that are classified as true positive. Recall shows the proportion of mappings in the reference alignment that are also present in the alignment. F-measure combines precision and recall by calculating their harmonic mean.

P recision = T P

T P + F P (3.1) Recall =

T P

T P + F N (3.2)

F-measure = 2 ×precision × recall

precision + recall (3.3)

3.3.2 Hierarchy analysis

For the use case of this study, equivalence mappings containing classes of the same type are of special interest. For example, a matching system should detect that two classes are both diseases. Each top-level hierarchy of an ontology contains classes that are of similar types, and every descendent of a top-level class shares the IS-A relationship with its ancestor(s). Biomedical ontologies have a high amount of information in their lexical labels, hence, ontology matching systems often primarily use lexical matching techniques [28]. However, a mapping containing classes that originate from different top-level hierarchies can be incorrect, even if the classes have labels that are lexically similar. For example, two classes each labeled as bone fracture where the first is part of the hierarchy clinical finding and the second of body structure. Therefore, the content of the alignments was analyzed by comparing the top-level hierarchies of matched classes. Mappings between the top-level hierarchies of NCIt-ORDO, NCIt-SNOMED CT, and ORDO-SNOMED CT were created manually. This manual creation of top-level hierarchy mappings was done by inspecting the top-level hierarchies of true positive mappings for the UMLS and UMLS + BioPortal. The class descriptions were also used to determine whether or not two top-level classes should be matched. True positives for only BioPortal were not utilized as they were known before to be highly incomplete [42]. Selected mappings for the top-level hierarchies were considered to be semantically equivalent. Figure 3.4 shows an example of how such a manual mapping was created.

False positive mappings (not in the UMLS nor BioPortal) were marked as incorrect if their top-level hierarchy classes were not present in the set of manually created mappings. The precision and F-measure were recalculated after discarding such incorrect mappings from the alignments.

(25)

Chapter 3. Methods 25

Figure 3.4. Example of manually created top-level hierarchy mappings. The four classes from NCIt and SNOMED CT were matched as equivalent by the matching system, all four mappings were present in the refer-ence alignment of the UMLS and BioPortal (true positive). Analyzing the top-level hierarchies revealed that all NCIt classes were descendants of Anatomic Structure, System, or Substance and all SNOMED CT classes of Body structure. A manual mapping between those top-level classes was then created for NCIt-SNOMED CT.

3.3.3 Rare disease use case

Finally, the aforementioned experiments were applied to a use case from the VASCA registry (Section 1.2.1). Alignments for the ISSVA ontology to ORDO, NCIt, and SNOMED CT were created using AML, LogMap, and FCA-Map. Then, the previously created alignments and their evaluations between ORDO, NCIT, and SNOMED CT were used to specify mappings in the ISSVA-ORDO alignments as true mappings. This practical evaluation served two purposes.

1. Determine the overlap between the ISSVA ontology and ORDO/SNOMED CT/NCIt, and evaluate based on the prior matching and evaluation between ORDO and NCIt + SNOMED CT which mappings would be marked as true positives. Figure 3.5 shows an example of this process, which is based on transitivity.

2. The top-level hierarchies of the ISSVA mappings in ORDO, NCIt, and SNOMED CT can be used by VASCA to evaluate the structure of their ontology and hence improve future iterations.

When the classes a, b, c are considered to be in a set of classes S, with the equivalence relationship R: a, b, c ∈ S, if (a, b) ∈ R and (a, c) ∈ R, then (b, c) ∈ R (Figure 3.5).

Figure 3.5. Example evaluation of an ISSVA-ORDO mapping, using transitivity. Class A is mapped to B in ORDO and C in SNOMED CT (by either one or more matching systems). If a true

(26)

(27)

Chapter 4

Results

4.1 Rare disease data elements and modules

A total of 117 data items were extracted from the set of common data elements for rare disease registries and the Orphanet rare disease classifications (full list included in Appendix A). The BioPortal Recommender annotated 42% of the input with ORDO classes, 52% with SNOMED CT classes, and 65% with classes from NCIt. The seed signatures contained 471 classes for SNOMED CT, 74 for ORDO, and 547 for NCIt. Table 4.1 shows the details of the extracted modules.

Table 4.1. Details of the ontologies and extracted modules.

ORDO (74 classes in seed signature) SNOMED CT (471 classes in seed signature) NCIt (547 classes in seed signature) Module type Star Bottom Top Star Bottom Top Star Bottom Top Classes (% of total) 299 (2%) 306 (2%) 13,436 (93%) 1.408 (0.4%) 1,410 (0.4%) 345,169 (98%) 1,014 (0.7%) 1,014 (0.7%) 156,172 (100%) Axioms (% of total) 2,227 (0.9%) 2,288 (1%) 216,295 (92%) 7,105 (0.4%) 7,146 (0.4%) 1,597,246 (98%) 19,017 (0.7%) 19,017 (0.7%) 2,542,770 (99.9%) Object properties (% of total) 7 (39%) 8 (44%) 17 (94%) 16 (13%) 18 (15%) 93 (78%) 40 (41%) 40 (41%) 97 (100%) Total classes 14,502 352,449 156,172

Total axioms 234,982 1,629,354 2,543,710 Total object properties 18 120 97

For each ontology, its star and bottom module had the same size. The star and bottom modules contained between 0.4-2% of the total amount of classes and axioms of the original ontologies. The top modules were significantly larger and contained over 90 % of the classes and axioms of the original ontologies. The NCIt top module had the same size as the whole NCIt ontology.

4.2 Alignments

Alignments were created between the ontology pairs ORDO-SNOMED CT, ORDO, and NCIt-SNOMED CT. A total of 12 OWL files were used as input for each matching system, four files per ontology (three modules, and the whole ontology). Hence, 36 alignments were created (12 per matching system). All alignments contained mappings with an equivalence relationship. In terms of run time, AML and LogMap were the fastest (NCIt-SNOMED CT full alignment within a few hours), FCA-Map was slower (6-8 hours for NCIt-SNOMED CT). Table 4.2 shows the number of mappings per alignment, and the overlap of equivalent mappings between the star/bottom modules and the top module/whole ontology.

The star and bottom module had a similar amount of mappings, as does the top module in comparison with the whole ontology. Furthermore, the overlap between all similar-sized alignments, i.e. star/bottom and top/whole ontology, had an overall overlap of almost 100%. Therefore, only results from the star modules (being the smallest modules) and the whole ontologies will be presented from now onwards. The full list of results for the whole ontologies and star modules are included in Appendix B.

(28)

28 Evaluation

Table 4.2. Details of the created alignments. The alignments for the star modules/bot-tom modules and top modules/whole ontologies are shown together. The percent-age of equal mappings (overlap, percentpercent-age of the smallest alignment that is present

in the other) between the star/bottom and top/whole alignments is also included.

Star / bottom AgreementMakerLight 2.0 FCA-Map LogMap 2.0

Mappings (star) Mappings (bottom) Overlap Mappings (star) Mappings (bottom) Overlap Mappings (star) Mappings (bottom) Overlap ORDO - SNOMED CT 42 42 100% 46 45 100% 53 53 100% NCIt - ORDO 36 36 100% 47 47 100% 31 31 100% NCIt - SNOMED CT 193 194 99% 220 225 100% 214 215 100% Top / whole AgreementMakerLight 2.0 FCA-Map LogMap 2.0

Mappings (top) Mappings (whole) Overlap Mappings (top) Mappings (whole) Overlap Mappings (top) Mappings (whole) Overlap ORDO - SNOMED CT 6,373 6,463 99% 4,784 4,973 99% 3,854 5,742 92% NCIt - ORDO 2,544 2,543 99% 4,556 4,663 98% 2,717 2,679 95% NCIt - SNOMED CT 18,481 18,887 99% 26,078 26,630 99% 23,444 23,885 99%

4.3 Evaluation

4.3.1 Reference alignments: BioPortal and UMLS

Table 4.3 shows the sizes of the reference alignments that were extracted from BioPortal and the UMLS. All reference alignments from the UMLS contained more mappings than the ones from BioPortal. The overlap between the NCIt-ORDO and NCIt-SNOMED CT reference alignments was the largest with a weighted overlap of 45% and 57% respectively (whole ontologies). The lowest overlap was 14% (star modules) and 25% (whole ontologies) for the ORDO-SNOMED CT pair.

Table 4.3. Reference alignments size, BioPortal and UMLS. The overlap and harmonic mean of the overlap between the alignments is also shown.

Ontology pair Ontology type Mappings UMLS Mappings BioPortal Overlap Harmonic mean overlap

ORDO-SNOMED CT Star module 35 7 3 14%

NCIt-ORDO Star module 27 18 12 53%

NCIt-SNOMED CT Star module 127 90 56 52%

ORDO-SNOMED CT Whole ontology 3,861 1,750 776 28%

NCIt-ORDO Whole ontology 1,484 1,450 656 45%

NCIt-SNOMED CT Whole ontology 19,309 16,290 10,195 57%

4.3.2 Evaluation results

Table 4.4 shows the evaluation results for the whole ontologies. For all ontology pairs, the recall with regard to BioPortal is higher than the recall with regard to the UMLS. The precision with regard to the UMLS is higher for ORDO-SNOMED CT (0.45) than that of BioPortal (0.28). The opposite is true for NCIt-ORDO and NCIt-SNOMED CT. AML had the highest F1-score for BioPortal (0.66), all matching systems had an overall higher F1-score for BioPortal than the UMLS. LogMap had a higher precision for the UMLS (0.47) than for BioPortal (0.45). AML had a higher precision for BioPortal (0.54) than the UMLS (0.47). NCIt-SNOMED CT had the highest recall and precision among all ontology pairs and systems.

Table 4.5 shows the results for using the star modularization algorithm. The recall for all ontology pairs was higher for BioPortal than for the UMLS, which corresponds to the results of the whole ontologies. The UMLS precision was higher than the BioPortal precision for all ontology pairs. Overall, all matching systems had a higher F1-score for the UMLS than for BioPortal.

(29)

Chapter 4. Results 29

Table 4.4. Evaluation results for using the whole ontologies, mean precision/recall/F1-score for both the UMLS and BioPortal. The scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching systems indicate the mean of all ontology pairs.

Pair or matching system Precision UMLS Precision BioPortal Recall UMLS Recall BioPortal F1-score UMLS F1-score BioPortal

ORDO - SNOMED CT 0.45 0.28 0.66 0.89 0.53 0.42 NCIt - ORDO 0.33 0.44 0.67 0.91 0.43 0.58 NCIt - SNOMED CT 0.55 0.67 0.66 0.94 0.60 0.78 AgreementMakerLight 2.0 0.47 0.54 0.66 0.96 0.55 0.66 FCA-Map 0.39 0.39 0.64 0.90 0.46 0.53 LogMap 2.0 0.47 0.45 0.69 0.88 0.55 0.58

Table 4.5. Evaluation results using the star modularization algorithm, mean precision/recall/F1-score for both the UMLS and BioPortal. The precision/recall/F1-scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching systems indicate the mean of all ontology pairs.

Pair or matching system Precision UMLS Precision BioPortal Recall UMLS Recall BioPortal F1-score UMLS F1-score BioPortal

ORDO - SNOMED CT 0.45 0.14 0.60 0.95 0.51 0.25 NCIt - ORDO 0.49 0.47 0.67 0.96 0.56 0.62 NCIt - SNOMED CT 0.51 0.42 0.84 0.98 0.64 0.59 AgreementMakerLight 2.0 0.49 0.37 0.67 0.97 0.57 0.51 FCA-Map 0.42 0.31 0.68 1.00 0.52 0.46 LogMap 2.0 0.53 0.35 0.77 0.92 0.62 0.49

4.3.3 Hierarchy analysis

The manual top-level hierarchy mappings are shown in Table 4.6. A total of three mappings were created for ORDO-SNOMED CT, six mappings for NCIt-ORDO, and 13 mappings for NCIt-SNOMED CT. Table 4.7 shows the results of the analysis. On average 10% (whole ontologies) of the mappings in an alignment had classes which top-level hierarchy was not present in the manual top-level mappings set. Mappings that were true positive for either BioPortal and/or the UMLS were not discarded from the alignments. Hence, on average 4.6% of the mappings in the alignments contained classes of which the top-level hierarchy was not present in the manual mappings set and were false positive. The star module alignments had an average of 19% of incorrect hierarchy mappings, and 8.7% of the mappings in the alignments were false positives with an incorrect top-level hierarchy.

The results of the whole ontologies for recalculating the precision and F1-score, after discarding false positive mappings with an incorrect top-level hierarchy, are shown in Table 4.8. Precision and F1-score values rose between 0.01 and 0.05, for all ontology pairs and matching systems. Except FCA-Map, for which the BioPortal precision increased from 0.39 to 0.45 (+0.06). Table 4.9 shows the new precision and F1-scores for the star modules. The scores rose between 0 and 0.06 points overall, except the NCIt-SNOMED CT precision and F1-score for BioPortal (+0.29 and +0.22 respectively).

(30)

30 Evaluation

Table 4.6. Manual top-level hierarchy mappings.

ORDO SNOMED CT

clinical entity Clinical finding (finding) genetic material Substance (substance)

geography Environment or geographical location (environment / location)

NCIt ORDO

Disease, Disorder or Finding clinical entity Gene Product genetic material Conceptual Entity geography Conceptual Entity inheritance Property or Attribute age of onset

NCIt SNOMED CT

Structure, System, or Substance Body structure (body structure) Disorder or Finding Clinical finding (finding) Property or Attribute Qualifier value (qualifier value) Anatomic Structure, System, or Substance Substance (substance)

Activity Procedure (procedure)

Organism Organism (organism)

Drug, Food, Chemical or Biomedical Material Substance (substance)

Drug, Food, Chemical or Biomedical Material Pharmaceutical / biologic product (product) Manufactured Object Physical object (physical object)

Property or Attribute Observable entity (observable entity)

Conceptual Entity Environment or geographical location (environment / location) Conceptual Entity Social context (social concept)

Conceptual Entity Observable entity (observable entity)

Table 4.7. Hierarchy analysis results. For each system and ontology pair the amount of map-pings is shown that contain classes of which their top-level hierarchy is not included as man-ual mapping (Table 4.6). The amount and percentage of false positives (FP) refers to the map-pings that were discarded from the alignment for recalculation of both the precision and F1-score.

Whole ontology Star module Matching system Ontology pair Incorrect hierarchy mappings

(of which FP)

Proportion of total alignment (FP)

Incorrect hierarchy mappings (of which FP) Proportion of total alignment (FP) AgreementMakerLight 2.0 ORDO-SNOMED CT 494 (318) 8% (5%) 9 (6) 21% (14%) FCA-Map ORDO-SNOMED CT 489 (310) 10% (6%) 11 (8) 24% (17%) LogMap 2.0 ORDO-SNOMED CT 193 (106) 3% (2%) 5 (3) 9% (6%) AgreementMakerLight 2.0 NCIt-SNOMED CT 3,055 (252) 16% (1%) 46 (13) 24% (7%) FCA-Map NCIt-SNOMED CT 6,868 (3,299) 26% (12%) 60 (23) 27% (10%) LogMap 2.0 NCIt-SNOMED CT 3,790 (1,180) 16% (5%) 42 (9) 20% (4%) AgreementMakerLight 2.0 NCIt-ORDO 127 (102) 5% (4%) 4 (1) 11% (3%) FCA-Map NCIt-ORDO 1,229 (1,170) 3% (3%) 12 (8) 26% (17%) LogMap 2.0 NCIt-ORDO 130 (92) 5% (3%) 3 (0) 10% (0%)

Table 4.8. Evaluation results using the whole ontologies after removing false positive mappings with an incorrect top-level hierarchy, mean precison/F1-score for both the UMLS and BioPortal. The scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching systems indicate the mean of all ontology pairs. Recall has not changed and is therefore not shown.

Pair or matching system Precision UMLS Precision BioPortal F1-score UMLS F1-score BioPortal ORDO - SNOMED CT 0.47 (+0.02) 0.29 (+0.01) 0.55 (+0.02) 0.44 (+0.02) NCIt - ORDO 0.36 (+0.03) 0.48 (+0.04) 0.46 (+0.03) 0.62 (+0.04) NCIt - SNOMED CT 0.59 (+0.04) 0.71 (+0.04) 0.62 (+0.02) 0.81 (+0.03) AgreementMakerLight 2.0 0.49 (+0.02) 0.56 (+0.02) 0.56 (+0.01) 0.68 (+0.02) FCA-Map 0.44 (+0.05) 0.45 (+0.06) 0.51 (+0.05) 0.59 (+0.06) LogMap 2.0 0.48 (+0.01) 0.47 (+0.02) 0.56 (+0.01) 0.60 (+0.02)

(31)

Chapter 4. Results 31

Table 4.9. Evaluation results star modules after removing false positive mappings with an incor-rect top-level hierarchy, mean precision/F1-score for both the UMLS and BioPortal. The scores for the ontology pairs indicate the mean of all matching systems, the scores for the matching sys-tems indicate the mean of all ontology pairs. Recall has not changed and is therefore not mentioned.

Pair or matching system Precision UMLS Precision BioPortal F1-score UMLS F1-score BioPortal ORDO - SNOMED CT 0.51 (+0.06) 0.16 (+0.02) 0.55 (+0.04) 0.28 (+0.03) NCIt - ORDO 0.52 (+0.03) 0.50 (+0.03) 0.58 (+0.02) 0.65 (+0.02) NCIt - SNOMED CT 0.59 (+0.08) 0.71 (+0.29) 0.62 (+0.02) 0.81 (+0.22) AgreementMakerLight 2.0 0.54 (+0.05) 0.39 (+0.02) 0.59 (+0.02) 0.52 (+0.01) FCA-Map 0.49 (+0.07) 0.37 (+0.06) 0.57 (+0.05) 0.52 (+0.06) LogMap 2.0 0.55 (+0.02) 0.36 (+0.01) 0.64 (+0.02) 0.49 (+0.00)

4.3.4 Rare disease use case

Table 4.10 shows the results of aligning the ISSVA ontology to ORDO, NCIt, and SNOMED CT. The total coverage, i.e. the percentage of ISSVA classes that has been mapped to one or more classes in the other ontology, is 44% for ORDO, 35% for NCIt, and 36% for SNOMED CT. These percentages represent the union of the alignments of all matching systems.

Table 4.10. Alignments by AgreementMakerLight 2.0, FCA-Map, and LogMap 2.0 for the ISSVA ontology to ORDO, NCIt, and SNOMED CT. Shown are the number of mappings in each align-ment, the number of unique ISSVA classes in the alignalign-ment, and the percentage of the total amount

of ISSVA classes that is present in the alignment. The total coverage is the number of unique ISSVA classes over all systems divided by the total number of classes in the ISSVA ontology.

AgreementMakerLight 2.0 FCA-Map LogMap 2.0

Ontology pair ISSVA coverage Total mappings (unique) ISSVA coverage Total mappings (unique) ISSVA coverage Total mappings

(unique) Total coverage

ISSVA - ORDO 23% 43 (40) 35% 77 (61) 15% 26 (26) 44% (75/172)

ISSVA - NCIt 19% 37 (33) 30% 58 (52) 22% 37 (37) 35% (61/172)

ISSVA - SNOMED CT 23% 52 (40) 31% 64 (53) 24% 45 (42) 36% (62/172)

Table 4.11 shows the precision per matching system for the ISSVA-ORDO and ISSVA-NCIt alignments. The ISSVA-ORDO alignment had a mean precision of 0.19 for its evaluation with the NCIt-ORDO alignment, meaning that on average 19% of the mappings were flagged as true positive. The ISSVA-NCIt alignment had the same mean precision. The ISSVA-ORDO alignments contained 48 mappings on average, whereas the ISSVA-NCIt alignments contained 44 mappings. Between 10 and 13 mappings found in the ISSVA-ORDO or ISSVA-NCIt alignments were also present in the NCIt-ORDO alignment.

Table 4.11. ISSVA-ORDO and ISSVA-NCIt results (NCIt-ORDO as reference alignment).

Matching system Precision ISSVA-ORDO (true positive mappings/total)

Precision ISSVA-NCIt

(true positive mappings/total) False positive

True positive

UMLS and/or BioPortal Correct hierarchy AgreementMakerLight 2.0 0.19 (8/43) 0.22 (8/37) 2 out of 10 8 out of 10 10 out of 10 FCA-Map 0.12 (9/77) 0.16 (9/58) 4 out of 13 9 out of 13 13 out of 13 LogMap 2.0 0.27 (7/26) 0.19 (7/37) 2 out of 11 7 out of 11 11 out of 11

Mean precision: 0.19 0.19

Table 4.12 shows the results for the ISSVA-ORDO and ISSVA-SNOMED CT alignments, evaluated using the ORDO-SNOMED CT alignment. The ISSVA-ORDO alignment had a mean precision of 0.14, the ISSVA-SNOMED CT alignment a mean precision of 0.11. The ISSVA-ORDO and ISSVA-SNOMED CT alignments contained 49 and 54 mappings on average respectively. Between 10 and 13 mappings were also present in the ORDO-SNOMED CT alignment.

Table 4.13 shows the results for the ISSVA-NCIt and ISSVA-SNOMED CT alignments, evaluated using the NCIt-SNOMED CT alignment. The ISSVA-NCIt alignment had a precision of 0.25, the

(32)

32 Evaluation

Table 4.12. ISSVA-ORDO and ISSVA-SNOMED CT (ORDO-SNOMED CT as reference alignment).

Matching system Precision ISSVA-ORDO (true positive mappings/total)

Precision ISSVA-SNOMED CT

True positive

Mean precision: 0.14 0.11

ISSVA-SNOMED CT alignment had a precision of 0.33. On average, the alignments contained 44 and 54 mappings respectively. Between 13 and 21 mappings were present in the NCIt-SNOMED CT alignment.

Table 4.13. ISSVA-NCIt and ISSVA-SNOMED CT (NCIt-SNOMED CT as reference alignment).

Matching system Precision ISSVA-NCIt (true positive mappings/total)

Precision ISSVA-SNOMED CT

True positive

Ontology Matching in Practice: Facilitating Ontology Alignments for Interoperable Research Data

Ontology Matching in Practice:

Facilitating Ontology Alignments for

Interoperable Research Data

Philip van Damme

Master’s thesis

Master of Medical Informatics

Ontology Matching in Practice: Facilitating Ontology

Alignments for Interoperable Research Data

Abstract

Samenvatting

Contents

Chapter 1

Introduction

1.1 Context of this study

1.1.1 Reusable research data

1.1.2 Ontologies and ontology matching

1.1.3 Combining FAIR data

1.2 Problem statement

1.2.1 Use case: a classification for vascular anomalies

1.3 Objective and research questions

1.4 Outline

Chapter 2

Background

2.1 The Semantic Web and interoperability

2.2 Ontologies

2.2.1 Module extraction

2.3 Ontology matching

2.3.1 Matching biomedical ontologies

2.4 Matching techniques and state-of-the-art systems

2.4.1 Classification of matching techniques

2.4.2 Matching systems

2.4.3 Evaluation

Chapter 3

Methods

3.1 Ontologies and matching systems

3.1.1 Selection of ontologies

3.1.2 Module extraction

3.1.3 Selection of matching systems

3.2 Alignments

3.3 Evaluation

3.3.1 Reference alignments

3.3.2 Hierarchy analysis

3.3.3 Rare disease use case

Chapter 4

Results

4.1 Rare disease data elements and modules

4.2 Alignments

4.3 Evaluation

4.3.1 Reference alignments: BioPortal and UMLS

4.3.2 Evaluation results

4.3.3 Hierarchy analysis

4.3.4 Rare disease use case