Comparing information models to FAIRify
ICU quality registry data sources and
implementing the optimal model
A FAIRly NICE ThesisStudent
Rowdy de Groot
Email: Rowdy.degroot@amsterdamumc.nl Student number: 11036508
Title
Comparing information models to FAIRify ICU quality registry data sources and
implementing the optimal model
Supervision
Dr. Nirupama Benis (mentor)
Dr. Ferishta Raiez (mentor)
Dr. ir. Ronald Cornet (tutor)
Prof. dr. Nicolette de Keizer (formal mentor)
Location
Department of medical informatics Amsterdam UMC & NICE quality registry, Meibergdreef 15, 1105 AZ Amsterdam
Period
2
Index
Abstract ... 3 Introduction ... 4 Background ... 5 Aims ... 5 Methods ... 7Overview of the three models ... 7
SARI subset OMOP CDM representation ... 7
SARI subset ContSys representation ... 8
OMOP to ContSys mapping... 9
MDS in OMOP CDM representation ... 10
Implementing the MDS ETL ... 10
Results ... 11
Overview of the three models ... 11
SARI subset OMOP CDM representation ... 13
SARI subset ContSys representation ... 14
Comparison of time taken for mapping to both models ... 15
OMOP to ContSys mapping... 16
MDS in OMOP CDM representation ... 16
Implementing the MDS ETL ... 16
Discussion... 18
OMOP CDM and ContSys experience comparison ... 18
OMOP CDM and ContSys interoperability ... 19
Discussion of the findings in relation to existing literature ... 19
Recommendations ... 20
Strength and weaknesses ... 20
Future research questions ... 20
Conclusion ... 21
References ... 22
3
Abstract
Introduction
NICE (National Intensive Care Evaluation) is a Dutch quality registry that collects data on Intensive Care Unit (ICU) admissions from all the ICUs in the Netherlands. Databases are all likely to have varied data structures which could impede efficient data analysis. Making the data FAIR (Findable, Accessible, Interoperable, Reusable) can help with this. The information model is an important component in the process of making data FAIR. Three commonly used models, Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), Clinical Data Interchange Standard Consortium (CDISC) Study Data Tabulation Model (SDTM) and International Organization for Standardizations ISO13940:2015 or ContSys, were examined to find the most suitable model for ICU quality registries and individual ICUs.
Methods
Based on a literature search the three models were compared for certain characteristics and an overlap of domains. Based on the comparison it was decided to map two modules of the NICE dataset to OMOP CDM and one module to ContSys. The OMOP representation was mapped to ContSys for the one overlapping module to determine the interoperability between the two models. One of the OMOP representations was implemented.
Results
The comparison of the models was done with characteristics found in the literature and some which were considered to be important in the context of ICUs. An overview of the overlap of domains between the three models showed what domains were alike. Of the two chosen models, OMOP CDM was the easier model to map. A guidebook, forum, and tools are supplied to help users with the OMOP CDM implementation. ContSys being a more conceptual model, lacks all of this. In OMOP CDM 94.6% of data values were mapped, in ContSys it was 93.5%. The results of the OMOP to ContSys mapping make it apparent that the two models are interoperable. One of the OMOP representations was implemented and a test was successfully performed.
Conclusion
The evaluation of the three information models makes apparent that OMOP CDM is the most usable for ICU quality registries. NICE is recommended to use the OMOP CDM information model to make the NICE data FAIR. ICUs could potentially benefit most of ContSys. However, ContSys could be problematic to make data FAIR due to the lack of implementation guidelines. OMOP and ContSys both could be considered for hospital data.
4
Introduction
More observational data is becoming available in the form of electronic health care data and insurance claims data. As a consequence, more observational research in outcomes and
(pharmaco)epidemiology is performed (1). All these databases, e.g. databases underlying Electronic Healthcare Records (EHR) in hospitals or databases of national quality registries, are likely to have varied data structures. This makes data analysis hard as observational studies have to deal with different table structures, missing data, free text and a lack of consistent data definitions (2, 3). The NICE (National Intensive Care Evaluation) registry is one such organization that collects observational data. NICE is a Dutch quality registry which collects data on Intensive Care Unit (ICU) admissions from all the ICUs in the Netherlands (4). NICE provides feedback reports, which ICUs can use to monitor and compare their performance with similar ICUs and national averages in order to improve quality of care. Data from the NICE registry can also be used by participating ICUs for clinical research (5). Analyses are done by or with supervision from the NICE team, to make sure that the data is correctly analyzed and interpreted. Other countries have similar ICU quality registries like NICE and there is a strong intention to conduct joint research. However, this is hampered due to the fact that every registry has its own dataset and data definitions.
With the increasing need to improve the infrastructure supporting the reuse of data, more attention is given to make data FAIR (Findable, Accessible, Interoperable, Reusable) (6). OHDSI (Observational Health Data Sciences and Informatics) is an initiative that invites data sources to become partners in a network to make their data sources FAIR. This enables the possibility to answer large multisite research and policy questions by reusing data (7).
The information model is an important component in the process of making data FAIR. It is a
representation of concepts, relationships, constraints, rules and operations which are used to specify data semantics in a certain domain (8). Information models clearly define the data items in the database and the relationships between them. Therefore, information models are useful for interoperability (the ‘I’ of FAIR) between databases.
NICE wants to make their data FAIR to support a wider use of the data, not only nationally for the participating Dutch ICUs but also internationally with other quality registries, and therefore NICE data should be represented using a common information model. Ideally there would be no central data collection by NICE and the ICUs would make their data FAIR, so that NICE can gather
information with a Personal Health Train (PHT) (9, 10) if NICE has permission to do so. Unfortunately, that seems not to be realistic on the short term and therefore NICE has to collect data in a central database.
Three commonly used information models are the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM), Clinical Data Interchange Standard Consortium (CDISC) Study Data Tabulation Model (SDTM) and International Organization for Standardizations ISO13940:2015 or ContSys (ISO13940 is ContSys) (11-13). The question is which model is most suitable to be used for ICU quality registries and the individual ICUs.
5
Background
OHDSI is an initiative that is heavily involved in leading and organizing the evolution and adoption of the OMOP CDM (14). The original focus for OMOP was drug safety, but over the years the OMOP CDM evolved into also supporting analytical use cases, including comparative effectiveness of medical interventions and health system policies (15). The OMOP CDM is a person centric relational model mostly suitable for observational data (2, 7). OMOP contains classes, classes contain domains and the domains contain columns. Domains and columns all have their own definitions. For example, OMOP has a class “Person” that contains the domain “Observation period” and that domain has a column “observation_period_id”. The definition for the domain “Observation period” in OMOP is “The OBSERVATION_PERIOD table contains records which uniquely define the spans of time for which a Person is at-risk to have clinical events recorded within the source systems, even if no events in fact are recorded (healthy patient with no healthcare interactions)” (16). The definition for the column “observation_period_id” is “A unique identifier for each observation period” (16). CDISC is a non-profit organization that develops data standards to transform unstructured data into a framework for generating clinical research data (17). Over the years, CDISC gained more interest from regulatory authorities such as the U.S. Food and Drug Administration and the Japanese
Pharmaceutical and Medical Devices Agency. One of the standards is SDTM which defines a standard structure for study data tabulations (18). CDISC SDTM is meant to support clinical trials and
therefore supports mostly clinical data (18). CDISC SDTM contains classes which contain domains, and the domains contain variables.
ISO13940 or ContSys is an information model made by the International Organization for
Standardization (ISO), which defines a system of concepts for different aspects of the provision of healthcare. The model has been designed to include professional healthcare, self-care, care by a third party and all aspects of social care, so the model was intentionally kept broad (19). ContSys is meant to support all (health)care related data due to its exceptionally broad scope. Because ContSys is a conceptual model it is not oriented for implementation, which is in contrast to OMOP CDM and CDISC SDTM. ContSys contain clauses, which contain concepts. Figure 1 shows the hierarchal structure and relationship between the models.
The NICE database contains many modules, like the Minimal Dataset (MDS), Sequential Organ Failure Assessment (SOFA) and Severe Acute Respiratory Infection (SARI). The MDS is the core dataset from NICE and is used to record the demographics, admission and discharge details,
physiological reasons for admission and severity of illnesses during the first 24 hours of IC admission, together with the outcome measures IC and hospital mortality and length of stay. The SARI dataset is used to report to the RIVM (Dutch National Institute for Public Health and the Environment). The goal of the SARI dataset is to monitor epidemics of SARIs and influenza-like illnesses. SOFA is a recording of a daily data collation for the recording of organ failure. Physiological parameters are recorded daily for organ systems whereby the dysfunction or failure is established (20).
Figure 1. Hierarchal structure and relationship of the data models. The OMOP CDM classes, CDISC SDTM classes and ContSys clauses are on the same class level. The OMOP CDM domains, CDISC SDTM domains and ContSys concepts are on the same domain level. The OMOP CDM columns and CDISC SDTM variables are on the same domain level. ContSys has nothing like OMOP CDM and CDISC SDTM to represent on the detail level. ISO13606 EHRCom (21) or ISO12967 HISA (22) could be used to complement ContSys on the detail level.
6
Aims
The aim of this research project is to evaluate three information models, i.e. OMOP CDM, CDISC SDTM, and ContSys on their usability for registry data and hospital data in general. In addition, it will also be investigated how suitable these models are to model (a subset of) the NICE dataset. The subset will be represented in the two most suitable models. Based on the results, the entire NICE dataset will be represented in the best suitable model(s) to check the integrity for longitudinal ICU data for that particular model. Integrity of a data model is the extent to which associations in the evaluated data model match the specific project needs (23). Furthermore, this research project includes an investigation on the data interoperability between the three different models. This includes the possibility to transform hospital data in one data model to NICE data in another model. It is to be expected that not everyone will use the same data model. Therefore it is interesting and important to know how interoperable certain data models are. This will be done to have an indication to what extent data can be queried from one FAIR data source to another.
7
Methods
Overview of the three models
A literature search was performed with PubMed and Google Scholar to find characteristics about the information models that could be compared and methods to compare them. The following terms or a combination of these terms was used for searches: ‘OMOP’, ‘CDM’, ‘CDISC’, ‘ISO 13940’, ‘ContSys’, ‘comparing’ and ‘comparison’. Titles were checked for relevance for each result of the search queries. Titles were considered relevant when it contained at least one of the names of the
information models. The abstract was read when the title was considered relevant. The whole article was read if the abstract mentioned the evaluation of one of the information models which were used as a search term or the information models in question were compared with other information models.
To compare the data models, the OMOP domains, CDISC domains and ContSys concepts were compared with each other. These levels were chosen because they are the lowest level without being too model specific (see Figure 1). For the comparison of the three data models, the OMOP definition for a domain was taken as a basis definition (24). The OMOP definition that was used as basis was generalized if the definition was too specific to OMOP. For example, the definition for the domain “Observation period” in OMOP is “The OBSERVATION_PERIOD table contains records which uniquely define the spans of time for which a Person is at-risk to have clinical events recorded within the source systems, even if no events in fact are recorded (healthy patient with no healthcare interactions)” (16). This definition was generalized to “Period of observation when person is at risk of recording clinical events” in the comparison.
Next, the list of CDISC SDTM domains (25) was checked to find fitting descriptions of domains which would fit or had overlap with a (generalized) definition taken from OMOP. If the CDISC domain definition matched or had overlap with the OMOP definition then the domains were matched. The same process was done for the ContSys concepts (13) with the OMOP domains. The whole process was repeated where the definition of the CDISC domains were compared to the ContSys definitions. It was possible that more than one domain definition from each model would fit the generalized OMOP definition.
SARI subset OMOP CDM representation
The SARI dataset is a subset of the NICE Minimal Data Set and was used in the initial mappings to OMOP CDM and ContSys.
The SARI dataset was first represented in the OMOP CDM. Figure 2 shows the steps to map a dataset to the OMOP CDM. The guide from OHDSI was used to conduct the process of making the Extract Transform, Load (ETL) design for the OMOP CDM (15). The process started by scanning the SARI dataset file using White Rabbit version 0.8.0, a tool delivered by OHDSI (26). White Rabbit generated a scan report, which contained information about the fields and frequency distributions of the values. The scan report created by White Rabbit was loaded into Rabbit-in-a-Hat version 0.9.0 which is also provided by OHDSI (26). Rabbit-in-a-Hat provides a graphical user interface to help the user create an ETL specification document according to the OMOP template. In Rabbit-in-a-Hat, the SARI dataset and columns were mapped to the OMOP CDM tables and columns. Usagi is the application provided by OHDSI to create mappings between source terms and the Vocabulary standard concepts (27). Figure 2 also shows the step to implement the ETL and to run ATLAS. However, those steps were not performed at this stage. ATLAS is used to design and execute observational analyses to generate evidence from patient level observational data.
8 A file with source codes and definitions from the SARI dataset was loaded into Usagi. Usagi suggests mappings based on textual similarities or code descriptions. Usagi gives the suggestions a matching score from 0.00 to 1.00 where 0.00 means no match and 1.00 a certain match. The automated suggestion for a target concept was approved when the automated suggestion was an appropriate target match. If the automated suggestion was not correct, then a manual search was performed for an appropriate target concept. When there was not a suitable match for a source concept, the concept was mapped to CONCEPT_ID = 0. If a source term was too complex to map to one concept, multiple concepts were mapped to it. The SARI dataset contains a column “APACHE 4 diagnosis” which includes 445 possible values for admission diagnosis. All of these values were mapped to SNOMED CT codes. It was possible to filter on vocabularies in Usagi, so when the SNOMED CT vocabulary filter was applied, Usagi only suggested SNOMED CT concepts for the data values. A reviewer checked all mappings and revisions were made when both the mapper and the reviewer agreed on the changes. When the mappings were approved, they could be used in the ETL design. To check the accuracy of Usagi it was counted how many times the suggestion from Usagi was accepted, how many suggested mappings had a matching score of 1.0 and how many times it was not possible to map. Usagi only suggests one concept, so when more than one concept is needed, the user needs to choose this manually. For example, there was a value “Thyroidectomy and
parathyroidectomy”, Usagi only suggested “Parathyroidectomy”. A concept for “Thyroidectomy” had to be manually added to completely represent the source term. Therefore, these cases were treated as if the user had to manually pick (other suggested) concepts and it was not counted as an accepted suggestion. The SARI set contains many of these complex data values.
Figure 2. Steps to map a dataset to the OMOP CDM. The blue boxes represent actions that are needed to map a dataset to the OMOP CDM and the purple boxes are deliverables in the process. The yellow star is a Data Partner milestone. Source: OHDSI EHDEN. The book of OHDSI.
SARI subset ContSys representation
The contsys.org website (13) was used to guide the process of representing the SARI dataset in ContSys. Figure 3 shows the steps to map to ContSys. It was first checked which clauses (that contain the concepts) represented the SARI dataset the most. Next, the concepts within these clauses were chosen to represent the column names from the SARI dataset. Descriptions and examples of a ContSys concept were checked to see if the column name and use would fit the description.
9 Concepts with a description that matched the column name were placed in a cell of considered concepts. If all concepts were checked and the list of considered concepts was complete, then a decision was made on which of the considered concepts represented the column name the best. The reason for it being the best choice was also recorded. For each mapping an Electronic Health Record Communication (EHRCOM) (21) type of data value was noted, which represents the type of data value from the SARI dataset. Comments or logic was added to the mapping when necessary. The ContSys representation was also checked by a reviewer and revisions were made when both the mapper and the reviewer agreed on the changes.
SNOMED CT has a very close alignment with ContSys. SNOMED CT can represent clinical statements within a record while ContSys establishes the relationship within the healthcare business (19). Therefore, the decision was made to map all SARI data values of the data items to SNOMED CT codes. ContSys does not provide any tools for the mapping process. However, Usagi from OHDSI was used for the vocabulary mapping with the SNOMED CT vocabulary filter. The OMOP CDM term mapping was mostly reused for the ContSys term mapping. Only the SARI column names that were not mapped to a SNOMED CT code were changed to a suitable SNOMED CT code to make the whole term mapping compliant with ContSys. For both the OMOP and ContSys representation the time needed to complete the mapping was noted to compare the mapping experience. The models were also compared on how they handle negative findings, an important part of the NICE database.
OMOP to ContSys mapping
Both the representation of SARI in OMOP and in ContSys were used to determine the
interoperability between OMOP and ContSys. This was done to check if hospital data in ContSys can be transformed to NICE data in OMOP. The used OMOP domains, columns and corresponding definitions were noted from the SARI in OMOP representation. Then the mapped SARI terms from the SARI in OMOP representation and the data value types were noted. Lastly, the ContSys concepts and their definitions were noted which were mapped to the SARI term in the SARI in ContSys representation. Logic for mapping data values from OMOP to ContSys and comments were added if necessary. Figure 4 shows the steps to map from OMOP to ContSys.
10
MDS in OMOP CDM representation
The MDS is the core dataset of the NICE database and was represented in the OMOP CDM. The same methods that were used for the SARI in OMOP CDM representation were also applied to the MDS in OMOP CDM representation. The only difference is that White Rabbit version 0.9.0 was used instead of version 0.8.0. The MDS also contains the 445 APACHE 4 values. The APACHE 4 source value mapping from the SARI dataset was reused for the MDS source value mapping. The APACHE 4 values were not taken into account when evaluating the accuracy of the Usagi suggestions. No vocabulary filter was applied for this mapping.
Implementing the MDS ETL
The process was guided by the Extract, Transform and Load course from the EHDEN academy (28). PostgreSQL was chosen for the implementation. PostgreSQL is one of the supported databases for Automated Characterization of Health Information at Large-scale Longitudinal Evidence Systems (ACHILLES) and will be the only supported database in later versions for ATLAS. The ETL designed with Rabbit-in-a-Hat was used as design for the implementation in PostgreSQL. The implementation of ACHILLES is also possible instead of ATLAS as was shown in Figure 2. ACHILLES was implemented in RStudio and connected with the database in PostgreSQL. ACHILLES was used to characterize the data in the database. ACHILLESHEEL is part of ACHILLES and was used to assess the data quality.
11
Results
Overview of the three models
In the literature search three articles were found regarding the comparison of data models. Characteristics defined for data model comparisons were used from these articles. These
characteristics were complemented with characteristics that were considered to be important. See Table 1 for the characteristics.
As part of this study the three models were compared to decide which models were most suitable for the NICE registry and the hospitals that deliver data to NICE. Suitability was determined with the results for the characteristics. Table 1 provides an overview of the three information models with their evaluation for each characteristic. These characteristics are defined in Appendix Table 1. Based on the results it was decided to map the SARI dataset to OMOP CDM and ContSys and not to CDISC. This was mostly decided on the results for the purpose of model and strengths characteristics and the characteristics defined by Garza et al. (29), Kahn et al. (30) and Moody et al. (23).
Table 1 Overview of the characteristics of information models.
* There is no list available of adopters for OMOP. Garza et al. only found 5 adopters. There are most likely many more adopters, therefore this should be seen as a minimum amount of adopters.
Characteristic OMOP CDM V6 CDISC SDTM ISO13940 - ContSys
Type of data Observational data (7) – administrative claims and electronic health records (2)
(Clinical) Study data (31) (Health)care data (32)
Purpose of model Observational research
purposes (15) Collecting, preparing and analyzing study data (31) Support continuity of care (33) Structure of model Person-centric relational
model (2) A standard structure for representing the planned sequence of activities and the treatment plan for the trial
System of concepts
Strengths Tools, clear guide,
supports cohorts Support for clinical trials Also supports self-care, healthcare third parties and extends to include all aspects of social care (19)
Weaknesses Has required fields which need to be filled in
Some domains are for
very specific purposes Issues with concepts, terms and definitions (34)
Very generic concepts Number of domains or
concepts 38 domains (15) 46 domains (25) 173 concepts Ontology/vocabulary The OMOP Standardized
Vocabularies which contain 111 vocabularies of which 78 have been externally adopted, like SNOMED CT (15)
CDISC Controlled
Terminology (25) SNOMED CT has a relationship with ContSys (19)
Integrity (23) 100% (29) Garza et. Al
100% for SARI 100% (86% incorporates the SUPPQUAL domain) (29)
100% for SARI Extensibility /Scalability
12
Ease of querying the model
a. No. of nested queries b. No. of table joins c. Estimated query performance (29) a. 1 b. 2-5 c. Faster (29) a. 1 b. 4-12 c. Slower (29) Not Applicable Ease of anonymization and de-identification (29)
Medium (29) Difficult (29) Easy Integration (23, 29, 30) 100% (29) Garza et. Al
94.6% ~ 452 out of 478 for SARI
67% (29) 93.5% ~ 447 out of 478 for SARI
Field experience (30) Release 2009 Release 2004 Release 2007 Stability (model updates
in last 2 years. Major update is a new version.
Minor update is an updated version.) (30)
1 major
3 minor 0 major 2 minor 0 major 0 minor
Adoption (number of
adopters) (30) 5* (29) The network consists of over 100 standardized databases (15)
>348 (29) In the Netherlands, a few adopters (35) Tooling Athena - look up
concepts in the vocabulary
White rabbit – perform a scan of the source data Rabbit-in-a-hat – define logic from data source to OMOP
Usagi – create code mapping
ATLAS – analysis tool ACHILLES – database characterization ARACHNE – facilitates distributed network analyses (15)
Not available Not available
Time spent on SARI
dataset modeling 8 hours (does not include mapping with Usagi)
Not Applicable 8 hours (does not include mapping with Usagi)
Appendix A shows the overlap in definitions between the domains from OMOP CDM, CDISC SDTM and ContSys (although domains in ContSys are called concepts). The classes from the three models are divided into different domains (note that classes in ContSys are actually called clauses). The clauses from ContSys act the same way as the classes from OMOP CDM and CDISC SDTM. Hence, the concepts from ContSys were treated as the same level as the domains from OMOP CDM and CDISC SDTM. Appendix A was made to research how similar the three models are and on which domains they mostly overlap.
13
SARI subset OMOP CDM representation
All of the variables from the SARI dataset were mapped to the OMOP CDM tables and columns. Appendix B contains the scan report generated by White Rabbit which was loaded into Rabbit-in-a-Hat. All variables but one fitted with the OMOP CDM columns without adjustments. The SARI subset has the age of a patient (nice_age) at the time of ICU admission. OMOP CDM does not use age but uses birth day, birth month and birth year of which only birth year is required. Birth year can be calculated by subtracting the age from the hospital admission date. Consequently, the birth year can be one year off, if the admission date is prior to the actual birthday of the patient. Birth month and birth day will be missing but those are not essential for OMOP CDM. Appendix C contains the Word file generated by Rabbit-in-a-Hat, which contains the SARI to OMOP CDM representation.
OMOP CDM on the other hand has fields that are required. However, the SARI dataset does not include the data to fill all those required fields. Table 2 shows which required fields in the used OMOP CDM tables could not be used with the data from the SARI dataset. However, a workaround was found to deal with the missing required fields on a forum from the community. For instance, the SARI dataset does not contain data about the race or ethnicity of patients while these fields are required in the OMOP CDM. In that case the workaround is to set RACE_CONCEPT_ID and
ETHNICITY_CONCEPT_ID to zero (36). Required fields that require a date were set to 01-01-1700 if NICE does not provide for the required field.
Table 2 Required fields from OMOP CDM, which could not be filled with data from the SARI dataset. Top row are domains and the rows below are columns within those domains. The type concept ids could not be filled in with SARI data. However, these were filled in with a standard concept id from OMOP.
Person Observation
period Visit occurrence Condition occurrence Fact relationship Measurement Observation Procedure occurrence
Person
id Person id Person id Condition occurrence id
Domain
concept id 1 Measurement id Observation id Procedure occurrence id
Race concept id
Period type
concept id Visit concept id Person id Domain concept id 2 Person id Person id Person id Ethnicity
concept id
Condition
start date Relationship concept id Measurement date Observation date Procedure date Condition concept type id Measurement type concept id Observation type concept id Procedure type concept id
In Usagi out of 33 source terms, only one term was unmapped (97% mapped). Eight source terms (24.24%) were mapped to two concept terms and two (6.06%) source terms were mapped to three concept terms. Appendix D contains the source terms to concept terms mapping. The APACHE 4 diagnoses values were separately mapped, as these would also be used for the ContSys
representation. There were 445 APACHE diagnoses of which 26 could not be mapped.
With 33 source terms and 445 APACHE diagnoses together, a total of 478 terms were used for mapping and of these 26 terms were unmapped (94.6% mapped). In total 96 suggestions from Usagi were accepted. Out of the 478 terms, 58 suggestions had a matching score of 1.0 and 56 of these suggestions were accepted. The only two suggestions with a matching score of 1.0 that were not accepted were for “male” and “female”. Usagi suggested LOINC codes for these, but the OHDSI guide states that both “male” and “female” must be mapped to the standard OMOP codes.
Therefore, the Usagi suggestions were rejected and the terms were manually mapped to the OMOP standard codes. The suggestion with the lowest matching score that was accepted had a matching score of 0.47.
14
SARI subset ContSys representation
All SARI column names were mapped to a ContSys concept, see Figure 5. The APACHE 4 diagnosis column was mapped to two ContSys concepts, because the APACHE 4 diagnosis contains both diagnosis and procedures. The “Died in hospital” and “Chronic renal insufficiency” columns in SARI were mapped to the “Excluded condition” and the “Working diagnosis” to represent the Boolean value of those columns. If the Boolean value is zero, it is an “Excluded condition”. When the Boolean value is one, it is a “Working diagnosis”. “Died in hospital” is also mapped to “Observable condition” and “Chronic renal insufficiency” is also mapped to “Health issue”. This is done to make a distinction between these two mappings. The term “Died in hospital’ is more an observable condition and “Chronic renal insufficiency” is more a health issue. Each mapping was made unique whenever possible. However, that was not always possible since certain column names from SARI are too similar for ContSys that uses very general concepts. Table 3 shows pairs of SARI columns of which elements of the pair was mapped to the same ContSys concepts. Appendix E contains the complete ContSys representation. Of the 478 data columns or values mapping from OMOP, sixteen were not mapped to SNOMED CT codes and were therefore mapped to a SNOMED CT code to make it compliant with ContSys.
Table 3 SARI columns which are mapped to the same ContSys concepts
SARI column names ContSys concept
Hospital number & Intensive care number Healthcare organization
Date of birth & Gender Demographic element
Hospital admission date & Intensive care
admission date Initial contact
Hospital discharge date & Intensive care
discharge date Healthcare activity period
Maximum ventricle heart rate & Maximum
creatinine Healthcare information
Figure 5 shows the SARI dataset represented in ContSys and the relation between the used concepts. Most concepts could be linked to each other with a maximum of one other unmapped concept in between. The “Healthcare treatment” concept was too separated from the other mapped concepts and is therefore represented without any relations. The “Healthcare information” concept does not have any relationship in ContSys. The “Element” and “Demographic element” concepts are separated, as they are concepts from ISO 13606-1 Health informatics— Electronic health record communication — Part 1: Reference model (21), a complimentary model for ContSys.
15
Comparison of time taken for mapping to both models
The OMOP and ContSys representations of the SARI set both took about eight hours to complete. The eight hours do not include the mapping with Usagi, as ContSys is not supported by the Usagi tool. However, many hours for the ContSys representation have been spent on checking all the descriptions of the concepts and deciding which definitions were fitting. Then a significant amount of time was spent on deciding which concepts fit best. The OMOP process was more
straightforward. For both OMOP and ContSys the SARI values mapping to the OMOP vocabulary and SNOMED CT were also timed. However, the value mapping from OMOP was mostly reused for ContSys and Usagi was used for ContSys. Therefore, in reality the time spent on the ContSys representation would be higher. Mapping the APACHE 4 values with Usagi took twelve hours to complete. Mapping half of the APACHE 4 values manually has previously been done and took one dedicated month to complete. This would likely be the case with ContSys if Usagi was not used. A dataset with many data values is mapped much quicker to OMOP with the help of the tools, which ContSys lacks.
A notable difference between OMOP and ContSys is how they handle negative findings. OMOP is not designed to handle negative findings. ContSys has specific concepts to represent negative findings such as “excluded condition” and “considered condition”.
Figure 5 SARI in ContSys represented. This figure contains the SARI to ContSys mapping and the relationship between used ContSys concepts. The white boxes below a ContSys concept box with a continuous border show the SARI column names mapped to that ContSys concept. The boxes with discontinuous borders are concepts which are not used in the SARI mapping, but form a link between concepts that are mapped. The colors of the boxes represent the clauses from the ContSys model. Light blue is “Responsibility”, purple is “Healthcare actor”, light green is “Healthcare matter”, dark green is “Time”, dark blue is “EHRCOM Reference model”, red is “Activity” and “Process”, yellow is “Electronic health record component”, white with a yellow border means that it does not belong to a clause. The bigger white arrows point to the super class of that concept. The edges represent the relationship between the concepts with cardinality specifications taken from the ContSys website.
16
OMOP to ContSys mapping
All SARI terms from the SARI in OMOP representation were mapped to the corresponding SARI terms from the SARI in ContSys representation. Therefore, the OMOP columns from the OMOP
representation were also mapped to the ContSys concepts from the ContSys representation. The OMOP column definitions fit with the ContSys concept definitions. Appendix F contains the OMOP to ContSys mapping. The table is structured with OMOP in mind as the mapping is from OMOP to ContSys. Most of the logic is about transforming the OMOP IDs to SNOMED CT codes. Comments are mostly about that certain OMOP columns are mapped to the same ContSys concepts and are thus redundant in ContSys. These OMOP columns are used to fulfil different roles in OMOP, but will result in ContSys as redundant.
MDS in OMOP CDM representation
Appendix G contains the scan report generated by White Rabbit for the MDS which was loaded into Rabbit-in-a-Hat. All of the 204 variables from the MDS were mapped to the OMOP CDM tables and columns. Again only nice_age needed the same adjustment as was done in the SARI in OMOP representation to fit the year_of_birth column in OMOP. The same workaround that was used in the SARI in OMOP representation, was applied in this mapping to deal with required OMOP columns which could not be filled with data in the MDS. Appendix H contains the Word file generated by Rabbit-in-a-Hat, which contains the MDS to OMOP CDM mapping.
The file that was loaded into Usagi with all of the source terms contained 817 variables that needed to be mapped. The APACHE 4 values from the SARI source term mapping were automatically mapped which left 372 values to map. Another 25 values were excluded as they were not longer used in the MDS or don’t have meaning outside the NICE dataset. So in total 347 values had to be mapped in Usagi. For 35 values no fitting mapping was found and these were left unmapped (89.9% mapped). Usagi gave 69 (19.9%) values a matching score of 1.0. Again the suggestions for “male” and “female” were rejected as Usagi did not suggest the recommended mappings for “male” and “female” by OHDSI. In total 124 (35.7%) suggestions from Usagi were accepted. The accepted suggestion with the lowest matching score had a matching score of 0.38. Appendix I contains the source terms to concept terms mapping for the MDS.
Implementing the MDS ETL
The implementation of the NICE MDS in the OMOP CDM was completed. Appendix J contains a written guide with details and personal experiences that are not available in the book of OHDSI. Necessary ids were correctly filled in with the corresponding concept ids from the Usagi source to concept mapping. Usagi created the SOURCE_TO_CONCEPT table. This table will search for the corresponding concept id for the source code. However, source codes are not used in the NICE database. Therefore an extra table CODES was created which links source values in the MDS to the corresponding source code given in Usagi. With the CODES table it was possible to fill the OMOP CDM tables with the correct concept ids.
Required fields in the OMOP CDM for which the MDS did not contain data were set to 0 or set to 01-01-1700 if it regarded a date. Many of the OMOP CDM tables are defined to have a concept type id. Table 4 shows the chosen concept type id for each used table in the OMOP CDM. The ACHILLES and ACHILLESHEEL tests were also completed. Results can be found in Appendix K for ACHILLES and in Appendix L for ACHILLESHEEL.
17
Table 4 Chosen concept type id for each used OMOP CDM table in the implementation of the MDS ETL
OMOP CDM table Concept type id
Care site 706367 Intensive care unit
Condition occurrence 44786627 Primary Condition
Measurement 5001 Test ordered through EHR
Observation 38000280 Observation recorded from EHR
Observation period 44814724 Period covering healthcare encounters Procedure occurrence 44786630 Primary Procedure
Visit occurrence 32024 Visit derived from encounter on medical professional claim
18
Discussion
With an increase in the amount of observational data there is an increase in varied data structures. This requires improved and more complex infrastructure to support the reuse of data. This is where making data FAIR can help. An information model is essential for data sources to become FAIR. However, it was unknown which information model is most suitable for quality registries such as NICE and to what extent information models that contain hospital data or ICU quality registry data are interoperable. Finding these optimal information models and determining the interoperability will help data sources become FAIR, which will allow answering large and multisite research and policy questions.
The evaluation of the three information models, i.e. OMOP CDM, CDISC SDTM and ContSys, makes apparent that OMOP CDM is the most usable for registries and that ContSys is most suitable for hospital data based on the results in Table 1. OMOP CDM and ContSys both proved to be usable as an information model for the SARI dataset. However, as in general OMOP CDM is the most usable for registries. The MDS was also represented with OMOP CDM and it has been proven with the implementation that is compatible with ACHILLES. Based on the results, it is concluded that OMOP CDM and ContSys are interoperable. Therefore, data can be transformed from ContSys to OMOP CDM with minimal data loss.
OMOP CDM and ContSys experience comparison
Of the two models, OMOP CDM was the easier model to map. OHDSI provides a guidebook, forum, and tools to help users to apply the OMOP CDM (15). In contrast to OMOP, ContSys is not
implementation orientated because it is a conceptual model. Consequently, ContSys does not provide a guidebook or tools to help use ContSys. The OMOP classes and domains are also more specific than the clauses and concepts from ContSys. This made it much clearer which SARI item should be connected to a certain OMOP column. This was sometimes problematic in ContSys, because the definitions for the concepts are very general. In many cases, the SARI column names would fit with many concepts. For the SARI representation in ContSys an effort was made to use unique concepts however that was, in certain cases, impossible. This could be problematic when implementing the ContSys representation. On the other hand, ContSys does not provide any guidelines for the implementation and the user can therefore freely decide how to implement the model. This allows the user to decide how to deal with this and other problems. This is for example also the reason why “ease of anonymization and de-identification” for ContSys was rated as “easy” in Table 1. However, this is problematic if ContSys is used for FAIR data. As the implementation for ContSys could be different for everyone. As was stated in the results section, the SARI
representation in OMOP and ContSys both took about eight hours to complete. However, for the ContSys representation this was with the help of OHDSI tools. So without the OHDSI tools the ContSys representation would have taken much more time to complete.
Another difference between OMOP and ContSys is how they handle negative findings. ContSys has concepts to represent negative findings such as “excluded condition” and “considered condition”. It could also be argued that “healthcare information” could represent negative findings. Healthcare information is information that is relevant for a person’s healthcare. In certain cases, negative findings could be relevant for the healthcare of a person. OMOP in theory, does not collect negative findings. Firstly, the discussion exists if negative findings should be collected in OMOP CDM (37). Queries potentially have to be doubled in case negative findings are collected, because the query needs to include a specific disease and needs to exclude those without the disease. Secondly, if it is decided to include negative findings, it is unclear how it should be done. The best solution seems to
19 be to put it into OBSERVATION with the observation_concept_id 4132135 (Absent) and a SNOMED CT code for the pertinent negative as value_as_concept_id (38), in case it is decided that negative findings add valuable information. But for some data sources, such as the NICE registry, it is necessary to represent negative findings.
OMOP CDM and ContSys interoperability
Based on the results of the OMOP to ContSys mapping it can be concluded that the two models are interoperable. The interoperability is mostly possible due to the broad definitions of the concepts of ContSys and that ContSys does not tell its users how to implement it. The broad definitions of the ContSys concepts were somewhat problematic in the SARI representation in ContSys but proved to be very useful in the OMOP to ContSys mapping. Although, the decision making process of choosing fitting concepts was already done in the SARI to ContSys phase. The fact that ContSys does not guide its users how to implement the model should also benefit the interoperability of the two models. Users are free to determine how to implement the ContSys model, which means that users are also free to choose how they implement an interoperability aspect with other data models. This is useful only if two sites want to make their data interoperable. The downside is that this free interpretation of the implementation could lead to new problems and errors to arise. Especially when used to make data FAIR. The OMOP model has guidelines for implementation, but this could also provide a good starting point. For the OMOP to ContSys mapping it was convenient that all of the APACHE 4 diagnoses were already mapped to SNOMED CT codes. If that were not the case, then the OMOP to ContSys mapping would require more work. However, Usagi from OHDSI could be used to remap the data values that were not mapped to a SNOMED CT code in the OMOP representation to speed up the mapping process.
Although the OMOP to ContSys mapping was done to research the interoperability from OMOP to ContSys, certain conclusions could also be drawn for a ContSys to OMOP mapping. The OMOP to ContSys mapping indicates no problems for a ContSys to OMOP mapping. The OMOP to ContSys mapping can also be read as a ContSys to OMOP mapping if read from right to left (Appendix F). For example, “gender_concept_id” from OMOP is mapped from the SARI column “gender” and “gender” is mapped to “demographic element” in ContSys. This can also be read as “demographic element” from ContSys is mapped from the SARI column “gender” and “gender” is mapped to
“gender_concept_id” in OMOP. Thus indicating that a ContSys to OMOP mapping is possible. The SNOMED CT codes from the ContSys representation could be left untouched or could be
complemented with concepts from the OMOP vocabulary when mapping from ContSys to OMOP. This is especially interesting for unmapped source values in the ContSys representation.
Discussion of the findings in relation to existing literature
Si et al. and Lima et al. applied the same procedure to map to OMOP CDM as was used in this study (39, 40). A small difference is that Lima et al. decided not to use Usagi but instead found the concept IDs for source terms in Athena. Athena is a website by OHDSI for searching concept IDs. That the same process used is throughout the OMOP community is expected due to the clear guides OHDSI provides. Notable is that Lima et al. mapped electronic patient records to the OMOP CDM. This may imply that OMOP CDM is also suitable for patient records and thus ICUs could consider using OMOP CDM instead of ContSys, taking into account the representation of negative findings in OMOP. A reason to do so this is that OMOP CDM is a matured information model which is also mentioned by Lima et al. In case an ICU decides to use OMOP CDM as the quality registry does, no interoperability problems should arise when extracting data from the ICU to be transported to the quality registry. However, this study assumed that ContSys is a better information model for continuity of healthcare and Lima et al. did not compare information models.
20 Liyanage et al. (41), Guo et al. (42) and Garza et al. (29) all compared CDMs and concluded that OMOP CDM is the best CDM for cohort studies and longitudinal community registries. Integration percentages in these studies vary from 88.7% to 98.8% for Guo et al. and is 100% for Garza et al. The SARI set scored 94.6% and the MDS scored 89.8% so the integration rate for NICE in OMOP CDM is in line with the other studies. It is remarkable that the SARI set scored on the high end. The SARI set has a significant number of complex source terms and a vocabulary filter was also applied.
Recommendations
NICE is recommended to use the OMOP CDM information model to make the NICE data FAIR. OMOP CDM, ContSys and CDISC SDTM were developed for specific purposes and their fit for other purposes depends on how closely the information model matches the intended use (29). The NICE registry is focused on observational data and the results show that both the integrity and integration are high. It is therefore likely that the OMOP CDM is the most suitable information model to make NICE data FAIR.
However, NICE receives data from ICUs, which are more focused on continuity of care as their data comes from EHRs. This means that hospitals or ICUs would probably benefit most by using ContSys to make their data FAIR. However, ContSys could be problematic to make data FAIR and according to Lima et al., OMOP can also be considered.
For the implementation process it is recommended to start with OMOP for NICE as OHDSI provides tools and clear guides for the implementation of the OMOP CDM which is a good starting point. In case an ICU chooses for ContSys, the ContSys implementation for the ICUs should be started when the OMOP implementation for NICE is completed. So, that it can be assessed if certain decisions need to be made for the ContSys implementation to optimize the interoperability with the OMOP data source from NICE.
Strength and weaknesses
This study has several strengths. First, the choice of information models to map data was based on an evaluation of three information models. Secondly, two datasets were represented in the OMOP CDM. So, conclusions for the usability of OMOP CDM are based on two examples, although these are overlapping. However, the APACHE 4 diagnosis were only mapped to SNOMED CT codes in the source to concept terms mapping using Usagi. This most likely reduced the number of source terms that could be mapped. This could influence generalizability as other data sources could decide to use different vocabularies in the OMOP CDM. On the other hand, for the MDS source to concept term mapping no vocabulary filter was applied. A second limitation is that the ContSys representation is open for interpretation. Because the descriptions for concepts in ContSys are so general, different people might decide to make different mappings. On this aspect it is noticeable that ContSys is not as mature as OMOP.
Future research questions
Further research is needed on how to optimally make a representation in ContSys. If different sites use ContSys differently, it may affect the interoperability with other information models. Therefore, more research is needed on how to optimally map using ContSys and how this influences the interoperability with OMOP CDM. The interoperability between OMOP CDM and other information models should also be further researched. Hospitals could decide that ContSys is not the optimal information model for them to use and decide to use a different information model. In such cases it is unknown to what extent the interoperability with OMOP is.
21
Conclusion
Information models are essential for data sources to become FAIR. This research shows that OMOP CDM is the optimal choice among information models for national ICU quality registries that need to make their data FAIR. Hospitals could opt for ContSys or OMOP. Results from this research make apparent that the information models OMOP CDM and ContSys are interoperable.
22
References
1. Makadia R, Ryan PB. Transforming the Premier Perspective® Hospital Database into the Observational Medical Outcomes Partnership (OMOP) Common Data Model. Egems. 2014;2(1). 2. Overhage JM, Ryan PB, Reich CG, Hartzema AG, Stang PE. Validation of a common data model for active safety surveillance research. Journal of the American Medical Informatics Association. 2011;19(1):54-60.
3. Reich C, Ryan PB, Stang PE, Rocca M. Evaluation of alternative standardized terminologies for medical conditions within a network of observational healthcare databases. Journal of
biomedical informatics. 2012;45(4):689-96.
4. NICE. Introductie 2020 [cited 2020 10th of January]. Available from: https://www.stichting-nice.nl/index.jsp.
5. NICE. Wat we doen 2020 [cited 2020 10th of January]. Available from: https://www.stichting-nice.nl/watwedoen.jsp.
6. Wilkinson MD, Dumontier M, Aalbersberg IJ, Appleton G, Axton M, Baak A, et al. The FAIR Guiding Principles for scientific data management and stewardship. Scientific data. 2016;3.
7. Hripcsak G, Duke JD, Shah NH, Reich CG, Huser V, Schuemie MJ, et al. Observational Health Data Sciences and Informatics (OHDSI): opportunities for observational researchers. Studies in health technology and informatics. 2015;216:574.
8. Lee YT, editor Information modeling: From design to implementation. Proceedings of the second world manufacturing congress; 1999: International Computer Science Conventions Canada/Switzerland.
9. GO-FAIR. Personal Health Train 2020 [cited 2020 11th of June]. Available from: https://www.go-fair.org/implementation-networks/overview/personal-health-train/.
10. Beyan O, Choudhury A, van Soest J, Kohlbacher O, Zimmermann L, Stenzhorn H, et al. Distributed analytics on sensitive medical data: The Personal Health Train. Data Intelligence. 2020:96-107.
11. OHDSI. OMOP Common Data Model 2020 [cited 2020 29th of January]. Available from: https://www.ohdsi.org/data-standardization/the-common-data-model/.
12. CDISC. SDTM 2020 [cited 2020 29th of January]. Available from: https://www.cdisc.org/standards/foundational/sdtm.
13. O N. A system of concepts for the continuity of care 2019 [cited 2020 29th of January]. Available from: https://contsys.org/page/default.
14. OHDSI. Data standardization 2020 [cited 2020 10th of January]. Available from: https://www.ohdsi.org/data-standardization/.
15. OHDSI. The Book of OHDSI 2019 [cited 2020 10th of January]. Available from: https://ohdsi.github.io/TheBookOfOhdsi/.
16. OHDSI. OBSERVATION_PERIOD 2018 [updated 30 November 2018; cited 2020 25th May]. Available from: https://github.com/OHDSI/CommonDataModel/wiki/OBSERVATION_PERIOD. 17. CDISC. About CDISC 2020 [cited 2020 16th of January]. Available from:
https://www.cdisc.org/about.
18. CDISC. Study Data Tabulation Model. 2019.
19. O N. Background of this system of concepts 2019 [cited 2020 20th of January]. Available from: https://contsys.org/page/Background.
20. NICE S. Our registries 2020 [cited 2020 14th of April]. Available from: https://www.stichting-nice.nl/dd/#modules.
21. ISO. ISO 13606-1:2019 Health informatics — Electronic health record communication — Part 1: Reference model 2019 [cited 2020 8th of June]. Available from:
23 22. ISO. ISO 12967-1:2009 Health informatics — Service architecture — Part 1: Enterprise viewpoint 2009 [cited 2020 11th of June]. Available from:
https://www.iso.org/standard/50500.html.
23. Moody DL, Shanks GG. Improving the quality of data models: empirical validation of a quality management framework. Information systems. 2003;28(6):619-50.
24. OHDSI. CommonDataModel 2019 [cited 2020 3rd of March]. Available from: https://github.com/OHDSI/CommonDataModel/wiki.
25. CDISC. Study Data Tabulation Model Implementation Guide: Human Clinical Trials 2019 [cited 2020 10th of January].
26. OHDSI. WhiteRabbit 2020 [cited 2020 27th of January]. Available from: https://github.com/OHDSI/WhiteRabbit.
27. OHDSI. Usagi 2020 [cited 2020 27th of January]. Available from: https://github.com/OHDSI/Usagi.
28. EHDEN. EHDEN academy 20020 [cited 2020 27th May]. Available from: https://academy.ehden.eu/.
29. Garza M, Del Fiol G, Tenenbaum J, Walden A, Zozus MN. Evaluating common data models for use with a longitudinal community registry. Journal of biomedical informatics. 2016;64:333-41. 30. Kahn MG, Batson D, Schilling LM. Data model considerations for clinical effectiveness researchers. Medical care. 2012;50.
31. CDISC. CDISC Standards in the Clinical Research Process 2019 [cited 2019 12th of December]. Available from: https://www.cdisc.org/standards.
32. ISO. ISO 13940:2015(en) Health informatics — System of concepts to support continuity of care: ISO; 2015 [cited 2020 13th of February]. Available from:
https://www.iso.org/obp/ui/#iso:std:iso:13940:ed-1:v1:en.
33. TC251 C. EN 13940-1: Health Informatics-System of Concepts to Support Continuity of Care-Part 1: Basic Consepts. European Committee for Standardization. 2006:105.
34. O N. Issues with the current model 2019 [cited 2020 20th of January]. Available from: https://contsys.org/page/ModelIssues.
35. Nictiz. CONTSYS 2016 [cited 2020 4th of February]. Available from: https://www.nictiz.nl/standaarden/contsys/.
36. Reich C. Conventions of Race and Ethnicity 2019 [updated 19th of August 2019; cited 2020 13th of February]. 13th of February 2020:[Available from: https://forums.ohdsi.org/t/conventions-of-race-and-ethnicity/7654.
37. Reich C. Negative information in OMOP CDM 2018 [cited 2020 9th of March]. Available from: https://forums.ohdsi.org/t/negative-information-in-omop-cdm/4923/8.
38. Sholle E. Negative information in OMOP CDM 2018 [cited 2020 9th of March]. Available from: https://forums.ohdsi.org/t/negative-information-in-omop-cdm/4923/10.
39. Si Y, Weng C. An OMOP CDM-based relational database of clinical research eligibility criteria. Studies in health technology and informatics. 2017;245:950.
40. Lima DM, Rodrigues-Jr JF, Traina AJ, Pires FA, Gutierrez MA. Transforming two decades of ePR data to OMOP CDM for clinical research. Stud Health Technol Inform. 2019;264:233-7. 41. Liyanage H, Liaw S-T, Jonnagaddala J, Hinton W, de LUSIGNAN S, editors. Common Data Models (CDMs) to Enhance International Big Data Analytics: A Diabetes Use Case to Compare Three CDMs. EFMI-STC; 2018.
42. Guo GN, Jonnagaddala J, Farshid S, Huser V, Reich C, Liaw S-T. Comparison of the cohort selection performance of Australian Medicines Terminology to Anatomical Therapeutic Chemical mappings. Journal of the American Medical Informatics Association. 2019;26(11):1237-46.
24
Appendix
Appendix table 1
Eight out of seventeen characteristics in Appendix Table 1 were used in research from Garza et al (29) where they compared CDMs to determine which CDM is most suitable to share data from an Electronic Health Record (EHR)-based community registry.
Appendix Table 1 Characteristics and their definition(s).
Characteristic Conceptual definition Operational definition
Type of data Type of data suitable for the
information model
Purpose of model The purpose for which the information model is intended to be used
Structure of model Design of the model
Strengths Strengths of the information
model in comparison with other information models
Weaknesses Weaknesses of the
information model in comparison with other information models
Number of domains / concepts Absolute number of domains
or concepts represented in the model
Ontology/vocabulary Which ontology/vocabulary is used in the information model
Integrity (23) The extent to which
associations in the evaluated data model match the specific project needs
The percentage of associations in the data model held by the evaluated model
Extensibility /Scalability (30) How much the information model can accommodate addition of new data elements Ease of querying the model
(29) The ease of querying the evaluated model for cohort identification
1. Number of table joins required for each query
2. Number of nested queries needed for each query
3. Qualitative estimate (faster or slower) of the overall query performance over views of the data model based on the complexity of the query used for cohort identification.
(These assume a relational database implementation)
25 Ease of anonymization and
de-identification (29) The ease of de-identification and anonymization of the data captured in the evaluated model
Qualitative estimate (easy, medium or difficult) of the complexity of the de-identification and anonymization process Integration (23, 29, 30) The extent to which the model
supports controlled terminologies
Each model was evaluated on the integration and use of a controlled vocabulary. The terminologies supported by the models were compared.
Field experience (30) The release year of the model,
which indicates the number of years of experience.
Stability (30) The number of changes to the
data model Number of model updates in the last two years. Major update means a change in version number. A small update means that a current version of the model was updated.
Adoption (30) The size of the community
using and supporting the data model
The number of adopters for the model
Tooling Is tooling available to map the
data source to the information model
Time spent on modeling Time in hours spent on modeling the data to the information model
Appendix A
*Yellow boxes have overlap with other model(s)
OMOP CDISC SDTM ISO13940 - ContSys
Standardized Vocabularies Special-Purpose Domains Healthcare actor
CONCEPT Comments (CO) healthcare actor
VOCABULARY Demographics (DM) healthcare emplyment
DOMAIN Subject Elements (SE) healthcare organization
CONCEPT_CLASS Subject Visits (SV) healthcare personnel
CONCEPT_RELATIONSHIP healthcare professional
RELATIONSHIP Interventions General Observation Class healthcare professional entitlement
CONCEPT_SYNONYM Concomitant Medications (CM) healthcare provider
CONCEPT_ANCESTOR Exposure as Collected (EC) healthcare supporting organization
SOURCE_TO_CONCEPT_MAP Exposure (EX) healthcare third party
DRUG_STRENGTH Substance Use (SU) next ofkin
Procedures (PR) organization role
Standardized Metadata other carer
CDM_SOURCE Events General Observation Class subject of care
METADATA Adverse Events (AE) subject of care proxy
Clinical Events (CE)
Standardized Clinical Data Tables Disposition (DS) Healthcare matter
PERSON Protocol Deviations (DV) clinical process interest
OBSERVATION_PERIOD Healthcare Encounters (HO) considered condition
VISIT_OCCURRENCE Medical History (MH) excluded condition
VISIT_DETAIL health condition
CONDITION_OCCURRENCE Findings General Observation Class health condition evolution
DEATH Drug Accountability (DA) health issue
DRUG_EXPOSURE Death Details (DD) health need
PROCEDURE_OCCURRENCE ECG Test Results (EG) health problem
DEVICE_EXPOSURE Inclusion/Exclusion Criterion Not Met (IE) health problem list
MEASUREMENT Immunogenicity Specimen Assessments (IS) health state
NOTE Laboratory Test Results (LB) health thread
NOTE_NLP Microbiology Specimen (MB) healthcare matter
SURVEY_CONDUCT Microscopic Findings (MI) input health state
OBSERVATION Morphology (MO) observed condition
SPECIMEN Microbiology Susceptibility Test (MS) output health state
FACT_RELATIONSHIP PK Concentrations (PC) potential health condition
PK Parameters (PP) professionally assessed condition
Standardized Health System Data Tables Physical Examination (PE) prognositc condition
LOCATION Questionnaires (QS) resultant condition
LOCATION_HISTORY Reproductive System Findings (RP) risk condition
CARE_SITE Disease Response (RS) target condition
PROVIDER Subject Characteristics (SC) working diagnosis
Subject Status (SS)
Standardized Health System Data Tables Tumor Identification (TU) Activity
PAYER_PLAN_PERIOD Tumor Results (TR) automated healthcare
COST Vital Signs (VS) automatic medical device
clinical process outcome evaluation
Standardized Derived Elements Findings About healthcare activity
DRUG_ERA Findings About (FA) healthcare activity directory
DOSE_ERA Skin Response (SR) healthcare activity element
CONDITION_ERA healthcare activity management
Trial Design Domains healthcare assessment
Results Schema Trial Arms (TA) healthcare communication
COHORT Trial Disease Assessment (TD) healthcare documenting
COHORT_DEFINITION Trial Elements (TE) healthcare evaluation
Trial Visits (TV) healthcare funds
Trial Inclusion/Exclusion Criteria (TI) healthcare investigation
Trial Summary (TS) healthcare needs assessment
healthcare planning
Relationship Datasets healthcare process evaluation Supplemental Qualifiers (SUPP-- datasets) healthcare provider activity
Related Records (RELREC) healthcare resource
healthcare resource management healthcare third party activity healthcare treatment
OMOP CDISC SDTM ISO13940 - ContSys
prescribed third party activity self-care activity
Process
adverse event
adverse event management clinical process
healthcare administration healthcare process
healthcare quality management healthcare service
healthcare service directory
Healthcare planning
care plan clinical guideline clinical pathway core care plan health objective
healthcare activities bundle healthcare goal
multi-professional care plan needed healthcare activity protocol
uniprofessional care plan
Time
clinical process episode contact
contact period episode of care
episodes of care bundle health approach health condition delay health condition period health related period healthcare activity delay healthcare acitivty period
healthcare acitivty period element healthcare appointment
indirect healthcare activity period initial contact
mandated period of care resource delay
self-care period
subject of care preference delay
Responsibility
authorization by law care period mandate clinical process mandate consent competence
continuity facilitator mandate demand for care
demand for initial contact demand mandate
dissent
healthcare activity mandate healthcare commitment healthcare mandate informed consent
mandate to export personal information reason for demand for care
referral request
subject of care desire
OMOP CDISC SDTM ISO13940 - ContSys
certificate related to a healthcare matter clinical report
discharge report
electronic health record component electronic health record extract electronic patient summary health concern
health record
health record component health record extract
healthcare information for import healthcare information request medium (duplicate)
non-ratified healthcare information personal health record
professional health record sharable data repository
summarized healthcare information repository
EHRCOM Data value
attachment value boolean value coded simple value coded value CV
data time value date value duration value
instance identifier value integer value
physical quantity value point in time value real value
simple text value string value time value URI value
EHRCOM Reference model
attestation information audit information base component cluster compostion content data value demographic cluster demographic element demographic entity demographic extract demographic folder demographic item element entry external link
extracted component set folder
item link section