• No results found

Computational methods for data discovery, harmonization and integration: Using lexical and semantic matching with an application to biobanking phenotypes

N/A
N/A
Protected

Academic year: 2021

Share "Computational methods for data discovery, harmonization and integration: Using lexical and semantic matching with an application to biobanking phenotypes"

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Computational methods for data discovery, harmonization and integration Pang, Chao

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Pang, C. (2018). Computational methods for data discovery, harmonization and integration: Using lexical and semantic matching with an application to biobanking phenotypes. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Propositions

1. The fact that we use human language when capturing scientific data

inevitably introduces heterogeneity.

2. To realize the promise of personalized medicine we need to bridge heterogeneity and enable large scale integrated analysis …. but

3. Manually harmonizing biobank data to enable integrated analysis is (too) complex and time-consuming (bioshare consortium).

4. Full automation of data harmonization not yet possible because

computational representation of knowledge is incomplete .... however

5. Semi-automatic systems allow users to more efficiently harmonize data and generate high quality training data for machine learning approaches.

6. Machine learning promises the ultimate solution to enable full

automation for the harmonization challenges.

7. Healthcare data needs to be coded using standard vocabularies or

ontologies to unleash its values.

8. Implementation of the FAIR principles is essential to enable discovery

and reuse of scientific knowledge and data as a basis for reproducible

science.

9. The difference between a data scientist and a data engineer is the understanding of the domain knowledge.

10. “If we want to harmonize data, we need to harmonize people first.” (BioSHaRE consortium)

Referenties

GERELATEERDE DOCUMENTEN

the programs INVLAP and INVZTR transform the list PREPARFRAC into a list of functions of which the sum is the inverse Laplace transform or the inverse z-transform of the

Figure 3 shows how the link between the different heterogeneous data sources to the conceptual model can be used to provide transparency, eliminate ambi- guity, and increase

Welk bod is voor A het voordeligst?. Berekening

In deze bijlage geeft het Zorginstituut een opsomming van de stand van zaken van de uitvoering van de activiteiten die zijn beschreven in het plan van aanpak voor de uitvoering

Net als angst voor spinnen is een negatieve of ongeïnteresseer- de houding ten opzichte van de natuur niet genetisch bepaald, maar wordt hij door volwassenen doorgegeven.. Bij de

Given that the effect of gender is context-dependent and that the variable is usually included in an analytical model as one of multiple determinants, each of which may capture part

Reading The Mill on the Floss with Bakthin’s theory in mind suggests that Eliot uses the intrusive voice of her narrator as a perspective against which she is able to transmit her

To answer this question three empirical models are constructed: a static log-log model to investigate whether there is a contemporary relationship between natural gas prices