• No results found

University of Groningen Integration techniques for modern bioinformatics workflows Kanterakis, Alexandros

N/A
N/A
Protected

Academic year: 2021

Share "University of Groningen Integration techniques for modern bioinformatics workflows Kanterakis, Alexandros"

Copied!
9
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

University of Groningen

Integration techniques for modern bioinformatics workflows

Kanterakis, Alexandros

IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.

Document Version

Publisher's PDF, also known as Version of record

Publication date: 2018

Link to publication in University of Groningen/UMCG research database

Citation for published version (APA):

Kanterakis, A. (2018). Integration techniques for modern bioinformatics workflows. University of Groningen.

Copyright

Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).

Take-down policy

If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.

Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.

(2)

Integration techniques for modern

bioinformatics workflows

(3)

flows. Thesis, University of Groningen, with summary in English, Greek and Dutch. The work in this thesis was financially supported by the University of Groningen Ubbo Emmius Fund, BBMRI-NL, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO project 184.021.007), and European Union Seventh Framework Programme (FP7/2007-2013) research projects BioSHaRE-EU (261433).

Printing of this thesis was financially supported by Rijksuniversiteit Groningen, Univer-sity Medical Center Groningen, Groningen UniverUniver-sity Institute for Drug Exploration (GUIDE) and NWO VIDI grant number 917.164.455.

The front cover features artwork from artist Theo van Doesburg (30 August 1883 – 7 March 1931) titled “Tekening”. The layout is based on the “classicthesis” template, Copyright ©2012 André Miede http://www.miede.de.

Printed by Netzodruk Groningen.

©2018 Alexandros Kanterakis. All rights reserved. No part of this book may be re-produced or transmitted in any form or by any means without permission of the author.

(4)

Integration techniques for modern

bioinformatics workflows

PhD thesis

to obtain the degree of PhD at the University of Groningen

on the authority of the Rector Magnificus Prof. E. Sterken

and in accordance with

the decision by the College of Deans. This thesis will be defended in public on

Wednesday 11 July 2018 at 9:00 hours

by

Alexandros Kanterakis

born on 30 July 1978 in Moschato, Greece

(5)

Prof. T.N. Wijmenga

Assessment committee: Prof. E.A. Valentijn Prof. J. Heringa Prof. H. Snieder

(6)

Paranymphs: Freerk van Dijk Dimitris Gakis

(7)
(8)

Contents

1 Introduction and Outline 9

1.1 Background . . . 9

1.2 Bioinformatics done right . . . 10

1.3 Integration as a service in genetics research . . . 16

1.4 Outline . . . 19

2 Creating transparent and reproducible pipelines: Best practices for tools, data and workflow management systems 25 2.1 Introduction . . . 26

2.2 Existing workflow environment . . . 28

2.3 What software should be part of a scientific workflow? . . . 29

2.4 Preparing data for automatic workflow analysis . . . 32

2.5 Quality criteria for modern workflow environments . . . 34

2.6 Benefits from integrated workflow analysis in Bioinformatics . . . 42

2.7 Discussion . . . 45

3 Population-specific genotype imputations using minimac or IMPUTE2 69 3.1 Introduction . . . 70

3.2 Methods . . . 71

3.3 Materials . . . 81

3.4 Procedure . . . 82

4 Molgenis-impute: imputation pipeline in a box 99 4.1 Background . . . 100

4.2 Methods . . . 102

4.3 Implementation . . . 104

4.4 Results and Discussion . . . 107

4.5 Supplementary information . . . 112

5 PyPedia: using the wiki paradigm as crowd sourcing environment for bioin-formatics protocols 121 5.1 Introduction . . . 122

(9)

5.3 Results . . . 134

5.4 Discussion . . . 138

5.5 Conclusions . . . 142

5.6 Supplementary Information . . . 143

6 MutationInfo: a tool to automatically infer chromosomal positions from dbSNP and HGVS genetic variants 157 6.1 Introduction . . . 158

6.2 Existing tools for resolving HGVS position . . . 162

6.3 Methods . . . 169

6.4 PharmGKB as a testing platform . . . 174

6.5 Analysis of HGVS variants with gene names . . . 183

6.6 Conclusions . . . 188

7 Discussion 197 7.1 Final notes in Imputation . . . 197

7.2 Integration as a vehicle towards clinical genetics . . . 206

Summary 225 Samenvatting 229 Περίληψη 233 Acknowledgments 237 List of publications 241 Curriculum vitae 245 8

Referenties

GERELATEERDE DOCUMENTEN

Genotype imputation allows the estimation of genotypes in a target data set, based on one or more available reference sets of SNPs, and it is based on searching common

The steps covered are aligning markers to the same genomic reference as the reference panel (by default hg19), applying quality control to check for genomic strand

Along with the source code, each article has sections that provide documentation, user parameters, under development code, unit tests and edit permissions of the method (See

To evaluate the ability of MutationInfo to locate the chromosomal positions of dbSNP and HGVS variants with sequence accession numbers, we analyzed the highly curated content

Finally, improved imputation accuracy was also measured for population-specific reference panels for the Ashkenazi Jewish [40], Sardinian (Italy) and Minnesotan (USA)

Without being exhaustive, we can place these tools in categories like simple scripts in modern interpreted languages, data visualization, tools for data annotation, validation

Bovendien beschrijf ik de vier belangrijkste praktische overwegingen die moeten worden aangepakt om een bio-informatica compo- nent (d.w.z. tools, data) zo bruikbaar mogelijk te

Στο κεφάλαιο 2, παρουσιάζω επίσης τα αναμενόμενα οφέλη από την υιοθέτηση αυτών των κατευθυντήριων γραμμών, τα σημαντικότερα από τα οποία είναι η αυξημένη