University of Groningen
Integration techniques for modern bioinformatics workflows
Kanterakis, Alexandros
IMPORTANT NOTE: You are advised to consult the publisher's version (publisher's PDF) if you wish to cite from it. Please check the document version below.
Document Version
Publisher's PDF, also known as Version of record
Publication date: 2018
Link to publication in University of Groningen/UMCG research database
Citation for published version (APA):
Kanterakis, A. (2018). Integration techniques for modern bioinformatics workflows. University of Groningen.
Copyright
Other than for strictly personal use, it is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), unless the work is under an open content license (like Creative Commons).
Take-down policy
If you believe that this document breaches copyright please contact us providing details, and we will remove access to the work immediately and investigate your claim.
Downloaded from the University of Groningen/UMCG research database (Pure): http://www.rug.nl/research/portal. For technical reasons the number of authors shown on this cover page is limited to 10 maximum.
Integration techniques for modern
bioinformatics workflows
flows. Thesis, University of Groningen, with summary in English, Greek and Dutch. The work in this thesis was financially supported by the University of Groningen Ubbo Emmius Fund, BBMRI-NL, a research infrastructure financed by the Netherlands Organization for Scientific Research (NWO project 184.021.007), and European Union Seventh Framework Programme (FP7/2007-2013) research projects BioSHaRE-EU (261433).
Printing of this thesis was financially supported by Rijksuniversiteit Groningen, Univer-sity Medical Center Groningen, Groningen UniverUniver-sity Institute for Drug Exploration (GUIDE) and NWO VIDI grant number 917.164.455.
The front cover features artwork from artist Theo van Doesburg (30 August 1883 – 7 March 1931) titled “Tekening”. The layout is based on the “classicthesis” template, Copyright ©2012 André Miede http://www.miede.de.
Printed by Netzodruk Groningen.
©2018 Alexandros Kanterakis. All rights reserved. No part of this book may be re-produced or transmitted in any form or by any means without permission of the author.
Integration techniques for modern
bioinformatics workflows
PhD thesis
to obtain the degree of PhD at the University of Groningen
on the authority of the Rector Magnificus Prof. E. Sterken
and in accordance with
the decision by the College of Deans. This thesis will be defended in public on
Wednesday 11 July 2018 at 9:00 hours
by
Alexandros Kanterakis
born on 30 July 1978 in Moschato, Greece
Prof. T.N. Wijmenga
Assessment committee: Prof. E.A. Valentijn Prof. J. Heringa Prof. H. Snieder
Paranymphs: Freerk van Dijk Dimitris Gakis
Contents
1 Introduction and Outline 9
1.1 Background . . . 9
1.2 Bioinformatics done right . . . 10
1.3 Integration as a service in genetics research . . . 16
1.4 Outline . . . 19
2 Creating transparent and reproducible pipelines: Best practices for tools, data and workflow management systems 25 2.1 Introduction . . . 26
2.2 Existing workflow environment . . . 28
2.3 What software should be part of a scientific workflow? . . . 29
2.4 Preparing data for automatic workflow analysis . . . 32
2.5 Quality criteria for modern workflow environments . . . 34
2.6 Benefits from integrated workflow analysis in Bioinformatics . . . 42
2.7 Discussion . . . 45
3 Population-specific genotype imputations using minimac or IMPUTE2 69 3.1 Introduction . . . 70
3.2 Methods . . . 71
3.3 Materials . . . 81
3.4 Procedure . . . 82
4 Molgenis-impute: imputation pipeline in a box 99 4.1 Background . . . 100
4.2 Methods . . . 102
4.3 Implementation . . . 104
4.4 Results and Discussion . . . 107
4.5 Supplementary information . . . 112
5 PyPedia: using the wiki paradigm as crowd sourcing environment for bioin-formatics protocols 121 5.1 Introduction . . . 122
5.3 Results . . . 134
5.4 Discussion . . . 138
5.5 Conclusions . . . 142
5.6 Supplementary Information . . . 143
6 MutationInfo: a tool to automatically infer chromosomal positions from dbSNP and HGVS genetic variants 157 6.1 Introduction . . . 158
6.2 Existing tools for resolving HGVS position . . . 162
6.3 Methods . . . 169
6.4 PharmGKB as a testing platform . . . 174
6.5 Analysis of HGVS variants with gene names . . . 183
6.6 Conclusions . . . 188
7 Discussion 197 7.1 Final notes in Imputation . . . 197
7.2 Integration as a vehicle towards clinical genetics . . . 206
Summary 225 Samenvatting 229 Περίληψη 233 Acknowledgments 237 List of publications 241 Curriculum vitae 245 8