UvA-DARE is a service provided by the library of the University of Amsterdam (https://dare.uva.nl)
UvA-DARE (Digital Academic Repository)
Scientific workflow design : theoretical and practical issues
Terpstra, F.P.
Publication date
2008
Link to publication
Citation for published version (APA):
Terpstra, F. P. (2008). Scientific workflow design : theoretical and practical issues.
General rights
It is not permitted to download or to forward/distribute the text or part of it without the consent of the author(s) and/or copyright holder(s), other than for strictly personal, individual use, unless the work is under an open content license (like Creative Commons).
Disclaimer/Complaints regulations
If you believe that digital publication of certain material infringes any of your rights or (privacy) interests, please let the Library know, stating your reasons. In case of a legitimate complaint, the Library will make the material inaccessible and/or remove it from the website. Please Ask the Library: https://uba.uva.nl/en/contact, or a letter to: Library of the University of Amsterdam, Secretariat, Singel 425, 1012 WP Amsterdam, The Netherlands. You will be contacted as soon as possible.
Contents
1 Introduction 1 1.0.1 Grid . . . 1 1.1 Virtual Laboratory . . . 3 4 1.3 Sharing Resources . . . 5 2 Methodology 9 2.1 Introduction . . . 9 2.2 Methodology of Science . . . 9 2.2.1 Philosophical issues . . . 11 2.3 Methodology of e-Science . . . 13 2.3.1 Definition of e-Science . . . 132.3.2 Empirical Cycle for e-Science . . . 15
2.4 Differences Science and e-Science . . . 19
2.5 Future Scenario e-Science . . . 22
2.6 Conclusion . . . 23
3 Workflow Design Space 25 3.1 Introduction . . . 25
3.2 Related Work . . . 25
3.3 Workflow Design . . . 26
3.3.1 Workflow design using abstraction . . . 27
3.4 Theoretical limits of Workflow design . . . 30
3.4.1 Building blocks . . . 30
3.4.2 Workflow construction . . . 31
3.4.3 Complex workflow construction . . . 32
3.4.4 Workflow design limits . . . 35
3.5 Discussion . . . 38 3.6 Conclusions . . . 40 4 Workflow formalisms 41 4.1 Introduction . . . 41 4.2 Problem domains . . . 42 iii
iv CONTENTS
4.3 Formalisms . . . 43
4.3.1 Overview . . . 45
4.4 Discussion . . . 49
4.5 Conclusions & future work . . . 50
5 Workflow Systems Analysis 53 5.1 Introduction . . . 53
5.2 Scientific Workflow Management Systems . . . 54
5.2.1 Workflow lifecycle . . . 54
5.2.2 Workflow model . . . 55
5.2.3 Workflow engine . . . 56
5.2.4 User support . . . 57
5.3 State of the art . . . 61
5.4 Towards a shared software resource . . . 62
6 Data Assimilation 65 6.1 Introduction . . . 65
6.2 Weather Prediction . . . 65
6.3 Data Assimilation Algorithms . . . 67
6.3.1 Observation Data . . . 68 6.3.2 Computational Model . . . 69 6.3.3 State Estimate . . . 69 6.3.4 Prediction . . . 69 6.3.5 Estimator . . . 69 6.3.6 Use of ensembles . . . 70
6.4 Data assimilation toolkits . . . 70
6.4.1 Overview of Toolkits . . . 71
6.4.2 Toolkits in detail . . . 71
6.4.3 Grid use . . . 73
6.5 Conclusion . . . 74
7 Data Assimilation Case Studies 75 7.1 Introduction . . . 75
7.2 Bird migration model . . . 75
7.2.1 Data . . . 76
7.2.2 Model . . . 76
7.2.3 Estimator . . . 77
7.2.4 Experiment . . . 77
7.2.5 Conclusions bird migration . . . 81
7.3 Traffic Forecasting . . . 84
7.3.1 Intelligent Transport Systems . . . 84
7.3.2 Prediction for ITS . . . 85
7.3.3 Current solution . . . 88
7.3.5 Conclusions for traffic prediction . . . 103
7.4 Conclusion . . . 103
8 Ideal Workflow for Data Assimilation 105 8.1 Introduction . . . 105 8.2 Workflow representation . . . 105 8.3 Workflow composition . . . 107 8.3.1 Defining data . . . 107 8.3.2 Defining resources . . . 108 8.3.3 Defining goals . . . 109 8.3.4 Provenance . . . 109 8.3.5 Partial workflows . . . 110 8.3.6 Dissemination . . . 110 8.3.7 Meta workflows . . . 111
8.4 Workflow Design methodology for Data Assimilation . . . 111
8.4.1 Shared software resource . . . 113
8.4.2 Methodology . . . 113 8.4.3 Data preparation . . . 114 8.4.4 State Estimation . . . 117 8.4.5 Model . . . 118 8.4.6 Workflow Patterns . . . 120 8.5 Optimization . . . 122
8.6 Requirements for Scientific Workflow Management Systems . 122 8.6.1 Meta-data . . . 124
8.6.2 Expressivity . . . 124
8.6.3 Composition . . . 125
8.6.4 Grid support . . . 125
8.7 Overview of features in existing SWMS . . . 126
8.8 Discussion & conclusion . . . 128
9 Conclusions & Future work 131 9.1 Introduction . . . 131
9.2 Work Performed . . . 131
9.3 Role of Workflow in e-Science . . . 132
9.3.1 Resource Sharing . . . 132
9.3.2 Dissemination and publishing . . . 133
9.3.3 Reproducibility . . . 133
9.3.4 Workflow design . . . 133
9.4 Current state of Workflow in e-Science . . . 134
9.4.1 Workflow design . . . 134
9.4.2 Formalisms for Workflow . . . 134
9.4.3 Scientific Workflow Management Systems . . . 134
9.4.4 Sharing of resources . . . 135
vi CONTENTS 9.5 Future Work . . . 136 9.5.1 Standardization . . . 136 9.5.2 Connectivity . . . 137 9.5.3 Data assimilation in SWMS . . . 137 9.5.4 Formalisms . . . 138 A List of abbreviations 139
B Turing Completeness I/O Automata 143
Acknowledgements 155
Summary 157