An overview of the different tools currently available, with clear reference to the underlying statistical principles will be given

(1)

AIMS

This course aims at introducing bioengineers in the field of bioinformatics. The field of bioinformatics is very broad and encompasses a wide range of research topics: sequence analysis, data analysis of vast numbers of experimental data (high throughput data), database management etc. In a set of theoretical courses the statistical insights and principles underlying the bioinformatical methods will be explained and illustrated by describing a selected set of tools into detail.

An overview of the different tools currently available, with clear reference to the underlying statistical principles will be given.

Based on these theoretical insights, the bio-engineering should be able to understand and make use of new tools. The application of the theory by analysing the principles of a new tool, will be part of the course (theoretical analysis of the tool, practical use on an dataset, evaluation of the result).

This course is a good introduction to more specialised courses in genome analysis, molecular model building, drug design, etc.

CONTENT

Combinatorial methods:

Introduction: backtracking, tree searching, heuristic methods

Application: sequence alignment (Needlemann Wunsh, Smith Waterman) Blast, PSI-blast

Statistical Methods:

Introduction

Multivariate statistics Bayesian statistics

HMM (hidden markov models) NN (neural networks)

EM (expectation maximization) Optimization techniques

Examples: DNA and proteins

Repeats and CpG islands (HMM)

Analyzing high throughput data (Clustering)

Phylogenetic analysis: general introduction and overview of methods Bio-databases:

Architecture, types (relational, object-oriented and mixture models), database

administration systems, introduction to SQL, problems applied to biological databases, database querying (overview of the distinct databases on the internet).

Applications:

1. choice from one of the following topics

1 phylogenetic analysis (substitution model building, tree building and evaluation) 2 gene prediction, promoter prediction, prediction intron-exon boundaries

3. retrieval of protein domains, families (HMM)

(2)

4. retrieval of motifs (protein, DNA) (HMM, EM)

5. prediction of secondary protein structures (NN, HMM)

6. analysis of ligand binding sites and protein-protein interface (in silico 2 hybrid) 7. classification (NN, PCA, SVM)

8. genetic network inference (Bayesian networks) 9. querying databases

2. Use of an integrated web tool with the emphasis on the importance of linking distinct algorithms to obtain biologically relevant conclusions.

In Silico exercises Teaching activities:

Lectures and exercises on PC Material:

Bioinformatics. A practical guide to the analysis of genes and proteins. Baxevanis A.D. and Ouelllete B.F.F. (1998), John Wiley & Sons, Inc., New York.

Biological sequence analysis. Durbin, R., Eddy, S., Krogh, A., Mitchinson, G. (1998), Cambridge University press.

Bioinformatics: the machine learning approach. Brunak, S. 1998, MIT Press.