• No results found

An overview of the different tools currently available, with clear reference to the underlying statistical principles will be given

N/A
N/A
Protected

Academic year: 2021

Share "An overview of the different tools currently available, with clear reference to the underlying statistical principles will be given"

Copied!
1
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

AIMS

This course aims at introducing bioengineers in the field of bioinformatics. The field of bioinformatics is very broad and encompasses a wide range of research topics: sequence analysis, data analysis of vast numbers of experimental data (high throughput data), database management etc. In a set of theoretical courses the statistical insights and principles underlying the bioinformatical methods will be explained and illustrated by describing a selected set of tools into detail.

An overview of the different tools currently available, with clear reference to the underlying statistical principles will be given.

Based on these theoretical insights, the bio-engineering should be able to understand and make use of new tools. The application of the theory by analysing the principles of a new tool, will be part of the course (theoretical analysis of the tool, practical use on an dataset, evaluation of the result).

This course is a good introduction to more specialised courses in genome analysis, molecular model building, drug design, etc.

CONTENT

Combinatorial methods:

Introduction: backtracking, tree searching, heuristic methods

Application: sequence alignment (Needlemann Wunsh, Smith Waterman) Blast, PSI-blast

Statistical Methods:

Introduction

Multivariate statistics Bayesian statistics

HMM (hidden markov models) NN (neural networks)

EM (expectation maximization) Optimization techniques

Examples: DNA and proteins

Repeats and CpG islands (HMM)

Analyzing high throughput data (Clustering)

Phylogenetic analysis: general introduction and overview of methods Bio-databases:

Architecture, types (relational, object-oriented and mixture models), database

administration systems, introduction to SQL, problems applied to biological databases, database querying (overview of the distinct databases on the internet).

Applications:

1. choice from one of the following topics

1 phylogenetic analysis (substitution model building, tree building and evaluation) 2 gene prediction, promoter prediction, prediction intron-exon boundaries

3. retrieval of protein domains, families (HMM)

(2)

4. retrieval of motifs (protein, DNA) (HMM, EM)

5. prediction of secondary protein structures (NN, HMM)

6. analysis of ligand binding sites and protein-protein interface (in silico 2 hybrid) 7. classification (NN, PCA, SVM)

8. genetic network inference (Bayesian networks) 9. querying databases

2. Use of an integrated web tool with the emphasis on the importance of linking distinct algorithms to obtain biologically relevant conclusions.

In Silico exercises Teaching activities:

Lectures and exercises on PC Material:

Bioinformatics. A practical guide to the analysis of genes and proteins. Baxevanis A.D. and Ouelllete B.F.F. (1998), John Wiley & Sons, Inc., New York.

Biological sequence analysis. Durbin, R., Eddy, S., Krogh, A., Mitchinson, G. (1998), Cambridge University press.

Bioinformatics: the machine learning approach. Brunak, S. 1998, MIT Press.

Referenties

GERELATEERDE DOCUMENTEN

• great participation by teachers and departmental heads in drafting school policy, formulating the aims and objectives of their departments and selecting text-books. 5.2

Neethling van Stellenbosch in die vyftigerjare van die vorige feu die Transvaalse gemeentes besoek en aan die hand gedoen dat op die plek waar Middelburg tans gelee is, 'n dorp

The data surrounding these dimensions were then compared to organisational performance (using the research output rates) to determine whether there were

To this end, Project 1 aims to evaluate the performance of statistical tools to detect potential data fabrication by inspecting genuine datasets already available and

d. An S-shape shows that the relative distances between quantiles in the tails of F 0 are bigger than in the distribution of the data. Yes, that is plausible since the QQ-plot shows

Bij het inrijden van de fietsstraat vanaf het centrum, bij de spoorweg- overgang (waar de middengeleidestrook voor enkele meters is vervangen door belijning) en bij het midden

To improve the information retrieval process and provide the user of the CHI system with more relevant information about available data resources the RDF metadata has to be related

In particular, we compare the steady state error and convergence rate performance of the proposed Detection guided partial crosstalk cancellation algorithm