• No results found

Multiple Alignment

N/A
N/A
Protected

Academic year: 2021

Share "Multiple Alignment"

Copied!
14
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Bioinformatics

Multiple Alignment

(2)

Overview

• Introduction Multiple Alignments

• Global multiple alignment

– Introduction – Scoring

– Algorithms

(3)

Algorithms

Multiple Alignment

HMM

Pattern

recognition Dynamic

Programming Heuristic

Searches

Motif

Searches Database searches

Chapter 2

(4)

Introduction

• Global multiple alignment (ClustalW)

– Proteins, nucleotides

– Long stretches of conservation essential – Identification of protein family profiles – Score gaps

• Local multiple alignments (Motif Detection, Profile construction)

– Proteins, nucleotides

– Short stretches of conservation (12 NT, 6 AA) – Identification of regulatory motifs (DNA, protein) – No explicit gap scoring

– Explicit use of a profile

(5)

Introduction

Evolution

• duplication

• speciation Primary sequence

Homologs in related organisms Families of proteins Multiple sequence alignment

Features characteristic for the whole family

(6)

Introduction

Multiple sequence alignment

Features characteristic for the protein family

Profile (HMM)

Detect remote members of the family

Phylogeny

Reconstruct phylogenetic

relationships

(7)

Scoring a multiple alignment

Assumption:

– Independency between columns

– Residues within column independent (I.e. representative members of a sequence family should be chosen, all evolutionary subfamilies should be represented)

– Sequence score: score for all the columns and gaps

) (

)

( m = G + ∑ i S m i

S

(8)

Scoring

Sums of pair score is an approximation

But for tree-way alignment

SP problem:

– N sequences with L (score L is 5)

– N-1 sequences with L and one with G (score G is -4) )

1 )(

, ( )

( l

m i l

k

i k m i s

m

S

= ≤ S(a,b) from scoring matrix PAM or BLOSUM

) 2 )(

/

log( q c

q b q a

p abc instead of log( / ) log( / q c ) log( p ac / q a q c )( 3 ) q b

p bc q b

q a

p ab + +

2 / ) 1 ( 5×N N

)) 1 ( 9 ( 2 / ) 1 (

5×N N × N

N N N

N

5 18 2

/ ) 1 (

5

) 1 (

9 − − =

relative difference in score between the correct and the incorrect alignment decreases with the number of sequences in the alignment

RAL RTL CAL RAG a

b c

Counterintuitive !

(9)

Algorithm

Multidimensional dynamic programming Tedious formalism (optimal alignment)

• computation of the whole dynamic programming matrices L1,L2,…LN entries

• Maximize over all 2N-1 combinations of gaps in a column

• Time complexity (2N LN)

Clever algorithm : Carrillo & Lipman (MSA)

(10)

Algorithm

2 1) N(N−

Pairwise sequence

alignments

Multiple sequence

alignment

Progressive alignment “once a gap always a gap”

Similarity matrix

A B C

B 142

C 95 101

D 60 62 55

Progressive clustering

D C B A Guide tree

(11)

Algorithm

Progressive alignment methods

Hierarchical (heuristic): succession of pairwise alignments

• Two sequences are aligned by standard pairwise alignment

• This alignment is fixed

• Align next sequence

Different algorithms

– Order of the alignment – Progression:

» Alignment of a new sequence to a growing alignment

» Subfamilies are built up on a tree structure and alignments are aligned to alignments

– Process used to align and score sequences to alignments

Heuristic approach:

– Align most similar pairs of sequences first

– Most similar is based on a guide tree (quick and dirty and

unsuitable for phylogenetic inference)

(12)

Algorithm

Disadvantage

But it is advantageous to use position specific information from an existing alignment

e.g. mismatches at highly conserved positions should be penalized more than mismatches at variable positions

e.g. gap penalties might increase in regions which do not contain gaps as compared to regions which contain gaps

PROFILE ALIGNMENT

(hidden Markov, frequency matrices)

C T T G T C A T G T C A C T T C A T T G

=

75 . 0

25 . 0

0 0

75 . 0 5 . 0 0 0

25 . 0 0 0 25 . 0

0 25 . 0 0 75 . 0

0 25 . 0 1 0 φ

(13)

Algorithm

PROFILE based progressive multiple alignment : CLUSTALW

– Construct distance matrix by pairwise dynamic programming – Convert similarity scores to evolutionary distances

– Construct a guide tree (clustering, neighbour joining clustering) – Progressively align in order of decreasing similarity

– Sequence-sequence – Sequence-profile – Profile-profile

» Weighting to compensate for defects in SP

» Closely related: hard matrices (BLOSUM80), distant related soft matrices (BLOSUM50)

» Gap penalties adapted

– To hydrophobicity of the residue

– Gap-open and gap-extend penalties increased if there are

no gaps in a column

(14)

Algorithm

• Further improvement

– Iterative refinement

• Problem: progressive alignment: subalignments are frozen

• Solution:

– Iterative alignment: remove sequence from alignment and realign

– Repeat realignment until the alignment score

converges

Referenties

GERELATEERDE DOCUMENTEN

This file provides example of setting the alignment of \Columns on the page: right (default), left or center.. The column are 0.44\textwidth, and we

deciding whether the alignment is biologically relevant (two sequences are related) or whether the alignment occurred by chance3. •

alignments are aligned against alignments (e.g. when progressing a group of sequences has already been aligned. The question is how to add the next sequence to the alignment. In

Results: Here we present 1) ss-TEA, a method to identify specific ligand binding residue positions for any receptor, predicated on high quality sequence information. 2) The largest

sequences distance matrix pairwise alignment sequence-group alignment group-group alignment guide tree. final

 Step 5: Extend the HSPs with normalized score above S g (S g =22 bits) by ungapped alignment ; stop the extension when the score drops by X g (e.g., X g =40) under the best

common end alignment Driving matrix NE/WE => NE+WE/NE-WE after thermal transient PR alignment acts on reference mass. Local control on marionette

Galvo feedback on NE bench (07/2006) Galvo feedback on NE bench