• No results found

Gibbs sampling for motif finding

N/A
N/A
Protected

Academic year: 2021

Share "Gibbs sampling for motif finding"

Copied!
24
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Gibbs sampling for motif finding

Yves Moreau

(2)

2

Overview

Markov Chain Monte Carlo

Gibbs sampling

Motif finding in cis-regulatory DNA

Biclustering microarray data

(3)

3

Markov Chain Monte-Carlo

Markov chain with transition matrix T

)

|

( X

1

j X i P

T

ij

t

t

A C G T A 0.0643 0.8268 0.0659 0.0430 C 0.0598 0.0484 0.8515 0.0403 G 0.1602 0.3407 0.1736 0.3255 T 0.1507 0.1608 0.3654 0.3231

X=A

X=C X=G

X=T

(4)

4

Markov Chain Monte-Carlo

Markov chains can sample from complex distributions

ACGCGGTGTGCGTTTGACGA ACGGTTACGCGACGTTTGGT ACGTGCGGTGTACGTGTACG ACGGAGTTTGCGGGACGCGT ACGCGCGTGACGTACGCGTG AGACGCGTGCGCGCGGACGC ACGGGCGTGCGCGCGTCGCG AACGCGTTTGTGTTCGGTGC ACCGCGTTTGACGTCGGTTC ACGTGACGCGTAGTTCGACG ACGTGACACGGACGTACGCG ACCGTACTCGCGTTGACACG ATACGGCGCGGCGGGCGCGG ACGTACGCGTACACGCGGGA ACGCGCGTGTTTACGACGTG ACGTCGCACGCGTCGGTGTG ACGGCGGTCGGTACACGTCG ACGTTGCGACGTGCGTGCTG ACGGAACGACGACGCGACGC ACGGCGTGTTCGCGGTGCGG

AC G T

%

Position

(5)

5

Markov Chain Monte-Carlo

Let us look at the transition after two steps

Similarly, after n steps

T T T

T T

i X k X

P k X

j X

P

i X k X

P i X k X

j X

P i

X j X

P T

S k

kj ik S k

t t

t t

S k

t t

t t

t t

t ij

.

)

| (

)

| (

)

| (

) ,

| (

)

| (

) 2 (

1 1

1 1

2 1

1 1

2 2

) 2 (

( )n

(

t n

|

t

)

n

TP X

XT

(6)

6

Markov Chain Monte-Carlo

Stationary distribution 

If the samples are generated to the distribution , the samples at the next step will also be generated according to 

 is a left eigenvector of T

Equilibrium distribution

Rows of T

are stationary distributions

 From an arbitrary initial condition and after a sufficient number of steps (burn-in), the successive states of the Markov chains are

samples from a stationary distribution

T

T T

T T T

T T

n n

n n

lim

1

lim 0.1188 0.0643 0.8268 0.0659 0.0430 0.1188 0.2788 0.0598 0.0484 0.8515 0.0403 0.2788

. =

0.3905 0.1602 0.3407 0.1736 0.3255 0.3905 0.2119 0.1507 0.1608 0.3654 0.3231 0.2119

  

T

  

     

     

     

     

     

     

(7)

7

Detailed balance

A sufficient condition for the Markov chain to converge to the stationary distribution p is that they satisfy the

condition of detailed balance

Proof:

Problem: disjoint regions in probability space , ,

i ij j ji

p Tp Ti j

 

i j ji i ij i ij i

,

j j j

pT   p T   p TpTpi

(8)

8

Gibbs sampling

Markov chain for Gibbs sampling

1

1 1

0 0 0

( | , )

1

( | , )

1 1

( | , )

1 1 1

( , , ) ( | , ) ( | , ) ( | , ) ( , , )

( , , ) ( , , )

( , , )

i i

i i

i i

P A B b C c

i i i i

P B A a C c

i i i i

P C A a B b

i i i i

P A B C P A B C P B A C P C A B a b c

a a b c

b a b c

c a b c

  

 

 



(9)

9

Gibbs sampling

Detailed balance

Detailed balance for the Gibbs sampler

Prove detailed balance

Bayes’ rule

Q.E.D.

1 1 1 1

1 1 1 1 1 1

( , , ) ( | , , , , , )

( , , , , , , ) ( | , , , , , )

n i i i n

i i i n i i i n

P x x P x x x x x

P x x x x x P x x x x x

   

    

( ) ( | ) ( ) ( | ), , P xy xP yx yx y

1 1 1

( | ) y x P x x ( | , ,

i

x

i

, x

i

, , ) x

n

 

1

 

( ) ( , , )

n

P xP xx

1 1 1 1 1 1 1

1 1 1 1 1 1

( , , ) ( , , , , , , ) / ( , , , , , ) ( , , , , , , ) ( , , ) / ( , , , , , )

n i i i n i i n

i i i n i n i i n

P x x P x x x x x P x x x x

P x x x x x P x x P x x x x

     

     

(10)

10

Data augmentation Gibbs sampling

Introducing unobserved variables often simplifies the expression of the likelihood

A Gibbs sampler can then be set up

Samples from the Gibbs sampler can be used to estimate parameters

( , | ) ( | , ) ( | , )

( | , ) ( | , )

model parameters, missing data, data

i j

i j

P M D P M D P M D

P M D P M D

M D

  

 

 

  

PME

1

( | ) ( , | ) 1

N k

M k

E D P M D dMd

N

     

     

(11)

11

Pros and cons

Pros

Clear probabilistic interpretation

Bayesian framework

“Global optimization”

Cons

Mathematical details not easy to work out

Relatively slow

(12)

12

Motif finding

(13)

13

Gibbs sampler

Gibbs sampling for motif finding

Set up a Gibbs sampler for the joint probability of the motif matrix and the alignment given the sequences

Sequence by sequence

Lawrence et al.

One motif of fixed length

One occurrence per sequence

Background model based on single nucleotides

Too sensitive to noise

Lots of parameter tuning

( , | ) ( | , ) ( | , )

motif matrix, alignment, sequences

P A S P A S P A S

A S

  

 

) ,

| ( )

,

|

(

1 i i i

K

i

P a S

S A

P  

(14)



 



 

2 . 0 05 . 0 9

. 0 05 . 0 04 . 0 05 . 0 05 . 0 1 . 0

4 . 0 9

. 0 05 . 0 9

. 0 03 . 0 05 . 0 04 . 0 2 . 0

2 . 0 02 . 0 01 . 0 01 . 0 8

. 0 1

. 0 9

. 0 4 . 0

2 . 0 03 . 0 04 . 0 04 . 0 03 . 0 8

. 0 01 . 0 3 . 0

NCACGTGN

: model Motif

T G C A



 



 

28 . 0

24 . 0

16 . 0

32 . 0

model Background

T G C A

1 2

0 Motif

( | ,

W

, )

bg bg

P S aBP P P

Motif , 1

1 x j

W

j b j

P q

 

 

1

1 0,

1 j

a

bg b

j

P

q

 

Translation start 500 bp

2 0, j

L

bg b

j a W

P q

 

 

(15)

15

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(16)

16

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(17)

17

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment and of the motif matrix

 

1

1

1

,

0 1 0,

, ,

( | , ) ( ) ( | , )

l i

l i

l l W

W i b

W

i b

x b b

P x S W x P x S

 

 

 

 

 

  

 

(18)

18

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(19)

19

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(20)

20

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(21)

21

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Convergence of the alignment

and of the motif matrix

(22)

22

Gibbs motif finding

Initialization

Sequences

Random motif matrix

Iteration

Sequence scoring

Alignment update

Motif instances

Motif matrix

Termination

Stabilization of the motif matrix

(not of the alignment)

(23)

23

Motif Sampler (extended Gibbs sampling)

Model

One motif of fixed length per round

Several occurrences per sequence

Sequence have a discrete probability distribution over the number of copies of the motif (under a maximum bound)

Multiple motifs found in successive rounds by masking occurrences of previous motifs

Improved background model based on oligonucleotides

Gapped motifs

(24)



 



 

2 . 0 05 . 0 9

. 0 05 . 0 04 . 0 05 . 0 05 . 0 1 . 0

4 . 0 9

. 0 05 . 0 9

. 0 03 . 0 05 . 0 04 . 0 2 . 0

2 . 0 02 . 0 01 . 0 01 . 0 8

. 0 1

. 0 9

. 0 4 . 0

2 . 0 03 . 0 04 . 0 04 . 0 03 . 0 8

. 0 01 . 0 3 . 0

NCACGTGN

: model Motif

T G C

A ( | ... )

model Background

2

1 j j m

j

j

b b b

b

P

0

Motif 1

( | , , , )

c i i

m bg bg

i

P S a cB P P P

 

Motif , 1

1 ai j i W

j b j

P q

 

 

x

m j

m j j

j m

bg P b b P b b b

P

1

1 1

0 ( ,..., ) ( | ... )

1

1 1

( | ... )

i

i

a i

bg j j j m

j a w

P

P b b

b

  

 

Translation start

500 bp

Referenties

GERELATEERDE DOCUMENTEN

Section 1.4 recalls a few facts from large-deviation theory for trajectories of the magnetization in the Curie-Weiss model subjected to infinite-temperature spin-flip dynamics

The upstream sequences of the genes belonging to a cluster are automatically retrieved from GenBank and can be fed directly into Motif Sampler, a Gibbs sampling algorithm that

 Integrated Clustering, Upstream sequence retrieval, and motif

[r]

Bayesian models for microarray data analysis and Gibbs sampling 4.1 Basic ideas of Bayesian methods: posterior  likelihood * prior 4.2 Applying Bayesian models on microarray

limitation of the traditional methods, and an overview of popular clustering algorithms applied for microarray data analysis3. 1.4 Biclusterin algorithms: the

Putting together data obtained by from microarray experiments under different experimen- tal conditions (which can be different tissues, time points, or environmental

We study the Gibbs properties of the transformed (time-evolved) system μ t,N obtained upon application of infinite- temperature diffusive dynamics to the initial Gibbsian