• No results found

Gibbs biclustering of microarray data

N/A
N/A
Protected

Academic year: 2021

Share "Gibbs biclustering of microarray data"

Copied!
27
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Gibbs biclustering of microarray data

Yves Moreau

(2)

 Microarray cost per expression measurement 

 Budgets and expertise 

 Publicly available microarray data 

 Need for exchange standards & repositories

 Big consortia set up big microarray projects

 Genome projects  “transcriptome” projects (= compendia)

 Change in microarray projects ( sequence analysis)

Analyze public data first to generate an hypothesis

Design and perform your own microarray experiment

From genome projects to

transcriptome projects

(3)

 Data becomes more heterogeneous

Gene clustering

Group genes that behave similarly over all conditions

Gene biclustering

Group genes that behave similarly over a subset of conditions

“Feature selection”

More suitable

for heterogeneous compendium

Why biclustering?

(4)

Distribution of expression values for a given gene

High Medium Low

Bicluster

 Discretized microarray data set

 Discretizing microarray data

Microarray data is continuous

Discretize by equal frequency

ge ne s

conditions

(5)

Bicluster

(6)

Likelihood

0 1

Background Pattern

(7)

Likelihood 0

1





.9.9.9.9.9





.9.05.9.9.9







.9.9.9.9.9

.05.9.9.9.9







.9.9.9.9.05











( | , , )

P D g c  

(8)

Likelihood 0

1





.9.05.05.05.9







.05.9.9.05.05



.05.05.05.05.05





.05.05.9.9.05







( | ', , ) ( | , , ) P D g c

P D g c

 











Get the right genes

(9)

Likelihood 0

1





.9.9.05.05.9





.9.05.05.9.9







.9.9 .05 .05.9

.05.9.05 .05.9







.9.9 .05 .05.05











( | , ', ) ( | , , ) P D g c

P D g c

 

Get the right conditions

(10)

Likelihood 0

1





.6.6.2.2.6





.6.2.2.2.6







.6.6.2.2.6

.2.6.2.2.6







.2.6.2.2.2











( | , , ') ( | , , ) P D g c

P D g c

 

Get the right frequency pattern

(11)

Optimizing the bicluster

 Find the right bicluster

Genes

Conditions

Pattern

 For a given choice of genes and conditions, the “best” pattern is given by the frequencies found in the extracted pattern

No more need to optimize over the pattern

 Maximum likelihood : find genes and conditions that maximize

Gibbs sampling: find genes and conditions that optimize

( | , ) P D g c

( , | )

P g c D

(12)

Gibbs sampling

Current configuration

1 1

( 1| , , )?

P gg c D

2 2

( 1| , , )?

P gg c D

Next gene configuration

3 3

( 1| , , )?

P gg c D

(13)

Updated gene configuration

Next complete configuration

 iterate many times

(14)

Gibbs biclustering

( , | ) ( |

i i

, , ) ( | , , )

j j

i j

P g c D   P g g c DP c c g D

(15)

Simulated data

(16)

Remarks

 Gibbs biclustering allows noisy patterns

 Optimized configuration is obtained by averaging successive iterated configurations

 Biclustering is oriented

Find subset of samples for which a subset of genes is consistenly expressed across genes

Find subset of genes that are consistently expressed across a subset of samples

 Searching for multiple patterns

For gene biclustering, remove the data of the genes from the current bicluster

Search for a new pattern

Stop if only empty pattern repeatedly found

(17)

Multiple biclusters

(18)

Leukemia fingerprints

(19)

Mixed-Lineage Leukemia

 Armstrong et al., Nature Genetics, 2002

 Mixed-Lineage Leukemia (MLL) is a subtype of ALL

 Caused by chromosomal rearrangement in MLL gene

 Poorer prognosis than ALL

 Microarray analysis shows that MLL is distinct from ALL

 FLT3 tyrosine kinase distinguishes most strongly between MLL, ALL, and AML

 Candidate drug target

(20)

 PCA Features

(21)

Biclustering leukemia data

 Bicluster patients

 Find patients for which a subset of genes has a consistent expression profile across this group of patients

 Discovery set

 21 ALL, 17 MLL, 25 AML

 Validation set

 3 ALL, 3 MLL, 3 AML

(22)

Discovering ALL

 Bicluster 1: 18 out of 21 ALL patients

(23)

Discovering MLL

 Bicluster 2: 14 out of 17 MLL patients

(24)

Discovering AML

 Bicluster 3: 19 out of 25 AML patients

(25)

Rescoring ALL

(26)

Rescoring MLL

(27)

Rescoring AML

Referenties

GERELATEERDE DOCUMENTEN

– different image analysis software suites – different ‘treatment’ of raw data.. – different analysis of treated data by software suites (Spotfire, GeneSpring,

The model has three parts: (i) gene annotation, which may be given as links to gene sequence databases, (ii) sample annotation, for which there currently are no public

High-throughput experiments allow measuring the expression levels of mRNA (genomics), protein (proteomics) and metabolite compounds (metabolomics) for thousands of

 Kies het aantal clusters K en start met willekeurige posities voor K centra.

We present an extension of the Gibbs sampling method for motif finding that enables the use of higher-order models of the sequence background.. Gibbs sampling makes it possible

In particular, Gibbs sampling has become a popular alternative to the expectation- maximization (EM) for solving the incomplete-data problem, where the asso- ciated random variables

Bayesian models for microarray data analysis and Gibbs sampling 4.1 Basic ideas of Bayesian methods: posterior  likelihood * prior 4.2 Applying Bayesian models on microarray

An extensive evaluation of the ProBic algorithm was performed on synthetic data to investigate the behavior of the algorithm under various parameter settings and input data. We