Gibbs biclustering of microarray data
Yves Moreau
Microarray cost per expression measurement
Budgets and expertise
Publicly available microarray data
Need for exchange standards & repositories
Big consortia set up big microarray projects
Genome projects “transcriptome” projects (= compendia)
Change in microarray projects ( sequence analysis)
Analyze public data first to generate an hypothesis
Design and perform your own microarray experiment
From genome projects to
transcriptome projects
Data becomes more heterogeneous
Gene clustering
Group genes that behave similarly over all conditions
Gene biclustering
Group genes that behave similarly over a subset of conditions
“Feature selection”
More suitable
for heterogeneous compendium
Why biclustering?
Distribution of expression values for a given gene
High Medium Low
Bicluster
Discretized microarray data set
Discretizing microarray data
Microarray data is continuous
Discretize by equal frequency
ge ne s
conditions
Bicluster
Likelihood
0 1
Background Pattern
Likelihood 0
1
.9.9.9.9.9
.9.05.9.9.9
.9.9.9.9.9
.05.9.9.9.9
.9.9.9.9.05
( | , , )
P D g c
Likelihood 0
1
.9.05.05.05.9
.05.9.9.05.05
.05.05.05.05.05
.05.05.9.9.05
( | ', , ) ( | , , ) P D g c
P D g c
Get the right genes
Likelihood 0
1
.9.9.05.05.9
.9.05.05.9.9
.9.9 .05 .05.9
.05.9.05 .05.9
.9.9 .05 .05.05
( | , ', ) ( | , , ) P D g c
P D g c
Get the right conditions
Likelihood 0
1
.6.6.2.2.6
.6.2.2.2.6
.6.6.2.2.6
.2.6.2.2.6
.2.6.2.2.2
( | , , ') ( | , , ) P D g c
P D g c
Get the right frequency pattern
Optimizing the bicluster
Find the right bicluster
Genes
Conditions
Pattern
For a given choice of genes and conditions, the “best” pattern is given by the frequencies found in the extracted pattern
No more need to optimize over the pattern
Maximum likelihood : find genes and conditions that maximize
Gibbs sampling: find genes and conditions that optimize
( | , ) P D g c
( , | )
P g c D
Gibbs sampling
Current configuration
1 1
( 1| , , )?
P g g c D
2 2
( 1| , , )?
P g g c D
Next gene configuration
3 3
( 1| , , )?
P g g c D
Updated gene configuration
Next complete configuration
iterate many times
Gibbs biclustering
( , | ) ( |
i i, , ) ( | , , )
j ji j
P g c D P g g c D P c c g D
Simulated data
Remarks
Gibbs biclustering allows noisy patterns
Optimized configuration is obtained by averaging successive iterated configurations
Biclustering is oriented
Find subset of samples for which a subset of genes is consistenly expressed across genes
Find subset of genes that are consistently expressed across a subset of samples
Searching for multiple patterns
For gene biclustering, remove the data of the genes from the current bicluster
Search for a new pattern