GIBBS SAMPLING
The posterior distribution for which we obtain samples via the Gibbs sampling procedure can be written as:
) ,
| , ,
( g c θ X ψ p
Here, g is the vector of binary gene labels, c the vector of binary condition labels, θ the vector of model parameters, ψ the total set of hyperparameters (κ, ν, s, ϕ, ξ
g0, ξ
g1, ξ
c0, ξ
c1
), X the expression data matrix (with rows referring to genes, and columns to conditions), Β(ξ
g1, ξ
g0) the Beta prior distribution on the probability that a gene belongs to the bicluster, Β(ξ
c1, ξ
c0) the Beta prior distribution on the probability that a condition belongs to the bicluster,
Derivation of full conditional Bernouilli distribution for gene and condition labels
1. Assume that the prior probability that a gene (condition) label equals one is given by a Beta distribution:
)
, and )
(
g1 g0B ξ ξ B ( ξ
c1, ξ
c02. The probability distribution of a collection g of n binary gene labels is then found by integrating out over this (conjugate) Beta prior distribution:
) (
) (
) (
) , , ( )
1 (
) , , ( ) , (
) , , ( )
| (
) , (
1 0
1 1 0 1
0 1
0 1 1
0 1 0
1
1 1
n n
dx x
B x
x
dx x
B x g Bern
dx x
B x p p
g g
g g
g g n
g g n
i
i g g g
g
+ + Γ
+ Γ
− +
= Γ
−
=
=
=
∫
∫ ∏
∫
−
=
ξ ξ
ξ ξ
ξ ξ
ξ ξ ξ ξ ξ
ξ
g g
g
| g
g g
n refers to the total number of genes and ║g║
1to the one norm of the current (binary) gene label vector (number of genes currently in the bicluster).
3. The derivation for the length m vector of binary condition labels c is analogous, and leads to:
) (
) (
) ) (
, (
1 0
1 1 0 1
0
1
m
p m
c c
c c
c
c
Γ + +
+ Γ
− +
= Γ
ξ ξ
ξ ξ ξ
ξ c c
|
c
4. Now consider the full conditional distributions of the gene labels. The conditional for gene label i is modeled by a Bernouilli distribution with parameter α
i, that gives the probability that the label equals one:
∏
∏
∏
≠
≠
=
=
i i g g ij ij ni k
x p x
p g
p ( 1 , |
1,
0)
bcl( )
bgd,0( )
1 g ξ ξ
In the above, X
k.indicates the k-th row in the expression matrix X.
∏
≠=
=
=
=
≠
≠
≠
=
=
=
=
=
=
=
i k k
k c
j c
j
n
k
k i
g g i i
g g i
i i
i
g N p
g p
g p g
N p
N p
g p g
p
j
j 1 | 0 1
|
1 0
1
0 1
) , ,
| (
) , ,
| ( ) , , 1
| ( ) ,
| , 1 (
) )
, , ,
| , (
) , , , , ,
| 1 ( ) , , , ,
| 1 (
θ c X
θ c X
θ c X
g θ c g X
θ c g X ψ
θ c g X
k.
k.
ξ
i.ξ ξ ξ
ξ ξ
≠
≠
=
= =
=
i i g g i i g gi
g p g
p ( X , 1 , g | c , θ , ξ
1, ξ
0) ( X , 1 , g | c , θ , ξ
1, ξ
0α
≠i g1 g0
1
5. Similarly, an expression for 1-α
ican be derived:
∏
∏
∏
≠=
=
=
≠
≠
=
=
=
=
−
n
i k k
k c
j
ij c
j
ij g
g i i
i i
i
g p
x p x
p g
N p g p
j
j | 0 1
0 , bgd 1
|
1 , bgd 0
1
, ) ( ) ( ) ( | , , )
| , 0 1 (
) , , , ,
| 0 (
θ c X
g
ψ θ c g X
ξ
k.ξ α
1
ow, the ratio of those two expressions has a simple form and an elegant rpretation, as discussed in some detail in the main text:
6. N te in
= ∏
l
b
( )
) ,
| , 1
(
ijc
x
g p
p g ξ ξ
α
∏
=≠
1
|
1 , 0 bgd
1
( )
cj
j
ij g
g i i
i
p x
∏
+
≠=
l b
1 i 1 c
(
ij)
g
g p x
ξ
≠ =
≠ =
− +
= =
−
1
| bgd,1 0 1
1 0 | 1
) ( ) ,
| , 0 1 (
j
j
c
j ij
i g
c g j g i i i
x p n
g p
g g
ξ
ξ α ξ
s for the Bernouilli parameters of the condition label conditionals can be The last equation is a direct consequence of the result described under 2.
7. The odd
obtained similarly:
≠
≠
= =
=
=
j j j i c cj
p ( c 1 | X , c , g , θ , ψ ) p ( c 1 | X , c , g , θ , ξ
1, ξ
0) β
∏
∏
∏
≠=
=
=
≠
≠=
≠
=
=
mi k k
k g
i
ij g
i
ij c c
c j j
i k k
k j
c c j j
c p
x p x
p c
M p M
i
i | 0 1
1 , bgd 1
| l b 0
1
1 0
1
) , ,
| ( ) ( )
( )
,
| , 1 1 (
θ g X
c
.k.k .j
ξ ξ
∏
≠
≠
=
=
=
= =
= =
m
c c j
j c
c i
j
c p
c p c
p
M c
p p
c
p
1 0 1 0) , ,
| ( ) , , 1
| ( ) ,
| , 1 (
) , , ,
| , 1 , ( )
, , ,
| , (
) , , ,
| , 1 , (
θ g X
θ g X
c
θ g c X
θ g c X
θ g c X
ξ ξ
ξ ξ ξ
ξ
ξ ξ
≠j c1 c0
1
∏
∏
∏
= ==
≠=
kg i
ij g
i
ij c
c j
j
p x p x p c
c
M p
i | i 00 , bgd 1
|
0 , bgd 0
1
, ) ( ) ( ) ( | , ,
| , 0 1 (
g X
c ξ ξ
.k≠=
=
≠m
i k k c
c g g j
j
1 0
1 0 1
) )
, , , , , , , ,
| 0
θ ψ
θ g c
X ξ ξ ξ ξ
=
−
jp( c
1 β
∏
∏
∏
≠
=
≠
+
= +
1 , bgd l
b
0 1 1
1 0 , 0 bgd
1
) ( )
(
) (
ij ij
c
c
j c
i
ij c
c j j
j
x p x
p
x p c
ξ ξ
∏
∏
=
≠ =
=
=
≠
−
− =
0
| bgd,0 1
| bgd,0 1
0
|
1 , bgd 1
| l b 0
1
) ( )
(
) ( )
( )
,
| , 0 (
) ,
| 1
i i
i i
g
i ij
g
i ij
j
n g i
ij g
i
ij c c
c j
x p x
m p
x p x
p c
p
c
c ξ ξ ξ ξ β
The extra factor is a result of the asymmetrical nature of the patterns we are
= (
j= 1 ,
j
p c c
β
looking for (‘striped’ biclusters rather than constant-value biclusters).
Derivation of full conditional Bernouilli distribution for model parameters ue to th
D e conjugacy, the full conditional distributions for the model are in the same form
mal distribution for the mean.
bined with a normal likelihood, the Inverse χ
2prior on σ
2gives rise to an Inverse χ
2posterior on σ
2.
This interpretation that directly motivates the parameterization of the prior, as it allows to f the corresponding formulas, as the priors.
1. One can easily derive that for the me t of the normal likelihood with the normal prior (conditioned on σ) leads to a po an, the produc
sterior nor
2. Similarly, com
interpret both κ and ν as a number of prior observations (sometimes referred to as
pseudocounts) associated with the prior. For a derivation o
we refer the reader to standard textbooks on Bayesian statistics (Gelman A.B. et al., 2004).
Note on the use of two background models
Strictly speaking there are two background models rather than one single model, as shown in figure REF. If condition label j is zero, the entire j th column is captured by one background model (bgd,0). On the other hand, if the j th column label equals one, the background model that describes the expression of those background genes should only be constructed with the background genes (bgd,1). The difference between those models only shows up in the derivations for the full conditional distributions of the condition labels (see above).
In most cases this distinction is primarily a technical issue and the difference between bgd,0 and bgd,1 only matters when very large biclusters occur.
SCORING MEASURES
Biclustering results were scored with module recovery and bicluster relevance scores as described in (Prelic et al., 2006). The module recovery score indicates how well the gene content of the ‘ideal’ modules is on average reflected in the (best matching bicluster of the) bicluster results:
2 1
2 1 )
,
( ( , )
1 2
1
1 1
1 2 2 2
1 max )
,
( G G
G G M M
M S
M C
G G C M
MR
∪
= ∑ ∩
∈ ∈