• No results found

Supplement to:

N/A
N/A
Protected

Academic year: 2021

Share "Supplement to:"

Copied!
2
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

BIOINFORMATICS

Vol. 00 no. 00 2007 Pages 1–2

Supplement to:

Kernel-based data fusion for gene prioritization

Tijl De Bie

a,b

, L ´eon-Charles Tranchevent

c

, Liesbeth M. M. van Oeffelen

c

, Yves Moreau

c

aDept. of Engineering Mathematics, University of Bristol, University Walk, BS8 1TR, Bristol, UK

bOKP Research Group, Katholieke Universiteit Leuven, Tiensestraat 102, 3000 Leuven, Belgium

cESAT-SCD, Kathlieke Universiteit Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium

PROOF OF THEOREM 1 Generalizing the problem

It is convenient to consider here a slightly more general algorithm. In particular, we consider the optimization problem maxK max

M,w,ξ p(M, ξ) = M − 1

10ξ, (1)

s.t. w0w ≤ 1, x0iw ≥ M − ξi(∀i), ξi≥ 0 (∀i), K ∈

(X

j

µj(Kjj) : µ01 = 1, µ ≥ 0 )

.

The difference with the problem introduced in the paper is that slack variables ξiare used here, which allow small mistakes for individual data objects. These mistakes are penalized stronger for small values of ν, and for ν → 0 the more simple optimization problem explained in the main part of the paper is recovered. Using duality theory, this problem can be shown to be equivalent to

mint,αt s.t. 1

≥ αi≥ 0 (∀i), 10α = 1, t ≥ α0(Kjj)α (∀j).

For any value of the margin M , optimization problem (1) minimizes 10ξ = Pn

i=1max(M − f (x), 0) = γ ˆEX(gM,γ(x0w)) = γ ˆEX(gM,γ(f (x))) for f belonging to the function class FKdefined as:

FK= (

f : x → Xn i=1

αik(x, xi)/√

α0Kα and k ∈ K )

.

A more general Theorem

We prove a more general Theorem, from which Theorem 1 follows immediately. First we give a few definitions. Given values M and γ, define the function φM,γas:

φM,γ(a) = min

³ max

³

M −a γ , 0

´ , 1

´

=



1 a ≤ M − γ,

M −a

γ if M − γ < a ≤ M,

0 M ≤ a.

Furthermore, define gM,γ(a) = max

³M −a γ , 0

´

. Then, with I the indicator function:

gM,γ(a) ≥ φM,γ(a) ≥ I(a ≤ M − γ).

THEOREM2. Given a set X of n objects (genes) xisampled iid from an unknown distribution D. Let λ(Kj) denote the largest eigenvalue of Kj. Then, for any M, γ ∈ <+and for any δ ∈ (0, 1), with probability of at least 1 − δ the following holds for all functions f ∈ F:

PD(f (x) ≤ M − γ) ≤ ˆEXM,γ(f (x))) + 4

vu utmin

à n max

j

λ(Kj) βj

, Xm j=1

trace (Kj) βj

! +

r2 nln2

δ.

c

° Oxford University Press 2007. 1

(2)

De Bie et al

Proof of Theorem 2

The proof has the same structure as the Rademacher complexity proofs in Bartlett and Mendelson (2002); Shawe-Taylor and Cristianini (2004); Lanckriet et al. (2004). For any M and γ:

PD(f (x) ≤ M − γ) = ED(I (f (x) ≤ M − γ)) ≤ EDM,γ(f (x)))

EˆXM,γ(f (x))) + sup

f ∈F

³

EDM,γ(f (x))) − ˆEXM,γ(f (x)))

´

We now make use of the fact that ˆEXM,γ(f (x))) is close to its expectation, as shown by McDiarmid’s inequality for functions with bounded differences. With probability 1 −δ2 over X:

sup

f ∈F

³

EDM,γ(f (x))) − ˆEXM,γ(f (x)))

´

≤ EX∼Dnsup

f ∈F

³

EDM,γ(f (x))) − ˆEXM,γ(f (x)))

´ +

r 1 2nln2

δ. Now consider a new sample Z of n data objects, sampled iid from D. From linearity, EDM,γ(f (x))) = EZ∼Dn

³EˆZM,γ(f (z)))

´ . Furthermore, the supremum of an expectation is smaller than or equal to the expectation of a supremum, such that:

EX∼Dnsup

f ∈F

³

EDM,γ(f (x))) − ˆEXM,γ(f (x)))´

≤ EX,Z∼Dnsup

f ∈F

³EˆZM,γ(f (z))) − ˆEXM,γ(f (x)))

´

= 1

nEX,Z∼Dn,σ sup

f ∈F

à n X

i=1

σiM,γ(f (zi))) − φM,γ(f (xi))

!

2

nEX∼Dn,σ sup

f ∈F

¯¯

¯¯

¯ Xn i=1

σiφM,γ(f (xi))

¯¯

¯¯

¯

where we take σ ∈ {−1, 1}na so-called Rademacher random variable with a uniform distribution. Finally, we can use McDiarmid’s inequality again to show that, with probability 1 −δ2 over X, this is upper bounded byn2Eσ supf ∈F

¯¯Pn

i=1σiφM,γ(f (xi))¯

¯ +q

1 2nln2δ. The first term in this expression is twice the so-called empirical Rademacher complexity of the function class HM,γ = {h = φM,γ f with f ∈ F}:

RˆX(HM,γ) = 1

nEσ sup

h∈HM,γ

¯¯

¯¯

¯ Xn i=1

σih(xi)

¯¯

¯¯

¯.

Since ∃a : φM,γ(a) = 0 and φM,γ is an L-Lipschitz-function with L = γ1, we can invoke Lemma 3 from Bartlett et al. (2002) to obtain that: ˆRX(HM,γ) ≤ γ2RˆX(F). Hence it suffices to bound ˆRX(F):

n ˆRX(F) = Eσ sup

f ∈F

¯¯

¯¯

¯ Xn i=1

σif (xi)

¯¯

¯¯

¯= Eσ sup

k∈K

supα

¯¯

¯¯ σ0

√α0

¯¯

¯¯

= Eσ sup

k∈K

supα

¯¯

¯¯

³√Kσ

´0µ √

α0

¶¯¯

¯¯

≤ Eσ sup

k∈K

√σ0Kσ ≤ r

Eσ sup

k∈Kσ0

The value of Eσ supk∈Kσ0Kσ can be upper bounded in two ways. From σ0Kjσ ≤ σ0σλ(Kj) = nλ(Kj), it is clear that n maxjλ(Kj)/βjis an upper bound. Alternatively, observe that Eσ supk∈Kσ0Kσ ≤ Eσσ0³Pm

j=1Kjj

´

σ =Pm

j=1trace (Kj)/βj.

Putting all pieces together completes the proof. ¤

REFERENCES

Bartlett, P., Bousquet, O., and Mendelson, S. (2002). Localized rademacher complexities. In Proceedings of the 15th annual conference on Computational Learning Theory (COLT02), pages 44–58.

Bartlett, P. L. and Mendelson, S. (2002). Rademacher and Gaussian complexities: risk bounds and structural results. Journal of Machine Learning Research, 3, 463–482.

Lanckriet, G. R. G., Cristianini, N., Bartlett, P., El Ghaoui, L., and Jordan, M. I. (2004). Learning the kernel matrix with semidefinite programming. Journal of Machine Learning Research, 5, 27–72.

Shawe-Taylor, J. and Cristianini, N. (2004). Kernel methods for Pattern Analysis. Cambridge University Press, Cambridge, U.K.

2

Referenties

GERELATEERDE DOCUMENTEN

• Move both the theorem and the proof completely in appendix • Easily change the defaults, and create your own styles/environments • Include sketch of proof in the main text, and

Chapter 6: Implicit learning of a useful structure by adults and children 83 Chapter 7: Can grammar complexity be counterbalanced by a semantic 95.

participants who did not know what kind of structure to expect, explicit learning did not differ from implicit learning and knowledge acquisition was guided by salience. Chapter

To determine whether or not complexity affects the type of knowledge acquired in artificial grammar learning, separate linear regression analyses were performed for each type of

If learning were restricted to the aspect of the structure that is most useful to the task in the induction phase, participants memorizing exemplars from the stimulus set with

Specifically, we investigated whether removing the memorize task from the intentional condition and instructing participants to look for specific rules would enable them to

Two experiments were performed to test the hypothesis that implicit artificial grammar learning occurs more reliably when the structure is useful to the participants’ task than

The conditions under which implicit learning occurred, however, where the same for adults and children: they learned when the structure was useful to their current task, but not