• No results found

Some properties of kernel matrices

N/A
N/A
Protected

Academic year: 2021

Share "Some properties of kernel matrices"

Copied!
19
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Some properties of kernel matrices

Citation for published version (APA):

Hoekstra, Æ. H. (1981). Some properties of kernel matrices. (Memorandum COSOR; Vol. 8105). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1981

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Department of Mathematics and Computer Science

PROBABILITY THEORY, STATISTICS, OPERATIONS RESEARCH AND SYSTEMS THEORY GROUP

Memorandum COSOR 81-05

Some properties of kernel matrices by

A.H. Hoekstra

Eindhoven, June 1981 The Netherlands

(3)

Some properties of kernel matrices

1. Definition and elementary properties

We consider discrete-time Markov processes of the type discussed in [4J: the processes have stationary transition probabilities and the transition distribution function H(ylx) = P(X n+ 1 ~ ylX n

=

x) is given by

(1. 1)

r

H(ylx)

=

I

aj(x)Bj(y)

j= 1

(x,y EO R)

where the a. and B. are real valued functions, the a

J· are measurable and

J J

the Bj are of bounded variation and continuous from the right.

Definition 1.1: The r x r matrix C with entries

(L2) c ..

=

fa.

(x)dB. (x)

1.J J 1. (i,j

=

l, .... r)

1.S called the kernel matrix corresponding to (1.1).

We shall denote a (column) vector with j-th component v. by v and its J

t

transpose by v.

Proposition 1.2: The n-step transition distribution function H(n) (ylx) is given by

(I.3) (x,y EO R, n ;::: 1)

with CO

=

I, the unit matrix.

Proof: see [4J.

Proposition 1.3: The representation (1) of the transition function H 1.S

minimal. i.e. H cannot be expressed as a sum with less than r terms,

if

(4)

Proof: see [4J.

Proposition 1.4: Two minimal representations of a transition function H can only differ by a nonsingular linear transformation T, i.e. if H(ylx)

=

r r

I

a.(x)B.(y)

=

I

j=l J J j=1

*

*

a.(x)B.(y) and r is minimal, then there is a

non-J J

singular r x r matrix T, such that ta*(x)

=

ta(x)T and B*(y) = T-1B(y).

Proof: for the case ofa finite Markov chain a proof, which can easily be generalized, will be given in section 2.

Proposition 1.5: Among the minimal representation of a transition function H there is at least one in which the Bj are distribution functions. The corresponding kernel matrix e has all its row sums equal to 1. There is also one in which the a. are bounded between 0 and 1.

J

Proof: see [4J.

\

Corollary 1.6: In every fixed minimal representation of H the functions a. J are bounded.

From here on we assume that r in (1.1) is minimal, and we then say that H is of rank r.

Let tr(H) be the set of all r x r kernel matrices that correspond to a fixed

~ -1

H of rank r. If C € vr(H) then also T CT € tr(H) for every nonsingular r x r matrix T, since T-ICT corresponds to the representation H(ylx) (ta(x)T)(T-IB(y» if C corresponds to H(ylx)

=

ta(x) B(y). Combining this with Proposition 1.4 we find that Cr(H) is a complete class of similar

r x r matrices. Let

t

be the set of all r x r kernal matrices.

r

Proposition 1.7: If C E ~r then en E

e

r for n ~ 2.

Proof: e is kernel matrix for a transition function H(ylx)

=

ta(x)B(y). Take H(n)(ylx) as a new one-step transition function H(ylx) with

(5)

~ f~ n-J

f

n

We obtain the kernel matrix C = (a.(x)dB.(x» . . = C (a.(x)dB.(x» . . = e .

] ~ ~,J J ~ ~,J

0

2 n

Proposition 1.8: If e € ~r then every convex combination of e,c , .•• C

is an element of

t

r,

Proof: the matrix ale + a 2e

2

+ ••• + a Cn (a.

~

0; a 1 + ••• + a

=

1) is a

n ~ (2)n

kernel matrix for the transition function aIR(ylx) + aZH (yix) + ••• + a H(n)(y x), n

as can easily be checked.

k . . CO

As we shall see in section 3 convex combinations of the C lnclud~ng == I do not necessarily belong to

C

r, In this respect kernel matrices differ from Markov transition matrices.

Proposition 1,9: If C E ~ , then e has an eigenvalue 1 and

IAI

~ 1 for r

all eigenvalues A of C.

Proof: see [4J.

Proposition 1.10: If there exists a matrix C € ~r with eigenvalue A

O' then for each p with 0 ~ p ~ 1 there exists a matrix C E t with eigenvalue pAO'

p r .

-r

L

j=l

a.(x)B.(y) with the B. distribution

J J J

Proof: Let C, corresponding to R ==

functions, have an eigenvalue AO' For 0 ~ p ~ I-define

r

(1. 5) Rp(Ylx) ==

L

pa.(x)BJ.(y) + (1 - p)B (y) ==

j= I J r

r-l == )

J=l

pa.(x)B.(y) + Cpa (x) + I - pJB (y).

J J r r

This is a transition distribution function with kernel matrix

(1. 6) C = P C + ( l - p ) p

o

0

o

o

o

0

o

(6)

'.Ie have (1 .7) det(C - AI) '" p = P Crt pC 11 - A pC 2t pC 12 pC1r + p e 22 - A •• , P c2r + pCr2

...

PCrr + t pC I2 pc 1 , r-i pC 22

-

A pC 2 r-l 1

from which it follows that Cp has an eigenvalue PA O' 2. The finite case

- P

- P

=

- P_A

(I - A) =

A finite Markovchainwiths x s transition matrixM '" (m .. ) is a special case

1.J

of the general Markov process considered in section 1. Let the state space be S

=

{X1, ••• ,X

s} and take

Fii

for ;K xi'

(2. i) a. (x) == for x

'-

S and j = 1 , ••• , s - 1,. J for x

'-

S and j = s.

{~

for y 2! x. , (2.2) B. (y) '" J J for y < x .. J

o

These functions satisfy (1.1) with r = s, the B. being distribution functions,

J

but the representation not necessarily being minimal. To get a minimal one suppose that M has rank r. Then M can be written as the product of an s x r matrix

A

and an r x s matrix B, where

A

and B both have rank r. This is

(7)

trivial i f r

=

s (take A

=

Ir and B

=

M, which is actually (2.1»; if r < s we can choose r independent rows of M, say (after re-ordening) the first r

rows. Since every row of M is now a linear combination of these r rows, we have r (2.3) m .. = ~J

~ Clik~J'

k=1 r

with Clik

=

o(i,k) for i = 1, ••• ,r and

I

Clik

=

1 for i '" 1, ••• ,s. Now let

k=l

A

=

(Cl'k) '-I and B '" (a ,)._ . We evidently have

1 1 - , ••• , S 1<.J k-l, ... , r

k-l, •.• ,r j=I .... ,s (2.4) M '" AB.

The rows of B are probability distributions. The upper r x r part of A is the unit matrix I ; in the lower (s - r) x r part some of the entries

r

may be negative or larger than 1.

The kernel matrix C now takes the simple form

(2.5)

c

=

EA.

We shall now give a proof a Proposition 1.4 in this special case. We write er(M) for the set of all lcernel matrices corresponding to a given transition matrix M of rank r.

"Proof" of Proposition 1.4: Suppose M

=

AB

*

* *

C :: B A E 'e (M) • r

* *

= A B ; C

=

BA E ~ (M) and r

A has rank r, so we can choose r independent rows of A, say a1, ••• ,ar, that ,.., "'*

together form a nonsingular r x r matrix A. Let A be the r x r matrix

con-*

sisting of the corresponding rows of A , and define

(2.6)

F rom AB '" A B 1t allows that AB

*

* , f

=

""* A B and hence B *

=

TB • We now have

*

*

* *

*

*

(8)

Since Band B* both have rank r, T is nonsingular, and we may conclude that

(2.7)

For a fixed M all matrices C € er(M) have the same characteristic polynomial PC(l)

=

det(C - AIr)' as they are similar. If PM(A) denotes the characteris-tic polynomial of M, we have

Proposition 2.1:

(2.8)

where s is the order of M.

o

Proof: Let A and B be the mat ices that produce C and define matrices AO and BO by

(2.9)

where R is any s x (s - r) matrix that gives AO rank s and ~has all entries equal to O. From M

=

AB

=

AOBO we find

(2.10)

=

det(BA - AI )det(-lI )

=

ls-rpC(l).

r s-r

o

Corollary 2.2: All nonzero eigenvalues of M are also eigenvalues of C € ~(M),

with the same multiplicity.

Corollary 2.3: C € ~r(M) has an eigenvalue 1 and

III

~ 1 for all eigenvalues

A

of C (since this is true for the transition matrix M; Cf. Proposition 1.9).

Corollary 2.4: The trace creC) of C E tr(M) , i.e. the sum of the eigenvalues of C, is nonnegative.

(9)

It does not follow that all eigenalues of a matrix e E

t

(M) are nonzero.

r

In fact, e can even have an eigenvalue 0 of multiplicity r - I. Take e.g.

0 0 « « . . . 0 0

.

. .

..

.

..

.

0 0 0 • • • « • '" .. 0 0

.

.

. . . .

.. 0 0 ..

.

. .

.

. .

0 0 0 • • • « . . . . (2.11 ) M = :: 0 0 0 • • • • • • • • • « 'II 0 ,.

...

'" .. 0 0

...

0

o

0 ...•... 0

o

o

.. • • . • •• 0 (2.12)

e=

o

o ...

0

M is a (r + 1) x (r + 1) transition matrix of rank r, e is a r x r matrix of rank r - I. and Pe(A) '" Ar-I(l - A).

If. as in the example above, C has an eigenvalue 0, the rank of C is less than r. We can then apply the factorisation procedure to e instead of M,

n n-I n-2

and find C - AIBI with C1

=

BIA] and M '" AC B

=

AAlC 1 BIB. If we go on we eventually obtain a matrix C with only nonzero eigenvalues and

rO (2.13) n-i-r n ""A eO,..., M '" B rO ,...,

where A '" AA] •.• A and B

=

B

rO rO BIB. In the above example rO '" and C '" (I). Indeed, ~ is constant for n ~ r in this case.

rO

Proposition 2.5: If C E Cr(M) then

r

M~ := lim Mn exists.

n~

:= lim en exists if and only if

n+oo

r

-Proof: the statement follows immediately from

~

'" ACn-IB and

c

n '"

B~-IA.

0

o

0 0

(10)

Proposition 2.6: Let C

=

BA E ~r(M) and let the rows of B be probability

distributions. If C has a single eigenvalue I and no other eigenvalues

A

with

IAI

=

1, then

r

exists, has identical rows with sums I, and (2. 14)

r

=

M A. 0>

00

Proof: under the conditions of the proposition M exists and has identical

• 00

rows with sums 1 (probability distributions). Hence

r

exists w1th

r

=

BM A 0>

=

M A, and

r

also has identical rows with sums 1.

Proposition 2.7: Every kernel matrix C is the limit of a sequence of kernel matrices corresponding to finite transition matrices.

r

Proof: Let C correspond to H(ylx)

=

L

a.(x)B.(y). It is no restriction

j=l J J

to assume that the B. are distribution functions. By Corollary 1.6 the a.

J J

are bounded, say

I

a. (x)

I

!> L for all x and all j E {I, ... , r

J.

For each

J

fixed k E :IN and j E {I, ••• , r} the sets (2.15)

form a measurable partition of R (since a. is a measurable function). J

. (k) (k)

Keep k f1xed and let Al , •••• AN(k) be the

A~~).

Take an arbitrary fixed x(k)

J~ (k) n

define the function a. by

J

(2.16)

a~k)(x)

=

J if

a measurable partition containing

E A(k) for each n E {1, ••• ,N(k)},

n

and let

B~k)

be the the point; x(k)

discrete distribution function with Jumps B.{A(k)} at J n

n

The

a~k)

are measurable stepfunctions and for all x and j

J

(2.17) lim

I

a.(x) -

a~k)(x)1

!> lim

1

= o.

k~ J J k~ k

(11)

Furthermore

a~k)Cx)B~k)Cy)

=

I

a.CxCk»B.{ U k ACk)} J J j

=

I J n J fn

I

x <.. ) ~y} n (2.18) r

I

j=1 n i f x E

A

Ck) n

is a

trans~t~on

function concentrated on the finite set {xik) , •••

,~~~)}.

Let C(k)

=

(c~~»

be the corresponding kernel matrix. We have

~J

(2.19)

Using Lebesgue t s dominated convergence theorem C

I

a

j (x)1

~

L,

f

LdBi < 00) we find for all i,j E {I, ••• ,r}

(2.20)

=

lim r

a~k)(x)dB.(x)

k~) J ~

=

f

a. (x)dB. (x) J ~

=

c ... ~J

Proposition 2.S: The trace of a kernel matrix is nonnegative.

Proof: Combine Proposition 2.7 and Corollary 2.4.

3. Eigenvalues

A. Stochastic matrices

Let M denote the set of all complex numbers n A that are eigenvalues of stochastic matrices of order n. The problem of determining

M

(or, slightly

n

more general, the set of all eigenvalues of nonnegative matrices of order n) was posed by Kolmogorov, partly solved by Dmitriyev and Dynkin in 1946

[IJ, and finally completely solved by Karpelewitsj in 1951 [3J.

M turns out to be a closed, star-shaped subset of the unit disk. The only n

_ 2 . k

points of M on the unit circle are the points e ~~

t

(t ~ n). M is

n n

o

(12)

symmetrical with respect to the real axis. figure 3.1.

11

4

The boundary of

M

n 2 • 1 + 'lTl -- n

between 1 and e is a straight line, and further consists of polynomial arcs. See [3] for the explicite formulas, The basic' observation in [1J and [3J is that A E M if and only if there exists a

n convex k-angular polygon'Q

plied by A.

(k ~ n), which is mapped into itself when

multi-~ernel matrices C are in general not nonnegative, and it is not clear how the arguments used in [IJ and [3J can be extended to get information about C , the set of all complex numbers A that are eigenvalues of kernel

r

matrices C E

t .

If we go back to the transition distribution function r

H(ylx) we have to study eigenfunctions ~ instead of eigenvectors. (~ is called an eigenfunction of H(ylx) f~(y)dH(Ylx)

=

A~(X). It can be

r

aRown that ~ll eigenfunctions have the form w(x)·

I

j=l

a

15' an eigenvector of e)-a From sectiQu 2 it is clear contained in the unit disk and that Mr c Cr'

e.a. (x), where J J

that C is also

(13)

B. ~2

Proposition 3.1:

e

2 is the set of all 2 x 2 matrices that are similar to 2 x 2 transition matrices.

Proof: Let C € ~2. By Proposition 1.9 C has an eigenvalue 1 and one

other real eigenvalue A with JAJ ~ I.

and this If A < 1 then C is similar to

(~ ~),

its Jordan normal form,

matrix in its turn is similar (via the transformation matrix T

= (

-1 1 1 1)) to the matrix

[

1 ; A

I - A

- 2 -

Y]

- 2 -1 + A

which is a transition matrix.

I f A

=

1 then the Jordan normal would be unbounded for n -+ (0),

form so it Corollary 3.2: C 2 M2

=

[-I,IJ c JR. cannot be ( 1 1 ) 1 0 0 I is (0 1)· (since then Cn

o

If M is finite transition matrix of order n and rank 2, th~ rows of M considered as points in JR n lie on a straight line in JR n. Each of these n points is a convex combination of the two extreme points on this line. If we take these extreme points to compose the 2 x n matrix B (cf. the beginning of section 2), all entries of the corresponding matrix A are nonnegative (and at most I). It is clear that now BA is a transition matrix. This provides an other proof for the finite case of Proposition

3.1 (problem 99 in Statistica Neerlandica 34 (1980), solution by J.Th. Runnenburg).

Only nonreal eigenvalues are of interest to us if we want information about

c

3• Let I, x + iy and x - iy (y > 0) be the three eigenvalues

(14)

of a kernel matrix C E ~3' The sum of the eigenvalues is nonnegative (Corollary 2.4) hence x ~ -~. This leads to

Proposition 3.3: If u + iv is a complex number with v ~ 0 such that Re(u + iv)n <

-1

then u + iv

i

c

3•

~: I f u + iv is an eigenvalue of a matrix C € l:3' then (u + iv)n

is an eigenvalue of the matrix Cn E ~3' so that Re(u + iv)n is at least

-!.

Proposition 3.3 enables us to exclude a part of the unit disk in searching the area of

c

3•

figure 3.2.

C3 is contained in the nonshaded area (except the part [-l,-~)·of the real axis) and, on the other hand, contains the triangle

(M

3 c

C

3).

1 • 11'

] .

-o

Let z'

=

2 ne n (n

=

1,2, ..• ), the solution of the equation zn

=

-!.

n

(15)

11'

Set zn "" x n + iy and t ,.,. -n . n , then

-t

x11' 2 cos t, Y11' .,. 2-tsin t.

t t

To see the behaviour of the sequence (zn) near 1 we take t as a con-tinuous parameter and let t tend to 0 (from above). We find

(

~) (~dt)

dx x=l "" dt' dx t=O

1

= -

log 2 = -1,44 •••

So, in particular, we see that

(~)

is finite. dx x=1

In order to obtain an inner bound for

t3

we considered the following problem: find a transition matrix M of order 4 and rank 3, with eigen-values x ~ itx for a specified t such that x is maximal (for t

>

0) or minimal (for t

<

0). Since for every matrix (p .. )~ . 1 with

eigen-~J

l.,J-values

AI, ••• ,A

n the following relations hold for the first two in-variants of the matrix (see e.g. [2J, sec 4.3)

(3.2) (3.3) A. l.

=

n

J

l.=1 p •• l.~

I

.J

A. A'

~

=

[p" U

P"I

l.J , J

~J

p .. p .. l.rJ lrJ J~ JJ

(16)

maximize x

=

~(Pli + P22 + P33 + P44 - 1) under the conditions 4

I

j==l p .. ;;:: 0; 1J p .... l.J for i

=

1, ••• ,4;

The maximization has been carried out by computer. The result is given

. f' 33Th . 2d/3 d 1 1" •

1n J.gure •• e &rc between the pOl-nts e an 2 + 2l. l.S gl.ven

by (for

-i

~ x ~

!)

(3.4) y 2 ~!{x -

D

2 1

+4' •

To see this, let the eigenvalues be 1,0 and

X!

iy, and take x fixed (i.e. PIt + P22 + P33 + P44 is fixed). From (3.2) we now deduce

(3.5)

l

:s

I I

p .. p ., - x 2 - 2x

~

6p 2 -

i -

2x,

Uj J.l. JJ

where p - HPll + P22 +P33 + P44) = !(2x + I). This leads directly to (3.4). For

-i

~

x

~

!

the upper bound for y2 is attained if we take P 0.

a

0 M= 0 P l,-p 0 0 0 P I-p I-p 0 l 0 P 24 ( 1 4 4

with 0. == p( 1 - p) + 2 and

B

= - E) - p • The first row is 2

(1 - p) ( 1 - p)

riow"a· linear combination of the 2 , 3 nd rd and 4 th row, so that M has rank 3. This fails for x >

i

(then P >

!

and

e

becomes negative). And,

(17)

in general, no transition matrix with all main diagonal entries greater than

i

can have an eigenvalue O' (see e.g. [2], Sec. 6.8, Gershgorin's theorem).

figure 3.3.

-At present this is about all the information we have about C

3' We are trJing to get more numerical information by considering the situation of a 5 x 5 transition matrix of rank 3. Starting from a n x n transition matrix M we obtain, instead of (3.5), the inequality

(3.7)

where now p

=

1.

(2x + 1), so that we find

n

(3.8) y 2 ::::;

!

For n +

~

this yields the inequality Re(x + iy)2

~

-i.

It is doubtful whether these values can actually be attained.

Using the foregoing results about C

3 we can show that several of the pleasant properties of the set

M

of stochastic matrices are not

r inheri ted bye.

(18)

First,

t

is not closed under matrix multiplication. Take for example r (3.9) C 1 is (3. 10)

c -

I

o

!

!

-!

o

!

and

a kernel matrix corresponding

!

1 0 0 2 0 1 1 0 M- 2 2 0 0

!

!

!

0 0 2 I to

o

o o

o

o

o

and C2 is a kernal matrix as it is itself a transition matrix of full rank. We have however

o

!

o

!

-!

with a negative trace, so that (by Proposition 2.8) C1C2

¢

~3.

The same two kernel matrices may serve as a counterexample to see that

e

r is not convex. The eigenvalues of Ca

=

aC

1 + (1 - a)C2 are AI

=

1 and

(3.12) a

-Relation (3.4) now reads

(3.13) !(3 - 2a) ~ !(a - 1) 2

+!

(19)

4. References

[1] Dmitriyev, N. and E. Dynkin, Characteristic radicals of stochastic matrices. Izvestija, Sera Mat. lQ (1946) 167-184.

[2J Franklin, J.N, Matrix theory, Prentice-Hall, 1968.

[3] Karpelewitsj, F.T, Over de eigenwaarden van matrices met niet-negatieve elementen. Izvestija, Sera Mat.

li

(1951) 361-383. [4] . Runnenburg, J. Th. and F.W. Steutel, On Markov chains, the

transition function of which is a finite sum of products of functions of one variable. M.C.-Report S304, 1962.

Referenties

GERELATEERDE DOCUMENTEN

6 In fact, prospective long-term follow-up is part of both investigator-initiated European- wide trials on fresh decellularized allografts for pulmonary and aortic valve replacement

mum rank, Path cover number, Zero forcing set, Zero forcing number, Edit distance, Triangle num- ber, Minimum degree, Ditree, Directed tree, Inverse eigenvalue problem, Rank,

Figure 12 shows the average amount of personal pronouns per person per turn in the manipulation condition for the victims and the participants.. It shows an

The schemes of updating and downdating form in combination with this downsizing method a fast dominant eigenspace tracker algorithm that needs per step only O(nm 2 ) operations

B. In order to take into consideration the time effect within the computation of kernel matrix, we can apply a Multiple Kernel Learning approach, namely a linear combination of all

B. In order to take into consideration the time effect within the computation of kernel matrix, we can apply a Multiple Kernel Learning approach, namely a linear combination of all

By taking this extra step, methods that require a positive definite kernel (SVM and LS-SVM) can be equipped with this technique of handling data in the presence of correlated

The good performance of the TL1 kernel in numerical experiments makes it a promising nonlinear kernel for classification tasksI. Index Terms—support vector machine, piecewise