Some properties of kernel matrices
Citation for published version (APA):
Hoekstra, Æ. H. (1981). Some properties of kernel matrices. (Memorandum COSOR; Vol. 8105). Technische Hogeschool Eindhoven.
Document status and date: Published: 01/01/1981
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne
Take down policy
If you believe that this document breaches copyright please contact us at: openaccess@tue.nl
providing details and we will investigate your claim.
Department of Mathematics and Computer Science
PROBABILITY THEORY, STATISTICS, OPERATIONS RESEARCH AND SYSTEMS THEORY GROUP
Memorandum COSOR 81-05
Some properties of kernel matrices by
A.H. Hoekstra
Eindhoven, June 1981 The Netherlands
Some properties of kernel matrices
1. Definition and elementary properties
We consider discrete-time Markov processes of the type discussed in [4J: the processes have stationary transition probabilities and the transition distribution function H(ylx) = P(X n+ 1 ~ ylX n
=
x) is given by(1. 1)
r
H(ylx)
=
I
aj(x)Bj(y)j= 1
(x,y EO R)
where the a. and B. are real valued functions, the a
J· are measurable and
J J
the Bj are of bounded variation and continuous from the right.
Definition 1.1: The r x r matrix C with entries
(L2) c ..
=
fa.
(x)dB. (x)1.J J 1. (i,j
=
l, .... r)1.S called the kernel matrix corresponding to (1.1).
We shall denote a (column) vector with j-th component v. by v and its J
t
transpose by v.
Proposition 1.2: The n-step transition distribution function H(n) (ylx) is given by
(I.3) (x,y EO R, n ;::: 1)
with CO
=
I, the unit matrix.Proof: see [4J.
Proposition 1.3: The representation (1) of the transition function H 1.S
minimal. i.e. H cannot be expressed as a sum with less than r terms,
if
Proof: see [4J.
Proposition 1.4: Two minimal representations of a transition function H can only differ by a nonsingular linear transformation T, i.e. if H(ylx)
=
r r
I
a.(x)B.(y)=
I
j=l J J j=1
*
*
a.(x)B.(y) and r is minimal, then there is a
non-J J
singular r x r matrix T, such that ta*(x)
=
ta(x)T and B*(y) = T-1B(y).Proof: for the case ofa finite Markov chain a proof, which can easily be generalized, will be given in section 2.
Proposition 1.5: Among the minimal representation of a transition function H there is at least one in which the Bj are distribution functions. The corresponding kernel matrix e has all its row sums equal to 1. There is also one in which the a. are bounded between 0 and 1.
J
Proof: see [4J.
\
Corollary 1.6: In every fixed minimal representation of H the functions a. J are bounded.
From here on we assume that r in (1.1) is minimal, and we then say that H is of rank r.
Let tr(H) be the set of all r x r kernel matrices that correspond to a fixed
~ -1
H of rank r. If C € vr(H) then also T CT € tr(H) for every nonsingular r x r matrix T, since T-ICT corresponds to the representation H(ylx) (ta(x)T)(T-IB(y» if C corresponds to H(ylx)
=
ta(x) B(y). Combining this with Proposition 1.4 we find that Cr(H) is a complete class of similarr x r matrices. Let
t
be the set of all r x r kernal matrices.r
Proposition 1.7: If C E ~r then en E
e
r for n ~ 2.
Proof: e is kernel matrix for a transition function H(ylx)
=
ta(x)B(y). Take H(n)(ylx) as a new one-step transition function H(ylx) with~ f~ n-J
f
nWe obtain the kernel matrix C = (a.(x)dB.(x» . . = C (a.(x)dB.(x» . . = e .
] ~ ~,J J ~ ~,J
0
2 n
Proposition 1.8: If e € ~r then every convex combination of e,c , .•• C
is an element of
t
r,Proof: the matrix ale + a 2e
2
+ ••• + a Cn (a.
~
0; a 1 + ••• + a=
1) is an ~ (2)n
kernel matrix for the transition function aIR(ylx) + aZH (yix) + ••• + a H(n)(y x), n
as can easily be checked.
k . . CO
As we shall see in section 3 convex combinations of the C lnclud~ng == I do not necessarily belong to
C
r, In this respect kernel matrices differ from Markov transition matrices.Proposition 1,9: If C E ~ , then e has an eigenvalue 1 and
IAI
~ 1 for rall eigenvalues A of C.
Proof: see [4J.
Proposition 1.10: If there exists a matrix C € ~r with eigenvalue A
O' then for each p with 0 ~ p ~ 1 there exists a matrix C E t with eigenvalue pAO'
p r .
-r
L
j=l
a.(x)B.(y) with the B. distribution
J J J
Proof: Let C, corresponding to R ==
functions, have an eigenvalue AO' For 0 ~ p ~ I-define
r
(1. 5) Rp(Ylx) ==
L
pa.(x)BJ.(y) + (1 - p)B (y) ==j= I J r
r-l == )
J=l
pa.(x)B.(y) + Cpa (x) + I - pJB (y).
J J r r
This is a transition distribution function with kernel matrix
(1. 6) C = P C + ( l - p ) p
o
0o
o
o
0o
'.Ie have (1 .7) det(C - AI) '" p = P Crt pC 11 - A pC 2t pC 12 pC1r + p e 22 - A •• , P c2r + pCr2
...
PCrr + t pC I2 pc 1 , r-i pC 22-
A pC 2 r-l 1from which it follows that Cp has an eigenvalue PA O' 2. The finite case
- P
- P
=
- P_A
(I - A) =
A finite Markovchainwiths x s transition matrixM '" (m .. ) is a special case
1.J
of the general Markov process considered in section 1. Let the state space be S
=
{X1, ••• ,Xs} and take
Fii
for ;K xi'(2. i) a. (x) == for x
'-
S and j = 1 , ••• , s - 1,. J for x'-
S and j = s.{~
for y 2! x. , (2.2) B. (y) '" J J for y < x .. Jo
These functions satisfy (1.1) with r = s, the B. being distribution functions,
J
but the representation not necessarily being minimal. To get a minimal one suppose that M has rank r. Then M can be written as the product of an s x r matrix
A
and an r x s matrix B, whereA
and B both have rank r. This istrivial i f r
=
s (take A=
Ir and B=
M, which is actually (2.1»; if r < s we can choose r independent rows of M, say (after re-ordening) the first rrows. Since every row of M is now a linear combination of these r rows, we have r (2.3) m .. = ~J
~ Clik~J'
k=1 rwith Clik
=
o(i,k) for i = 1, ••• ,r andI
Clik=
1 for i '" 1, ••• ,s. Now letk=l
A
=
(Cl'k) '-I and B '" (a ,)._ . We evidently have1 1 - , ••• , S 1<.J k-l, ... , r
k-l, •.• ,r j=I .... ,s (2.4) M '" AB.
The rows of B are probability distributions. The upper r x r part of A is the unit matrix I ; in the lower (s - r) x r part some of the entries
r
may be negative or larger than 1.
The kernel matrix C now takes the simple form
(2.5)
c
=
EA.We shall now give a proof a Proposition 1.4 in this special case. We write er(M) for the set of all lcernel matrices corresponding to a given transition matrix M of rank r.
"Proof" of Proposition 1.4: Suppose M
=
AB*
* *
C :: B A E 'e (M) • r* *
= A B ; C=
BA E ~ (M) and rA has rank r, so we can choose r independent rows of A, say a1, ••• ,ar, that ,.., "'*
together form a nonsingular r x r matrix A. Let A be the r x r matrix
con-*
sisting of the corresponding rows of A , and define
(2.6)
F rom AB '" A B 1t allows that AB
*
* , f=
""* A B and hence B *=
TB • We now have*
*
* *
*
*
Since Band B* both have rank r, T is nonsingular, and we may conclude that
(2.7)
For a fixed M all matrices C € er(M) have the same characteristic polynomial PC(l)
=
det(C - AIr)' as they are similar. If PM(A) denotes the characteris-tic polynomial of M, we haveProposition 2.1:
(2.8)
where s is the order of M.
o
Proof: Let A and B be the mat ices that produce C and define matrices AO and BO by
(2.9)
where R is any s x (s - r) matrix that gives AO rank s and ~has all entries equal to O. From M
=
AB=
AOBO we find(2.10)
=
det(BA - AI )det(-lI )=
ls-rpC(l).r s-r
o
Corollary 2.2: All nonzero eigenvalues of M are also eigenvalues of C € ~(M),
with the same multiplicity.
Corollary 2.3: C € ~r(M) has an eigenvalue 1 and
III
~ 1 for all eigenvaluesA
of C (since this is true for the transition matrix M; Cf. Proposition 1.9).Corollary 2.4: The trace creC) of C E tr(M) , i.e. the sum of the eigenvalues of C, is nonnegative.
It does not follow that all eigenalues of a matrix e E
t
(M) are nonzero.r
In fact, e can even have an eigenvalue 0 of multiplicity r - I. Take e.g.
0 0 « « . . . 0 0
.
. .
...
...
0 0 0 • • • « • '" .. 0 0.
.
. . . .
.. 0 0 ...
. .
.
. .
0 0 0 • • • « . . . . (2.11 ) M = :: 0 0 0 • • • • • • • • • « 'II 0 ,....
'" .. 0 0...
0o
0 ...•... 0o
o
.. • • . • •• 0 (2.12)e=
o
o ...
0M is a (r + 1) x (r + 1) transition matrix of rank r, e is a r x r matrix of rank r - I. and Pe(A) '" Ar-I(l - A).
If. as in the example above, C has an eigenvalue 0, the rank of C is less than r. We can then apply the factorisation procedure to e instead of M,
n n-I n-2
and find C - AIBI with C1
=
BIA] and M '" AC B=
AAlC 1 BIB. If we go on we eventually obtain a matrix C with only nonzero eigenvalues andrO (2.13) n-i-r n ""A eO,..., M '" B rO ,...,
where A '" AA] •.• A and B
=
BrO rO BIB. In the above example rO '" and C '" (I). Indeed, ~ is constant for n ~ r in this case.
rO
Proposition 2.5: If C E Cr(M) then
r
M~ := lim Mn exists.n~
:= lim en exists if and only if
n+oo
r
-Proof: the statement follows immediately from
~
'" ACn-IB andc
n '"B~-IA.
0
o
0 0
Proposition 2.6: Let C
=
BA E ~r(M) and let the rows of B be probabilitydistributions. If C has a single eigenvalue I and no other eigenvalues
A
withIAI
=
1, thenr
exists, has identical rows with sums I, and (2. 14)r
=
M A. 0>00
Proof: under the conditions of the proposition M exists and has identical
• 00
rows with sums 1 (probability distributions). Hence
r
exists w1thr
=
BM A 0>=
M A, andr
also has identical rows with sums 1.Proposition 2.7: Every kernel matrix C is the limit of a sequence of kernel matrices corresponding to finite transition matrices.
r
Proof: Let C correspond to H(ylx)
=
L
a.(x)B.(y). It is no restrictionj=l J J
to assume that the B. are distribution functions. By Corollary 1.6 the a.
J J
are bounded, say
I
a. (x)I
!> L for all x and all j E {I, ... , rJ.
For eachJ
fixed k E :IN and j E {I, ••• , r} the sets (2.15)
form a measurable partition of R (since a. is a measurable function). J
. (k) (k)
Keep k f1xed and let Al , •••• AN(k) be the
A~~).
Take an arbitrary fixed x(k)J~ (k) n
define the function a. by
J
(2.16)
a~k)(x)
=
J if
a measurable partition containing
E A(k) for each n E {1, ••• ,N(k)},
n
and let
B~k)
be the the point; x(k)discrete distribution function with Jumps B.{A(k)} at J n
n
The
a~k)
are measurable stepfunctions and for all x and jJ
(2.17) lim
I
a.(x) -a~k)(x)1
!> lim1
= o.
k~ J J k~ k
Furthermore
a~k)Cx)B~k)Cy)
=I
a.CxCk»B.{ U k ACk)} J J j=
I J n J fnI
x <.. ) ~y} n (2.18) rI
j=1 n i f x EA
Ck) nis a
trans~t~on
function concentrated on the finite set {xik) , •••,~~~)}.
Let C(k)=
(c~~»
be the corresponding kernel matrix. We have~J
(2.19)
Using Lebesgue t s dominated convergence theorem C
I
aj (x)1
~
L,f
LdBi < 00) we find for all i,j E {I, ••• ,r}(2.20)
=
lim ra~k)(x)dB.(x)
k~) J ~
=
f
a. (x)dB. (x) J ~=
c ... ~JProposition 2.S: The trace of a kernel matrix is nonnegative.
Proof: Combine Proposition 2.7 and Corollary 2.4.
3. Eigenvalues
A. Stochastic matrices
Let M denote the set of all complex numbers n A that are eigenvalues of stochastic matrices of order n. The problem of determining
M
(or, slightlyn
more general, the set of all eigenvalues of nonnegative matrices of order n) was posed by Kolmogorov, partly solved by Dmitriyev and Dynkin in 1946
[IJ, and finally completely solved by Karpelewitsj in 1951 [3J.
M turns out to be a closed, star-shaped subset of the unit disk. The only n
_ 2 . k
points of M on the unit circle are the points e ~~
t
(t ~ n). M isn n
o
symmetrical with respect to the real axis. figure 3.1.
11
4
The boundary ofM
n 2 • 1 + 'lTl -- nbetween 1 and e is a straight line, and further consists of polynomial arcs. See [3] for the explicite formulas, The basic' observation in [1J and [3J is that A E M if and only if there exists a
n convex k-angular polygon'Q
plied by A.
(k ~ n), which is mapped into itself when
multi-~ernel matrices C are in general not nonnegative, and it is not clear how the arguments used in [IJ and [3J can be extended to get information about C , the set of all complex numbers A that are eigenvalues of kernel
r
matrices C E
t .
If we go back to the transition distribution function rH(ylx) we have to study eigenfunctions ~ instead of eigenvectors. (~ is called an eigenfunction of H(ylx) f~(y)dH(Ylx)
=
A~(X). It can ber
aRown that ~ll eigenfunctions have the form w(x)·
I
j=l
a
15' an eigenvector of e)-a From sectiQu 2 it is clear contained in the unit disk and that Mr c Cr'e.a. (x), where J J
that C is also
B. ~2
Proposition 3.1:
e
2 is the set of all 2 x 2 matrices that are similar to 2 x 2 transition matrices.Proof: Let C € ~2. By Proposition 1.9 C has an eigenvalue 1 and one
other real eigenvalue A with JAJ ~ I.
and this If A < 1 then C is similar to
(~ ~),
its Jordan normal form,matrix in its turn is similar (via the transformation matrix T
= (
-1 1 1 1)) to the matrix[
1 ; A
I - A
- 2 -
Y]
- 2 -1 + Awhich is a transition matrix.
I f A
=
1 then the Jordan normal would be unbounded for n -+ (0),form so it Corollary 3.2: C 2 M2
=
[-I,IJ c JR. cannot be ( 1 1 ) 1 0 0 I is (0 1)· (since then Cno
If M is finite transition matrix of order n and rank 2, th~ rows of M considered as points in JR n lie on a straight line in JR n. Each of these n points is a convex combination of the two extreme points on this line. If we take these extreme points to compose the 2 x n matrix B (cf. the beginning of section 2), all entries of the corresponding matrix A are nonnegative (and at most I). It is clear that now BA is a transition matrix. This provides an other proof for the finite case of Proposition
3.1 (problem 99 in Statistica Neerlandica 34 (1980), solution by J.Th. Runnenburg).
Only nonreal eigenvalues are of interest to us if we want information about
c
3• Let I, x + iy and x - iy (y > 0) be the three eigenvaluesof a kernel matrix C E ~3' The sum of the eigenvalues is nonnegative (Corollary 2.4) hence x ~ -~. This leads to
Proposition 3.3: If u + iv is a complex number with v ~ 0 such that Re(u + iv)n <
-1
then u + ivi
c
3•
~: I f u + iv is an eigenvalue of a matrix C € l:3' then (u + iv)n
is an eigenvalue of the matrix Cn E ~3' so that Re(u + iv)n is at least
-!.
Proposition 3.3 enables us to exclude a part of the unit disk in searching the area of
c
3•
figure 3.2.
C3 is contained in the nonshaded area (except the part [-l,-~)·of the real axis) and, on the other hand, contains the triangle
(M
3 c
C
3).1 • 11'
] .
-o
Let z'
=
2 ne n (n=
1,2, ..• ), the solution of the equation zn=
-!.
n11'
Set zn "" x n + iy and t ,.,. -n . n , then
-t
x11' 2 cos t, Y11' .,. 2-tsin t.
t t
To see the behaviour of the sequence (zn) near 1 we take t as a con-tinuous parameter and let t tend to 0 (from above). We find
(
~) (~dt)dx x=l "" dt' dx t=O
1
= -
log 2 = -1,44 •••So, in particular, we see that
(~)
is finite. dx x=1In order to obtain an inner bound for
t3
we considered the following problem: find a transition matrix M of order 4 and rank 3, with eigen-values x ~ itx for a specified t such that x is maximal (for t>
0) or minimal (for t<
0). Since for every matrix (p .. )~ . 1 witheigen-~J
l.,J-values
AI, ••• ,A
n the following relations hold for the first two in-variants of the matrix (see e.g. [2J, sec 4.3)(3.2) (3.3) A. l.
=
nJ
l.=1 p •• l.~I
.J
A. A'~
=
[p" UP"I
l.J , J~J
p .. p .. l.rJ lrJ J~ JJmaximize x
=
~(Pli + P22 + P33 + P44 - 1) under the conditions 4I
j==l p .. ;;:: 0; 1J p .... l.J for i=
1, ••• ,4;The maximization has been carried out by computer. The result is given
. f' 33Th . 2d/3 d 1 1" •
1n J.gure •• e &rc between the pOl-nts e an 2 + 2l. l.S gl.ven
by (for
-i
~ x ~!)
(3.4) y 2 ~!{x -
D
2 1+4' •
To see this, let the eigenvalues be 1,0 and
X!
iy, and take x fixed (i.e. PIt + P22 + P33 + P44 is fixed). From (3.2) we now deduce(3.5)
l
:s
I I
p .. p ., - x 2 - 2x~
6p 2 -i -
2x,Uj J.l. JJ
where p - HPll + P22 +P33 + P44) = !(2x + I). This leads directly to (3.4). For
-i
~
x~
!
the upper bound for y2 is attained if we take P 0.a
0 M= 0 P l,-p 0 0 0 P I-p I-p 0 l 0 P 24 ( 1 4 4with 0. == p( 1 - p) + 2 and
B
= - E) - p • The first row is 2(1 - p) ( 1 - p)
riow"a· linear combination of the 2 , 3 nd rd and 4 th row, so that M has rank 3. This fails for x >
i
(then P >!
ande
becomes negative). And,in general, no transition matrix with all main diagonal entries greater than
i
can have an eigenvalue O' (see e.g. [2], Sec. 6.8, Gershgorin's theorem).figure 3.3.
-At present this is about all the information we have about C
3' We are trJing to get more numerical information by considering the situation of a 5 x 5 transition matrix of rank 3. Starting from a n x n transition matrix M we obtain, instead of (3.5), the inequality
(3.7)
where now p
=
1.
(2x + 1), so that we findn
(3.8) y 2 ::::;
!
For n +
~
this yields the inequality Re(x + iy)2~
-i.
It is doubtful whether these values can actually be attained.Using the foregoing results about C
3 we can show that several of the pleasant properties of the set
M
of stochastic matrices are notr inheri ted bye.
First,
t
is not closed under matrix multiplication. Take for example r (3.9) C 1 is (3. 10)c -
Io
!
!
-!
o
!
anda kernel matrix corresponding
!
1 0 0 2 0 1 1 0 M- 2 2 0 0!
!
!
0 0 2 I too
o o
o
o
o
and C2 is a kernal matrix as it is itself a transition matrix of full rank. We have however
o
!
o
!
-!
with a negative trace, so that (by Proposition 2.8) C1C2
¢
~3.The same two kernel matrices may serve as a counterexample to see that
e
r is not convex. The eigenvalues of Ca=
aC1 + (1 - a)C2 are AI
=
1 and(3.12) a
-Relation (3.4) now reads
(3.13) !(3 - 2a) ~ !(a - 1) 2
+!
4. References
[1] Dmitriyev, N. and E. Dynkin, Characteristic radicals of stochastic matrices. Izvestija, Sera Mat. lQ (1946) 167-184.
[2J Franklin, J.N, Matrix theory, Prentice-Hall, 1968.
[3] Karpelewitsj, F.T, Over de eigenwaarden van matrices met niet-negatieve elementen. Izvestija, Sera Mat.
li
(1951) 361-383. [4] . Runnenburg, J. Th. and F.W. Steutel, On Markov chains, thetransition function of which is a finite sum of products of functions of one variable. M.C.-Report S304, 1962.