On Markov chains of finite rank

(1)

On Markov chains of finite rank

Citation for published version (APA):

Hoekstra, Æ. H. (1983). On Markov chains of finite rank. Stichting Mathematisch Centrum.

https://doi.org/10.6100/IR108664

DOI:

10.6100/IR108664

Document status and date:

Published: 01/01/1983

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be

important differences between the submitted version and the official published version of record. People

interested in the research are advised to contact the author for the final version of the publication, or visit the

DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page

numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne

Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

(3)

PROEFSCHRlFT

TER VERKRIJGING VAN DE GRAAD VAN DOCTOR IN DE TECHNISCHE WETENSCHAPPEN AAN DE TECHNISCHE HOGESCHOOL EINDHOVEN,OP GEZAG VAN DE RECTOR MAGNIFICUS, PROF. DR. S.T.M. ACKERMANS, VOOR EEN COMMISSIE AANGEWEZEN DOOR HET COLLEGE VAN DEKANEN IN HET OPENBAAR TE VERDEDIGEN

OP DINSDAG 1 FEBRUARI 1983 TE 16.00 UUR

DOOR

J.EGLE HJALMAR HOEKSTRA

GEBOREN TE NAARDEN

1983

(4)

door de promotoren

Prof.dr. F.W. Steutel

en

(5)

o.

INTRODUCTION 0.1. Motivation 0.2. Summary 0.3. Notations

I. THE MODEL

1.1. Definiton, basic properties and linear transformations 1.2. The kernel matrix

1.3. Interpretation 1.4. Two special cases

1.4.1. Markov operators

1.4.2. Morgenstern's bivariate distribution function 1.5. Examples

2. EIGENVALUES

2.I. General results

2.2. The eigenvalue I; classification of states 2.3. The eigenvalues of modulus I; periodicity 2.4. The location of eigenvalues

2.5. Examples 3. LIMIT DISTRIBUTIONS

3.I. The invariant distribution 3.2. The central limi t theorem 3.3. Expectation and variance of S

n 3.4. Extreme values 3.5. Renewal theory 2 3 4 4 10 16 19 19 20 22 26 27 31 43 47 57 58 58 63 72 77 88

(6)

4.1.1. Multidimensional central limit theorem 4.1.2. Continuous-time analogue

4.2. Approximation by kernels of lower rank

4.2.1. Approximation and invariant distribution 4.2.2. Correlation coefficients

4.3. Applicability APPENDIX

A.I. The spectral decomposition of a matrix A.2. Miscellaneous REFERENCES SAMENVATTING CURRICULUM VITAE 93 98 101 101 104 115 118 118 126 130 133 136

(7)

0.1. Motivation

There exists a vast literature on Markov chains. A Markov chain describes a process which moves from state to state in discrete steps. The

transition to a next state is stochastic, but depends only on the present state and not on "the past".

The best known and most widely studied case is the (time-homogeneous) finite Markov chain. where the state space 5, i.e. the set of possible states, is finite, e.g .. 5 = {I,2, ... ,slo The probability of a transition

s from state j to state k is denoted by Pjk' and the matrix P (Pjk)j,k=1 is called the transition matrix of the chain. In general. any square matrix with nonnegative elements and row sums I is called a transition matrix. An essential property of the finite Markov chains is that the n-step transi-tions (n ~ I) are given by the matrix pn. An extension of the finite Markov chain is the denumerable chain. where the state space 5 is countably in-finite.

If one wants a more general state space, e.g. an interval of the real line, there is no theoretical problem. Let (5,S) be the measurable space. The probability of a transition from x E 5 to A E S is denoted by p(Alx), and now Pc. Ix) is for each x E 5 a probability measure on (5.S). It is called the transition kernel of the chain. The n-step transition probabili-ties p(n)(Alx) are defined recursively by

(0.1)

J

p(n-J) (A[y)P(dylx)

5

for n ~ 2 .

There is, however, the practical problem that in general P (n)iA: x \ cannot

easily be computed. Hence it is difficult to obtain detailed results about these general Markov chains.

The same practical problem of cOIDPutational difficulties often necessitates the assumption of independence. when a sequence of random variables X

O,XJ,X2•... is considered. For instance. in waiting-time and renewal theory it usually is assumed that the intervals between sllccessive

(8)

renewals are independent and that they have the same distribution. RUNNENBURG [28J and [29J considered renewal theory for Markov-dependent random variables. In RUNNENBURG [28J as an example the following type of transition kernel was introduced: (0.2) p(Alx) r

L

j=1 a. (x)B. (A) , J J

where r is finite. This example arises as a natural extension of the in-dependent case. For r

=

I the kernel in (0.2) describes a sequence of in-dependent random variables, while for r ~ 2, a.(x) ~ 0 and the B. probability

J J

measures, the kernel simply is a convex combination of distributions. One may, however, allow the a. and B. to be arbitrary functions (measures), as

J J

long as p(Alx) is a transition kernel for each x.

The kernels of type (0.2) are called

of finite rank.

This term was in the present context first used by KINGMAN [19J. A justification for this expression is given at the end of section 1.1. Markov chains of finite rank have the advantage of being more general than finite Markov chains (which are included as a special case) but having comparable computational accessibility; their transition probabilities are also governed by powers of finite matrices.

0.2. Summary

In chapter 1 the chains of finite rank are formally introduced and the kernel matrix, which takes over the position of the transition matrix of a finite Markov chain, is defined. The eigenvalues of kernel matrices are studied in chapter 2; just as in the case of finite Markov chains this leads to a classification of the state space. In section 4 of this chapter some numerical results about the eigenvalues, which were obtained with the generous help of Dr. J. de Jong and R. Kool, are presented. Limit theorems are the topic of chapter 3: the (existence of an) invariant distribution is studied, a central limit theorem for chains of finite rank is proved, two extreme-value theorems are derive~and finally some elementary renewal

theory is considered. All proofs are essentially based on simple matrix theory, in particular on spectral decompositions. Chapter 4 contains some generalizations and possible approximations and applications. Several results in matrix theory, especially on spectral behaviour, which are used in the chapters 2, 3 and 4, are proved in an appendix.

(9)

0.3. Notations

Random variables are denoted by capitals. The expectation of a random variable X is denoted by EX (in the spectral decomposition of a matrix we write E~ for the idempotent matrix corresponding to the eigenvalue A~;

confusion seems unlikely).

Underlined symbols denote column vectors; dimensions, when not in-dicated, are clear from the context. If ~ E ~r, then v. is the j-th component

J

of v (j = 1, •.• ,r) and Tv is the transpose of ~, i.e. the row vector with

. h H ~f ",r h T ",r . h .

J-t component vi' ence, ~ ~,~ E ~ , t en uv = ~j=1 UjV_j ~s t e ~nner

product of the vectors, whereas uTv is an r x r matrix. Special vectors that often occur are 0 (with all components equal to 0) and I (with all components equal to J). If v is an eigenvector of a matrix C corresponding to the eigen-value A, then v is said to be a A-eigenvector of C (see further the beginning of chapter 2).

I

r is the r x r unit matrix (the index r may be omitted. The matrix 0 is the matrix with all elements equal to O.

The symbols

0

and 0 have their usual meaning: if g(x) > 0, then

f(x) o(g(x» (x ->- a) -

[~~~~I

->- 0 as x ->- a • (0.3)

f(x) O(g(x» (x ->-a) _ ~[f(x)!.~s boun ed d for x ->- a ,

Statements about vectors and matrices, such as e.g. [~I 5 1 and C(n) = o(£n) (n ->- 00), are always supposed to hold elementwise; i.e., in the given examples we would have Iv

j

l

5 1 for all j and

cj~)

= o(£n) (n ->- 00) for all j, k.

The open square

0

marks the end of a definition, an assumption, a remark, an example, or the statements of a lemma or a theorem. The closed square I marks the end of a proof.

Concerning our notation for vectors we'make the following remark. Several mathematicians in The Netherlands use underlined symbols to denote random variables. Efforts have been made in the past to propagate this notation, which had been introduced by Van Dantzig, to other countries. Unfortunately, these efforts have not been very successful. Our use of underlined symbols for vectors may certainly not be taken for a dislike of the "Dutch notation", which has indeed many advantages. Yet we have chosen here to follow the common usage in English literature on probability theory, denoting random variables by (non-underlined) capitals. Thereupon the present vector notation was adopted as being the most practical in our situation.

(10)

CHAPTER I. THE MODEL

The motivation for our model has already been given in the previous introductory chapter. In section 1.1 we shall properly define the model and prove some of its basic properties. In section 1.2 the kernel matrix C is introduced, which plays a central role in the rest of our investigations, and some elementary properties of this matrix are determined. The possibility of an interpretation is considered in section 1.3. Two special cases are discussed in section 1.4, and finally some simple numerical examples are given in section 1.5.

Many of the results in the sections 1.1, 1.2 and 1.3 can also be found in RUNNENBURG AND STEUTEL [30J.

1.1. Definition, basic properties and linear transformations

DEFINITION 1.1.1. Let (S,S) be a measurable space. A time-homogeneous Markov chain XO,X_I'"" taking values in S, is said to be of finite rank if there are complex-valued S-measurable functions al, ••• ,a

r and complex-valued measures BI, •.. ,B

r on S, such that the transition kernel p(. Ix) has the form

(1.1.1) p(Alx)

r

L

j=1

a. (x)B. (A)

J J for all XES, A E

S .

Also the kernel itself is then said to be of finite rank.

In a Markov chain of finite rank the conditional probabilities p(Alx) = P(X I E A Ix x) are defined for

aZZ

x E 5, see (1.1.1). In a

n+ n

general Markov chain this does not have to be so.

o

We shall often be concerned with the special case where 5 S B(IR). We then write for x,y E IR

IR and

(1.1.2)

p(ylx) := P«-oo,yJ Ix) B. (y) := B. «-oo,yJ) ;

(11)

p(ylx) is now a distribution function and so is B.(y) if B. is a probability

J J

measure.

Consistent with our notations listed on page 3 we write a and B for the (column) vector with elements a. and B., respectively. In this vector notation

J J

(1 •I . I) becomes

(1.1.3) P(A!x) T~(x)~(A)

where Ta denotes the transpose of a. Dimensions, when not indicated, will be clear from the context.

We now further investigate (1.1.1), and we shall show that no generality is lost by assuming that the B. are probability measures and that the a.(x)

J J

are real-valued. First we give one more definition.

DEFINITION 1.1.2. The representation (1.1.1) of p(Alx) is called minimal if p(Alx) cannot be represented as a similar sum of less than r terms. In that

case r is called the rank of the chain.

o

Clearly, if the a. or the B. are linearly dependent, then p(Alx) can

J J

be written as a sum of less than r terms, and the representation is not minimal. For the moment we assume linear independence, and in theorem 1.1.6 we shall prove that linear independence is equivalent to minimality of the representation.

REMARK 1.1.3. The measures B. are said to be linearly independent if from

J (I.1.4) r

1:

j=1 6. B. (A) J J

o

for all A E S it follows that 6 1 = ••• = 6r = O.

Here S can sometimes be replaced by a smaller class of measurable sets. For ins tance, if S = IR., S = B(IR.) and i f B.(IR.) < 00 for all j, then (I. I.4)

J follows from (1.1.5) r

1:

j=1 6.B. (y) = 0

J J for all y E IR.,

since the a-algebra B(IR.) is generated by the sets of the form (-oo,yJ. So the linear independence of the measures B. on B(IR.) is then equivalent to

J

the linear independence of the func tions B. (y) on JR. 0 J

(12)

THEOREM 1.1.4. It is no restriction to assume that the a.(x) are real-valued

J

and that the B. are probability measures. In that case the a.(x) satisfY

J J (1.1.6) r

I

j=1 a. (x) _ J I .

o

PROOF. By the assumed linear independence of the a.(x) there exists at least

J

one r-tuple xl, ••. ,x

r such that det[a~(~)J ~ 0 (this is proved in the appendix as lemma A.2.4).

If we define the matrix T (tk~) by tk~ a~(~), then T is a nonsingular r x r matrix. Write (1.1.7) then (1.1.8) where T!!;.* (x) := Ta (X)T- 1 * _TB(A) ~ (A) := p(Alx) T!!;.*(x)~*(A) (1.1.9) r

l.

t kJ· B. j=1 J r jL

aj(~)Bj

=

p(.I~)

k. By the assumed linear independence of linearly independent, and hence that the is a probability measure for each

h .

*

t e B. 1t follows that the B k are

*

J

~(x) are real-valued (consider the imaginary parts).

Relation (1.1.6) follows by substituting A= S in (1.1.8).

•

LEMMA 1.1.5. In every linearly independent, are finite.

representation (1.1.1) where the a. and the B. are

J J

the functions a. (x) are bounded and the measures B.

J J

o

PROOF. We have seen in the proof of theorem 1.1.4 that a linear

transforma-. . h -I * . * b b'l' ( )

t10n T eX1sts such t at B = T B w1th B a pro a 1 1ty vector measure. Hence the B. are finite.

J

As an analogue to lemma A.2.4 it can be shown that there exist AI, .•. ,A r C S -I

such that det[Bj(~)J ~ O. Now replace T by the transformation U with elements u

jk =

Bj(~)'

Then we obtain functions a;*(x) and measures B;*, where as a counterpart of (1.1.9) we get

(13)

(1.1.10) r ** \' ~ (x) = L a.(x)u·_k j=l J J r jII a j (x)Bj ("\) = P("\ Ix) • So the **

~ are bounded, and hence the a. are bounded.J I

THEOREM 1.1.6. The representation (1.1.1) is minimal i f and only i f both the

a.(x) and the B. are linearly independent.

0

J J

PROOF. The "only if"-part has already been dealt with.

Now suppose both the a.(x) and the B. are linearly independent, and that

J J (1.1.11) r

L

j=1 a.(x)B. = J J r-I

I

j=1 a:(x)B~J J for all x E: S .

By lemma 1.1.3 there exist xl, ••. ,x r such

*

r-I

r vectors ~ (xI)""'~ (x_r) E: lR • Even

that det[a.(~)J

10.

Consider the

J

*

if the functions a. are linearly

J

independent, these vectors are linearly dependent, so there exist cl, •.• ,c_r' not all zero, such that L~=I cka;(~) = 0 for j = 1, .•• ,r-l. But then we have r ( r ) r r (1.1.12)

_{L \ L}

cka.(~) _B.

_L

c k

L

a.(~)B. j=1 k=1 J . J _k=1 _j=1 J J r r-I

L

c

L

* * k a.(~)B. = k=1 j=1 J J r-I

(~ ~

a:

(~)

) B

~

=

I

O. j=1 k=1 J J r

By the linear independence of the B

j we conclude that Lk=1 ckaj(~) = 0 for j = I, .•• ,r, hence c

k = 0 for k = I, ...,r. Thus a contradiction is obtained, which means that there are no a~ and B~ satisfying (1.1.11). •

J J

Two minimal representations of the same kernel p(.lx) can only differ by a nonsingular linear transformation T, i.e. we have

LEMMA 1.1.7. Suppose

r r

(1.1.13) P(A!x) =

_L

a.(x)B.(A)

_L

a~ (x)B~(A) for aU AE: S

,

j=1 J J j=l J J

where r is minimal. Then there is a nonsingular r x r matrix T, such that

(14)

•

TB. PROOF. Choose any r-tuple (xl' ••• 'x ), such that the matrix D with elements

- - - r

d

ik = a~(xj) is nonsingular, and let D* be the matrix with elements d

jk = ~(Xj)' Further, choose an r-tuple (AI, ••• ,Ar), such that the matrix E with elements e

jk = Bj(Ak) is nonsingular, and let E* be the matrix with

* * * -I T T* * .

elements e

jk

=

Bj(~)' Let T

:=

E E • From ~(x)~

=

~ (x)~ 1t follows that DE = D*E*. As D and E are nonsingular, it follows that D* and E*, and hence T, are nonsingular. For all x we have Ta(x)E

=

Ta*(x)E*, or

Ta*(x) T ( ) -I~ x T • For alI y we have DB

D~,

or B*

=-

(D*)-I DB

DEFINITION 1.1.8. Representation (1.1.1) is called standard if it is minimal and the B. are probability measures,

J

o

ASSUMPTION. Unless otherwise stated, the representation of the kernel is

assumed to be standard. D

It should be noted, however, that even standard representations are nonunique. To a given standard representation one can always apply a linear

transformation T, as in (1.1.7), where T is any nonsingular r x r transition matrix. The resulting new representation is then again standard.

The set S, appearing in definition 1.1.1, is usually called the state space of the chain. The process may in fact be concentrated on a subset So of S, i.e. we may have

(1.1.15) for all x EO S .

The following lemma states that then every B

j is concentrated on SO' LEMMA 1.1.9.

If

(1.1.15)

holds, then

(I.1.16) D

PROOF. For all x EO S we have, with S := S \SO' (1.1.17)

hence ~(S) = Q, by the linear independence of the a .•

(15)

If the process is concentrated on SO' the values of aj(x) for x , So are unimportant. Therefore we shall sometimes specify the aj(x) only for x E SO' In such a case one may assign arbitrary values to aj(x) for XES \SO.

The Markov chains of finite rank include all finite Markov chains as a special case. Let P

=

(Pjk) be the transition matrix of a Markov chain with state space S

=

{sl, ..• ,sN}' We simply take

(1.1.18)

B.(A) = IA(s.) ,

J J

(j ,k I , ••• , N) ,

where I

Ais the indicator function of the set A. The representation thus obtained is standard if and only if P is nonsingular.

If P has rank r we obtain a standard representation as follows. Select r linearly independent rows of P and let B be the r x N matrix consisting of these rows. Suppose for simplicity that the rows are the first r ones of P, so that B is equal to the upper r x N part of P. Let A = (a

jk) be the N x r matrix given by J 0 0 (1.1.19) A 0 0 (l r+I,J (lr+I,2 (IN J (IN,2 .

,

r

where the (ljk are such that l:k=1 (ljkPkt

o

2 r+J and all t. Then

(1.1.20) P = AB , and if we take a_j(sk) := _~j (I.1.21) B. (A) :=

_I

b jk J {klskEA} we obtain a standard representation.

(16)

This construction is easily generalized for infinite transition matrices P of finite rank. This justifies the term "Markov chain of finite rank" for the model under consideration.

REMARK 1.1.10. On the measurable space (S,S) a measure m can be defined such that each measure B. is absolutely continuous with respect to m. If, for

J

example, we take

(1.1.22) m

:=!

I

r j=1 B.J

then m is even a probability measure on

S.

Obviously pC. Ix) also is absolutely continuous with respect to m for all

XES. The measure m will playa role in section 4 of this chapter and in some other places.

The definition (1.1.22) of m depends on the choice of the (standard) representation of the kernel. However, any such m will do for all standard

• T *( ) * .

representations. To see th1s, suppose ~ x ~ 1S such a representation; if

I

T

*

meA) = 0, then 0 = peA x) = ~ (x)~ (A) for all XES, hence B (A) =

o.

1.2. The kernel matrix

Now consider the n-step transition distribution

D (1.2.1) defined by (1.2.2) (n)

I

P (A x)

=

P(X n E A X

o

=

x) , P(I) (A

I

x) := P (A

I

x) , p(n+I)(Alx) :=

f

p(n) (Alz)P(dzlx) S for all n ~ I .

For p(2)(Alx) we find

(1.2.3)

f

p(AI z)P(dz

I

x) S r r

I

L

k=1 j=1 a. (x) J

f

~(Z)Bj(dz)Bk(A)

• S

(17)

If we write

(1.2.4) _{c jk} :=

f

~(Z)Bj(dz)

S

for j,k E {I, ...,r} ,

relation (1.2.3) becomes, in vector notation,

(1.2.5)

where C is the r x r matrix with elements c

jk. By iteration we find

THEOREM 1.2.1. The n-step transition probability can be written in the form

(1.2.6)

where the r x r matrix C

=

(c

jk) is given by (1.2.4), and cO

:=

I, the unit

~~x. 0

DEFINITION 1.2.2. The matrix C with elements given by (1.2.4) is called the kernel ~trix corresponding to the kernel with representation (1.1.1). C is called mini~l or standard if the representation of the corresponding kernel

is minimal or standard.

o

Theorem 1.2.1 represents the essential feature of Markov chains of finite rank: The behaviour of the n-step transition functions is governed by the behaviour of the n-th power of the finite kernel matrix C, just as in the case of a finite Markov chain, where the n-th power of the transition matrix provides the n-step probabilities.

For finite Markov chains, theorem 1.2.1 follows directly from (1.1.20), since

(1.2.7)

The kernel matrix now takes the simple form C = BA.

For the analysis of a Markov chain with N states there is an advantage in our approach, using en rather than pn, if the rank r of the N x N matrix p is less than N.

The kernel matrix of a Markov chain of finite rank is no more unique than the representation of its kernel. Even if we restrict to minimal representations there is a whole set of corresponding kernel matrices. This set is given by

(18)

THEOREM 1.2.3. All minimal kernel matrices of a fixed kernel aI'e similaI', i.e. fOT' any two such matrices C and

c*

theT'e exists a nonsingulaT' T, such

that

c*

= TCT- 1•

0

PROOF. If C corresponds to ~_{a (x)} T -I

*

= a(x)T and B = -T -1-

*

= ~(x)T C TB. Hence C the B.• J

T_{a(x)B and C}

*

_to T_{~ (x)~}

*

_{, then by lemma 1.1.7}

TB. It follows from (1.2.5) that ~(x)C~= ;CT-1 by the linear independence of the a. and

J

•

If C is a standard kernel matrix of a kernel P, then by definition C is minimal for P. Further we have

LEMMA 1.2.4. A standaT'd kernel matrix has T'OW sums I.

PROOF. Apply (1.1.6) of theorem 1.1.4 to definition (1.2.4) of c jk•

The set of all r x r matrices that can occur as kernel matrices for Markov chains of rank r or less will be denoted by K(r). We have

LEMMA 1.2.5. Every r x r tT'ansi tion matrix P is an element Of K(r) .

o

• o

PROOF. Consider the representation given by (1.1.18). We obtain C= P, hence

P E K(r).

•

In general, a matrix C E K(r) is the kernel matrix of many different

transition kernels. C may be minimal for one kernel, nonminimal for another. Consider, for example, the matrix C

=

(g :).

For the finite (2-state) Markov

h · . h (0 I)

. .

.

. k

I .

b

..

I

c a~n w~t '0 1 as trans~t~on matr~x, C ~s erne matr~x, ut not m~n~ma ; the minimal kernel matrix in this instance is the 1 x 1 matrix (1). But the finite (3-state) Markov chain with transition matrix

(I.2.8) P

1

o

is of rank 2 and has C as a minimal (even standard) kernel matrix.

A matrix C E K(r) will be called minimal if it is minimal for at least one kernel.

In view of (1.2.6) and (1.2.7) we want to compare kernel matrices, in particular standard ones, with transition matrices. With this in mind we derive some elementary properties.

(19)

LEMMA I.2 .6. en is bounded for n E IN.

PROOF. For the n-step transition kernel in standard representation we have

o

(1.2.9)

T~(X)

f

p(n-I)(Aly)~(dy)

=:

T~(x)~(n)(A)

S

where the

B~n)

are probability measures. Hence we obtain the relation J

(1.2.10) for all XES

From the left hand side of (1.2.10) it is seen that en-Ie

=

en is a kernel matrix corresponding to the kernel p(n)(. Ix), and that en-IB

=

B(n). It follows that

(1.2.11) I

f

ak(x)Bjn) (dx)I

~

S

~ sup I~(x)

I

< 00

XES since by lemma 1.1.5 the a.(x) are bounded.

J for all n ::: I , I LEMMA 1.2.7. If

e

E K(r), n 2 I, a~ 2 0 (~ n ~ then l:~=1 a~

e

E K(r). n 1,2, ...,n) and l:~=1 a~ I,

o

PROOF. Suppose e is a kernel matrix corresponding to the kernel

PC. Ix)

=

T~(x)~.

Define i by

T~(x)

:=

T~(x)e~-I,

and consider

p(~)(.lx)

as a new kernel p(.lx) with

(1.2.12) p(.lx) = Ta(x) e~-IB = Ti(x)~ We then obtain the kernel matrix

(1.2.13)

c

_{= (}

_J

~(X)B.(dX»).

J 'J k

S '

We can do this for ~

=

1,2, .•• ,n, and one now easily verifies that the

. n ~ .

matr~x l:~=1 a~e ~s a kernel matrix for the chain with kernel

n

(n

I

(20)

Lemma 1.2.7 might suggest that the set K(r) is convex. However, in chapter 2 it will be shown by an example that K(3) is not convex. In this

respect kernel matrices differ from transition matrices.

THEOREM 1.2.8. Every minimal C E K(r) is the limit of a sequence of matY'ices

c(~) (~

= 1,2, ••• ), where the

C(~)

E K(r) are kernel matY'ices for finite

Markov chains of rank r.

o

PROOF. There is no loss of generality in assuming that C is standard, as we can apply the same linear transformation to all

C(~).

Suppose Ccorresponds

r

to the kernel L. I a.(x)B .• By lemma 1.1.5 the a. are bounded, say

J= J J J

I

a . (x)

I

< L E

:rn

for all x and all j.

J

For each fixed k E

:rn

and j E {I, ••• ,r} the sets

(1.2.14) A(n

jk {x I

I

k ~ aj( )x < k+1}~ (k = -L~,-L~+I,•••,L~-I)

form a measurable partition of S.

. (i) (i)

Keep k f1xed and let A₁ , •••,AN(i)

A~~).

Take an arbitrary, fixed x(i)

J (i) n

each j define the function a. by

J

be a measurable partition containing the E A(i) for each n E {1, •.•,N(i)}, for

n

(1.2.15)

a~n

(x)

J

for x E A(~)

n n = I, •••,N(~) ,

and let

B~~)

be the probability measure with mass B.{A(i)} at x(i)

J J n n

are measurable stepfunctions, and for all x and j we have

(1.2.16)

Furthermore the measure Pi(.lx), defined by

(1.2.17)

n

I

j=l

(i) (n

is a transition kernel concentrated on the finite set {xJ ' .••'~(t)}'

corresponding therefore to a finite Markov chain. Let C(~) be the correspond-ing kernel matrix. The elements

cj~)

of

C(~) s~tisfy

(1.2.18)

J

~~)(X)Bj~)(dX)

S

J

~~)(X)Bj(dX)

•

(21)

Using Lebesgue's dominated convergence theorem

(Ia~~)(x)

I < L,

Sf

LdB. < 00)

J • J

together with (1.2.16) we find for all j and k

(1.2.19)

in other words:

limt~ C(~) =

c.

The following simple property will prove to be useful. LEMMA 1.2.9. The trace of a kerne l matrix is nonnegative, i. e .

I (I.2.20) tr(C) r

L

j=1 c .. ~ 0 JJ for all C E K(r) .

o

PROOF. A transition matrix has a nonnegative trace, since all its elements are nonnegative. Now (1.2.20) follows from theorem 1.2.8.

The product of two transition matrices is again a transition matrix. The set K(r), however, is not closed under matrix multiplication, as the following example shows. The two matrices

0

~]

~l

(1.2.21) C I 0 0 C2 0 0 -~

d

I

are both elements of K(3); C

I is a transition matrix (see lemma 1.2.5), and

for C₂ we refer to example 1.5.3. The product matrix

(1.2.22)

o

-~

however, has a negative trace, so C

(22)

1.3. Interpretation

A Markov chain of rank 1 is simply a sequence of independent random variables: we have al(x)

=

1 and C = (I), hence

(1.3.1) for all XES and n 2 I

so that e.g., with I ~ k < ~ and F

O the distribution of XO'

(1.3.2) _{P (XO EAO '} _~ _E_~ _, _X~ _E_A~) ₌

J (

f

p(~-k)(A~ly)p(k)(dYlx))

FO(dx)

Aa

~

Here X

1,X2, •.. are identically distributed with distribution B1.

Now consider a chain of rank r 2 2. In a standard representation of the kernel the a.(x) need not be nonnegative (see example 1.5.1), but suppose

J

that they are. In that case the a.(x) can be considered as probabilities,

J

while the kernel matrix C is a transition matrix. We introduce random variables JO,J

1, ••• , taking values in S(r) := {I, ••• ,r}, such that JO,XO,JI,XI,J2"" is a Markov chain with

(1.3.3) p(~ E A IJ k = j) j I~ = x) B. (A) J a. (x) J

for formal correctness one can take

S

= (S x{O}) u ({O} xS(r)) as state space for this Markov chain. Considering only the X we recover our

n

original chain of rank r, whereas the J form a finite Markov chain with

n

(1.3.4) kIJn=j)

i.e. with transition matrix C. Furthermore it follows that

(I.3.5) j)

k IJ j)

(23)

This means that we actually have an example of a semi-Markov process (cf. PYKE [26J). If S

=

[0,00) then X

n can be considered as the sojourn time of the process in state I

n•

If the a.(x) can take on negative values, they have no probabilistic J

interpretation. But the Markov chain as such can still be described in terms of a semi-Markov process as done above, if there exists at least one other representation of the kernel with nonnegative a.(x) and probability measures

J

B.. In section 2.4 it will be seen that for chains of rank r ~ 3 a standard J

representation with nonnegative a.(x) does not necessarily exist. For rank 2

J

we have

THEOREM 1.3. I. For every chain of rank 2 there is a standard representation

of p(Alx) with nonnegative functions al(x) and a₂(x).

0

PROOF. Let

(I.3.6)

be an arbitrary standard representation of p(Alx). Define

(1.3.7)

*

As al(x) is bounded, ~ and L are finite; moreover ~

f

L, since otherwise the chain would be of rank I.

It is easily verified that the measures B] and B₂, defined by

(] .3.8)

are probability measures. Applying the transformation

(1.3.9) T

I-L]

I-~

[

I-~

L-I]

-~ L

*

we find TB

=

Band (1.3.10)

(24)

I

*

= L_!/, (a

l(x) -!/, , L - al(x)) •

There is, of course, in this situation no reason for requiring a standard (which means minimal) representation. Whether a (possibly non-minimal) representation with nonnegative a. always exists if r ~ 3,we do not

J

know. We remark that it is always possible by a linear transformation to obtain a.(x) between 0 and I: if in the proof of theorem 1.1.4 we replace

J T-1 by (1.3.11)

where the ~ E S are chosen such that U is nonsingular, then the resulting

*

~ (x) is a vector of probabilities. However, the resulting B. are now not

J

necessarily probability measures.

The relations P = AB and C= BA for finite (or denumerable) Markov chains remind one of the technique of Lumping of states, as described in KEMENY AND SNELL [18J. One might suspect that lumpability is equivalent to having small rank. The following example shows that this is not true. EXAMPLE 1.3.2. Consider the transition matrices

o

and

o

PI has full rank 3, but the first two states can be lumped together, yielding

as the transition matrix for the lumped chain. P

2, on the other hand, has rank 2 and kernel matrix

C = [:

:J,

(25)

1.4. Two special cases

For Markov chains of finite rank the theory of Markov operators becomes very simple and rather attractive. Here we exhibit the special form the operator and its adjoint take. For general definitions, proofs and details we

refer to FOGUEL [9J.

Let U+ be the set of (equivalence classes of) nonnegative measurable functions on (S,S). For a fixed kernel p(Alx) let m be a measure on (S,S), such that p(.lx) is absolutely continuous with respect to m for every x. Now

" d U+ . . U+ U+ . d

P ~s cons~ ered as an operator on ,~.e. as a mapp~ng P: + def~ne

by

(1.4.1) (Pf)(x) :=

J

fCy)P(dYlx)

S

for all f E U+ •

For fixed x, let p(.lx) be the Radon-Nikodym derivative of P. Then the adjoint operator of Pf, denoted by fP, can be defined by

(1.4.2) (fP)(y)

J

f(x)p(ylx)m(dx)

S

for all f E U+

Now if PC. Ix) = T~(x)~ is the kernel of a Markov chain of rank r, then for m we may take the measure defined by (1.1.22). We obtain

(1,4.3) (Pf)(x) =

J

f(y)T~(x)~(dy)

S

Ta(x)B[fJ

(fP) (x)

(1.4.4)

with B[fJ :=

sf

f(y)~(dy), i.e. Pf is a linear combination of the a j, For the adjoint operator we find, putting b. := dB./dm,

J J

r

fCx)

T~(x)~(y)m(dx)

=

~[fJ~(y)

J

S

with a[fJ :=

sf

f(x)~(x)m(dx), i.e. fP is a linear combination of the b j.

(26)

As a simple example of a two-dimensional distribution function MORGENSTERN [24J considered

(1.4.5)

having G1(x), G

2(y) as marginal distribution functions, see also GUMBEL [13J. We note that p is not the corresponding coefficient of correlation (see the end of this section).

We assume that all distributions are absolutely continuous with respect to Lebesgue measure. The bivariate density is then given by

(1.4.6)

I t follows that

(1.4.7)

and hence the transition distribution function is

(1.4.8)

Unless p = 0, P (ylx) is a transition kernel of rank 2 for all p with p

Ipl ~ I. An easy computation shows that, with

(1.4.9) a_l :=

J

G₁(y)G₂(dy) ,

lR

a 2 :=

J

GI(Y)G~(dY)

lR

the kernel matrix e is given by p

['''-:>P.,

:>p.

-T

(1.4.10) en 2pa:-p

=

p l-+p -2pa 2 1 [1+P-2 pa 2 p-2 pa l

l

+ 1 +2p(l-a l-a2) I+p-2pa 2 p-2paJ}I n 1

r

p-2pa l

2,.,-, ]

+ [2p(a₂-(₁)] I _{+ 2p(1 a} l-(2) ~2pa2-I-p l+p-2pa 2 for all n 2: I.

(27)

If G

I = G2 = G, then al = 1/2 and a2 2/3; we find

(1.4. 11) and (1.4.12) Define

C

n

=

[I 0]

+

(~)n[

0 0]

p I

°

3 -I I

()

f)n

2

ppn (y\x) =G(y) +\% p(2G(x)-I)(G (y)-G(y».

(1.4.13) Then (1.4.14) Il :=

J

xG(dx) IR S :=

J

(I -G(x»G(x)dx • IR

o

JG

2(x)dx +

f

(I -G2(x»dx =

°

o

J

[{I -G(x)}G(x) +G(x)Jdx + J (I +G(x»(l -G(x»dx =

°

Using (1.2.6) and (1.4.11) we find, putting ¢(x) := 2G(x) - I ,

(1.4.15) E XOX n =

J

xG(dx) yp(n) (dylx) IRIR

J

n-l T 2 xG(dx)(l-pl/J(x) ,pl/J(x»C y (G(dy),G (dy» IRIR

J

n-I T = xG(dx) (I -pljJ(x) , p¢(x»C (1l,Il+S) = IR

f

xG(dx)(1 -Pl/J (x),Pl/J (xDT(1l ,Il +s(%r-I ) = IR

r

xl/J(x)F(dx) = Il2 + - p - Sn 2 J 3n -1 IR

(28)

It follows that (1.4.16)

in agreement with theorem 3 of LAI [20J, and it follows that the correlation coefficient p(XO,X

I) = p/3 (we return to this example in chapter 4).

1.5. Examples

We give some simple explicit examples of kernels of finite rank. Some of the examples will be referred to in the following chapters.

EXAMPLE 1.5.1. The chain with state space S = [O,IJ c IR and kernel (1.5.1) P(yIx) (I +x)y - xy2 for x,y E [O,IJ

is of rank r 2. The representation is standard and the kernel matrix is

3 I

2 -2

(1.5.2) C

5 2

3 -3

Another representation, also standard, for the same kernel is

(1.5.3) p(ylx) = (I -x)y + x(2y -y ) ,2 and now the kernel matrix is a transition matrix:

I I

2

T CT-I (1.5.4) C

*

,

2 I

3

where T

(~ -~)

.

EXAMPLE 1.5.2. Let the infinite transition matrix P

o

(Pjk)j,k=1 be given by (1.5.5) i.e. , I -k j - 1 , Pjk =

T

2 + - j - °ik

(29)

1 I 1 I

-4

8

16

2 3 1 1 I

4

8

16

32 1.5.6) P 5 1 I I

6"

12

24 48 7 I 1 1

8

16

32 64

Each row of P is a linear combination of

_(2'4'8'16"")

I 1 I 1 and (1,0,0,0, ••• ) ,

hence P is of rank 2. With

I 1 I 1 1 I I

"2

4 8"

16 ...

]

"2

"3

4

(1.5.7) B AT

0 0 0

...

0 I 2 3

"2

"3

4

the relation P

=

AB holds, and the kernel matrix C BA is standard:

(1.5.8)

o

EXAMPLE 1.5.3. The transition matrix

0 0 0 0 0

:]

(1.5.9) P 0 0 0 0 0

0 0 0 0 0 0

0 0 -1

has rank 3, and a standard kernel matrix is

(1.5.10) C

[

o

~

0]

!

-~ I

As we shall see in the next chapter no standard kernel matrix with

(30)

EXAMPLE 1.5.4. Let the distribution functions B

1, .•. ,B5 be given by their respective densities (which all are "0 elsewhere"):

b 1(y) el -y for y ~

b

2(y) 2e2- 2y for y ~ (1.5.11) b 3(y) eY for y $ 0 b 4(y) 2e 2y for y $ 0 b 5(y) for 0 $ Y$ 1

Let the function a. be given by

J (1.5.12)

F

for x < 0 for

o

s x < 1 -x for x ~ e

{o

for x < -x for 1 - e x ~

{

~X_1X

for x < 0 for 0 $ x < 1 for x ~

{:

x for 0 -e x < a 4(x) for x ~ 0

{:

for

o

$ x s a 5(x) = otherwise.

These functions define a chain of rank 5 in standard representation. The kernel matrix is

(31)

I -] 1 -I 0 0 0

2

e

-2

e 2 -] ₂ _-I 0 0 0

3

e

-3

e (1.5. ]3) C 0 0 I I 0

"2

2

0 0 2 I 0

3

I 0 I 0 I

"4

2

We here have a chain with two "absorbing" sets: (-00,0] and [1,00), and C is a "reducible" transition matrix. We shall discuss this situation in the next chapter.

EXAMPLE 1.5.5. The finite Markov chain given by the transition matrix

0 0 0 0

[~

0

~]

(1.5.14) p 0 0 0 0 I 0 0 0 0 0 0 0 0 0 0 0 0 0

is of rank 3 with kernel matrix

(1.5.15) C

[:

0

:]

0

This chain is cyclic with period 2.

o

(32)

CHAPTER 2. EIGENVALUES

In chapter I we have seen that to each Markov chain of finite rank there corresponds an equivalence class of similar kernel matrices. Since all matrices in such an equivalence class have the same eigenvalues, it is quite natural to study the eigenvalues of kernel matrices. It turns out that in many respects kernel matrices behave like the well-studied transition matrices of finite Markov chains. For the eigenvalue structure of the latter we refer to IOSIFESCU [16J and to FRITZ, HUPPERT AND WILLEMS [12J. The eigenvectors of the kernel matrix have natural analogues in the eigen-functions of the kernel PC. Ix); their close relationship plays an essential role in our considerations.

General results are given in the first section. In section 2 we consider the eigenvalue A

O= I, which leads to a classification of (sets of) states, similar to that for finite Markov chains. The eigenvalues of

modulus 1, but unequal to 1, determine "cyclically moving subsets"; this is the topic of section 3. Section 4 is devoted to a further comparison of transition and kernel matrices, and to the location of eigenvalues in the complex plane. In the final section some examples are given.

We use the following notation and terminology:

Avector v E ~r with ~

1 Q,

satisfying Cv = AV for some AE ~, will be called a right A-eigenvector of C; the word right is sometimes omitted. Avector ~ E ~r with ~

1

0, satisfying Tu C ATu for some AE ~, will be

called a left A-eigenvector of C.

Throughout the chapter we assume a standard representation.

(33)

2.1. General results

o

(A E: C) •

for all x E: S

AV(X)

DEFINITION 2.1.1. A function v is called a A-eigenfunction of the kernel p(Alx) if v(x)

t

0 and

J

v(Y)P(dy\x) (2.1.1)

LEMMA 2.1~2. If v is a A-eigenfunction Of the kerneL p(Alx)

A

f

0, then Ta(x)B(A) and (2.1.2) v(x) r

I

v. a. (x) = Ta(x)v j=1 J J

with ~ a right A-eigenvector of the kerneL matrix C.

o

PROOF. We have

(2.1.3) _v(x)

₌

_I

1

J

_v(y)P(dy!x)

₌

r

_1.

_a.(x)

J

_{---A- B.(dy) ,}v(y)

j=] J J

hence v(x) is of the form (2.1.2). Substituting (2.1.2) in (2.].]) we find (2.1.4) T~(x)C~ = AT~(x)~ ,

•

and by the linear independence of the a.(x) it follows that v is a

A-eigen-J

vector of C.

LEMMA 2.1.3. The eigenfunctions vl(x), .•• ,vk(x) are LinearLy independent if and onLy if the corresponding eigenvectors ~I""'~kare LinearLy

indepen-dent. 0

PROOF. Follows immediately from the linear independence of the a.(x). • J

THEOREM 2. ].4. Every kerneL matrix Csatisfies

(2. I .5)

(2.1.6)

CI

=

~

;

C~ = A~, :!...

f

Q ".

IAI ~ 1 •

o

PROOF. Relation (2.1.5) only restates that C has unit row sums. Now let v(x) be given by (2.].2), then v(x) satisfies (2.1.]). As the a.(x) are bounded

J

(34)

(2.I.7) IAI·[v(x)!

~

Jlv(y) Ip(dylx)

~

sup Iv(y) [ yES

for all XES .

since sUP_yES Iv(y)/

F

0, it follows that IAI ~ I. I

An eigenvalue A is called nondegenerate, i f its algebraic multiplicity equals its geometric multiplicity, i.e. if the multiplicity of A as a root of the characteristic polynomial of the matrix is equal to the number of linearly independent A-eigenvectors; the geometric multiplicity cannot exceed the algebraic multiplicity, but it may be smaller.

For the eigenvalues of a kernel matrix we have

LEMMA 2.1.5. If IAI = I, then Ais nondegenerate.

o

This result is a simple generalization of the corresponding result for finite Markov chains; it is proved in the appendix as lemma A.2.3.By lemma 1.2.6 and

theorem 2.1.4 the conditions of lemma A.2.3 are satisfied here.

REMARK 2.1.6. Although, as we shall see in section 4, a kernel matrix C is not necessarily similar to a nonnegative matrix, it has a Perron-Frobenius eigenvalue as we have shown by theorem 2.1.4. This is related to the fact

that a convex cone is left invariant by C (compare KINGMAN [19J). However, in our case there is not necessarily a nonnegative left I-eigenvector, as is seen from the example

2 0 I 1

3

6 6"

0 2 0 1 0 2 0 I I

3

0

3

6

(2. I .8) 2 ₀ ₀ 1 2 -I 2 ₀ 1 ₀

3 :J

3

2 -I 2 0 0

3

where 4 I

3 -3

[:]

[~

-~}

(2.1.9) C 4 I

3 -3

(35)

In this and in the following chapters we often use the so-called spectral decomposition of a matrix. Details will be given in the appendix, theoremA.1.1 if. The matrix theory result that we exploit is

LEMMA 2.1.7. Let Mbe a complex r x r matrix with distinct eigenvalues AO' AI ' ••. , AS' Then

M

(2.1.10)

s

L

0Q, EQ, +NQ,) Q,=O

where EQ, E_k = NQ,N_k EQ, N_k = NQ,E_k = 0 if Q,

F

k, EQ, NQ, = NQ,EQ, = NQ,'

E~

EQ, m

and NQ,Q,

=

0 for an mQ, ~ r.

Furthermore, i f AQ, is a nondegenerate eigenvalue, then NQ, =O.

If AQ, has algebraic multiplicity I and i f ~Q, and ~Q, are left and right AQ,-eigenvectors of Mwith T~Q,~Q,

=

I, then

(2. I • I I)

o

We apply lemma 2.1.7 to a kernel matrix C. Let 1= AO,AI, ... ,A d_1 denote the distinct eigenvalues of modulus I. From (2.1.10) we find, using lemma 2.1.5,

(2.1.12) C

and more generally for all n E 1N

(2. I •13)

We note that the inner sum has less that r terms, since Nr O. Hence

Q,

only the first sum matters if n is large.

LEMMA 2.1.8. There exists an E, 0 < E < I, such that

(2.1.14)

d-I

Cn

=

L

Q,=O

(n ->- (0) •

o

PROOF. Define p = maxQ,2d IAQ,I, then 0 ~ p < I and from (2.1.13) we obtain

(36)

Now for any £ with p < £ < 1 relation (2.1.14) is satisfied. See also lemma A.I.13 in the appendix.

In the next section we need the following results:

LEMMA 2. I .9. Let :!..be any vector such that

(2.1.16) lim Cn v = O. n-+oo

Then, with the £ of (2.1.14),

(2. I • 17) for aU sufficiently large n.

o

PROOF. In the appendix, lemma A.I.IS, it is proved that (2.1.16) yields

Relation (2.1.17) is a trivial consequence of (2.1.18).

(2. I.18) Cn:!.. = 0(£n) (n .... 00) •

•

In section 2.3 it will be proved, independently of lemma 2.1.9, that all eigenvalues of modulus I are (integer) roots of unity. Using that result we can give a simple straightforward proof of lemma 2.1.9, based on (2.1.14): S.~nce l.:.Q,=O A.Q,d-J n E.Q,:!.. ~s. a per~o ~c. d' funct~on 0. f 'n, ~t must b ' de ~ ent~ca. 11y zero, as the remainder term of Cn v tends to 0 as n .... 00.

LEMMA 2. I.10. If for some subset T of S

(2.1.19) then (2.1.20) lim p(n)(Tlx) = 0 n-+oo lim Cn~(T) '" 0 n-+oo for aU XES,

o

PROOF. Put

I)~n)

:'" (CuB(T)) .• By lemma 1.2.6 the set

{I)~n)

Ij

E {I, ...,d,

J J J

n E IN} is bounded. Let a

l be an arbitrary limit point of the sequence

(I) (2) (3)

a

1 ,al ,al , . . . . Then there are limit points a2, ...,ar and a subsequence

(~)k=100 such that l~~-+oo• a(n.Q,) { } (2 ) .

j aj for all j E I, ..., r . From .1.19 ~t

follows that

(2.1.21)

r

La.

a. (x)

j=1 J J

o

(37)

the linear independence of the a.(x). Thus we

J

=

°

for j

=

I, and clearly this also holds for all

Hence a

l

= ...

=

ar

=

0, by have found that lim

a~n)

n->oo J

other j. But that is equivalent to (2.1.20).

I

As a consequence of lemma 2.1.10, if T satisfies (2.1.19), then the vector v = ~(T) satisfies the condition of lemma 2.1.9, so that B(T) satis-fies (2.1.17). As a special consequence we get

LEMMA 2.1.11. If the set T satisfies reration (2.1.19), then

(2.I.22)

L

I

en

~(T)

I

< a> • n=O

o

2.2. The eigenvalue I; classification of states

As in the case of finite Markov chains the eigenvalues A with IAI

=

are of special interest. They correspond directly to the structure of the chain.

In this section we consider the eigenvalue A_O

=

I. We shall see that its multiplicity, which by theorem 2.1.4 is at least I, determines the number of absorbing subsets. Most of the results about I-eigenfunctions that we need here, hold more generally for A-eigenfunctions with IAI

=

I. They are formu-lated as such for later use, particularly in the next section.

All subsets of the state space S occurring in the sequel of this chapter are supposed to be measurable.

We start by giving some definitions.

DEFINITION 2.2.1. If A is a nonempty subset of S such that (2.2. I) p(Alx)

=

for all x E A ,

then A is said to be erased. If a closed set cannot be split into two dis-joint closed sets, it is called absorbing.

If a subset T of S of positive m-measure (see (1.1.22» satisfies (2.2.2) lim P (n) (T Ix)

n->oo

°

for all x E T ,

then T is called transient.

(38)

In the literature there is no uniform terminology for classifying sub-sets of the state space of a Markov process. The terms introduced above are based on those used by FELLER [6J for finite Markov chains. In DOOB [5J

transient sets are defined by (2.2.2) with x E T replaced by XES; see lemmas 2.2.2 and 2.2.3 below.

First entrance probabilities will be very useful in our considerations. For n E IN and arbitrary A E S let

(2.2.3) x)

f(n)(Alx) is determined recursively by

(2.2.4) f (I)(AIx) f(n) (Alx) P(AI x) ,

f

f(n-I) (Aly)p(dYlx) S\A for n ~ 2 • We further define (2.2.5) f(A!x) :=

L

f(n)(Alx) , n=1

the probability that the set A is ever entered after a positive number of transi tions.

For formal reasons we introduce (2.2.6)

•

the indicator of A.

We first derive some results about transient sets.

LEMMA 2.2.2. A subset of positive ID-Pleasure of a transient set is transient.n PROOF. Follows immediately from the definition (2.2.2).

In the following lemma we apply the Fatou-Lebesgue lemma: if g,f

l,f2, ••• are measurable functions, such that Ifni ~ g for all nand

f

g dll < "', then limsup_n--

J

f dll_n ~

f

limsup_{. n-+co n}f dll. A proof can be found in LOEVE [23J.

(39)

LEMMA 2.2.3. If T is a transient set, then

(2.2.7) lim p(n)(Tlx) = 0 for

a~~

X E S .

n-+«>

o

PROOF. For XES \ T we have, taking pO) (TIY) := 0 if j < 0,

p (n) (T

I

x) n

J

f(k) (dylx)p(n-k) (Tly) (2.2.8)

_I

k=O T

I

_J

f(k) (dylx)p(n-k) (T!y) $ k=O T $

L

f(k)(Tlx) f(Tlx) $ I . k=O

Applying the Fatou-Lebesgue lemma twice (with majorants provided by (2.2.8» we obtain

(2.2.9) limsup p(n)(Tlx) limsup

_I

r

f(k) (dylx)p(n-k) (T!y) $

n+00 n+00 k=O J T s

_I

limsup

J

f(k) (dy!x)p(n-k) (TIY) $ k=O n+00 T

COROLLARY 2.2.4. The union of a finite nwriber of transient sets is transient.D

k

PROOF. Let T1, ..• ,T

k be transient sets and put T:= Uj=1 Tj• Then

(2.2.10) limsup p(n)(T!x) n +00 k $

I

limsup p(n)(T.!x) = 0 . j=1 n+0:> J I

(40)

LEMMA 2.2.5. The union of a countable number of transient sets

is

transient.D PROOF. Let T

I,T2, •.. be transient sets. Define

(2.2. II) T:= lJ j=1 T. J k U j=I T. J for k 1,2, • .. .

By corollary 2.2.4 the sets T(k) are transient for all k ~ I. Choose an arbitrary E > 0. Since li~~ T(k) = T and since the B

j are probability measures, there is a k

O= kO(E) such that (2.2.12)

As en is bounded for n E lN, there exists a constant c such that

Ici~)1 ~

c for all j,k E {l, •.. ,r} and all n E IN. Applying lemma 2.1.10 we find

(2.2.13) limsup p(n) (TI x) = limsup Ta(x)

en~(T) ~

n-+00 n-+00

~ limsup[T~(x) Cn~(T(ko»

+

T~(x) en~(T

\T(ko»J

~

°

+ rCE • n -+00

As E was chosen arbitrarily, it follows than T satisfies the relation (2.2.7).

REMARK 2.2.6. The lemmas 2.2.2 and 2.2.3, as well as corollary 2.2.4, are valid for general Markov chains; in their proofs no use is made of the special form of transition kernels of finite rank. Lemma 2.2.5, however, does not hold for general Markov chains. Take, for example, the denumerable Markov chain on S = {0,I,2, .•. } with transition matrix P given by

•

(2.2.14)

{;

for k = j +1

otherwise. Here the sets T

j

:=

{O,I, .•• ,j} are transient for all j ~ I, but

uj=l

Tj

=

= {0,I,2, •.. } S is not transient. The chain determined by (2.2.14) is not of finite rank.

Most of the results in the sequel of this section are not valid for general

(41)

THEOREM 2.2.7. A set Tis transient i f and only i f

(2.2.15)

L

P(n)(TIx)

n=O

< 00 for aU _{X E S} _,

point x. If T is transient this number is p(n)(Tlx) is the expected number of visits in which case the sum is even bounded.

PROOF. The implication (2.2.15)

*

(2.2.2) is trivial. Now suppose T is transient. Lemma 2.2.3 applies and hence by lemma 2.1.11 we get

(2.2.16)

As the a.(x) are bounded, the above expression is bounded.

J

00

For any set T the sum Ln=O

to T, the process starting at the finite, even bounded, for all X E S .

Here again a similarity appears between Markov chains of finite rank and finite Markov chains: there are no null-states, i.e. the situation

o

•

(2.2.17) lim p(n)(Tlx)

n--o

does not occur.

We remark that (2.2.2) and (2.2.15) are satisfied for any m-nullset T, p(n)(Tlx) being equal to 0 for all n

~

I. For this reason in definition 2.2.1 it is required that a transient set has positive m-measure; in general transient sets are taken modulo m-nullsets.

We now turn to closed sets. By definition, once the process has entered a closed set it remains there with probability I. A closed set may contain two or more absorbing sets; the process actually stays within the absorbing set that it has entered. An absorbing set cannot have arbitrarily small m-measure as in shown in

LEMMA 2.2.8. If Ais a closed set, and if sup Ala.(x)1 ~ L, then L > 0 and

XE J

(2.2.18) meA) ~ I

(42)

PROOF. By definition (2.2.19)

hence

I = p(Alx) T~(x)~(A) for all x E A ,

(2.2.20) r :<; L

I

j=1 B. (A) = L rm(A) • J I

COROLLARY 2.2.9. The nwr0eT' of disjoint absoT'bing sets is finite.

o

Many properties of closed and transient sets follow from eigenfunction considerations. The next lemma gives an important property of A-eigenfunctions with

I AI

= I.

LEMMA 2.2.10. Let A be a nonempty suhset of sand vex) a bounded function on

Asuch that, fOT' a

A

with

IAI

= I,

(2.2.21) AV(x)

f

v(y)p(dylx) A

fOT' aU x EA.

Then theT'e is at least one X

o

E A such that

(2.2.22) sup Iv(x)

I

= Iv(x

_o

)I

xEA

FUT'theY'1TloT'e, i f this supT'emum is positive then

(2.2.23) A

O := {xO E A Ilv(x

o

)

I = sup !v(x)I}

xEA

is a closed set.

o

PROOF. If v(x)

=

0 on A, the assertions are trivial. Otherwise, vex) can be normed such that sUPXEAlv(x)

I

= I. Assume this has been done and let (un):=I' with u E A for all n, be a sequence with lim Iv(u)1 = I.

n n-- n

As the aj(x) are bounded, there is a subsequence

(xR,)~=1

=

(unR,)~=l'

such that lim. a.(x.) =: ~. exists for all j. One easily verifies that

x,-><X> J x, J (2.2.24) r F:=

I

j=1 ~.B. J J

(43)

(2.2.25) Iv(x) I $: rlv(y) Ip(dYlx)

=

J A

T~(X)

f

Iv(y)

I~(dy)

A for all x E A

one deduces, by substituting x x

t and letting t + 00,

(2.2.26) $:

f

Iv(y) \F(dy)

A It follows that for A

O = {x E A I [v(x) I = I} we must have F(Aa) = 1. Hence

Aa is nonempty and the first assertion of the lemma is proved. Furthermore we obtain from (2.2.21) for x E A

O

(2.2.27) $: p(Aolx) +

J

Iv(Y)lp(dYlx)

A\A O

I f the integral does not vanish, it is less than peA \A

OI x), but that contradicts (2.2.27). Hence p(Aolx)

=

1 for all x E A

O' in other words: AO

is a closed set.

•

The conditions of lemma 2.2.10 are satisfied if vex) is a A-eigen-function with IAI

=

1 and A is a closed set. The absolute value of every such eigenfunction therefore takes on a maximum on each closed set. Note that, trivially, S itself is a closed set.

If for a fixed closed set A the function vex) satisfies (2.2.21) with a positive supremum over A and is normed in such a way that

(2.2.28) max Iv(x) I

=

1

=

v(x O) XEA

for some X

o E A ,

we say that vex) is normed on A.

In the special case A

=

1 another closed subset of A can be found, which is contained in the subset A

O given by (2.2.23). This is proved in LEMMA 2.2.11. I f A

nomed on A, then

1, i f A and vex) satisfy (2.2.21), and i f vex) is

(2.2.29) A

_O

:= {x E A I vex) 1 }

(44)

PROOF. As v(x) is normed on A, the set A

_O

is nonempty. Rather than (2.2.25) we now have

(2.2.30) v(x) Ta(X)

J

v(y)~(dy)

A

for all x E A,

from which we obtain

(2.2.31)

f

v(y)F(dy)

A

It follows that

AD

is closed, similarly to the last part of the proof of the

previous lemma.

I

o

I

LEt1MA 2.2.12. If A is an absorbing set then every I-eigenfunction is constant on A.

PROOF. Suppose v(x) is a I-eigenfunction, nonconstant and normed on A. Then ~(x) := 1 -v(x) is also a I-eigenfunction, nonconstant on A. Applying lemma 2.2.11 we find two disjoint closed subsets

AD

and

AD

of A, both defined by

(2.2.29) for v(x) and ~(x), respectively. This contradicts the assumption that A is absorbing.

The following theorem is the main result of this section.

THEOREM 2.2.13. The nurrber of disjoint absorbing sets equa?'s the muUipUcit;y of the eigenva?'ue A

O

=

1 of C.

0

PROOF. Let to be the multiplicity of A

O= 1. Suppose AI, ••• ,At are disjoint absorbing sets, and put A

O := S

\U~=I ~.

With f(Alx) as defined by (2.2.5) we put

(2.2.32) vk(x) := f(~lx) for all XES (k 1, ..•,t)

.

Clearly,

{:

for all x _E~, (2.2.33) v

k(x)

for all x E A. j

f-

k ,

J

hence the vk(x) are linearly independent. Furthermore,

(45)

this is trivial for x E U£ A_. while for x E A

Owe have k=1 -lc

(2.2.35)

_L

n=J

Thus we have constructed £ linearly independent I-eigenfunctions. hence £0 ~ £ by lemma 2.1.3.

Now suppose that £0 > £. Then by lemma 2.1.5 there exists a I-eigen-function w(x). independent of vI (x) •••.• v£(x). By lemcra 2.2.12 the I-eigen-function w(x) is constant on each set ' \ . say w(x) = c

k for x E ' \ . Consider the function (2.2.36) £ vex) := w(x) -

L

c kvk(x) k=l for XES •

This is a I-eigenfunction. vanishing outside A

O' From lemma 2.2.10 it follows that A_Ocontains at least one closed set. hence at least one absorbing set A£+I' which is obviously disjoint from AJ •.••• A£. This

completes the proof.

I

When we have determined the disjoint absorbing sets AI ••••• A£o. what can be said about the remainder set? The following theorem answers this question.

THEOREM 2.2.14. Let AI ••••• A£o be disjoint absorbing sets, where £0 is the multiplicity of A

_o

= I. Then the set TO := S \

IJ~~I

' \ is transient Or'

m-nuZZ.

0

£0

PROOF. Put A := Uk=1 ' \ . The function vex)

=

J is a I-eigenfunction. which by theorem 2.2.13 is linearly dependent of the vk(x). defined by (2.2.32). From (2.2.33) it follows that we simply have

(2.2.37) Hence (2.2.38) vex) f(A!x) £0 \' vk(x) . k~1

(46)

so that for all N~ and XES C2.2.39) £CAlx) N

1.

n=1 fCn)CAlx) +

1.

n=N+l

J

fCn-N)CAly)pCN)Cdylx) TO

I

f Cn) CAlx) + pCN)CTOlx) n=1 This yields

hence by definition TO is transient if m(T O) > O. C2.2.40) lim pCN)CTolx) = 0

N--for all XES ,

•

We have shown that S consists of to disjoint absorbing sets A1, ••• ,Ato and a transient remainder set TO' Each of the absorbing sets may, however, still contain a transient part. If one removes a transient set from a closed set, the remaining set is not necessarily closed, but we have

LEMMA 2.2.15. If A is a closed set and TeA is transient, then A \ T contains a closed set A_O' and A_Ocan be chosen in such a way that A \A_Ois transient.n

PROOF. Define A O :

=

{x

E A

I

1.

pCn) (T

I

x)

=

O} , n=O C2.2.41) TO := {x E AI

Y

p(n)cTlx) > O} n=O

For all x E A and fixed N~ 1 we have

pCn)cTlx) N-I p(n)cTlx) +

I

p(n-N)CTly)pCN)Cdy!x) C2.2.42)

_1.

_I

n=O n=O n=N TO N-I

J

00 p(n) CT/y)pCN) Cdylx) =

I

pCn)cTlx) +

_1.

.

n=O TO n=O I f we take N I (and x E A O) this yields

(47)

(2.2.43) for all x E A O •

By letting N

~

00 in (2.2.42) and noting that

~:=O

p(n)(Tlx) < 00 by theorem 2.2.6, it follows that (2.2.44) lim P (N) (A O

I

x) N~ I - lim p(N) (Tolx) N~

for all x EA.

From (2.2.44) we see that A

O is nonempty, hence by (2.2.43)

Aa

is closed, whereas TO = A \A_Ois transient (with m(T

O) ~ m(T) > 0). I

THEORElf 2.2.16. Let A be a closed set. There exist sets TeA and R := A \ T,

such that T is transient (or rrr-nuU) and Ris recurrent. These sets are uniquely determined modulo rrr-nullsets.

o

PROOF. Put (2.2.45) and (2.2.46)

T

:= {U c A

I

U is transient or m-null} , t := sup m(U) UET

I

The case t

=

0 is trivial. Suppose t > 0 and let T

1,T2, •.• be a sequence of sets in

T,

such that limn_~ m(T

n)

=

t. By lemma 2.2.5 the set T

:=

U

OO

T

~ n=1 n

is transient, while m(T)

=

t.

Now apply lemma 2.2.15: R = A\T contains a closed set A

Owith A\AO transient. Since T is a transient set of maximal m-measure, we have

Aa

=

R (modulo an rrr-nullset) and R is recurrent.

As for uniqueness: suppose T* and R* also satisfy the conditions. Then T UT* is an element of

T,

and by the construction of T it follows that T* \ T is an rrr-nullset. On the other hand, also T \ r* is an m-nullset, otherwise R* would contain a transient set. Hence T

=

T* and R

=

R* modulo an rrr-nullset.

Applying theorem 2.2.16 to the closed set S, we see that the state space of a Markov chain of finite rank consists of a recurrent part R and a transient part T, unique modulo a nullset.

In general, the state space of any lfarkov process can be split into a so-called conservative and a dissipative part. This is known as. the Hopf-decomposition, see e.g. FOGUEL [9].

(48)

By definition a set A is conservative if meA) > 0 and if for every subset A' c A of positive m-measure one has f(A'lx) for m-almost all x E A'.

In order to complete the Hopf-decomposition for a Markov chain of finite rank, we prove

LEMMA 2.2.17. Ais recurrent if and onLy if Ais conservative and cLosed. 0 PROOF. Suppose A is recurrent, and suppose there is a subset A' c A with meA') > 0 such that f(A'lx) < I for all x E A'. For a sufficiently small

a

> 0 the set

(2.2.47) T:= {xlf(A'lx) ~ I-a} cA' has positive m-measure. From

(2.2.48) for all x E A

we obtain for x E T, putting p(x) (2.2.9) , := limsup p(n)(Tlx), cf. (2.2.8) and n--(2.2.49) p(x)

~

J

sup p(y)f(dYlx) T YET sup p(y)f(Tlx) ~ yET (J -0) sup p(y) . yET

It follows that p(x)

o

for x E T, so that T is transient by definition. This contradicts the assumption that A is recurrent.

Conversely suppose A is conservative and closed, and let A' c A be any subset of positive m-measure. For m-almost all x E A' we have

(2.2.50)

L

p(n)(A'

I

x)

n=O

1+

L

p(n)(A'Ix)

n=1

I + I +

L

J

f(n-k) (A' Iy)p(k) (dylx) k=1 n=k+! A'

(49)

1 + 1 +

I

p(k) (A' Ix)

k=1

1 +

I

p(n)(A'lx) n=O

from which it follows that

~:=O

p(n) (A' Ix) 00 and hence, by theorem 2.2.6, that A' is not transient. ke conclude that A is a closed set, containing no transient subsets, and therefore is recurrent by definition. •

2.3. The eigenvalues of modulus I; periodicity

If A is a closed set (not necessarily absorbing) we can construct a new Markov chain of finite tank by restricting the original chain to A. On A the a.(x) may be linearly dependent, in which case the rank of the new

J

chain is less than the rank of the original chain; if so, we re-define the a.(x) and the B.(y) in order to obtain a standard representation for the

J J

new chain.

How about the eigenvalues and eigenfunctions? Suppose v(x) is a A-eigenfunction of the original chain, and suppose that its restriction v(x) to the closed set A is not identically O. Then v(x) is a A-eigen-function of the new chain, i.e.

(2.3.1) AV(X) =

J

v(y)p(dYlx)

A

for all x € A, v(x)

t

0 on A •

In this way eigenvalues are inherited by the new chain. On the other hand, if v satisfies (2.3.1) with jAI = I, it can be extended to a function on S, by (2.3.2) V(x) :=

I

n=1 A-n

J

v(y)f(n) (dy!x) A for x € S \A •

The extended function v is a A-eigenfunction of the original kernel, as one easily verifies. This means that no new eigenvalues of modulus 1 are intro-duced by restricting the Markov chain to A. Trivially, the multiplicity of A in the new chain cannot exceed the multiplicity of A in the original chain. In particular, the extension of v given by (2.3.2) is not necessarily the only one (unless S \A is transient).

On Markov chains of finite rank