Nonnegative matrices, generalized eigenvectors and dynamic programming

(1)

Nonnegative matrices, generalized eigenvectors and dynamic

programming

Citation for published version (APA):

Zijm, W. H. M. (1980). Nonnegative matrices, generalized eigenvectors and dynamic programming. (Memorandum COSOR; Vol. 8013). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1980

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Department of Mathematics

PROBABILITY THEORY, STATISTICS AND OPERATIONS RESEARCH GROUP

Memorandum COSOR 80-13 Nonnegative matrices, generalized eigenvectors and dynamic programming

by

W.H.M. Zijm

Eindhoven, September 1980 The Netherlands

(3)

Uonnegative matrices, generalized eigenvectors and dynamic programming

by V.H.M. Zijm

*

Abstract

In this paper we present a· detailed analysis of the structure of a set of nonnegatiye matrices (not necessarily stochastic) which

plays a role in several dynamic programming recursions (~~rkov

decision processes, Leontief substitution systems). We investigate the asymptotic behaviour of these recursions and give an appli-cation, arising from the study of sensitive optimality criteria in l'1arkov dec i sian proces ses.

Zusammenfassung

In dieser Arbeit gebe3 wir eine detaillierte Analyse der Struktur einer !-jenge von nicht-negativen Matrizen (nici1t notwenJig stochas-tisch), die in verschiedenen Rekursior:sgle:'chungen bei dynamiscilen

Progranmlierungsproblemen eine Rolle spielt n~rkoffsche

Entscheidungs-prozesse, Leontief Substitutionssysteme). lJir behandeln das asympto-tische Verhalten der Wertiteration bei diesen Problemen und geben

eine Anwendung die sich bei der Untersuchung VO::l sensitiven

Opti-malitatskriterien in Markoffschen Entschei<.lungsprozesseri' erglbt.

*

University of Technology ~indhoven

Department of liathematics P.O. Box 513

(4)

Nonnegative matrices, qeneralizcd eigenvectors and dynamic programming

by

W.H.f4. Zijm

1. Introduction

In this precis we present results of the author's work concerning the theory of sets of nonnegative matrices and its applications to dynamic programming recur-sions which appear in growth systems. The theorems will be stated without proofi a more detailed treatment can be found in [16],[17J,[18J and [19]. We conclude with an application, arising from the theory of Markov decision processes. Consider a set M of matrices, which is generated by all possible interchanges

1

of corresponding rows, taken from a fixed finite set of nonnegative N x N-matrices. In this paper we study the asymptotic behaviour of the utility vector x(n) (a col-umn N-vector), defined by the following dynamic programming recursion

(1) x (n + 1) max P x(n) P<'M

n=O,I,2, •••

where x(O) is a fixed, strictly positive vector. From the structure of M i t is ob-vious that we may take the maximum component-wise in (1).

N:1nnegative matrices, and especially recursion (1), play an important role in a wide range of applications, e.g. additive Markov decision theory (Bellman [IJ, Howard [4]), risk-sensitive (multiplicative) Markov decision chains (Howard and Matheson [5], Rothblum [7]), controlled multitype branching processes (Pliska [6J) and Leontief substitution systems (Burmeister and Dobell [2J). Compare also

Sladky [1~],[11].

Afte'r introducing some notational conventions and recalling some well-known results about nonnegative matrices, we will state the main results and point out a number of important special cases. In the final section we give an application, dealing with sensitive optimality criteria in Markov decision chains.

2. Preliminaries

A matrix A is called nonnegative (positive) denoted by A c 0 (A »0) - if all its coordinates are nonnegative (positive). A is called semi-positive - denoted by A > 0 - if A c 0 and A # O. Similar definitions apply to vectors. By [AJ. we denote

~

the i-th row of A, by [AJ

(5)

2 We will refer to the indices l, ••• ,N as states; S

=

{l, ••• ,N} is 'called the state space. The set M of nonnegative matrices is defined as follows:

Let V C S, P

1,P2 € M then P, defined by [P]i

=

[P1]i for i € V, [pJi

=

[PiJi

for i

i

V is also an element of M.

Let (1(1') be the spectral radius of P (~ M. Then o(P) is the largest positive eigen-value of P and we can choose the corresponding eigenvector ~(P) > O. If P is ir-reducible then even ~ (P) »0 and o(P) is a simple eigenvalue (Berron-Frobenius theorem). If P is reducible then, eventually after permuting the states we may write P

=

P_{12 • • •}Pls 1>22 • _P2s • • P ss

with P .. irreducible with spectral radius

~~ 0i(P), for i

=

1, .•• ,s. We say that p .. ~~

has access to P!! if have P

k k > 0, j

j - l ' j

for some sequence of integers kO 1, .•• ,0. The sequence {Pk.,k.;

J J

= i < k 1 < • • • < k 0

=

.t we

j = l, •••

,o}

is called a chain. Furthermore p ..

~~ is called basic, resp. non-basic if o. (P) ~

=

o(P), resp.

o. (P) < o(P). It is well-known (compare

~ e.g. Gantmacher [3]) that ~(P) »0 if and

only if each non-basic class of P has access to some basic class, whereas no basic class has access to any other irreducible class of P.

The length of a c~ain is the number of basic classes it contains. The index v(P) of P is defined as the length of its longest chain of irreducible classes (com-pare Rothblum [8 J) .

Having these concepts, we are ready to formulate our results in the next section; the generality of these results will be discussed in the remarks after the theo-rems and will be illustrated in section 4.

3. The main results

Our first theorem deals with a decomposition result for sets of nonnegative matri-ces, In fact it implies a far-reaching generalization of the Perron-Frobenius theory for nonnegative matrices.

Theorem 1: There exists a matrix

P

€ M, with spectral radius

a

and index

V,

and a

partition {DO,D1, ..• ,D

v}

of the state space S,such that after eventually

(6)

3 / / '

r

PA A PA A •

.

p ... _P... 0 V,V v,v~1 v,1 V, P ... 1 ... 1'" _v- _I_v- p ... _v-l~l p ... _v-l,O (2) P

PI,l

P_ltO

t

PO,O

where P . . is defined on D. x D. and P . . = 0 for i < j, i,j =

O,.,.,V

and for all

~,J l. J ~,J

P c M. Furthermore there exist vectors ]1(1), ••• ,]1(v) such that

(3) max P ll(k) P " ]1(k) =

&

Jl(k} + Jl (k + 1) k =

_L""v-l

PEM

(4) max P Jl (v)

₌

P

lJ (v) (J ll( \»

Pc:M Let lJ

i (k) denote the restriction of lJ(k) to D. l. (k = 1, •.. ,Vii = O"",

V),

The vec-tors lJ(k) can be chosen in such a way that for k = 1

,,,,V

(5) lJ i (k) Finally we have (6) max PEM »0 for i ~

=

k lli (k) 0 for i < k

.

A < (J

The proof of this theorem may be found in [8]. For a precise description of the nature of the decomposition we refer to [16].

Remark 1: Notice that (3), (4) and (5) together imply

and since JJ k (k) »0 we find max PEM "-(J JJkCk) k

=

l, •••

,v

k = 1 , . . .

,v

o

(7)

4

Remark 2: Suppose all P I M are irreducible. Then

V

from (4) and (5)

1 and DO = ~ and we conclude

max P 11 (1) PcM

P

\J(1) =

a

\J(1) »0

a well-known extension of the Perron-Frobenius results to sets of irreducible non-negative matrices (compare e.g. Sladky [10J).

Remark 3: Suppose M contains only one (reducible) matrix P (hence P =

Pl.

Then P

is characterized by a set of nonnegative vectors lICk) ; k = l, ..•

,v

which are call-ed generalizcall-ed eigenvectors in this case. If in addition every non-basic class of P has access to some basic class of P (i.e. DO =~) then the generalized eigenvec-tor of highest order (\J(1» can be chosen strictly positive.These results (exten-sion of the Perron-Frobenius theory to generalized eigenvectors) are also proved by Rothblum [8].

Remark 4: It will be clear that we may formulate similar results for the set of matrices PO,O' defined on DO x DO' and with respect to a_O= max a{Po,o)'

Continu-PcM

ing in this way we obtain the complete block-triangular decomposition as i t is formulated in [18J.

The results of theorem 1 are exploited in [17J and [19J to study the asymptotic behaviour of x(n), defined by (1). To avoid complexity in the notations we make one additional assumption, namely: all matrices which occur infinitely qften.as a maximizer in the dynamic programming recursion (1), are aperiodic (i.e. a(p) is the only eigenvalue on the spectral circle of P) •

Theorem 2· Let V'a and {DO,D., •.. D,,} be defined as in theorem 1. There exist

vec-(Tj

(v) ~ \l

tors x , •.• x such that

(7)

with p <

a.

Let furthermore

x~k)

denote the restriction of x(k) to Di for k = 1, ••.

,v ;

i =

O"",V.

We have

(8) _~(k) »0 i _Xi(k) 0 for i < k k = 1, ... ,\; i

=

_O"",V

Finally the vectors x (k)

,

k _"" l, ...

,v

satisfy the functional equations

(9) max P x (v) (J x

( v)

FcM

(l0) _max P _x(k) "( (k) _(J _x ₊_x(k+l) ) _k ₌ _{1, •.. ,}

v-l

(8)

5 Remark 5: Since Pi,O ==

a

for i >

a

and for all P, the behaviour of xOCn) (the res-triction of x(n) to DO} is only influenced by the matrices PO,O' In view of re-mark 4 i t follows that the behaviour of xO(n) can be described more precisely by an asymptotic expansion in 0

0 max

PcM complete result, described in [19J.

o (Po, 0)' Continuing in this way we get the

Remark 6: Analogous, but more complex, results can be formulated if we drop the aperiodicity assumption stated above.

Remark 7: Once having the block-triangular decomposition, equations (9) and (lO) can be used to develop policy iteration procedures. Furthermore (7) makes i t pos-sible to estimate

&

and

V.

These subjects will be treated in a forthcoming paper.

4. An application:sensitive optimality

Suppose we have a discrete time Markov decision process (M.D.P.) with finite state space and finite action space. In other words: each P € M is row-stochastic

n

(.L

t [pJ ..

=

1), whereas with each P there is associated a reward-vector r(P).

As-J= ~J

sume reP} ~ O. Consider (for fixed k E {O,l,2, ••• }) the following dynamic program-ming recursion

(11) (k)

vn+l n=O,1,2, ...

Van der Wal [14J showed t~at these recursions play an important role in the study of so-called k-'average optimality criteria in M.D.P., a concept introduced by Slad-ky [9J, as an extension of Veinotts' [13Jovertaking optimality criterion (compare also Sladky [12J, van der Wal and Zijm [15J). We are interested in the asymptotic

(k)

expansi01 of v • However, by a simple trick we may reformulate (11) into

n (k) P rep) v n+l (n+l\ 1 1 \ k ) (n+l) 1 1 k-l max PcM n+l

I

1 • 1 1 1 n 1

hence theorem 2 gives the complete asymptotic behaviour of v(k)

n

Note that equations (9),(10) turn into the'well-known'policy-iteration equations for k-order average optimal policies (for k

=

0 we deal with Howards'equations for ave-rage optimal policies) •

(9)

References / ' /

. /

[1] Bellman, R., Dynamic Programming, Princ.Univ. Press, Princeton, New Jersey, 1957.

[2] Burmeister, E. and R. Dobell, Mathematical Theories of Economic Growth, Mac-Millan, New York, 1970.

6

[3J Gantmacher, F.R., Applications of the theory of matrices, translated from Rus-sian by J.L. Brenner, Interscience Publishers, Inc., New York, 1959.

[4] Howard, R.A., Dynamic Programming and Markov processes, Wiley, New York, 1960. [5] Howard, R.A. and J.E. Matheson, Risk-sensitive Markov decision processes, Man.

Science ~, p. 356-369 (1972)

[6] Pliska, S.R., Optimization of Multitype Branching Processes, Man. Science ~, 2, p. 117-124 (1976).

[7J Rothblum, U.G., Multiplicative Markov Decision Chains, Ph.D. Dissertation, Dept. ·Oper. Res., Stanford Univ., Standford, Calif., 1974.

[8] Rothblum, U.G., Algebraic Eigenspaces of nonnegative matrices, Lin.Alg. and

Appl. ~, p.281-292 (1975).

[9J Sladky, K., On the set of optimal controls for Markov chains with rewards, Kybernetica

lQ,

p.350-367 (1974).

[10] Sladky, K., On dynamic programming recursions for multiplicative Markov

deci-sion chains, Math.Progr.Study~, p.216-226 (1976).

[11] Sladky K., Successive approximation methods for dynamic programming models, Proc. Third Formator Symposium on mathematical methods for the analysis of Large Scale Systems, Prague, p.171-189 (1979).

[12] Sladky, K., On successive approximation algorithms for Markov Decision Chains, presented at the 6th Conference on Prob. Theory, Brasov, Roumania, september

1979.

[13] Veinott,A.F., On finding optimal policies in discrete dynamic programming with no discounting, Ann. Math. Stat

22,

5, p.1284-1294 (1966) •

. [14] van der Wal, J., Stochastic dynamic programming, Math. Centre Tract, Mathema-tisch Centrum, Amsterdam, 1980.

[15J van der Wal, J. and W.H.M. Zijm, Note on a dynamic programming recursion, Mem. COSOR 79-12, Eindhoven University of Technology (1979).

[161 ZLjrn, W.H.M., On nonnegative matrices in dynamic programming, Mem.COSOR 79-10, Eindhoven University of Technology (1979).

_.

_. _.

_{. .}

[17] Zijm, W.H.M., Maximizing the growth of the utility vector in a dynamic pro-gramming model, Mem. COSOR 79-11, Eindhoven University of Technology (1979).

(10)

7 [18J Zijm, W.H.M., Generalized eigenvectors and sets of nonnegative matrices, Mem.

COSOR 80-03, Eindhoven University of Technology (1980).

[19J Zijm, W.H.M., Asymptotic behaviour of the utility vector in a dynamic program-ming model, Mem. COSOR 80-04, Eindhoven University of Technology (1980).