Successive approximation for average reward Markov games

(1)

Citation for published version (APA):

Wal, van der, J. (1977). Successive approximation for average reward Markov games. (Memorandum COSOR; Vol. 7710). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1977

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Department of Technology

PROBABILITY THEORY, STATISTICS AND OPERATIONS'RESEARCH GROUP

Memorandum-COSOR 77-]0

Successive approximations for average reward Markov games

by

J. van der Wal

Eindhoven,April 1977 The Netherlands ;

(3)

J. van der Wal

Abstract. This paper considers two-person zero-sum Markov games with finite-ly many states and actions with the criterion of average reward per unit time. Two special situations are treated and it is shown that in both cases the me-thod of successive approximations yields an e-band for the value of the game as well as stationary e-optimal strategies. In the first case all underlying Markov chains of pure stationary optimal strategies are assumed to be uni-chained. In the second case it is assumed that the functional equation Uv

=

v + ge has a solution.

1. Introduction, Notations and some Preliminary Results

In this paper we deal with two aspects of average reward two-person zero-sum Markov games with finitely many states and actions. We will not go into the important and still unsolved question whether these games have a value within the class of all strategies. What we know is that in general they will neither have a value within the class of stationary strategies, nor in the class of Markov strategies (cf. Gillette [4J, Blackwell and Ferguson [1J).

It has been shown by Gillette [4J and afterwards by Hoffman and Karp [5J that the game has a value within the class of stationary strategies if for each pair of stationary strategies the underlying Markov chain is irreducible. This condition has been weakened by Rogers [6J and Sobel [9J who still demanded that the underlying Markov chain was unichained but allowed for some tran-sient states. Federgruen [3J has shown that the unichain restriction may be replaced by the condition that the underlying Markov chains corresponding to a pair of (pure) stationary strategies all have the ,same number of irreduci-ble subchains.

Here we give two results obtained from a successive approximations approach. In section 2 we show that in the situation of Rogers and Sobel (one recurrent chain and possibly some transient states) successive approximations yield e-optimal stationary strategies and an e-band for the value of the game. In or-der to find them we first apply Schweitzer's [7J data transformation to ob-tain an equivalent Markov game. In section 3 we consider games that do have a

(4)

2-value independent of the starting state. Using again Schweitzer's transforma-tion we show that successive approximatransforma-tions yield e:-optimal statransforma-tionary strate-gies.

First we specify the game and give a number of notations. We are concerned with a dynamic system with finite state space S := {1,2,.",N} which is

ob-served at times t = 0,1" ••. The behaviour of the system is influenced by two players, PI and P

2, having completely opposite aims. For each i E S two finite nonempty :sets of actipns exist, one for each player, denoted by K.

1.

for PI and Li for P₂, As a joint result of the state i and the two selected actions, k for PI and ~ for P_Z' the system moves to a new state j with pro-bability p(jli,k,!), where

L

p(jli,k,!)

=

1, and PI receives a (possibly

jES

negative) amount from P

_z

denoted by r(i,k,!).

Following Zachrisson [11] we shall call these two-person zero-sum games Mar-kov games. Many authors, following Shapley [8], use the term stochastic games, A strategy ~ for PI in this game is any function that specifies for each time n = 0,1, ••• and for each state i E S the probability ~(kli,n,hn) that action k E K. will be taken as a function of i, n and the history h , By the history

1 n

hn up to time n we mean the sequence hn = (iO,kO'!O, ••• ,in-l,kn~I'!n-l) of prior states and actions (h

O is the empty sequence), If all ~(kli,n,hn) are independent of h the strategy is called a Markov strategy, if moreover

n

~(kli,n,hn) does not depend on n the strategy is called stationary. A policy f for PI is any function such that f(i) is a probability distribution on Ki

for all i E S. Thus a Markov strategy prescribes a policy f for each time

n n and will be denoted by (fO,f

l, •.. ). And a stationary strategy is a Markov strategy with fn = fa, n

~

1 and will be denoted by f6OO)

Similarly we define strategies p and policies h for P 2,

We define V _n(~,p) as the total expected reward vector for the first n periods when strategies ~ and p are used, and. g(~,p) by

-I

g(n,p) = liminf n V (TI,p) • n

n-+«>

We say that the game has a value G if inf sup g(TI,p)

=

p ~

sup inf g(iT,p) = TI P

G •

(Instead of taking liminf in the definition we might just as well have taken limsup) •

A strategy TIe: for PI' is called e::-optimal, e: ~ 0, if inf g(TIt;;'p)

p

. T

(5)

. and a strategy p £ is called E:-optimal i f

sup g(n,p ) ~ G + €e •

£

n

A a-optimal strategy is called optimal.

For convenience we define the vector r(f,h) and matrix P(f,h) by

r(f,h)(i)

:=

L

I

f(i,k)h(i,t)[r(i,k,~)],

kEK. R.EL.

~ ~

(where f(i,k)[h(i,t)] denotes the probability by which action k[t] is selec-ted according to policy f[h]).

P(f,h) .. :=

I

f(i,k)h(i,t)[p(jii,k,t)].

~J kEK. R.EL.

~ ~

Since we will be engaged with successive approximations it is of notational N·

convenience to introduce the operators L(f,h) and U on~ by L(f,h)v := r(f,h) + P(f,h)v ,

Uv := max min L(f,h)v f h

(maxmin is taken componentwise). For a vector w E

~N

we define

6W := max wei), Vw := min wei) and spew) := 6W - Vw •

iES iES

Now we start with some preliminary results.

Lemma 1. If a policy f satisfies for all h L(f ,h)v ~ Uv then we have

v v

inf g(f(oo) p)

~

V(Uv - v).e

v '

p

and similarly if h satisfies for all f _v L(f,h)v _v ~ Uv then

sup

g(n,h~co»

s t:.(Uv - v)(i).e •

n

Proof. We only prove the first statement, the proof of the second is similar. Let PI play strategy

f~oo)

then P

2 may restrict himself to stationary strate-gies. (This may be shown. using

a

games version of Derman and Strauch's result [2], see for example the proof of theorem 2.3 in Stern [IOJ).

(6)

4

-or

for all h •

This follows immediately with

~ ••• ~ n.V(Uv - v).e - Av.e •

o

An immediate consequence of this lemma 18 the following

<20rollary 1.

i) If Uv - v

=

g.e for some gEm., then the game has value g.e and the poli-cies f and h from lemma 1 constitute stationary optimal strategies.

v v

~f h . l ' (* *) * "TflN * TD f h f t' 1

I.e. L t ere eX1sts a so ut10n v ,g ,v E = , g E ~ a t e unc 10na equation

( 1.1) Uv - v + g.e

* . *(00) *(00) . . f

then the game has value g .e and strateg1es f , h sat1sfYlng or all f and h

are optimal.

* * * *

~ L(f ,h )v (= Uv ) ~ L(f,h )v

* *

ii) If there exists a sequence {v } of vectors inm.N with sp(Uv - v ) + 0

n n n

(n + 00) then the game has a value and the strategy f(oo) with n

L(f ,h)v _n _n ~ Uv _n for all h will be sp(Uv - v )-optimal. _n _n

Proof. i) is straightforward. The only difficulty in ii) is that we must show that there exists a g* Em. such that Uv - v + g*.e (n + 00). But this is

im-n n

mediate once we realize that for any two vectors v,w EJRN

Part ii) of corollary approximations.

o

(7)

If we are going to apply the method of successive approximations then as in the case of Markov decision problems periodic behaviour will be. unpleasant. In order to eliminate any periodicity one may apply Scheitzer's transforma-tion [7].

Before we proceed to section 2 we will consider this transformation and we will show that it gives rise to an equivalent problem.

Assume we have a specific Markov game with data r(.,.,.) and pC.

I.,.,.).

Now we may transform the data using Schweitzer's transformation as follows

(0 < 0'. < J)

N.,.,.)

:= (] - a)r(.,.,.)

p(jli,.,.)

:= a~ .. + (1 - a)p(jli,.,.)

~J

That these two games are (in many respects) equivalent follows from the fol-lowing lemma. The operators L(f,h) and U and the function g(.,.) for the transformed problem are denoted by

L, U

and

g.

Lemma 2.

...

i) L(f,h)v - v (I - a)[L(f,h)v - vJ for all f and h .

j i) Uv - v

=

(I - a) (Uv - v) • Proof.

i) From f(f,h) = (1 - a)r(f,h) and P(f,h) = aI + (I - a)P(f,h) we have L(f,h)v - v

=

f(f,h) + P(f,h)v - v A

=

(1 - a)r(f,h) + alv + (I - O'.)P(~,h)v - v

=

(1 - O'.)[r(f,h) + P(f,h)v - vJ

=

(I - O'.)[L(f,h)v - vJ •

il) Since 1 - a ~ 0 the second statement follows now inunediately.

So if the functional equation of the original problem has a solution, say

*

* .

-g ,v , then the funct~onal equation of the transformed problem Uv

=

v + g.e has the solution (1 - a)g*,v*. Similarly, if

Uv

=

v

+ g.e then also

U... A (1 )-1 ....

(8)

6

-~ -~

And if for example for all h L(f,h)v - v ~ ge - Ee, which implies by lemma 1

g(f(oo) ,h(oo»

~

ge - Ee, or £(00) is E-optimal in the transformed problem then

£(00) is also (I - a)-IE-optimal

~n

the original problem as follows from:

Thus we see that both problems are equivalent with respect to those features which are important when we apply successive approximations. And therefore we will consider in the remainder of this paper only games satisfying for some

o

< a < 1 and all f and h

(1.2) P(f,g) - aI ~ 0 .

2. The Unichained Model

In this section we consider Markov games satisfying the following assumption:

Unichain assu~ption. For all pure stationary strategies the underlying ~furkov

chain will consist of one recurrent subchain and possibly some transient sta-tes.

We will approximate the value (which is independent of the starting time) and find E-optimal strategies by means of the method of successive approximations.

We do this by showing that under the extra assumption P(f,h) - aI ~ 0 for some

N n+1 n

a > 0 and all f and h for any v E ~ sp(U V - U v) tends to zero

geometri-cally as n -+ 00.

First we will derive some inequalities from which it will be clear that the main theorem of this section guarantees the convergence of sUGcessive appro-ximations.

N

Let v E ~ be given arbitrarily and let the policies fk,h

k satisfy

for all policies f and h, k

=

1,2, . . . .

Then for all n

(9)

3nd similarly (2.2)

Let us denote for an arbitrary pair of strategies TI,p the probability that at time n the system occupies state k given that the state at time 0 is i and strategies TI and p are used by

Let further TI1[P

1] denote the strategy (fn, ••• ,f2,fl,fl, .•. )

[(hn ,···,h2,hl,h1,···)] and TI2[P2] the strategy (fn+l, ••. ,f3,f2,f2"") [(hn+l,···,h3,h2,h2'···)]·

Then for example

Now we have for all i,j E S

n+J n n+l n TI₂,p) TI₁,P2

(2.3) (U v-Uv)(i)-(U v-Uv)(j)s

ICIP.

(S==k)-:JP. (S=k)](Uv-v)(k)

kES ~ n J n

TI₁,P2 TI₂,PI TI_{1 ,P2}

L

IJP.

(S =k) - min{:JP. (S =k),lP. (8 =k)}] (Uv - v)(k)

kES J n 1 n J n == 1 -

L

kES Hence n+ 1 n , . \' sp (U v - U v) s J - mm L i,j kES

(10)

8

-In theorem 2 we will prove that under the assumptions of this section there exists a y > 0 such that for all ~~p~~',pt:

(2.4) min \ minup~,P(8

=

k)~~ ,p

,

(8

=

k)} ~ y •

kLE8 ~ N-1 J N-l

i,j

Then we have from (2.3) and (2.4)

n+l n

Thus sp(U v - U v) tends to zero geometrically. Combining this with (2.]) and (2.2) we get the following

*

E-mN

Theorem I. If (2.4) holds for some y > 0 then for some g E:R, v =

i) if+lv - Unv = g .e

*

+ 0« I _ y) n/N-l ) (n -+ (0)

ii) Unv == ng .e + v +

*

_*

0«1 _ _{y)n/N-I) (n}-+ (0)

iii) Uv

*

= v + g.e

*

.

n+1 n

Proof. i) follows immediately from the geometric convergence of sp(U v - U v) and (2.1) and (2.2) from which we have for all n and v:

Now ii) follows from i) and iii) from ii).

o

It is a direct consequence of theorem Ii) and lemma that we will also find C(l - y ) n/N-l -opt~ma stat~onary strateg~es ~n . l ' . . step n 0 f a success~ve . appro-ximation procedure where C is some constant depending on the scrapvector v.

It remains to prove our main theorem

Theorem 2. If both the unichain assumption and assumption (1.2) are satisfied then there ts a y > 0 such that for all Markov strategies ~,p,~',p' and states ~,J E 8

(2.5) \ _L _m~nl,.lt".• J-m~'P( _SN-l

kES ~

Proof. First we prove that the left hand side ~n (2.5) i,j E S and pure Markov strategies.

(11)

\.Jc will sometimes use that 'IT,p,'lT',p' may be denoted by (£O,f\, ... ),

(hO,h1, .•• ),(fO,f;, .•. ) and (hO,hi, ••• ) respectively. Now fix the pure Markov strategies 'IT,p,n',p' and defined V

N-1 := {(s,s) Is E S} and Vn := {(t

1,t2) t1,t2 E S and there exists a pair (sl,s2) E Vn+1 such that

f ,h

• fro n n(

ml.n 1..11: t S 1

I

n = N-2, ••• ,0 •

Then Va becomes the set of all pairs (i,j) for which the left hand side of (2.5) is nonnegative. So what we have to show is Vo

=

S2, This may be done as follows. Define

v

(s) := {t E S I (t,s) E V } •

n n

Nmv we prove by induction, writing

Iv

(s)1 for the number of elements in n

v

(s),

n

(2.6) IV (s)1 ?: N-n,

n n

=

N-], ••• ,O •

First observe V I c V because of assumption (1.2) thus

n+ n

(2.7)

By definition we have IVN_1(s)1

=

l{s}1

=

I.

Now assuming (2.5) holds for k

=

N-1, ..• ,n we prove it to hold for n - 1.

We distinguish three cases

a)

Iv

(s)1 n b) IV (s)1 n c) IVn(s)1 > N - n then from (2. 7)

I

V n _ 1 (s)

I

N - n,

I

V n- I (s)

I

?: N - n + 1.

=

I

V n-l (s)

I

= N - n. ?:N-n+1.

It is clear that in order to prove (2.5) it is sufficient to show that c) leads to a contradiction. Assume c).

We will write V for the complement of a subset V of S. Clearly for all t E Vn_l(s), and all (sl,s2) E Vn

(2.8)

since otherwise (t,s) E V I and thus t E V l(s).

(12)

n-- 10

-ff h'

Specifying (2.8) for (sl,s2)

=

(sl's) and us~ng . lPs n-t' n-l (SI

=

s) ;;:: a > 0

f n- 1 ,hn - t

we see Pt (SI

=

sl)

=

0 for all sl E Vn(s).

This implies that under the pair of stationary strategies fn-1,h_n- _t

V

_n(s) =

V

_n-l(s) contains a recurrent subchain. Thus by the unichain tion there must exist for any pair of pure policies f and h a state

the set as

sump-s* E V l(s) and a state t E

V

l(s) such that pf,hCS

1

=

t) > O. Otherwise

n- n- s*

it would be possible to construct from f' I,h' I and f,h a third pair of

po-n-

n-licies with at least two different recurrent subchains. So we see that there

e~~stsh~ state SN-2 E Vn_l(s) and a state t N- 2 E Vn_l(s) such that p N-2' N-2(S

=

t ) > 0, hence (t

N-2,sN-2) E VN-2,

sN-2 I N-2

But then again there must be a state sN-3 E V

n_l(s)\{sN_2} and a state

f' h'

t E Vn_l(s) U

{sN~2}

such that

Ps:=~'

N-3(SI

=

t) > 0, and also a state t N- 3 E Vn_l(s) such that (t

N-3,sN-3) E VN-3• Continuing to reason in this way one shows that there are at least N - n different elements s 1 E V

n_ 1 (s) satisfying for some t E

V

n-l (s) (t,sl) E Vn-t, Hence, since IVn-1 (s)

I

=

N- n and s E V l(s),also (t,s) E V I for some t E

V

I(s). But this would

im-n- n-

n-ply t E V

n- 1 (s). Contradiction. So we conclude IVn_1 (s) I ;;:: N- n + 1. Hence 2

VO(s)

=

S for all s E S and therefore Vo

=

S ,

Since S, K., L. are all finite there must exist a y > 0 such that for all ~ ~

pure Markov strategies TI,p,TI',p' (2.9)

Moreover it is fairly obvious that the minimum of the left-hand side of (2.9) within the set of all Markov strategies equals the minimum within the set of

pure Markov strategies TI,p,TI',p'. And this completes the proof.

o

So we see that if the unichain assumption is satisfied and also 1.2 holds (for example as result of Schweitzer's transformation) then the method of successive approximations yields an e-band for the value of the game as well as stationary c-optimal strategies. And this exponentially fast.

(13)

3. The Functional Equation Uv. '" v + &.e has a Solution

In this section we assume that the functional equation Uv

=

v + g.e has a solution v*,g*. Moreover we assume throughout this section that (1.2) holds, i.e. P(f,h) - aI ~ a for some a > a and all f and h. We will show that in this case the method of successive approximations again yields an €-band for

*

the value g and stationary e-optimal strategies.

First we show that if we take an arbitrary scrap vector v E]RN then the se-quences {Unv - ng*e} and {Un+1v - Unv} are bounded. Hence they both have con-vergent subsequences with limits, say

v

and

g.

Then, essentially using (1.2), we show

g

=

g*.e. And finally we conclude that the above sequences are con-vergent.

N

Choose va E]R arbitrary and define vn+I := _{UVn and}

8u

:= _{vn+l - vn '} n=O,I, ••••

As in the proof of theorem lone easily argues (3. 1)

As a consequence the sequence {gn} nE::N contains a convergent subsequence. We also need the following inequalities formulated in

Lennna 3.

*

ng e + v + V(V

O - v ).e S vn ~ ng e + v + ~(vO - v ).e ,

*

where g ,v is the solution of (1.1).

*

Proof. va - v ~ V(V_O- v ).e, hence

n _.0

*

vn

=

U va ~ u (v + V(V

O - v ).e)

=

ng e + v + V(VO - v ).e •

Similarly one proves the second inequality.

o

*

So also any subsequence of {v - ng e} ~T has a convergent subsequence.

n nt:JL'I

Now let

v

~e a limit point of the sequence {v - ng*e} ~T and

n nt:JL'I

{ v~ - ~g *} e a subsequence converg1ng to v. Let further g . , . . , , . . , € ]R Nb e a 1m1t-1"

point of the sequence {~ } and {g} a subsequence of {~ }

converg-K k€lN nR, .tdN K

(14)

Uv'" IV

Lemma 4.

=

v + g •

Proof. U(v - n~g

*

e)

=

n~

Taking the limit for ~ + 00 and

to interchange sums and limits

- 12

-using the finiteness in U(v - nog*e) we

n~ IV

of 8, Ki and Li' i .~ ,..., '" get uv

=

v + g.

Now we must show

g

=

g .e. For this we need some additional lemmas.

*

Lemma S. For all £ > 0 and all M €

m

there exists an n > M such that

€ 8

Proof. Let L be sufficiently large such that II v - n~g*e - vII <

el2

for

n~

~ ~ L. Then with n = n~+k - n~, ~ ~ L, k such that n ~ M

II U v - v - ng ell n...

*

=

lIu _.n,..., v - U (v n n9,

n,.., n

*

~ II U v - U (v - n g e ) II + II v - v - nHkg ell <: £

n9, 9, n_t+_k

where we used nUnv - Unwll ~ IIv - wll for all n,v,w.

Lemma 6. For all £ > 0 and for all M E

m

there exists an n > M such that

Proof. Choose n as in lemma 5 then

o

II U n+1.... ..n.... '" v - u v - gil

=

II U(U v) - U(v - ng e) n.... ...

*

+ .~,.., uv - v -

"'''''''

g + v + ng e - U v

*

n....11 s; 2 0 £.

Lemma 7. ~ ( U k+1", v - U v k...)

=

~g .... for all k = 0,1, . . . .

Proof. Obviously

~(~+lv

-

ukv)

~ ~(UV

-

v)

=

~g.

80 it remains to show

(15)

€: ,

hence with

~(Un+l~ -

unv)

~ 6(Uk+1

v - ukv) ,

we have

for all €: > 0 so

which completes the proof.

o

This lemma is especially important since we may combine it with lemmas 8 and

9 below.

Define 81n := {i

I

gn(i)

=

8Sn}'

Lemma 8. If 8Sn+l

=

~Sn then ~ln+] c 6In •

Proof. Let fn+l and hn satisfy L(fn+1,h)vn ~ _{vn+l and L(f,hn)vn_ t} ~ vn for all f and h. Since P(fn+1,h_n)

=

(1 - a)Q(f

n+1 ,hn) + aI, where Q(fn+1,hn) is a stochastic matrix as well, we have (cf. 2.1).

hence for i E 8I n+1

So with 8gn+1 = 8Sn we get Sn(i)

=

6Sn' which implies

*

Lemma 9. If 8Sn

=

6~ for all n ~ N then 6g

N

=

g •

i E 61 • n

Proof. From 81 _n+1 c 81 and 61 _n _n ~

0

we see that there must exist an

i E n 81. For this i we have n

n~N

With lemma 3 this implies ~gN

=

g*.

o

(16)

14

-Now we are able to prove the main result of this section.

Theorem 3.

g

=

g

*

.e.

Proof. Define

Vo

:=

v.

Then we obtain with lemma 6 and the reasoning of lem-mas 3 and 9

6g

= g*. Adapting lemma's 9 and 7 one may also show

Vg

=

g*.

'"

*

Hence g

=

g .e.

Since the limitpoint

g

is a constant vector,

g

is also the limit of the se-quence gn (because of the monotonicity of

680

and V8o)'

*

And also

v

is the limit of the sequence {v - ng e} as follows from n

Theorem 4. vn ...

v

+ ng*e + 0(1) (n + 00).

o

Proof. Obviously since g

=

g*e UP'V - v

=

pg*e for all P € E. Nnw fix € >

O.

Let k satisfy IIv~- ~g*e -

vII

< € then for all n > _{nk , writing p}III n - nk • *) Up,.., UP"" ""v - pg*e II

~g e - v +

v-:::; "v - n g * e - 'V II + II UP'V - 'V - P g * e II < € •

~ k

* "'"

Hence for all £ > 0 there exists a K such that for all n > K IIv -ng e-vll< €.

n

This completes our proof.

0

So we see that if the functional equation 1.1 has a solution then the method of successive approximations yields an €-bandfor this value and stationary £-optimal strategies for both players.

Probably the convergence is again exponentially fast but we have not proved this yet.

(17)

References

[1] D. Blackwell and T.S. Ferguson~ The big match, Ann. Math. Statist. 39 (1968), pp. 159-163.

[2] C. Derman and R. Strauch, A note on memoryless rules for controlling

sequential control processes, Ann. Math. Statist.

1L

(1966), pp. 276-278.

[3] A. Federgruen, On N-person stochastic games with denumerable state space, Mathematical Centre report BW 67/76, Mathematical Centre, Amster-dam, ]976.

[4] D. Gillette, Stochastic games with zero stop probabilities, Contribu-tions to the theory of games, Vol. III, M. Dresher, A.W. Tucker, P. Wolfe, eds.~ Princeton University Press, Princeton, New Jersey, pp. 179-187.

[5] A. Hoffman and R. Karp, On non-terminating stochastic games, Management

Science, ~ (1966), pp. 359-370.

[6] P.D. Rogers, Non-zero sum stochastic games, Ph. D. thesis submitted to the University of California, Berkeley, 1969.

[7] P.J. Schweitzer, Iterative solution of the functional equations of un-discounted Markov renewal programming, J. Math. Anal. Appl. 34

(1971), pp. 495-501.

[8] L.S. Shapley, Stochastic games, Proc. Nat. Acad. Sci. USA, 39 (1953), pp. 1095-1100.

[9] M. Sobel, Noncooperative stochastic games, Ann. Math. Statist. ~ (1971), pp. ]930-]935.

[10] M.A. Stern, On stochastic games with limiting average payoff, Ph. D. thesis submitted to the University of Illinois, Chicago, ]975.

[II] L.E. Zachrisson, Markov games, Advances in game theory, M. Dresher, L.S. Shapley, A.W. Tucker, eds., Princeton University Press,