Stochastic games with metric state space

(1)

Citation for published version (APA):

Couwenbergh, H. A. M. (1978). Stochastic games with metric state space. (Memorandum COSOR; Vol. 7805). Technische Universiteit Eindhoven.

Document status and date: Published: 01/01/1978 Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

EINDHOVEN UNIVERSITY OF TECHNOLOGY

Department of Mathematics

PROBABILITY THEORY, STATISTICS AND OPERATIONS RESEARCH GROUP

Stochastic games with metric state space by

H.A.M. Couwenbergh

Memorandum COSOR 78-05

Eindhoven, February 1978

(3)

H.A.M. Couwenbergh Abstract

In this paper the stochastic two person zero sum game of Shapley is

considered, with metric state space and compact action spaces. It is proved

that both players have stationary optimal strategies. under conditions

which are weaker than those of Maitra and Parthasarathy (a.o. no compact-ness of the state space). This is done in the following way: we show the existence of optimal strategies first for the one-period game with general terminal reward, then for the n-period games (n

=

1,2 •••• ); further we prove that the game over the infinite horizon has a value v, which is the

limit of the n-period game values. Finally the stationary optimal strategies are found as optimal strategies in the one-period game with terminal reward v.

I. Introduction

The stochastic games we consider are non-cooperative two person zero sum games with discrete time parameter (originally introduced by Shapley, ref. [9]). This means that a system is given with a set of states S, and two so-called action spaces: A for player I, B for player II. The system is started at time t

=

0 in a state s € S; both players choose

an action: a € A, b € B. As a consequence of these actions, player! receives a "reward" r(s,a,b) (this may be negative) from II, and the system moves to a new state s' according to a (sub-) probability measure p(-Is,a,b) on S (in case p(-Is,a,b) is defective, the game has a positive stopping probability). Then this process is repeated at time t

=

1 from the new starting state Sf, and so forth. The reward at time t is

multi-plied with

e

t, where

a

> 0 is called the discount factor. Object of

player I is to maximize the total expected discounted reward (over the in-finite horizon), object of player-II is to minimize this same amount. Sometimes these opposite ambitions can be met simultaneously, namely if there exists an optimal pair of strategies; then for each player it is not profitable to deviate from his optimal strategy. In this case the game

(4)

- 2 ...

also has a "value" (in fact a real function on S, see section 2 for a definition). In literature several sets of conditions have been given for the existence of "stationary" optimal strategies. We mention results of Vrieze [12J and Wessels [13J in case of a countable state space, and of Maitra and Parthasarathy [5J and Parthasarathy [8J when S is a com-pact metric space.

The theorems presented 1n this pap.er are obtained by combining and generalizing the bounding function method of Wessels, and Maitra and Parthasarathy's approach for a compact state space. The first theorem

(section 3) makes use of the continuity on S of reward function rand transition law p, and produces a continuous value function; the second one (section 4) uses measurability of rand p on Sf causing the value function to be measurable. In the proofs of both results we do not need a contraction theorem in order to find the value. Section 5 contains an extension and some remarks; we start in section 2, with a formal model and preparations.

2. Model and prerequisites

First we have to give a definition of

(sub-) transition probability.

Let X and Y be metric spaces; denote by 0y the Borel o-algebra on Y. The map q : 0y x X -+ [O,lJ (closed unity interval) is called a (sub-) transition probability (abbreviation: (sub-) trpr) X -+ Y if q(·lx) is a measure on Y with q(ylx) ~ 1 for all x € X, and q(yol') is a measurable function on X

for all YO € 0y (by measurability we shall always mean Borel measurability).

If, moreover, q(ylx)

=

I, x E X, then q is called a (nondefective) trpr. In this section we require that the state space S and the action spaces A and Bare nonempty metric spaces, that the transition law p is a (sub-) trpr S ~ A x B -+ S, and that the reward r is a measurable map S x A x B -+

m

(set of reals);

B

is a positive number. Define for t ~ 1 H

t

:=

S x A x B x S x x B (t times S x A x B); an element h

t

=

(sO,aO, ••• ,bt_l) of Ht is called history. Let F be the set of all trpr's f. S -+ A, and G the set of all trpr's g. S -+ B; for t ~ I ITt(A)

is the set of all trpr's ~t' H

(5)

defines the set of general strategies ~ for player I. A strategy ~ is called Markov if each ~t is independent of history, so ~ - (f

O,f1, ••• )

with f

t € F, t ~ O. The set of Markov strategies is denoted by R(A).

If ~ • (f,f •••• ) for some f € F, then ~ is said to be stationary; we

co • •

write n - f • S1mllarly for player II the set nCB) of general strategies,

• co

the set R(B) of Markov strategies, and the stationary strategles g are defined. Take n

:=

n(A) x nCB), R := R(A) x R(B).

Let

So

€ S and (~,p) € IT be given. According to a theorem of Ionescu Tulcea (see Neveu [6J, Prop. V 1.1; in order to apply this result p has to be nondefective, which may be accomplished by extending S with an extra state *; take p({*}ls,a,b) :- I - p(Sls.a,b) and extend all real-valued functions h on S by h(*) :- 0) this starting state and pair of

strategies determine a (sub-) probability measure P with the

so·n.p following property. If E = So x AO x ••• x B

t is a m6asurable rectangle in H

t+1 then

PiO,n,p(E) C ISO (sO)

f

no(daols o) JpO(dbOISo) fp(dS) Iso,ao,bO)'"

AO BO S1

...

Int(dat1ht,St) Jpt(dbtlht,St) ,

At B_t

and this expression is measurable in sO' This implies that P is a (sub-) trpr n,p

Ht+t ~ S for all t ~ O. We introduce the random variables Xt' K_t _{and Lt} (t ~ 0): these are respectively the state of the system, player Its action and player II's action, all at time t.

Define for t ~ 0, S € Sand (n,p) € n r~(s,~,p)

:=

E r+ (Xt,Kt,L

t) and

n _ S,~tP

rt(s,~,p) : .. E r (Xt,Kt,L

t) (E denotes expectation w.r.t. P ;

s,n,p s,~,p s,n,p

if a € 1R then a+ .. max{O,a}, a-

=

max{O,-a}). These expectations exist, but may equal infinity. From now on we assume that for all s, nand p

co t p ( ) co t n ( )

rt_O

e

_{r t s,n,p} < co or rt=O

B

rt(s,n,p) < co, so rt(s,~,p) := Es,n,pr Xt,Kt,L t (expected reward at time t),

(6)

4

-v (s,lT.p) := n

n-l

L

etrt(s,lT,p) (n-period total expected reward. n

=

1.2 •••• ) t=O

and

v(s,lT,p) :- lim v (s,lT.p) (total expected reward) are welldefined.

n n-+oo

Since P is a (sub-) trpr. r (s,lT,p) is measurable in s (the integral

IT,p t

of a quasi-integrable measurable function with a trpr is measurable in the remaining variables, see e.g. Neveu [6], Prop_ III 2.1); so vn(s,lT,p) and v(s,lT,p) are measurable too.

Note: in the remainder, if we omit the index for the state space in some formula, the function on S is meant; except in some obvious cases, all statements concerning such functions are supposed to hold pointwise. Define for n € 1N (set of positive integers) and s € S

L

inf v (s,lT,p) H inf vn(s,lT.p)

v (s) := sup

_•

v (s) := sup

n

lTeil (A) pel1(B) n n peII(B) lTeII(A)

and

L

inf v(s,lT,p) H inf v(s,lT,p)

v (s) := sup v (s) :- sup

lTeII(A) peII(B) peII(B) lTe IT (A)

,

•

I_t ₁₈' eas~ '1 y c ec e t at a ways v h k d h 1 L ~ v H ,v L H ~ v ; • v ... v , t e game ~.f L H h ~s '

said to have a

value

v, where v =

~ (s~milarlY

for the n-period game values v ).

n If there exists a pair (IT*,p*) € II with v(lT,p*) ~ v(lT*,p*) ~ v(lT*,p) for

11 d ( . , 1'· L H) , *

*

11 d

a

1T an p th~s ~mp ~es v .. v ,the strateg~es 1T and pare ca e

optimal.

When player I uses as criterion ~(lT) := inf v(lT,p) (minimal total

p

expected reward) then evidently IT* maximizes ~, while the criterion sup v(lT,p)

*

IT

for II is minimized by p • So in this respect IT and p are really optimal.

In the following sections we shall prove that a pair of stationary optimal strategies exists; in order to achieve this, we need some more definitions and auxiliary

results.

Let X be a metric space. If q is a measure on X and f is a quasi-integrable measurable map X ~ lR, we define qf := fxq(dx)f(x). By P

x the set of all probability measures on the Borel subsets of X is denoted. On P

(7)

weak topology may be constructed (see also below), which has the following properties (see e.g. Blackwell, Freedman and Orkin [IJ).

Lemma 2.1. (Without proof). Let X be a metric space. i) (Definition) qn -+ q in the weak topology on P

x

iff qnf -+ qf for all

bounded continuous f : X -+ JR.

ii) Let X be separable. Then P

_x

is separable metric, the Borel a-algebra on P

_x

is the a-algebra generated by the weak topology. This a-algebra is also generated by the functions q -+ qf for ! all bounded continuous

f, or b all indicator functions f of an arbitrary class of subsets of

X generating the Borel a-algebra on X.

iii) If X is a compact metric space, then P

_x

is also compact metric.

If Y is a separable metric space, then a trpr q, X -+ Y, may be regarded

as a Borel map X -+ P_y (since q('IX)EP_ytX E X; Borel measurability from

2. 1

ii»,

and conversely.

We introduce a generalization of the weak topology on P S'

Let ~ be a measurable map S -+ (0,=). Define V :- {u : S -+ JR I lu(s)I/~(s)

j.I

is bounded} • M _j.I := {u E V lu is measurable}, C :- {u € M lu is continuous}, _j.I

j.I j.I

and IT := {q I q is a measure on the Borel subsets of 5 and qJ.l < =}. We

~

define the weak ~-topology on IT as the smallest topology which makes the

j.I

maps q -+ qf (q E IT ) continuous for every

j.I

j.I-topo1ogy if and only if for all f € C J.l

fEe • So q -+ q in the weak

j.I n q f -+ q£ (n -+ ~).

n The strong j.I-topology on IT

j.I is defined as the smallest topology in which

the maps q -+ qf are continuous for all f EM.

j.I

Lemma 2.2. Let X and Y be metric spaces. Assume Y is compact, and t is a continuous map X x Y -+ JR. Then t is continuaus in x uniformly in y.

Proof. Straightforward.

o

Lemma 2.3. Let X and Y be metric spaces. Let r be a compact valued function from X to Y (i.e. for all x f(x) is a compact subset of Y). Suppose r is continuous (that is: {xlr(x) n G'

f 0}

and {xlr(x) C G'} are open in X whenever Of is open) and t : X x Y -+ JR is continuous. Then max t(x,y)

(8)

6

-Proof. This lemma is Lemma 3.1 of Parthasarathy [8J. see there.

o

3. Existence of optimal strategies, first case: continuity on S

In order to prove the existence of stationary optimal strategies, we proceed in the following way: first we prove that the one-period game with a terminal reward which is continuous and bounded with respect to

the bounding function ~ on S, has optimal one-step strategies; with this we construct optimal strategies for the n-period games; finally the

stationary optimal strategies of the game over an infinite horizon are found, again by using our one-period game knowledge.

We state the main theorem first, and prove it via a series of lemmas. Consider the game model determined by S, A, B, p, r, and a and assume that a function

u

with corresponding sets V ,

M

,C and

n

is given (see the

U ].I ~ ].I

previous section).

Theorem 3.1. Make the following conditions:

a) S, A and B are metric spaces, A and B compact;

I

b) ].I is continuous on S;

c) r is continuous on S x A x Bj

d) p is continuous on S x A x B with respect to the weak ].I-topology on

n

_].I;

-e) for r, defined by res) := inf YEP_B

sup

f f

r(s,a,b)~(da)y(db)

f holds r E V].I;

<PEP_{A A} B

f) there exists a a' > 0 such that for all Sf a and b fp(dS' Is,a,b)].I(s') S a\I(S);

s

00

g) for all (~,p) e IT and s E S \' _l. _at P _rt(s,~,p) t=O

< 00 or

00

h)

L

atlrt(s,~,p)l/u(s)

+ 0 (N + 00) uniformly on S x

n.

taN

Then the game has a value v with v

=

lim v

• n+oo

eXlst stationary optimal strategies.

n and v E C • moreover, there Il'

(9)

From the conditions of this theorem it follows that p is a (sub-) transition probability:

Lemma 3.2. Suppose §,

£,

d and f of theorem 3.1 hold, then p is a (sub-) trpr S x: A x B -+ S,

Proof. Define for all s, a and b q('ls,a,b) € TIe (e is the unity vector

on S, e(s)

=

1) by

q(Zls,a,b) :=[a'].I(s)]-l fp(dStls,a.b)].I(st)

Z

If m is a nonnegative measurable function on S then

(Z Borel in S).

Jp(dSI

Is,a,b)m(st)~(s')

=

a'].I(s)J q(ds' Is,a.b)m(s') •

S S

so we may express p in q:

p(Zls,a,b) =

a'~(s)

J

q(ds'ls,a,b)/~(s')

Z

This implies that if q is a (sub-) trpr, then p also.

Take

a

:= {K

=

0 n clo is open in S, C closed}:

a

is a Boole semi-algebra which generates the Borel a-algebra on

S;

in order to prove that q is a

(sub-) trpr, if suffices to prove that q(Kls,a,b) is measurable on S x A x B for every K E

a

(see Neveu [6J, p. 74).

*

Let K

=

0 n C, 0 and C closed ( denotes the complement w.r.t. S). Let d denote the metric distance on S; define G

:=

{s E Sld(s,C) < n-1}

n * - 1

(here d(s,C)

=

inf{d(s,s')ls' E C}) and H _n := {s E Sld(s,O ) < n }, n E IN. According to K. Parthasarathy [7J. Theorem 1.6, there exists for all n a uniformly continuous map f : S ~ [O,lJ such that f (s)

=

I if

n n

SEC, 0 if s E G*; similarly we have g : S -+ [0,]] uniformly continuous

n

*

n

with gn(s)

=

1 of s E 0*, 0 if s E Hn' Let g~

:=

e - gn' then [fng~].I] E C].I' so

(10)

8

-f

p(ds'ls,a,b)f

(s')g'(s')~(s')

n n S is continuous on S x A x B, Since

q(Kls,a,b)

=

lim

[B'~(s)J-l

f

p(ds'ls,a,b)f (s')g'(s')p(s')

n n

n-+<>o S q(Kls,a,b) is measurable on S x A x B, q,e,d.

We use the following result on measurable selections,

o

Lemma 3.3. Let X and Y be metric spaces, Y compact; let for all x € X D(x) c Y be nonempty such that K := {(x,y)lx € X, y € D(x)} is closed. Suppose

w : K + lR is continuous, then there exists a measurable map f : X + Y

with f(x) € D(x) and w(x,f(x»

=

sup w(x,y) (x € X). y€D(x)

Proof. If X were also complete and separable, lemma 3.3 would be a simple application of Theorem 2.1 in Parthasarathy [8J. However, one may check that these two conditions are not necessary for the proof of Parthasarathy's theorem.

For convenience we introduce an operator notation. Let Q denote integration with respect to p(ols,a,b), and C(f,g) integration w.r.t. f(ols) and g(ols),

if

f € F and g € G.

o

Define for all ~ € TI(A) and hI € HI ~[hlJ € TI(A) by

~[hl] : .. ('II"1(0Ih₁,), ~2(·lhl~·)' ••• ) (if'll" = (f_O,f₁, ... ) € R(A), this gives 'II"[h

l]

=

(fl,f2, ••

,».

Using a similar notation for player II we find by means of the Ionescu Tulcea integral evaluation (see section 2)

rt+l(~'P)

=

C('II"O'PO)Qrt(~[·J,p[·]) (t ~ 0, (~,p) €

n).

From this follows

(11)

•

First we consider the one-period game with certain terminal reward w; if the players choose f E F and g E G then the total expected reward

in this game is given by L(f,g)w := C(f,g)[r + 8QwJ (here w has to be a measurable map S ~

lR

(extended reals) which is bounded from

be-low or above by some element of

V ).

We define the "optimization"

j..l

operator 0 by (Ow)(s) := inf sup(L(f,g)w)(s)i (s € S). The next result g f

shows that Ow is indeed the value of this game if w € C ; then also a

. .

.

* *

.

j..l

*

pa1r of opt1mal one-step strateg1es (f ,g ) eX1sts, w1th Ow

=

L(f ,g )w €

Lemma 3.4. Let WEe. Onder the assumptions a through

.

*

j..l

*

* -*

there eX1st f E F and g € G such that L(f ,g )w € C

*

* *

*

j..l

L(f,g )w ~ L(f ,g )w ~ L(f ,g)w.

!

of theorem. 3. I ,

and for all f and g

Proof. Define 1

w(s,a,b)

:=

r(s,a,b) +

8I

s

p(ds'ls,a.b)w(s'). On account of d and e 1 is continuous. De£ine on S x P

A x PB - - w

KW(S,~,y)

:=

J

f

1w(s,a,b)~(da)y(db)

, A B C • j..l

then for all £ € F, g € G we have (L(f,g)w)(s) ... K (s,£(s),g(s» (here f andg w

are considered as Borel maps S + P

A and S ~ PB respectively). Kw is

continuous in (s,~,y) (reasoning almost as in Maitra and Parthasarathy [5J,

Lemma 2.l t use our lemma 2.2). For fixed s € S we may apply the minimax theorem of Sion ([IOJ Theorem 3.4) on Kw(s,.,.) and find

sup inf Kw(s,~,y)

=

ql Y

inf sup K (s,~,y)

w

Y ~

Define for s € S, Y E P

B t(s,y) := max

K

w (s,~tY)' According to lemma 2.3 t

~

is continuous on S x P

B, By virtue of lemma 3.3 there exists a measurable map g* : S ~ P

B such that t(s,g*(s» ... inf teStY) (s € S). So now we have

*

Y

inf sup K (s,~,y)

=

max K (s,ql,g (s», both expressions continuous on S

Y

ql w ~ w

(apply lemma 2.3 on t). Similarly a Borel map f*

*

• min K (s,f (s),y). _w _.

Y

S ~ P is found, such that sup inf K (s,ql,Y)

=

A w

(12)

•

- 10

-When we combine these results and state them in terms of L(f,g)w, we see that L(f*,g*)w is continuous on S and that L(f,g*)w ~ L(f*,g*)w ~ L(f*,g)w

(f t;: F, g € G). Finally (L(f*,g*)w)(s)

=

inf sup K (s,<p,y) ::;; res) + SI3'M'J.l(s) on account of conditions

~

and! (M' is a

~oun~

fo;

I

w(s) 1/J.l (s»;

* *

-

. *

*

analogously Lef ,g )w <': r - I3I3'M'jJ, so L(f ,g )w € V • 0

J.l

Now we show the existence of optimal Markov strategies in the n-period games.

Lemma 3.5. For all n €

m

there exist n-step Markov strategies ~(n)

such that vn :=

vn(~(n),p(n»

€ C p and

vn(~,p(n» ~

vn

~ vn(~(n)'p)

(1T,p) € IT.

*

and Pen) for all

Proof. With induction. Let n

=

I; we have vI (1T,p) c L(1T

O'PO)O so the desired one-step optimal strategies are found by taking w .. 0 in lemma 3.4. Suppose the statement to prove holds for some n. Applying lemma 3.4 with w - v we _n find fn* and gn* such that L(f*,g*)v has the properties mentioned there.

n n n

* ** * ** () h

Take 1T(n+l) := fn~(n) and P(n+l) := gnP(n); let p € IT B , t en

vn+1('If(n+l)'P)

=

C(f:'PO)[r + SQvn(1l'(n),PC'J)] <!:

L(f~'PO)vn

<!:

L(f~,g~)vn

=

a vn+l ('IT(n+l)'P(n+J»' The rest of the proof is evident.

From lemma 3.5 and the conditions g and h we deduce the existence of the game value v and some properties of it.

Theorem 3.6. The game has a value v, v == lim v , v € C

n+oo n )..I

and Uv .. v.

o

Proof. Let E: > O. I f n is large enough, we have for all s € S v (s) - v (s) H =

n

• inf sup [v (s,'IT.p) + _n

p 'IT

L

where p is arbitrary in IT(B). Analogously for the same n v - v <!: - £)..1.

n

So v := vL == vH == lim v • Obviously v € C ; remains to prove Uv

=

v. Take

n+oo n J.I

(13)

•

now Uv ~ inf sup C(f,g)[r + SQ(v

N + EU)J $ UVN + ESS'U ~ V + EU(l +

sst)

g f

since UVN = v_N+_{I '} Similarly Uv ~ v - £~(I +

sst),

q,e.d. (note that U is not necessarily a contraction, in contrast with a similar operator used

in Maitra and Parthasarathy [5J).

0

In order to prove the remaining part of theorem 3.1, we need the following lenuna.

Lemma 3.7. Let q be a finite measure on the Borel subsets of a metric space X. If t , n E IN, and t are measurable functions X -+- lR, if q It I < ""

n

and t ~ t for all n, then q[lim sup t ] ~ lim sup qt n•

n n

Proof. Directly from Fatou's lemma.

0

Theorem 3.S. There exist stationary optimal strategies.

Proof. The one-step strategies f* and g* which determine the desired optimal strategies, are found by applying lemma 3.4 with w

=

v; we have

* * * *

v = Uv

=

L(f ,g )v and L(f,g )v ~ v $ L(f ,g)v (f E F, g E G). Let K E :IN be such that Iv - v

I

$ ~ if n ~ K (see above); for all 'IT E n(A)

. N n

deflne 11' := (11'o, ... ,'IT

N_1) (N E :IN)

First we prove with induction for all N E :IN the statement BN:

if n ~ K, and lim sup w (n,N) ~ vJ • 11'

n-l><X>

*

For N = 1: let 11' E fl(A), then w'IT(n,l)

=

L('ITO,g )v

n ~ C(11'O,g )[r+SQ(v + u)J ~ s v + Ba'l.! (n ~ K), and using lemma 3.7, limsup~w'IT(n,J) ~ L(11'O,g*)limsupv

n $ v. Assume that BN holds for some N. We have for all 'IT € II (A) w'IT (n,N + 1)

=

*

N

*

* *

,

N+l

=

_{L('ITO,g )Vn}+_{N(7T['] 7T(n),g ••• g}Pen»~ ~ v + (SS) u (n ~ K), and by the second part of the induction assumption, lim sup w (n,N + ]) S V.

11'

(14)

- 12

-So B

N+1 has been proved.

Let now ~ € IT(A) and £ > O. It follows from BN that n+N

. *~ \ t N * * * * *~

v ~ h::"uP[V_N(1T,g ) + _{t;NS r t (1T 1T(n),g ••• g Pen»] ;::} v(~,g ) - 2e:ll ,

if N is large enough, on account of condition

h.

So we may conclud.e that v ~ v(1T,g*1. Analogously v ~ v(f*~,p) for all P E IT(B) can be proved.

*<» *~

.

It follows that (f ,g ) ~s optimal.

0

4.

Existence of optimal strategies, second case: measurability on S

This section provides an analogue of the previous section for the case in which all relevant functions on S are measurable, like they were continuous

there.

Again we assume that S, A, B, p, r,

a,

'11,

v ,

M , C and IT as in section 2

'Il II II II

are given. We impose a stronger condition on S, namely that S is a nonempty Borel subset of a Polish (that is: complete separable metric) space (also named standard Borel space, abbr. SB-space): this is necessary in view of

the different selection theorem we use here.

Theorem 4.1. Suppose ~ S is an SB-space, A and B are compact metric;

g

~ is measurable; £ r(s,a,b) is measurable in s for fixed (a,b) and continuous in

(a,b) for fixed s;

£

p is a (sub-) trpr S x A x B ~ S, and for fixed s

~

p('ls,a,b) is continuous in (a,b) w.r.t. the strong

~-topology

on

IT~; ~,

!,

g and

h

as in theorem 3.1.

Then the game possesses a value v, v stationary optimal strategies.

=

lim v _n

n~

and v € M . there exist

~'

Note: here we could not deduce from the other conditions that p is a (sub-) trpr, as we did in lemma 3.2.

The next result is useful for the proof of theorem 4.].

Lemma 4.2. Let X and Y be metric spaces, Y separable; suppose that the map

f : X x Y ~ 1R has the following properties: f(-,y) is measurable on X for

all y E Y, and f (x,,) is continuous on Y for all x E X. Then f is measurable

(15)

Proof. Let {a In _n Em} be a countable subset which lies dense in Y.

For each y E Y we may construct a subsequence (bn(y»n of (an)n such that

b (y) ~ y (n ~ m) for all y, and b (y) is a Borel measurable function of

n n

y (n Em). Define fn(x,y)

:=

f(x,b (y». Now f

=

lim f (pointwise),

n · n

n~

and f is measurable on X x Y. Consequently, f is measurable also.

0

n

As in the previous section, we need a selection theorem.

Lemma 4.3. Let X be an SB-space and Y a compact metric space. Let for all x E X D(x) C Y be nonempty such that K := {(x,y)jx € X, y E D(x)} is closed

... in X x Y. If w is a measurable map K ~ ]R and w(x,y) is continuous in y for

fixed x E X, then there exists a measurable function f : X ~ Y with f(x) € D(x)

and w(x,f(x»

=

sup w(x,y) (x EX).

•

yeD(x)

Proof. Application of Corollary 1 of Brown and Purves [2J.

Note that lemma 4.3 is not a generalization of lemma 3.3 since there X is only metric. Using the same operator notation as in section 3, we proceed with the analogue of lemma 3.4.

Lemma 4.4. If w E M and if the conditions a through f of theorem 4.1

~

-fulfilled, all f and g

. * * . (* *) d

then there eXLst f E F and g € G w~th L f ,g W.E M an

* * * * ~

L(f,g )w ~ L(f ,g )w S L(f ,g)w.

are for

o

Proof. By lemma 4.2, 1 , as defined in lemma 3.4, is measurable on S x ~ x B

w

and continuous on A x B for fixed s € S. It follows that

=

f

J

A B

1 (sta,b)~(da)y(db) w

is measurable in s for all ~ E P

A and y E PB, and continuous in (~,y) for fixed s (proof of the latter as in Maitra and Parthasarathy [5], Lemma 2.1). Applying Sion's minimax theorem, we get for all s E S

sup inf ~(s,~,y)

=

inf sup Kw(St~'Y)'

~ Y Y ~

Define on S x P_B t(s,y)

:=

max Kw(s,~.y). Fix y € P

B• We apply lemma 4.3 ql

(16)

e

5.

14

-on T(s,q» := Kw(s,q>,y) and find a measurable fO : S -+ P

A with T(s,fo(s» =t(s,y),

s €

S;

T is measurable in (s,q» and (s,fo(s» is a measurable function of s,

so T(s,fo(s» is measurable. Hence, t(s,y) is measurable in s for fixed y. Using lemma 2.3 with s fixed, we see that t(s,·) is continuous on P

B, Lemma 4.3 provides now a measurable function g* : 8 -+ P

B satisfying t(s,g*(s»

=

inf t(s,y) (s € S). Since t is measurable on 8 x P

B,

*

.

y

*

t(s,g (s» ~s measurable in s. 80 inf sup K (s,q>,y)

=

max K (s,q>,g (s»,

w w

y q> q>

both expressions are measurable in s. Analogously an f* € F is found with

sup inf K (s,q>,y) _w = min K (s,f*(s),y). If we combine these results and _w

q> y . y

remember that (L(f,g)w)(s) = K (s,f(s),g(s» for all s, f and g,we see that w

*

L(f,g )w s L(f ,g )w s L(f ,g)w (f € F, g € G).

The proof of L(f*,g*)w € V is the same as in lemma 3.4.

0

J.l

The proof of theorem 4.1 proceeeds in a perfectly similar way as the proof of theorem 3.1; we only have to replace C by M in the text of section 3,

J.I 1.1

starting with lemma 3.5.

Note that the conditions of theorem 3.1 are not stronger than those of theorem 4.1: in this section 8 is an SB-space, and both conditions ~ are incomparable.

Extension and other remarks

The results of the foregoing sections may be extended in the following direction (as is done in Parthasarathy [8]). Let for all s € S be given a nonempty

Borel subset A(s) of A (the set of "admissible" actions) and a nonempty Borel subset B(s) of B. We denote by IT(A) the set of strategies ~ which restrict themselves to admissible actions, so ~t(A(s)lht's) = 1 for all t ~ 0, h_t € H

t and s € S. Analogously IT(B) is defined; IT

:=

IT(A) x IT(B).

We can extend theorem 3.1 to the game with admissible actions:

Theorem 5.1. Replace 'in theorem 3.1.~ P_{A by PA(s)' PB by PB(s)' in g and}

h

IT by IT. Make the following additional assumptions:

1

A(s) and B(s) are

(17)

are continuous (that is: {sIQA(s) n G' ;

0}

and {sIQA(S) C GI} are o~en in S if G' is open in P_A, similarly for QB)' Then, within the class IT, the game possesses a value v with v = lim vn and v € C~, and there exist

n-+«>

stationary optimal strategies.

Proof. If lemma 3.4 produces an f* and a g* which satisfy f*(s) E PA(s)'

*

g (s) E PB(s)' s € S, then we can restrict lemma 3.5 and theorem 3.6 and

3.8 to

n.

And this is not difficult to check, since the lemmas 2.3 and 3.3 are already stated in the proper form for this extension (when applying lemma 3.3, K is closed because Q

A and QB are continuous, so upper

semi-continuous; see Parthasarathy [8J).

0

Similarly of course, theorem 4.1 may be extended.

We conclude with three remarks.

Remark 5.2. A slightly different version of theorem 3.1 can be proved, where S is supposed to be an SB-space, but in condition

h

IT is replaced by R. We use the following theorem, which is based on an idea of Strauch

[II]. Let (~tP) € IT; if one of these strategies, say p, is Markov, then

for each starting state s € S there exists a Markov strategy for the

*

other player, say ~ , such that the (sub-) probability measure on the

process remains the same, P

=

P * (for a proof, see the author's

s,~,p s,~ ,p

master's thesis [3]).

Now because of the weakening of condition

h,

the function v found in theorem 3.6 represents the value of the game within the class R only. However, by applying the above-mentioned theorem, we see that v is also

the value of the game on IT

(vH "" 1n sup . f v(~,p) ~ inf sup v(n,p)

=

inf sup v(n,p)

=

sup inf v(~,p) ~ v ). L

IT(B) I1(A) R(B) IT(A) R R

In the same way theorem 3.8 is proved first for Markov strategies (~,p); then the transition to general strategies can be made.

(18)

16

-in condition h IT can be replaced by R.

Remark 5.3. If we take ~

=

e (~(s)

=

1) in the previous sections, then with S'

=

I conditions band! are satisfied. If the reward r is bounded w.r.t •

~ and sa' < I, then ~, g and

n

are fulfilled. Now it is easy to see that theorem 5.1 is a generalization of Parthasarathy's Theorem 3.1 [8J, and theorem 4.1 a generalization of his Theorem 3.2.

Remark 5.4. In the author's master's thesis [3J assumption f of theorem 3.1 is closer investigated, and in particular the stronger assumption

!*:

there exists a S' € (0,1) such that for all s, a and b

f

p(ds'ls,a,bh(s') :s;

a'~(s)

•

s

The Markov processes {X_t

I

t = 0, I, •.. } (X

t is the state at time t) generated by choosing a starting state s € Sand a pair of Markov

strategies ('IT,p) € Rare said to be contracting if the transition law

p satisfies

£*,

for some measurable 1..1

.

S -+- (0,00).

Now if for all s, a and b p(-Is,a,b) is absolutely continuous with respect to a fixed a-finite measure q on S, two characterizations can be given of this contraction property (actually of a somewhat weaker property, "q-contraction"): one in the form of an enforced drift through a partitioning of the state space, and one in terms of an exponentially bounded lifetime (these results are a generalization of corresponding results for countable S, presented by Van Ree and Wessels [4J).

References

[IJ Blackwell, D., D. Freedman, and M. Orkin, The Optimal Reward

Operator in Dynamic Programming, Annals of Probability (1974)

l.,

p. 926-941.

[2J Brown, L.D., and R. Purves, Measurable Selections of Extrema,

(19)

•

[3J Couwenbergh, H.A.M., Stochastic Games with General State Space,

Eindhoven University of Technology (The Netherlands), Dept. of Math., Master's Thesis, 1977.

[4] Hee, K.M. van, and J. Wessels, Markov Decision Processes and Strongly Excessive Functions, Eindhoven University of Technology

(The Netherlands), Dept. of Math., Memorandum COSOR 77-11, 1977. [5] Maitra, A., and T. Parthasarathy, On Stochastic Games, Journal of

Optimization Theory and Applications (1970) 5, p. 289-300.

[6J Neveu, J., Mathematical Foundations of the Calculus of Probability,

Holden-Day, San Francisco, London, Amsterdam, 1965.

[7J Parthasarathy, K.R., Probability Measures on Metric Spaces,

Academic Press, New York, 1967.

[8J Parthasarathy, T., Discounted, Positive and Noncooperative Stochastic

Games, International Journal of Game Theory (1973) ~, p. 25-37.

[9J Shapley, L.S., Stochastic Games, Proceedings National Academy of

Sciences of the U.S.A. (1953). Vol. 39, No. 10, p. 1095-) 100.

[10J Sion, M., On General Minimax Theorems, Pacific Journal of Mathematics

[11 ]

(1958)

!.

p. 171-176.

Strauch, R.E., Negative Dynamic Programming, Annals of Mathematical Statistics (1966)

lit

p. 871-890.

[12] Vrieze, O.J. t The Stochastic Noncooperative Countable-Person Game

with Countable State Space and Compact Action Spaces under

the Discounted Pay-off Criterium, Mathematical Centre, Amsterdam, Report 66/76, 1976.

[13J Wessels, J., Markov Games with Unbounded Rewards, p. 133-147 in

Dynamische Optimierung, Bonner Mathematische Schriften 98, Edited by M. Schal, Bonn, 1977.