The policy iteration method for the optimal stopping of a Markov chain with an application

(1)

Markov chain with an application

Citation for published version (APA):

Hee, van, K. M. (1975). The policy iteration method for the optimal stopping of a Markov chain with an application. (Memorandum COSOR; Vol. 7504). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1975

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

RRC'~

81'~

cos

TECHNOLOGICAL UNIVERSITY EINDHOVEN

Department of Mathematics

STATISTICS AND OPERATIONS RESEARCH GROUP

Memorandum COSOR 75-04

The policy iteration method for the optimal stopping of a Markov chain with an application

by

K.M. van Hee

(3)

by

K.M. van Hee

0. Summary

In this paper we study the problem of the optimal stopping of a Markov chain with a countable state space. In each state i the controller receives a re-ward rei) if he stops the process or he must pay the cost c(i) otherwise. We show that, under the condition that there exists an optimal stopping rules

the policy iteration method, introduced by Howard, produces a sequence of stopping rules for which the expected return converges to the value function. For random walks on the integers with a special reward and cost structure, we show that the policy iteration method gives the solution of a discrete

two point boundary value problem with a free boundary. We give a simple al-gorithm for the computation of the optimal stopping rule.

I. Introduction

Consider a Markov chain {x

I

n

=

O,I,2, ••• }

defined on the probability space n

(~,F~). The state space S is countable. We suppose that F[x

O = i] > 0 for all i E S. Hence Fi[AJ, the conditional probability of A E

F

given X

_o

=

i,

~s defined for all i E S.

On S real functions rand c are defined, where rei) is the reward if the pro-cess is stopped in state ~ and c(i) is the cost if the process goes on. We consider stopping times T (for a definition see [7J). For a nonnegative func-tion g on S we define

Ei[g(xT) ] : =

I

_{g(xT)&P i •}

{T<oo }

(4)

Condition A. Suppose that the reward function r satisfies

for all i E S and all stopping times T.

(Note that: r+(i) := max{O.r(i)}. r-(i) := -min{O,r(i)}).

Let P be the transition matrix of the Markov chain, with components P(i,j) for i.j E S. If the function c on S is integrable for all

F.[.J,

we define

~

the function Pc by

Pc(i) :=

I

P(i,j)c(j)

jES n-I

and with induction. if P c is integrable for all

F.[.J

~

n n-I

P c := P (P c).

We call a function c on S a charge (see [3J) if

00

I

pn

I

c

I

< 00 •

n=O

(Note that for function v and w on S: v ~ w if v(i) ~ wei) for all i E Sand v < w if v(i) < wei) for all i E S. Further Ivl is defined by

Ivl (i) := Iv(i)

I).

Condition B. Either the cost function c ~s a charge or rand c are nonnega-tive, both.

Throughout this paper we shall suppose that conditions A and B hold.

We call a function w on S c-exessive with respect to the cost function c if

I) 2) w 2': -c + Pw 00 w 2':

-

I

pnc

.

n=O

(5)

For a stopping time T the expected return vT(i)t given the starting state ~t is defined by T-I vT(i) := lEi[r(l.r) -

2

n=O c(X )J • n

The existence of the expected return vT(i) is guaranteed for all T since IlEiLr(xT)JI < 00 for all i and c ~s either a charge or a nonnegative function.

Note that vT(i)

=

-00 is permitted.

The value function v(i) is the supremum over all the stopping times T v(i)

:=

sup vT(i) •

T

Sometimes we need the following assumption.

*

Assumption C. There exists an optimal stopping time T t i.e. vT*(i)

=

v(i) for all i E S.

In the rest of this section we summarize some properties of stopping problems.

I. I. The value function v satisfies the functional equation v(i)

=

max{r(i)t -c(i) +

L

P(itj).v(j)}

j ES

(see [2J, [3J or [7]).

1.2. The value function v is the smallest c-excess~ve function dominating the reward function r (see [2J and [3J).

1.3. If an optimal stopping time exists the entrance time T

_r

~n the set r := {i

I

rei) = v(i)} ~s optimal (see [2J and L6J).

1.4. If sup Ir(i)

I

< 00 and inf c(i) > 0 then there exists an optimal stopping

iES iES

time (see L2J and [7J).

2. Some preparations

A stopping rule f ~s a mapping from S to {O,l} where f(i)

=

°

means that the process is stopped in i and f(i) = I means that the process goes on in state

~. The stopping rule f is equivalent with the entrance time T_f in the set P_f

:=

{i

I

f(i)

=

OJ. The expected return under a stopping rule f is indica-ted by vf(i).

(6)

For a stopping rule f we define 2. 1• D

f := { i E S I f (i) = 1}, the go-ahead set.

r

f := S\Df , the stopping set.

2.2. P

f is the matrix with components

P(i, j)

o

otherwise • 2.3. d f is a function on S with r(i) -c(i) if i to

r

f otherwise •

If assumption C holds, property 1.3 guarantees that the entrance time T

_r

1n the set

r

is also optimal. In that case

2.4. v(i) = E . [ r ( L ) -1 -L

r

T -]

r

I

n=O c(x )J • n

According to the stopping time T

_r

we define the stopping rule f* by 2.5. f (i) = 0

*

if and only if i E

r .

Further let

Lemma 1. For each stopping rule f with v

f ~ r we have ]) _Ivf(i)

I

< 00 00 2) v_f

=

L

_{Pfd f}n n=O 3) lim

p~1

d_f

I

= 0 n-700

4)

v_f = d_f + Pfv_f 5) _lim

P~lvfl

= 0 n~ (pointwise convergence) (pointwise convergence) •

(7)

Proof. If rand care nonnegative we have

Since

for all i to S •

T -I

f

vf(i) = lE.[r(2L )J -lE.[

L

~ -L_f ~ n=O we may conclude c(X )J n c(X )

IJ

< 00 n for all i E S •

Note that if c ~s a charge this also true. Define:

2.6. T -I f +JE.[

I

~ n=O I c (X ) 1

J •

n

So we have for both cases of B

We have the following representation

00

w

=

f

I

p~ldfl

n=O

(note that

P~(i.j)

= 1 if and only if

~

convergence.

00

I

P~df' (Statement 2) n=O

j) and ~n the same way. by absolute

Because w

f < 00 we may conclude p~ldfl + 0 for n + 00 (statement 3)

00 00

(8)

is finite we may change the summation order, hence

00

df + Pf

I

P~df

= d_f + PfV f ' (Statement 4) n=O

In the same way

By iterating this equation we get

from which it follows that

P~Wf

tends to

a

if n tends to 00. Because [vfl

~wf

we have also

O. (Statement 5)

o

Corollary I. If C hold we have from 2.4 and lemma I that !v(i)! < 00 for all i E S and lim pnldl

=

0 •

n~ Define: 00 w:=

I

rldl • n=O By lemma 1 we have 2.7. lim ::nP w

=

0 • n~

In the next section we study expressions like

P~vf'

where f and g are stopp-ing rules. We shall give sufficient conditions in lemma 2 for the finiteness of these expressions.

Lemma 2. Let f and g are stopping rules. Suppose v_f z r. Then

P~!vf; ~s

finite for k = 1,2,3, •••

(9)

Proof. Let T := T_f + k. Using the same arguments as in lemma 1. we derive for c a charge: T-I lEi [

I

r (~)

I

+

I

n=O Note that

I

c(X )

I]

< 00 • n T-I lEi [

I

r (X_T)

I

+

I

n=O (W_f is defined in 2.6). Hence

I

c (X )

I]

= n k-I

L

pnlcl (i) + pkwf(i) n=O

Now let rand c be nonnegative.

pkvf is defined because v_f

~

r

~

_{O. Hence pkVf}

~

_{pk r}

~

0

T -1 f

I

n=O c(X )] ~ n Define vectors c f and r f by rf(i) := rei) if i E

r

_f

:=

0 otherwise

:=

0 otherwise Note that

I

d

f I = r f + cf • It is easy to verify that 00

L

pk(i,j)lli.[r(~

)] = pk

_L

P~rf(i)

and jES J f n=O

T -1

f 00

L

pk(i,j)lli.[

L

c(X )J := pk

I

P~cf(i)

.

jES J n=O n n=O

k

Hence P w

f

P~IVfl

< 00.

Reasoning like before, we eee that

(10)

3. Policy iteration method

Let f be a stopping rule. such that

L

P(i,j)vf(j) is defined. For f we de-fine the improved stopping rule g byjES

3. 1. g(i) :=

a

if rei) ~ -c(i) +

L

P(i,j)vf(j)

jES

: =

otherwise

Lemma 3. Let g be the improved stopping rule of f and let v

f ~ r. Then

Proof. We first prove 1).

If g(i)

=

1 then

rei) < -c(i) +

L

P(i.j)vf(j) ~ -c(i) + jES

hence

L

P(i.j)v(j) jES

~ v(i)

D

=

{ i

I

g (i)

=

I} c { i

I

v (i) > r (i) } D.

g We proceed with 2). Note that Pgv f is finite P (i,.) = P (i •• ) and so g

(by lemma

2).

let ~ E D then g(i)

g = 1I dg(i) =-c(i) t rei) < -c(i) + Since either

L

P(i.j)vfU) = j ES d

(i)

+ g -c(i) +

L

P(i,j)vf(j) jES

or vf (i)

=

rei) the statement is true for 1. E D

.

g If i E

r

then g(i)

=

0, d (i)

₌

rei) and P (i,.)

g g g

rei) ~ -c(i) +

I

P(i,j)vf(j) jES

it 1.S true for i E

r

.

g

a

and S1.nce

(11)

Lemma 4. Assume C. If g ~s the improved stopping rule of f and if v

f ~ r then

vg ~ vf.

Proof. From lemma 2 it follows that

P~lvfl

exists and is finite for all k. By lemma 3 ~s v f ::;; dg + Pgv f' Hence N k N _pkd N+I k

L

_{P gVf} ::;;

_L

₊

_L

PgV f k=O k=O g g k=1 and therefore N

L

k=O N + We shall prove that Pgv

f + 0 for N + 00. Consider first the case that r ~ 0 and c ~

o.

Since 0 ::;; r ::;; v

f ::;; v and Dg C D

o ::;;

P:V_f ~ pNv ::;; pNv

g

by corollary 1 P vN + for N+ 00.

Suppose now that c ~s a charge: ::;; w

(w ~s defined in corollary I) hence

By 2.7 Therefore N P w + 0 00 for N+ 00 • v g 11u

r (for example fO(i) = 0 for all iES)

We define a sequence of stopping rules {f

O,f1,f2, ••• } by 3.2. fO(i) is a stopping rule with v

f

_o

~

f is the improved stopping rule of f I ' n ~ 1 (see 3. I) •

(12)

n-theorem 2 we study the convergence of v f

n

some properties of the sequence {f

O,fI,f2J ••• } are derived. In to v. Call

The method of approximating the optimal stopping rule and its expected return by the sequence 3.2 is called the policy iteration method. This method was introduced by Howard [4J for decision processes with a finite state space and discoun ted rewards.

In theorem 1) v := v f n n 2) d := d f n n 3) D := D f n n 4)

r

:=

r

f n n

Theorem 1. Assume C. The following assertions hold

Proof. I t _{follows from lemma} 4 that v 1 ~ v for n ~ _{OJ since v}

_o

~ r. I f

n+ n

f (i) = I then n

rei) < -c (i) +

L

P(i,j)v 1(j) ::; _{-c (i) +}

_I

_{P(i,j)v (j)J for n;-:::} _I

jE:S n- j ES n

hence fn+I(i) = I J which proves assertion 1. Suppose fn(i O)

fn+I(i O)

=

I, then

o

and

vn(i O)

=

r(io) < -c(i

_o)

+

L

P(i,j)vn(j)::; j ES

o

The,)rem 2. As s ume C.

00

I) I f_{J either} v 2:

-

_L\' p c ork _v 2: 0 for some _nO' then lim v

=

v.

nO _k=O nO _n--'oo n

2) If J 1n addition to I , f

=

f for some n ->

nO then v 1S optimal.

(13)

Proof. Since D_n c D for all n (lemma 3) and since f (i) is nondecreasing_n ~n

n (theorem 1) there exists a set E c S such that lim D = E c D .

n

n-+oo

And, in the same way, since v (i) ~ v(i) for all n and since v (i) ~s

nonde-n n

creasing in n, there exists a function z such that z(i) = lim vn(i) •

n-+«>

Fix some i E E. For all n sufficiently large is i E D and so:_n

rei) ~ v (i) = -c(i) +

L

P(i,j)v (j) ~-c(i) +

L

P(i,j)v(j) =v(i) •

. n j ES n j ES

Since v (i) t z(i) we have by monotone convergence

n

-c(i) +

L

P(i,j){v (j) -r(j)} t -c(i) +

L

P(i,j){z(j) - dj)} ,

jES n jES

hence

z(i) = -c(i) +

L

P(i,j)z(j) ~ v(i) •

j ES

Fix some ~ E S\E. For all n it holds that i E

r

hence

n

v (i) = rei)

n 2:: - c(i ) +

L

P(i ,j )vn (j ) j ES

and therefore (again by monotone convergence) z(i) = rei) 2:: -c(i) +

I

P(i,j)z(j) •

jES So z satisfies the functional equation:

z(i) max{r(i) ,-c(i) -

L

P(i,j)z(j)}

j ES

co

Now, suppose v 2:: -

L

pnc. Ther. z 2:: -

I

pnc and since z satisfies the

nO n=O n=O

f~nctional equat.ion, z is a c-excessive function dominating r. Because v is the smallest function with this property it must hold that v

=

z.

If v 2:: 0 it must hold that z 2:: 0 and v 2:: 0. We now prove that v. = z on f. nO

Let i E

r:

(14)

Let. now 1. E D:

o

~ v(i) - z(i) ~

L

P(i.j){v(j) - z(j)} • jES

...

Hence 0 ~ v-z ~ P(v-z).

Iterating this inequality gives

o

~ v - z ~

pn

(v - z) ~

pn

v -+ 0 for n -+ 00

which proves v = z. The first assertion 1.S proved.

Suppose f_n = f_n+1 for some n 2: nO' Then v_n = v_n+1 and therefore f_n+2= f_n+l • By induction it follows that z

=

v which proves the theorem.

n

Lemma 5. Let c be a charge. Let f be the stopping rule defined by f(i) for all i E S and let g be the improved stopping rule. then

and v 2: r • g If v g vf then f is optimal. 00 Proof. Since v

f =

L

pnc it holds that PVf and

P~vf

are finite. Following n=O

exactly the proof of lemma 3 we have v

f ~ dg + Pgvf and from the proof of lemma 4 it follows. since

P~vf

is finite. that

n

I

k=O Note that Since c 1.S a charge: 00 w f :=

I

pnlc! < 00 • n=O Hence w

f

=

Icl + PWf and therefore P~wf tends to 0 if n tends to 00. Because

wf 2: Ivfl we may conclude

lim pnlv I

=

0 •

(15)

Hence

00

= v

g

rei) and if g(i) = I then

I f g(i) =

o

then v (i)

g

rei) < -c(i) +

I

P(i,j)vf(j)

jES

::; v (i) • g

v

f

_o

~ r ~ 0, hence the sequence

Hence v ~ r.

g

Now, suppose v

g = vf' then

hence v

f ~s c-excess~ve and dominates r. Because vf ::; v and the fact that v

is the least function with this property, we have v = vf'

0

Corollary 2.

I) If r is nonnegative, we have for f

O

=

0

v converges to v.

n

2) If c ~s a charge we may start with f_

1(i) := I for all i E S and try to

improve this stopping rule by fa. If no improvement is possible (i.e. v

f

_a

= vf ) we have already the optimal stopping rule. Otherwise fa

satis-:-1 fies a) V

_o

= v f 2: r 0 00 b) V

_o

~

-

I

pnc n=O hence v converges to v. n Examples.

I) There exists a stopping problem satisfying assumptions A, Band C where the policy iteration method does not converge to the optimal stopping rule.

Lo;t S = {l,2}; r(l)

=

r(2)

=

-I~ c(l)

=

c.(2)

=

0 and P(I~I) =a= I -P(I,2),

?(2,2)

=

S

=

I - P(2,1). The optimal stopping rule is f(I)

=

f(2)

=

I and

v( I)

=

v(2)

=

O. The cost function is a charge and ]E

i[

I

r C{T)

i

l ::; I. Note

that reI) = ar(I) + (I - a)r(2) and r(2) "" Sr(2) + (l - S)r(l) so that

r ~ c + Pr hence f

(16)

2) There exists a stopping problem satisfying assumptions A and B where the improved policy of f_O is not at least as good as fl. Let

S

=

{0,I,2,3, ••• } u {x}, For ~ = 0,1,2,3, ••• : P(i,i + 1)

=

1 - E, P(i,x) Further: I>E>O. E, rei)

=

i '

c(i)

=

° .

(1 - E) P(x,x) 1, r (x)

=

1, c (x)

=

1 •

Note that rand c are nonnegative both (condition

A).

We shall examine the stopping time T

=

n:

n

Hence

wei) := sup v

T (i) = - - -....

i

+ 1 •

r

n (1 - E)

This function w satisfies the functional equation wei) = max{r(i),-c(i) +

I

P(i,j)w(j)}

jES 00

and w ~ -

L

pnc, hence w

=

v so that v(i) < 00 from which it follows that

n=O

Ei[!r(Xr)

IJ

< 00 for all ~ and all T (condition

B).

For i = 0, 1,2,3, ••• : 1 rei) = , < (1 - E)-~---~,~ (1 - E)~ (1 - E)~+I + E = -c(i) +

L

j I~S P(i,j)r(j)

and rex)

=

1 > -c(x) + rex).

Hence fI(i) = 1 for i E {0,I,2,3, ••• } and f

1(x) =

°

so that v1(i)

all i, but vO(i) =

i

> 1 for i = 1,2,3, . . . .

(I - E)

(17)

4.

An

application

We shall study ~n this section the optimal stopping of a random walk on the integers with a special cost and reward structure, to illustrate the compu-tational aspects of the policy iteration method. For simplicity we shall not formulate the results as general as possible.

Definition of the decision process.

Consider a random walk on the set of integers (2). Let the transition matrix P be defined by

4.1. P(i,i + 1) := Pi' P(i,i) := si' P(i,i - J) = qi with p. ,q. > 0, s~ 2

°

and p. + q. + s.

~ ~ ~ ~ ~ ~ 1. The reward function

4.2. o :::;

rei) :::; M, ~ E: 2 The cost function

4.3.

c(i) 2 6 > 0, i E: 2 •

Further we assume the existence of integers d, e, such that: 4.4. rei) < -c(i) + p.r(i + 1) + q.r(i - 1) + s.r(i)

~ ~ ~

if and only if d :::; ~ :::; e. Call H

:=

{i E: 2

I

d :::; ~ :::; e}.

Assumption 4.4 says that for i E: 2\H immediately stopping is more profitable than making one more transition. In statistical sequential analysis there are examples of random walks where this assumption is fulfilled in a natural way (compare [5J). In lemma 6 we collect some properties of this process.

Lemma 6. For the sequence of stopping rules f

O,f1,f2, ••• defined in 3.2 with fOCi)

=

°

for all i E: 2 it holds that

1) there exist numbers k,£ E Z such that

n n D

=

{i E 2

I

k :::; ~ :::; £ }, n n n n=O,I,2, ••• 2) k n 2 kn+I 2 3) for some n kn - 1 and £n :::; f is optimal. n £ n+l < ;C11 + 1.

(18)

Proof. Since

a

~ rei) ~ M and c ~ 0 A and B are satisfied. By 1.4 we know that the entrance time in

r

is optimal. hence the assumption C is fulfilled. By theorem I we have D c D I for n

=

0.1.2.3, ••• and by theorem 2 we have

n n+

lim v (i)_n

=

v(i). We shall prove I and 2 with induction.

n--~s easy to verify that fl(i)

=

I if and only if i E H, hence

e. Suppose 1 hold for n = m. For i < k - I and i > £ + 1

m m

f l(i) = 0 because v (i) rei) and ~ E Z\H. Therefore it can

m+ m

in the points i

=

k_m- I and i

=

Q,_m+ I that f_m+1(i) >f (i). Since n

and 2 are proved. Now the last assertion. happen only DO is empty. It k₁ = d and £1 it holds that D c D m m+l

Note that 0 ~ rei) ~ M and c(i) ~ 0 > 0 for all ~ E Z. Choose I > E > 0 and a natural number k such that (I - E)k >

i .

Let f be the optimal stopping rule. We shall prove Fi[T

f ~ kJ ~ E. Suppose the contrary, i.e. let Fi[T_f ~ kJ < E. Then

which ~s a contradiction.

Hence for all i E Z

r

must be reachable in at most k steps, so that

Dc {i j d-k ~ ~ ~ e +kL Since D I e Dc D and because D I is a proper

n- n

n-subset of D if f I(i)

#

f (i) for at least one i we may conclude that

n n- n

f_n- ₁ = f_n for some n. 0

Computational aspects

In our case v ~s the smallest solution of

v(i)

=

max{r(i),-c(i) + p.v(i + 1) + s.v(i) + q.vCi - I)} •

~ ~ ~

Because we know the structure of D we may say v is the smallest ~unction x which has the following properties.

For some k ~ d and some Q, ~ e, i,k,Q, E Z:

I) xCi) -c (i) + p.x(i + I ) + s.x(i) + q.x(i - I ) , k ~ i ~ Q,

J. ~ ~

2) xCi) = r (i) , i > Q" 1- < k

(19)

This ~s a two point boundary value problem with a free boundary. We shall show that for fixed k and £ the function x is completely determined by I and 2.

Define, for function on Z, the difference operator ~ as usual by 4.5. ~x(i) := xCi + 1) - xCi) •

Consider the difference equation, derivated from I,

4.6.

p.~x(i) - q.~x(i - I) = c(i) •

~ ~ Call: Z. :

=

~x(i) , ~ Hence

4.6

becomes b. ~ and bi :=

---p:-

c(i) ~

With induction on m it ~s easy to verify that for k ~ m ~ £

4.7. Z m m zk-l IT i=k a. + ~ m

L

i=k m {b. IT ~ j=i+I a. } J

(an empty product has the value I , an empty sum the value 0).

Because x(£ + I) = r(Q, + 1) and x(k - I ) = r(k - 1) it holds that

Q, r(£+ I ) - r(k - 1)

_L

,.., Z m=k-I m hence 9, m

I

L

m==k-I i=k 4.8. r(~ + I) - r(Q, - 1)

-v,

I

m=k-I m

n

i=k a. ~ m {b. IT ~ j=i+1 a. } J

From 4.7 and 4.8 one can compute zk,zk+I""'zQ, and even so

x(k) ,x(k + I) , ••• ,xU>, which shows that the function x is completely deter-min!:!1.

The boundary conditions 3 can be formulated as fo110';'/s

4.9.

~r(Q, + 1) - ao+'jzo ~ b , IV :IV £+1

(20)

which shows that we only have to compute the differences zk to check 3 and not the function x itself.

It is easy to verify that the sums and products in 4.7 and 4.8 can be corne puted recursively. We shall formulate an algorithm to compute the optimal stopping rule and the value function v.

Algorithm

2. _{compute zk_l (by 4.8) and} _Zt (by 4.7), set ~ := 0, 3. i f _zk-l

-

_~-1

.

f:!.r(k - 2) > b

k_1 then k := k - 1 and i := 1, 4. if f:!.r(.Q, + 1)

-

_{a.Q,+l z .Q,} _> _b _{1 then .Q,} := .Q, + 1 and i := 1,

H

5. i f ~

=

a

then goto 6, else goto 2,

6. D ~s the set {i E Z

I

k ::; i ::; .Q,} and v can be compute by 4.7.

Acknowledgement

The author whishes to express his gratitude to Dr. A. Hordijk for pointing out a serious mistake in an earlier version of this paper.

Literature

[IJ Dynkin, E.B., Juschkewitsch, A.A.; Satze und Aufgaben uber Markoffsche Prozesse. Springer-Verlag (1969).

[2J Rordijk, A., Potharst, R., Runnenburg, J.Th.; Optimaal stoppen van Markov ke tens. MC-syllabus 19 (1973).

[3J Rordijk, A.; Dynamic programming and Markov potential theory. MC tract (1974) •

[4J Howard, R.A.; Dynamic programming and Markov processes. Technology Press, Cambridge Massachusetts (1960).

[5J van Hee, K.M., Hordijk, A.; A sequential sampling problem solved by op-timal stopping. MC-rapport SW 25/73 (1973).

[6J van Ree, K.M.; Note on memoryless stopping rules. COSOR-notitie R-73-12, T.H. Eindhoven (1974).

[7J Ross, S.; Applied probability models wt~h optimization applications. Holden-Day (1970).