The technique of optimal stopping applied to a sequential sampling problem

(1)

The technique of optimal stopping applied to a sequential

sampling problem

Citation for published version (APA):

Hee, van, K. M., & Hordijk, A. (1975). The technique of optimal stopping applied to a sequential sampling problem. (Memorandum COSOR; Vol. 7521). Technische Hogeschool Eindhoven.

Document status and date: Published: 01/01/1975

Document Version:

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)

Please check the document version of this publication:

• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.

• The final author version and the galley proof are versions of the publication after peer review.

• The final published version features the final layout of the paper including the volume, issue and page numbers.

Link to publication

General rights

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain

• You may freely distribute the URL identifying the publication in the public portal.

If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:

www.tue.nl/taverne Take down policy

If you believe that this document breaches copyright please contact us at: openaccess@tue.nl

providing details and we will investigate your claim.

(2)

Department of Mathematics

STATISTICS AND OPERATIONS RESEARCH GROUP

Memorandum COSOR 75-21

The technique of optimal stopping applied to a sequential sampling problem

by

K.M. van Hee and A. Hordijk

Eindhoven, September 1975 The Netherlands

(3)

K.M. van Hee and A. Hordi;k

I. Introduction

A psychologist, investigating human decision processes, ~s interested ~n an experiment we shall describe below.

The following facts are known to the testee.

An urn contains N baUs with n different colours. Each co1_{our group} _1.- _has

n

its own proportion p ..

(I

p.

=

N)

1.- • 1 ~ ~=

It is unknown which colour belongs to a group. The testee has to guess which

colour has the highest proportion. To acquire information he may

draw

a

ball from the urn which is replaced afterwards. Each observation costs an

amount c (c > 0). If the testee feels that he has enough information to make

a decision he stops sampling and tells his conclusion to the leader of the

experiment. When his answer is correct he becomes a reward r otherwise he

must pay a penalty ~ (r _~ t ~ 0). What 1.-S the optimal behavior?

The psychologist is also interested in this experiment when the proportions p. are unknown. In this paper we give a first approach to this problem by

dis-~

cussing the situation where n =

2

and

PI,P2

are known to the testee.

To compare the strategies of testees it is interesting to know a "best" stra-tegy. The word "best" depends on the criterion the testee will choose. We shall discuss the Bayesian criterion and the minimax criterion in the next section. First we shall transform the decision problem into a (statistical) sequential sampling problem.

Suppose the balls are coloured red and black and let the fraction of red balls

~s t. The two hypotheses concerning tare H

O : t = 81 and HI : t = 82,

(0 S;

8 1' 8 2 S; 1) (Note that we donot restrict 8

1 and A to the rational numbers so that we may apply this model also to the flipping of a co~n with probability

t on heads where t = 8

1 or t = 82, Note also that in the original problem 62=1-81) The results of this paper are:

The minimax decision rule for this problem is equivalent to

the optimal stopping rule of a Markov chain. We provide an explicit solution for the stopping problem.

(4)

2. The sequential decision ppoblem

Using the terminology of [2J chapter 7, we shall formulate our problem as a sequential decision problem.

The elements are: I. e:= {8

1,82} thepar'ametep space; states of nature

2. A:= e the space of tepminal actions; the testee chooses the parameter value t E e if he stops sampling

3. L(8,a), the loss function, a real valued function on e x A representing the loss when the testee makes his decision

L ~s defined by

4. ~I' ~2' ~3' ... ~s a sequence of random variables available to the testee. They are Bernoulli distributed with parameter t and x represents the outcome

-n of the n-th drawing.

The testee will choose a decision pule which may be devided into two parts:

a stopping pule and a tepminal decision pule.

5. A stopping pule ~ is a sequence of functions W = (WO,W1(x₁), W₂(x_l,x₂), W

3(xl,x2,x3), ... ) where xi ~s the realization of the i-th sample and Wi(x], ... , xi)

=

0 or I. If Wi(x₁, ... , xi)

=

I then W (xI' ... , x )_n _n

= ]

for all n ~ i. The interpretation is as follows. If x 1,x2"",xi are observed the testee will stop sampling if ~i(xI""'~i)

=

and he will buy an other observation if Wi(xI"",x_i)

=

O. If ~O

= ]

then he makes his decision without sampling.

Using Wwe can define a random variable ~, called a stopping time, by

t

=

min{i I w.(xl, ... ,x.)

=

n.

-

~

-

-~

Hence t is the time the sampling is stopped using the stopping rule W.

6. A tepminal decision puls

°

is a sequence of functions

°

=

(°

₀, _{o](x]), 02(x I ,x2), 63(x],x2 ,x3), •• ), where (\(xI' ... ,xi ) is a} function with values in A.

For each sequence of realizations xJ""~x, the testee specifies by 0.

~ ~

the decision he will make under the condition that he stops sampling at time t.

So a decision rule is a pair (~,o).

7. The risk function R(8,(W,0» of a decision rule (W,o) is the expected loss plus the expected sampling costs when

a

is the true parameter:

(I) R(8,(~,0» := E₈[L(8,0 (x1' ... ,x» + TCJ

(5)

(where ~ is the stopping time defilwd by 4)

In general, and also in our case, there does not exist a decision rule

*

(~ ,~ ) such that

*

R(O,(~ ,6 » ~ R(O,(~,6» for all 0 ( U and all (~,6).

Hence we have to choose an other criterion to determine a best decision

. . ( * *)

rule. We discuss minimax and Bayes decision rules. A declslon rule ~,o is called a minimax (decision) rule if

(2) _{max R( 0, (ljJ ,}

* *

°

_»

od,

(ljJ,o)inf max R(G. (1),6»

od,

*

A decision rule (ljJ ,0 ) is called a Bayes (decision) rule with respect to the distribution n over a if

(3) R(O,(ljJ ,0 »on(O)

*

inf

I

(ljJ,o) OE'a

R(O,(ljJ,o»~n(O)

In the next section we shall study the last criterion extensively.

o

or 1 3. The Bayesian approach

In the Bayesion approach of the sequential decision problem sketched in the foregoing section, it is assumed that the parameter t is a random variable ! ' which takes on the value 6

1 with probability n(OI) and the value 02 with

probability n(02)' (n(OI)

=

n, n(02)

=

1 - n, 0 ~ n ~ 1). It is assumed that sequence of random variables ~I' ~2' ~3'oo, are, given!

=

0, independently Bernoulli distributed with density

x I-x

F[~i

=

x

I

!

=

oj

=

°

(1-0) , x

the

Note that the simultaneous distribution of !, ~I'~2'~3"" is completely determined by these assumptions. The distribution of ! on

e

is called the

prior distribution and it is completely determined by n.

Let us return to the sequential decision problemo

The Bayes risk of a decision rule (ljJ,o) depends on n and is defined analogously

to (1) as the expected loss plus the expected sampling costs:

Conditioning on !

°

_gives

(5) r(n~(ljJ,o»

=

I

E[L(O, 0t(~I""'~T» + ~oc

I

!

=

oJon(O)

OEe -

-Compare this formula with (3). Hence a Bayes decision rule (ljJ*,o*) satisfies

1<

*

r(n,(~ ,0 » = inf r(n,(~,o».

(6)

The choice of the prior distribution of

.!:.

(or equivalently the choice of '11) depends on the (subjective) opinion of the decisionmaker. Sometimes he has prior information about the two hypotheses which he can translate in a prior distribution.

It is arbitrarily if we consider the Bayes risk as the average risk with respect to 'IT of the risk function like in section 2, or if we consider it

from the point of view of the subjectivist with prior knowledge indicated by 'IT.

It will be shown in the appendix that the Bayes rule belonging to the prior distribution with 'IT = ~ is also a m~n~max rule. We shall restrict our attention

to the Bayesian approach and we shall compute the Bayes rule for 'IT = ~.

4. Properties of the sequential decision process

We i~troduce now the posterior distribution of t g~ven the sequence of

ob-servations xI_-

=

xl' ... 'x_-n

=

x_n (x.

=

0 or I): ~ x J, n

=

1,2,3, .•. n to verify that 1,2,3, •..

8J

JP[t

=

8J

t

with the rule of Bayes it is easy

JP[~I = xl"'" ~n = _xn

I.!:.

8 I

J

lP

[!

= 8 1

J

yn =

---=---=:...---.-:.--

n

L

lP[~1

8d~ hence 1,2,3, ... X. ~ n n-

L

x. i=I ~ n n

L

xi n-

l.

(I-Tr)e/=l (1-8 2) ~=~ n n-

L

x. i=1 ~ + 'IT n

L

x. i= 1 ~ 'IT 8 1 (1-81)

---..::.---::.---n

n

L

x. i=I ~ 8 1 (1-81) (6) Define

Yo

:= 'IT

The value Yn+l can be computed from Yn and x

n+1 by x I-x n = 1,2,3, ... n+l(l_e) n+I(I_y) 2 n I-x (I-a) n+1 1 x

a

n+1 • 1 x I-x 8 n+l(l_e) _{n+l y +} I I n 8 2

(7)

So y is the prior probability of

!

=

0, for the (n+l)-th experiment.

n

Considering the sequence Yl'Y2'Y3' ... as functions of the random variables instead of there realizations we get a sequence random variables Io'YI'Y2' .. · recursively defined by ]PC

1.0

= 11

J

= I (7) _I. n+l := x I-x x I-x -n+l(I_0) -n+l -n+l( ) -n+J(1 ) OJ ₁ _-nY + 8₂ 1-02 -y_~

Without proof we state some well-known facts from statistical decision theory (see [2J).

A. Consider the sequence lo'Yl'Y2' . . . . For each sequential decision rule

(~,6) based on ~1'~2'~3'... there 1S a sequential decision rule (~0,60)

based on lo'YI'Y2' ... which 1S as good as (~,6).since there is a one-one correspondence between the two sequences. So we have only to consider YO'YI'Y2' ... when we are searching for the Bayes rule.

B. If for n

=

1,2,3, ... 6 (y

1, ... ,y ) is a Bayes rule with respect to 11 for

n - -n

the decision problem based on the fixed sample size of n observations, then for any stopping rule ~ the risk, defined in (4), is minimized by 6

=

(6

0,61,82, ... ).

C. When we consider also randomized action rules and stopping rules there does not exist a randomized pair (~,6) with a lower risk, if the loss function 1S bounded and if there exists for each n a fixed sample size Bayes rule. D. It is easy to verify that for the fixed sample S1ze of n observations the

Bayes risk with respect to 11 1S

E [min{-ry + £(I-y ),-r(l-y ) + £y JJ + nc,

11 -n -n -n -n

(where the subscript 11 indicates the dependency of the prior distribution).

Hence we may formulate our sequential decision problem 1n the following way.

Search for the stopping time ~ such that

(8) E [max{ry -£(I-y ),r(J-y )-£(y )} - TCJ

11 - , - , - , - ,

attains for ~ its max~mum.

(8)

5. The equivalent stopping problem

We restrict us here to the case that: 8 1 = o:::oc:l.

8, 8,.,

L.

= 1 - 8,

In L3J we showed that the sequence Yo,1.j '1.2"" forms a stat;ionar'Y MaY'koo

chain with states space [0,1 J and discrete time parameter. We called it a

Bayes process.

The transition probabilities are given by

(9)

P(X.n+1 y)

for x

=

a

or I and y E [O,IJ.

For notational convenience we shall use the following notations

( 10) g (x) y ( I I ) P (x) := 8x (I_8)I-xy + 81-x (I_8)x(l_y). y Hence (9) becomes From (6) we see y) = P (x). Y (12) We define 1-n + -n n

I

(1-2x.)

.

-~ (_8_)1.=1 1-8 ( 13) z -n n 2

L

x. - n i=I -1. for n 1,2,3, . . . .

Note that _-nz 1.S the difference between the number of successes and failures 1.n n trials.

There exists a one-one correspondence between the random variables y and z ,

-n -n

hence the sequence ~1'~2'~3"" forms also a Markov chain. We use the following notation

(9)

Without loss of generality we shall suppose 0 < a < I. In the case where

a = 0 or a = I the optimal decision rules are trivial. If a = 0 we are certain after one trial, if a = I the hypotheses are identically.

The transition probabilities of the Markov chain ~J '~2'~3'... are easily derived from (9) by the transformation (use (12), (13) and (14))

(15) y

-Z

1+l3a

for Z E Z

where Z is the set of integers. The transition probabilities are

13 I z+ I 13 (16) _Pz,z+1 := _P(z _I = z+ II z z) a + a+1 -n+ -n z a +13 13 z-I+13 p(z I z-II z z) a a Pz z-I := -n+ = -n a+1 z +13

,

_a Note that 13 ₊ ₁₃ ₌ I. Pz,z+1 Pz z-I

,

for z E Z for z E Z .

For each state y E [O,IJ of the state space of the chain 1D'~1'~2'... we

define a reward (compare (8))

(J7) s(y)

=

max{ry - £(I-y),r(l-y) - £y} Obviously s(y) ry - £ (I-y) r(I-y) - £y i f y ;:: ~ otherwise.

For the transformed Markov chain ~1'~2'~3'•.. we find (r+£) r-(rH) -z 1+13a -z 1+l3a - £ for z $ alog 13, z E Z, a for z > log 13, z E Z.

In the case 13

=

I we omit the superscript 13. (We may distinguish s(y) and s(z) by the domains of the functions.) We state an important property of s(z):

(19) s(z)

=

P_z,z+Is(z+l) + p Is(z-I),

Z,Z- Z E Z, z ~

O.

Relation (19) says that s(z) is harmonic, except in z • 0 (cf.[3J). We are searching for a stopping time ~ such that for all integers z

(10)

where the subscript z means that we start the chain 1n state z. Of course we are only interested in v6(0).

The determination of

2D

and v 6 (z) is known as an optimal stopping problem

in a Markov chain. Note that it is equivalent to the sequential decision

problem of section 2.

Without proof we state some results from the theory of optimal stopping (see [IJ,[4J). We formulate the properties for the chain ~1'~2'~3""

A. if sup s6(z) < 00 and i f c > 0, which is true in our

case~

then there

z

exists an optimal stopping time 1

0 D:!t-tsfying (20).

B.

2D

is the entry time in the set r~ where

(21)

r

=

{z

I

v 6(z)

=

s6(z)

=

s (z), Z E Z}

c.

v6(z) is the minimal element in the set of solutions from the functional

equation

(22) w(z)

=

max{s6(z), - c + P: z+l. w(z+l) + P: z-I w(z-I)}.

_,

D. We define recursively

(23) s6(z)

v~(z)

=

max{s6(z), - c + P:,Z+I

v~_I(z+l)

+ P:,Z-I v6(z-l) .

The functions v

6

(z) are nondecreasing and

n

(24) v6(z)

The approximation of v 6 (z) 1S called the method of successive approximations. For TI

=

4,

or equivalently 6

=

1, the solution of the optimal stopping problem

shall be given explicitly in section 8. In section 6 we provide an algorithm I

to compute v (0) and the set

r

1n this case. In section 8 we prove that the prior distribution represented by TI

=

4

is least favourable and that the

decision rule derived for TI

=

4

is a minimax rule.

6. The algorithm for computing the optimal stopping time

Of course we could use the method of successive approximations, defined in (23), to approximate v(z) by v (z), Z E Z. Because the optimal stopping time

n

~, defined in (20), is the entry time of

r,

defined in (21), we have to be sure that also

r

= {z

I

v (z), Z E Z} is equal to

r.

n n

In general it may happen that the method of succeSS1ve approximations requires an infinite number of iterations to provide this. Even if it is guaranteed that

(11)

the method of succeSS1ve approximations provides the optimal stopping set I' in a finite number of iterations, one needs a criterion which says from which n on I' = I'. Our algorithm leads in a finite number of simple steps

n

to the determination of v(z) and 1'. In section 8 we show that the number of steps is less than !(r+£).

c

We know that v(z) satisfies the functional equation (22). In section 8 it 1S proved that v(z) has the following properties

(25) v(z) is symmetric around z =

o.

There exists an integer k > 0 such that

(26) __ { s(z) v(z) - c + p_z,z+Iv(z+I)+p_z,z-Iv(z-I) for Izi ~ k for Izi < k.

We shall prove that for each k there is only one function (on the integers) fk

~hich

satisfies (25) and (26). Moreover we prove that v(z) is the unique function which satisfies (22), (25) and (26) simultaneously.

Our method will search in the class of all functions

{fk}~=1

who satisfy (25) and (26), for some k, such that fk also satisfies (22). It will be shown that to check if the function fk satisfies (22) it is onZy necessary

to inspect this function in the point k-I.

Indeed, i f

k k

s(k-I) < f (k-I) and s(k) ~ - c + Pk,k+1 s(k+l) + Pk,k-I f (k-I)

k k

then f satisfies (22). Also a simple recursive relation to compute f (k-I) shall be derived.

For readers, familiar with Markovian decision processes we note that our method is similar to Ho~ard's poZicy-iteration algorithm (see [IJ,[6J) in the sense that this algorithm, when started with the policy that prescribes stopping everywhere, will find, sequentially, policies of the type

"stop if Izl ~ k" for k = 1,2,3, . . . . However, our method does not need the value determination at each iteration. As is well-known the value determination step in Howard's algorithm is rather time consuming. The only thing we have to do is to check a simple recursive relation.

We shall now formulate our algorithm:

1) if (). >

-

_r_+_£_-...,..2_c

r+H2c

then stopping immediately 1S optimal and the expected return 1S v(O)

=

!(r-£), otherwise:

(12)

2) compute for k 1 , check if 1,2,3, ... the numbers b k, defined by I+Pk,k_Ibk_1 Pk,k+1 S(k+ I )-s (k) b k ~ c

the first k which satisfies this inequality gives the decision points: +(k+l)

3) the expected return v(O) ~s founded by the iteration procedure v(k+l)

=

s(k+I),

v(n)

=

v(n+l) - b c, n

=

k, k-I, ... ,0.

n

(13)

Only the positive decision points are mentioned.

r = 100, ~ = 5, c = 1.

a expected return decision point

0 99.00 I 1/8 95.95 3 1 2/8 93.53 1 3 3/8 89.50 4 4/8 83.23 4

,

5/8 73.35 4 6/8 60.30 3 7/8 50.40 2 103 47.50 0 - - <0.<1₁₀₇ -r

=

1000, Q, 500, c I.

expected decision expected decision

a point a point return return 0 999 1 11/16 920 12 1/16 996 3 12/16 879 14 2/16 994 4 13/16 803 16 3/16 992 5 14/16 652 17 4/16 990 5 15/16 389 12 5/16 987 6 31/32 285 6 6/16 983 7 63/64 . 258 3 7/16 977 7 127/128 251 1 8/16 970 8 255/256 250.46 1 9/16 959 9 1498 250.00 0 1502 :S;0.:s;1 10/16 944 11

_I

(14)

7. l.'c'I:l:GL·::'OIlS with Wald's sequential probabUity ratio test (SPRT)

Wald's sequential probability ratio test is usefull in the same situation: two simple hypotheses H_O : t

=

8₁, HI : t

=

8₂,

In this test the likelyhood ratio (see [2])

determines, together with the real numbers A and B, 0 < A s stopping time:

\)J_n(xI' ... ,x)_n := 0 if A < A (xI' ..• ,x)_n _n < B := otherwise

and the decision rule:

0n(XI"",Xn) := 8 1 if An(xI, ,xn) 2': B > A := 8₂ if An(xI, ,x n) s A < B

:=

any if A

=

B = I $ B < 00 the

Since we restrict us to hypotheses 8 1

I

a+1 . (See (14)), A has the following form:

n n 2

L

i=l

A (xI' ... ,x) =u_{n -} _-n X. - n -~ z -n

=

u

so there exists a one-one correspondence between the likelyhood ratio and the posterior probability y (see

(12)

and

(15)).

-n

Since our optimal stopping rule has the form, s top i f

I

z

I

2': k

n

where k is determined by the algorithm of section 6, our procedure is a SPRT with bounds

A uk and B

=

u-k

Consider a given SPRT (A,B) and let

denote the two error probabilities. Although it is tedious to compute

Po

and PI given A and B, it ~s easy to give good approximations PO and PI by

A - - and B =

Po

(15)

(It holds that

Po

+

PI ::; Po

+

PI

and

Po ::;

Po

_k

PI

= ex =

I

-

_PI

_{- Po}

hence

Po

=

_PI

k ex P=--~k· - ex

Po

PI

-:---- , PI ::; I

).

In our case

- PI

- Po

Since the optimal stopping rule stops tL.e sampling process always, we may suppose that the reward r is payed to the decision maker in advance and that he must pay r +t if he stops the process and gives the wrong conclusion. So it is clear that the decision point k only depends on r+ t. We call r+t the "loss.

Further note that there is a one-one correspondence between k and the appro-ximated error probability p. Hence for each loss we can compute p. Since k only takes on integer values p is not a continuous function of the loss. In table 2 p is given as a function of the loss.

e

= 1/8

e

= 3/5

error probe loss error probe loss

0.500 I - 2 0.072 115 - 172 0.125 3 - 14 0.045 173 - 266 0.020 15 - 82 0.027 267 - 417 0.029 83 - 542 0.016 418 - 663 0.004 ~ 543 0.099 664 - 1068 0.006 1069 - 1737 0.004 ~ 1738

(16)

e

=

1/4

e

₌

7/16

error prob. loss error prob. loss

0.500 1 - 3 0.118 325 - 401 0.250 4 - 14 0.094 402 - 494 0.100. 15 - 36 _i 0.074 495 - 610 0.036 37 - 94 0.059 611 - 753 0.012 95 - 260 0.046 754 - 933 0.004 261 - 750 0.037 934 - 1159 I 0.001 751 - 2212 0.029 1160 - 1445 0 ;:: 2213 0.023 1446 - 1808 0.018 1809 - 2271 0.014 ;:: 2271

Table 2: Approximated error probability p in Wa1d's SPRT and the range of the loss.

8. Appendix

In section 8.1 we give an explicit solution for the function v(z) defined in (20) and for the entry time

!o

bf the set

r

=

{z

I

s(z)

=

v(z)}, in the case that the prior distribution is represented by n

=

!.

In section 8.2 we pro-vide a justification of the algorithm to compute v(z). In section 8.3 we show that the prior distribution n

=

!

is least favourable and that our decision rule is a minimax rule.

8.1. The explicit solution

In this section we take n

=

!,

so that S

=

1.

Lemma 1.

The function

v(z) ~s

symmetric around

0

for

z E

z.

Proof. By induction we prove v (z)

=

v (-z). From (18) with

B

=

1 we see

n n

that s(z)

=

s(-z). So vO(z)

=

vO(-z). Suppose v

m_l (z)

=

vm-1(-z) for z E Z. From (16) with

B

= I we see that p_z,z+1

=

P-z,-z~.1 and p_z,z-1 = P_-z,-z+1·

(17)

Hence v (z) = max{ s (z), -c + P Iv I (z + I) + P Iv I (z - I )} = m z,z+ m- z,z- m-= max{s(-z) -c +p V (-z -1) + , -z,-z-I m-I + P_-z-z+IV_m-1(-z+I)} = V (-z) • m

By taking the limit for m7 00 and using (24) we find that v(z) = v(-z).

0

Lemma 2.

The set

{z 1 z E Z, v(z) > s(z)} ~s

the set of integer points of a

symmetric interval around

o.

Proof. We first prove this property for vn(z). Indeed, vO(z) = s(z) and vI (0)

=

max{s(O) , -c + s(I)}, hence v

1(0) may be greater than s(O). From

(19) we see that vI (z) = max{s(z), -c + s(z)}, z

#

0; therefore the state-ment is proved for n = 1. Suppose it is proved for n = m - I. We may conclude

from (24) that {z I z E Z, vm_I(z) > s(z)} c {z 1 z E Z, vm(z) > s(z)}. If,

for z ~ I, v l(z - 1) = s(z - 1) and v l(z + 1) s(z + I) it follows again

m-

m-from (19) that v (z) = s(z). Hence there exists a k such that v (z) > s(z)

m m m

if Izi < k and v (z) = s(z) if Izi ~ k • Because v (k ) = v (-k ) the

state-m m m mm m m

ment is proved for n = m. Note that {z 1 z E Z, v(z) > s(z) = u {z 1 z E Z,

v (z) > s(z)} which proves the lemma. m

Corollary. We define,

(27) k

=

min{z 1 z > 0, Z E Z, v(z)

=

s(z)}

m

_o

then the stopping rule becomes: stop sampling as soon as one of the points +k or -k are reached. Note that k < 00. Indeed, when k = 00 then the sampling

costs are infinite and this is certainly not optimal. From the functional equation it follows that the function v(z) satisfies

(28) [ -c + P Iv (z + I) + P Iv (z - 1) v(z)

=

z,z+ z,z+ s(z) for Izi < k for

I

z

I

~ k • Lemma 3.

If the function

w(z)

is a solution of the functional equation (22)

and w(z)

satisfies for some

t > 0

i

'-c + p IW(z + 1) + p lw(z - 1) w(z)

=

1

Z,Z+ Z,Z-S(z)

then

v(z) = w(z). for Izl < t for Izi ~ t

(18)

Proof. Assume k > t. Since v(z) = -c + w(z)

=

-c + p lw(z - I) + p lw(z

z,z- z,z+

=

w(z) for Izi ~ £ we see that u(z) :=

p_z,z-Iv(z - 1) + p_z,z+IV(z + 1) for Izl < £ and since v(z) - w(z) satisfies

+ 1) and v(z) ~ s(z) =

u(z) = p_z,z-lu(z - 1) + p_z,z+IU(z + 1) for Izi < £ ,

u(£)

~

°

and

u(-£)

~

° .

Let Q be the restriction of the matrix of transition probabilities of our Markov chain to the rows and columns with numbers -£+1,-£+2, .•• ,0, .•. ,£-1.

Then we may write the above equation in vector notation

u = Qu+d, with d(-UI) = P_£+I,£u(-£) and d(£-I) = P£_I,£u(£)

and d(z)

=

°

for Izl < £ - 1. Hence (I - Q)u = d. Because Qn +

°

for n + 00

there exists an inverse of I - Q which has only positive entries, so that u = (I - Q)-ld

~

°

which implies

v(z) ~ w(z) for z E Z •

On the other hand for Izi < k we have

w(z) ~ -c + p IW(z + 1) + p lw(z - 1) z,z+ z,z-and so that v(z) = -c + p IV(z + I) + p IV(z - 1) z,z+

z,z-u(z) S p_z,z+IU(z + 1) + p_z,z-IU(z - 1) •

Since the Markov chain with transition probabilities as defined in (16) (with 6

=

1) for Izi < k and absorbing barriers in k is an absorbing Markov chain and, moreover, u(-k) u(k) = 0, it follows from the above inequality that u(z) s

°

hence v(z) S w(z). Therefore w(z) = v(z) for z E Z. For k < £

the proof proceeds in a similar way.

Lemma

4. Let

k

be defined as in

(27),

then

o

(29) s(k - 1) < v(k - 1) s s(k - 1) + ~c___ Pk k-I

,

Proof. The first inequality is immediate from (22) and (27). Since v is a solution of (22) we have v(k)

=

s(k) ~ -c + Pk,k+Is(k + 1) + Pk,k~Iv(k - 1). Hence, using (19),

(19)

v(k - I)

~

{s(k) + c} I - s(k + I)Pk,k+l

=

Pk,k-I Pk,k-I

s (k - 1) + _....:c:..-Pk k-I

,

.lJ

We shall introduce a class of functions {fn(z) n = 1,2,3, .•. , z E Z} and we prove that v(z) is the unique element of this class which satisfies (29) when k;:: I.

Define recursively a sequence {b 1 n

n b O= (30) + P b b = n,n-I n-j n _Pn,n+1 0, I ,2, ••• } by

For arbitrary positive integer we define a function fn(z) for z E Z by

(31 )

fn(z) s(z) for z ;:: n

fn(z)

₌

fn(z + I )

-

b c for z

=

n-I ,n-2, •.• ,0 and z

fn(z) f (-z)n for z < 0

.

Lemma 5.

The function

fn(z)

satisfies

(32) fn (z)

=

-c + P I fn (z + 1) + P 1fn (z - I)

z,z+ z,z- for Izi < n .

Proof. From (30) we see that P b = I + P b By (31) we may state z, z+ I z z, z-I z-I'

that

P_z,z+1fn (z) - P_- _z,z+I fn (z + I) - P_{z,z+1 z '}b c hence

o

~ z < n

Also from (31) it follows that

n P If (z-I) Z,z-so that ~ z < n + I n n n n P +I f (z) + P f (z) = -c + P If (z + I) + P If (z -I) Z,z z,z-I z,z+

Z,Z-from which the statement follows for I ~ z < n. For Z

=

0 we see by (31)

and (30) that fnCO) = fn(l) - c. Because fn(l)

=

fn(_I) the statement is

(20)

Remark. Note that

n-I n

If s(n - I)

=

s(n) - b IC' then f (n - I)

=

s(n - I)

=

f (n - I) from

n-which it follows by induction that fn(z) = fn-I(z) for all z.

In lemma 6 we show that if fn(z) satisfies (29) then fn(z) ~s a majorant of s(z).

Lemma 6.

If

fn(n - I) ~ s(n - I),

then

fU(z) ~ s(z)

for all

z E Z.

Proof. Since fn(z)

=

s(z) for

Izi

~ nand f and s are symmetric around 0 it remains to show that

8(z) := fn(z) - s(z)

~s nonnegative for 0 ~ z ~ n - 2. Let

t. ~ , then 1 - t. = Pn-i+l,n-i ~ Pn-i+l,n-i+2 Pn-i+1 n-i

,

From (19) we have s (n - i) t. s (n - i + I) + (I - t. ) s (n - i + 2) , ~ ~ for ~ ~ n ,

and from (32) we see for i ~ 2

Subtracting s(n - i) from fn(n - i) gives

(34) 8(n - i) t .

e

(n - i + I) + (I - t. )

e

Cn - i + 2) + t. e •

~ ~ ~

with 8 (n) = 0 and

e

(n - I) ~ 0 by assumption we have

e

(n - I) ~ 8 (n). Assume 8(n-i+l) ~ 8(n-i+2) then, since I - t. < 0, it follows

~

(I - t.)8(n-i+2) ~ (I -t.)8(n-i+l) •

~ ~

Substituting this in (34) gives

8(n-i) ~ t.8(n-i+l) + (l-t.)8(n-i+l) + t.e ~ 8(n-i+l) •

~ ~ ~

Hence by induction is 8 (z) decreasing in z for z ~ O. With 8 (n - I) ~8 (n) =0

(21)

Theorem I.

I) If a 2:

~: ~ ~ ;~

then v(z) = s (z) and immediately stopping is optimal. 2) Othe~ise~ let n be the unique natural number with

(35) sen) - sen - I) c _ _1_::; b P_n,n-I n-I < sen) - sen - I) c

v(z), and hence n = k with k defined as in (27).

Proof of I). s(z) satisfies (19), hence

s(z) > -c + p IS(z + I) + p IS(z - I),

z,z+ Z,Z- z f. 0 .

s(O)

!(r -

~) and s(O) 2: -c + s(l) if fies (22) for all z,

from (18) s(l) = r - (r +~)a ~ 1 • Therefore

r + ~ - 2c . > r +~ - 2c

a 2: ~ 2 • Hence ~f a - ~ 2 then s(z)

r+ + c r+ + c

and it is also the minimal solution of (22) hence

= s(z) for all z E Z (this is also a consequence of lemma 3).

satis-v(z) =

Proof of 2). Since s(O) < -c + s(l) the function s(z) does not satisfy the

functional equation (22) and hence k 2: I.

n -I

We first prove that f (z) ~s a solution of (22). From b

n_1< (s (n) - s (n - I» c it follows that s(n- I) < sen) - b IC' According to lemma 6 and relation (31)

n-we obtain fn(z) 2: s(z) for Izi < n. From lemma 5 and from (19) we have

(36) _{-c + P} _{+Ifn(z + I) + P} _{Ifn(z - I) ,}

z,z

z,z-for Izl < nand Izi > n. From the first of the inequalities (35) and

rela-tion (31) we find Hence n f (n - I) ::; sen - I) + _ c_ _ Pn n-j

,

fn(n)

=

sen)

=

Pn,n+ls(n + 1) + Pn,n-ls(n - 1)

~

~ -c + P Is(n + 1) + P Ifn(n - I) . n,n+

n,n-Consequently (36) holds for all Z E Z. Hence fn(z) is a solution of (19).

According to lemma 3 we have fn(z)

=

v(z) and in view of lemmas 3 and 4 there

is exactly one natural number with property (35).

0

Remark. According to the lemmas 3 and 4 and theorem I we have that for n, defined in (35), fn(z) satisfies

(22)

(37) s (n - I) + _ _c_

Pn n-I

,

and that fn(z) ~s the only f-function as defined in (31) with this property.

8.2. Justification of the aZgorithm

In this section we suppose again that S = 1. We have seen, that to find the value k it is only necessary to compute the numbers b , n

=

1,2,3, ..• and to

n

check the inequality s(n) - s(n - I)

c which is equivalent to:

______ ~ b < s(n) - s(n - I)

n-I c

s(n - I) < fn(n - I) ~ s(n - I) + c___

Pn,n-I

We shall prove in lemma 7 that it is only necessary to check the left-hand side inequality of (35), or equivalently the right-hand side of (37). The first n ~ I for which this inequality holds is k. This gives, with the check

'f r + £ - 2c . ) the 1 ' h f 11 f

~ a < r + £ + 2c ' the algor~thm. Part 3 of a gor~t m 0 ows rom

theorem I (fn(z)

=

v(z)) and relation (31).

n c n+1

Lemma 7.

If

f (n - 1) > s(n - I) + ,

then

f (n) > s(n)

for

n

=

1,2,3, •••

Pn,n-I Proof. From fn(n - I) > s(n - I) + c Pn n-I follows that ' (38) s(n - I) + b IC < s(n) _ c n- p n,n-I From (31) we have n and f (n - I) s (n) - b Ic it n-fn+1 (n)

=

Hence, with (19) s(n + I) - b c n (I + P_{n,n-I n-I}b ) c s (n + I) - --.-.;.~---Pn,n+1 n+1 p_n,n+If (n): s(n) _P _{] s(n-I)-(c+p} ]b IC ) : n,n- n,n- n-: s(n) - p (s(n - I) + b ) - c • n,n-I n-I

(23)

with (38) we find

P_n,n+1 s (n) • n+I

Therefore f (n) > sen).

Corollary. Lemma 7 is equivalent to che following assertion:

o

< sen) - sen - I)

i f b_n-1 _c - - - then b_n < sen + I) - sen)_c

The following lemma gives an upperbound for k and so for the number of itera-tions of the algorithm.

Lemma 8.

Let

n

be the least

n

such that

n > !(r + £)/c,

then

k < n.

Proof. Consider the stopping time ~

=

0. Evidently when we start ~n z

=

0,

All stopping times T with P{T ~ n}

=

1 have an expected return not larger than r - nco Because r - nc < (r £)/2, these stopping times have an

expect-ed return less than the stopping time ~

=

0, hence they are not optimal. Suppose now that k ~ n. Then the optimal stopping time ~ when starting in z

=

0, requires at least k steps, so that p{~ ~ k}

=

I. Hence it is not

optimal, which is a contradiction. Therefore k < ~. 0

8.3. The mimimax decision rule

In this section we shall show that the pr~or distribution characterized by n

=

! is least favourable, (see [2J for a definition).

We return to the original Harkov chain ZO,:i

1,:i2"" defined in

(n,

with state space [O,IJ.

In the same way as in (20) we define w(y)

:=

sup E [S(:i_T - TCJ •

T y

-Obviously from (14) and (12) vS(O)

=

w(I

~

s)

=

wen). Moreover, we!) equals the expression defined in (5) for the optimal stopping time.

Similar to (23) we define recursively (recall (10) and (11» wo(y)

=

s(y)

(24)

w (y) = max{s(y), -c +

L

p (x)w I(g (x))} .

n x=O,l y n- y

The properties (22) and (24) become for the chain YO'YI'Y2"" w(z) = max{s(y), -c +

L

p (x)w(g (x))} x=O, I Y Y and w(z) lim w (y) n n4<>O

Lemma 9.

The function

w(y)

on

[O,IJ

is symmetric around

y ~

and

w(y)

is

convex.

for

°

< y $ I and that PI (x) =

i+y

same way as in lemma I we may prove

\ P (x)w I(g (x))}

L y n- y

x=O,l

o

Suppose now that wn-1(y) is is also convex (for a proof

is convex. is convex. Therefore w(y) = lim w (y)

n n4<>O Proof. Note that s(y + ~) = s(~ - y) = Pi (I - x) for x = 0 or I. In the

2-y

that w (y) is symmetric around_n ~ and therefore w(y) also. Note that s(y) is convex on [O,IJ, therefore wO(y) is convex.

convex for some n. Then

L

p (x)w I (g (x))

x=O,1 y n- y

see [3J) and we may conclude that w (y) = max{s(y), -c +

n

Corollary. wO) min w(y).

ye[0, I

J

We shall use the following version of the minimax theorem (see [2J):

If for a sequential decision problem the parameter set

G

is finite and the

risk set

S

is bounded from below then

inf

(1J!,

c) sup r(~,(~,c)) = sup rr rr inf

(1J!,

c) r(rr,(1J!,c)) ,

where r(rr,(1J!,o)) is defined in (4). The infimum has to be taken over all pr10r distributions and the supremum over all sequential decision rules. R(6,(1J!,c)) is the risk with respect to the prior distribution which gives probability 1 to 6.

A sequential decision rule

(1J!O'c

O)

is called minimax if

sup R(8,(1J!O'C_O)) =

BeG

inf sup R(6,(1J!,C)) •

(25)

In our case the risk set S is defined by

S

=

{R(S,(1j>,o)), R(l-S,(1j>,o))

I

(1j>,0) a sequential decision rule} •

Because

R(S, (1j>, 0)) ;:: -r and R(I - S, (1j> , 0)) ;:: -r for all (1j>,0)

the risk set S is bounded from below and we may apply the minimax theorem.

Theorem 2.

The prior distribution represented by

n

=

!

is Zeast favourable

and the sequential decision rule

(Yo,oo)

th

YO

the

ent~

time of the set

{z

I

Z E Z, v(z)

=

s(z)}

and

00

chooses

HI

if

z > 0

and

HO

if

Z ~ 0

is a

minimax rule.

Proof. From (8) and (17) it follows that -wen)

=

inf r(n,(y,o)) •

(y,o)

According to lemma 9 and section 2 we have -we!) = sup

n

Therefore the prior distribution represented by n =

!

is least favourable. Because Ps{deciding H

_O

}

=

Pt_s{deciding HI} for the decision rule (yo,oO) we have

Hence

Note that

sup r(n, (lji,o))

n

sup R(S,(yO'oO)) for all TI •

SEG

sup R(S, (\)1,0)) • SEG

Applying the m~n~max theorem, we obtain

sup inf r(n,(\)I,o))

n (y,o)

inf sup R(6,(lji,0)) (y,o) SEG

(26)

23

-References

[IJ Derman, C.,

Finite state Markovian decision processes,

Academic Press, New York, 1970.

[2J Ferguson, T.S.,

MathematicaL statistics. A decision theoretic approach,

Academic Press, New York, 1967.

[3J Hordijk, A. & K.M. van Hee,

A Bayes process,

Mathematical Centre Report SW 23, Amsterdam, 1973.

[4J Hordijk, A., R. Potharst & J.Th. R1:at1cuburg,

OptimaL stopping of Markov

chains,

MC Syllabus 19, Mathematical Centre, Amsterdam, 1973,

(in Dutch).

[5J Ross,S.,

ProbabiLity modeLs with optimization appLications,

Holden-Day, New York, 1969.

[6J Van Hee, K.M.,

The poLicy iteration method for the optimaL stopping of a

Markov chain with appLications.

Proceedings of the 7th IFIP confe-rence on optimization techniques. Springer-Verlag 1975.

[7J Hordijk, A. & K.M. van Hee,

A sequentiaL sampLing probLem soLved by