Markov chain with an application
Citation for published version (APA):
Hee, van, K. M. (1975). The policy iteration method for the optimal stopping of a Markov chain with an application. (Memorandum COSOR; Vol. 7504). Technische Hogeschool Eindhoven.
Document status and date: Published: 01/01/1975
Document Version:
Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers)
Please check the document version of this publication:
• A submitted manuscript is the version of the article upon submission and before peer-review. There can be important differences between the submitted version and the official published version of record. People interested in the research are advised to contact the author for the final version of the publication, or visit the DOI to the publisher's website.
• The final author version and the galley proof are versions of the publication after peer review.
• The final published version features the final layout of the paper including the volume, issue and page numbers.
Link to publication
General rights
Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of accessing publications that users recognise and abide by the legal requirements associated with these rights. • Users may download and print one copy of any publication from the public portal for the purpose of private study or research. • You may not further distribute the material or use it for any profit-making activity or commercial gain
• You may freely distribute the URL identifying the publication in the public portal.
If the publication is distributed under the terms of Article 25fa of the Dutch Copyright Act, indicated by the “Taverne” license above, please follow below link for the End User Agreement:
www.tue.nl/taverne Take down policy
If you believe that this document breaches copyright please contact us at: openaccess@tue.nl
providing details and we will investigate your claim.
RRC'~
81'~
cos
TECHNOLOGICAL UNIVERSITY EINDHOVENDepartment of Mathematics
STATISTICS AND OPERATIONS RESEARCH GROUP
Memorandum COSOR 75-04
The policy iteration method for the optimal stopping of a Markov chain with an application
by
K.M. van Hee
by
K.M. van Hee
0. Summary
In this paper we study the problem of the optimal stopping of a Markov chain with a countable state space. In each state i the controller receives a re-ward rei) if he stops the process or he must pay the cost c(i) otherwise. We show that, under the condition that there exists an optimal stopping rules
the policy iteration method, introduced by Howard, produces a sequence of stopping rules for which the expected return converges to the value function. For random walks on the integers with a special reward and cost structure, we show that the policy iteration method gives the solution of a discrete
two point boundary value problem with a free boundary. We give a simple al-gorithm for the computation of the optimal stopping rule.
I. Introduction
Consider a Markov chain {x
I
n=
O,I,2, ••• }
defined on the probability space n(~,F~). The state space S is countable. We suppose that F[x
O = i] > 0 for all i E S. Hence Fi[AJ, the conditional probability of A E
F
given Xo
=
i,~s defined for all i E S.
On S real functions rand c are defined, where rei) is the reward if the pro-cess is stopped in state ~ and c(i) is the cost if the process goes on. We consider stopping times T (for a definition see [7J). For a nonnegative func-tion g on S we define
Ei[g(xT) ] : =
I
g(xT)&P i •{T<oo }
Condition A. Suppose that the reward function r satisfies
for all i E S and all stopping times T.
(Note that: r+(i) := max{O.r(i)}. r-(i) := -min{O,r(i)}).
Let P be the transition matrix of the Markov chain, with components P(i,j) for i.j E S. If the function c on S is integrable for all
F.[.J,
we define~
the function Pc by
Pc(i) :=
I
P(i,j)c(j)jES n-I
and with induction. if P c is integrable for all
F.[.J
~
n n-I
P c := P (P c).
We call a function c on S a charge (see [3J) if
00
I
pnI
cI
< 00 •n=O
(Note that for function v and w on S: v ~ w if v(i) ~ wei) for all i E Sand v < w if v(i) < wei) for all i E S. Further Ivl is defined by
Ivl (i) := Iv(i)
I).
Condition B. Either the cost function c ~s a charge or rand c are nonnega-tive, both.
Throughout this paper we shall suppose that conditions A and B hold.
We call a function w on S c-exessive with respect to the cost function c if
I) 2) w 2': -c + Pw 00 w 2':
-
I
pnc.
n=OFor a stopping time T the expected return vT(i)t given the starting state ~t is defined by T-I vT(i) := lEi[r(l.r) -
2
n=O c(X )J • nThe existence of the expected return vT(i) is guaranteed for all T since IlEiLr(xT)JI < 00 for all i and c ~s either a charge or a nonnegative function.
Note that vT(i)
=
-00 is permitted.The value function v(i) is the supremum over all the stopping times T v(i)
:=
sup vT(i) •T
Sometimes we need the following assumption.
*
Assumption C. There exists an optimal stopping time T t i.e. vT*(i)
=
v(i) for all i E S.In the rest of this section we summarize some properties of stopping problems.
I. I. The value function v satisfies the functional equation v(i)
=
max{r(i)t -c(i) +L
P(itj).v(j)}j ES
(see [2J, [3J or [7]).
1.2. The value function v is the smallest c-excess~ve function dominating the reward function r (see [2J and [3J).
1.3. If an optimal stopping time exists the entrance time T
r
~n the set r := {iI
rei) = v(i)} ~s optimal (see [2J and L6J).1.4. If sup Ir(i)
I
< 00 and inf c(i) > 0 then there exists an optimal stoppingiES iES
time (see L2J and [7J).
2. Some preparations
A stopping rule f ~s a mapping from S to {O,l} where f(i)
=
°
means that the process is stopped in i and f(i) = I means that the process goes on in state~. The stopping rule f is equivalent with the entrance time Tf in the set Pf
:=
{iI
f(i)=
OJ. The expected return under a stopping rule f is indica-ted by vf(i).For a stopping rule f we define 2. 1• D
f := { i E S I f (i) = 1}, the go-ahead set.
r
f := S\Df , the stopping set.
2.2. P
f is the matrix with components
P(i, j)
o
otherwise • 2.3. d f is a function on S with r(i) -c(i) if i tor
f otherwise •If assumption C holds, property 1.3 guarantees that the entrance time T
r
1n the setr
is also optimal. In that case2.4. v(i) = E . [ r ( L ) -1 -L
r
T -]r
I
n=O c(x )J • nAccording to the stopping time T
r
we define the stopping rule f* by 2.5. f (i) = 0*
if and only if i Er .
Further let
Lemma 1. For each stopping rule f with v
f ~ r we have ]) Ivf(i)
I
< 00 00 2) vf=
L
Pfd fn n=O 3) limp~1
dfI
= 0 n-7004)
vf = df + Pfvf 5) limP~lvfl
= 0 n~ (pointwise convergence) (pointwise convergence) •Proof. If rand care nonnegative we have
Since
for all i to S •
T -I
f
vf(i) = lE.[r(2L )J -lE.[
L
~ -Lf ~ n=O we may conclude c(X )J n c(X )
IJ
< 00 n for all i E S •Note that if c ~s a charge this also true. Define:
2.6. T -I f +JE.[
I
~ n=O I c (X ) 1J •
nSo we have for both cases of B
We have the following representation
00
w
=
f
I
p~ldfl
n=O
(note that
P~(i.j)
= 1 if and only if~
convergence.
00
I
P~df' (Statement 2) n=Oj) and ~n the same way. by absolute
Because w
f < 00 we may conclude p~ldfl + 0 for n + 00 (statement 3)
00 00
is finite we may change the summation order, hence
00
df + Pf
I
P~df
= df + PfV f ' (Statement 4) n=OIn the same way
By iterating this equation we get
from which it follows that
P~Wf
tends toa
if n tends to 00. Because [vfl~wf
we have also
O. (Statement 5)
o
Corollary I. If C hold we have from 2.4 and lemma I that !v(i)! < 00 for all i E S and lim pnldl
=
0 •n~ Define: 00 w:=
I
rldl • n=O By lemma 1 we have 2.7. lim ::nP w=
0 • n~In the next section we study expressions like
P~vf'
where f and g are stopp-ing rules. We shall give sufficient conditions in lemma 2 for the finiteness of these expressions.Lemma 2. Let f and g are stopping rules. Suppose vf z r. Then
P~!vf; ~s
finite for k = 1,2,3, •••Proof. Let T := Tf + k. Using the same arguments as in lemma 1. we derive for c a charge: T-I lEi [
I
r (~)I
+I
n=O Note thatI
c(X )I]
< 00 • n T-I lEi [I
r (XT)I
+I
n=O (Wf is defined in 2.6). HenceI
c (X )I]
= n k-IL
pnlcl (i) + pkwf(i) n=ONow let rand c be nonnegative.
pkvf is defined because vf
~
r~
O. Hence pkVf~
pk r~
0T -1 f
I
n=O c(X )] ~ n Define vectors c f and r f by rf(i) := rei) if i Er
f:=
0 otherwise:=
0 otherwise Note thatI
df I = r f + cf • It is easy to verify that 00
L
pk(i,j)lli.[r(~
)] = pkL
P~rf(i)
and jES J f n=O
T -1
f 00
L
pk(i,j)lli.[L
c(X )J := pkI
P~cf(i)
.
jES J n=O n n=O
k
Hence P w
f
P~IVfl
< 00.Reasoning like before, we eee that
3. Policy iteration method
Let f be a stopping rule. such that
L
P(i,j)vf(j) is defined. For f we de-fine the improved stopping rule g byjES3. 1. g(i) :=
a
if rei) ~ -c(i) +L
P(i,j)vf(j)jES
: =
otherwiseLemma 3. Let g be the improved stopping rule of f and let v
f ~ r. Then
Proof. We first prove 1).
If g(i)
=
1 thenrei) < -c(i) +
L
P(i.j)vf(j) ~ -c(i) + jEShence
L
P(i.j)v(j) jES~ v(i)
D
=
{ iI
g (i)=
I} c { iI
v (i) > r (i) } D.g We proceed with 2). Note that Pgv f is finite P (i,.) = P (i •• ) and so g
(by lemma
2).
let ~ E D then g(i)g = 1I dg(i) =-c(i) t rei) < -c(i) + Since either
L
P(i.j)vfU) = j ES d(i)
+ g -c(i) +L
P(i,j)vf(j) jESor vf (i)
=
rei) the statement is true for 1. E D.
g If i Er
then g(i)=
0, d (i)=
rei) and P (i,.)g g g
rei) ~ -c(i) +
I
P(i,j)vf(j) jESit 1.S true for i E
r
.
ga
and S1.nceLemma 4. Assume C. If g ~s the improved stopping rule of f and if v
f ~ r then
vg ~ vf.
Proof. From lemma 2 it follows that
P~lvfl
exists and is finite for all k. By lemma 3 ~s v f ::;; dg + Pgv f' Hence N k N pkd N+I kL
P gVf ::;;L
+L
PgV f k=O k=O g g k=1 and therefore NL
k=O N + We shall prove that Pgvf + 0 for N + 00. Consider first the case that r ~ 0 and c ~
o.
Since 0 ::;; r ::;; vf ::;; v and Dg C D
o ::;;
P:Vf ~ pNv ::;; pNvg
by corollary 1 P vN + for N+ 00.
Suppose now that c ~s a charge: ::;; w
(w ~s defined in corollary I) hence
By 2.7 Therefore N P w + 0 00 for N+ 00 • v g 11u
r (for example fO(i) = 0 for all iES)
We define a sequence of stopping rules {f
O,f1,f2, ••• } by 3.2. fO(i) is a stopping rule with v
f
o
~f is the improved stopping rule of f I ' n ~ 1 (see 3. I) •
n-theorem 2 we study the convergence of v f
n
some properties of the sequence {f
O,fI,f2J ••• } are derived. In to v. Call
The method of approximating the optimal stopping rule and its expected return by the sequence 3.2 is called the policy iteration method. This method was introduced by Howard [4J for decision processes with a finite state space and discoun ted rewards.
In theorem 1) v := v f n n 2) d := d f n n 3) D := D f n n 4)
r
:=r
f n nTheorem 1. Assume C. The following assertions hold
Proof. I t follows from lemma 4 that v 1 ~ v for n ~ OJ since v
o
~ r. I fn+ n
f (i) = I then n
rei) < -c (i) +
L
P(i,j)v 1(j) ::; -c (i) +I
P(i,j)v (j)J for n;-::: IjE:S n- j ES n
hence fn+I(i) = I J which proves assertion 1. Suppose fn(i O)
fn+I(i O)
=
I, theno
andvn(i O)
=
r(io) < -c(io)
+L
P(i,j)vn(j)::; j ESo
The,)rem 2. As s ume C.
00
I) I fJ either v 2:
-
L\' p c ork v 2: 0 for some nO' then lim v=
v.nO k=O nO n--'oo n
2) If J 1n addition to I , f
=
f for some n ->nO then v 1S optimal.
Proof. Since Dn c D for all n (lemma 3) and since f (i) is nondecreasingn ~n
n (theorem 1) there exists a set E c S such that lim D = E c D .
n
n-+oo
And, in the same way, since v (i) ~ v(i) for all n and since v (i) ~s
nonde-n n
creasing in n, there exists a function z such that z(i) = lim vn(i) •
n-+«>
Fix some i E E. For all n sufficiently large is i E D and so:n
rei) ~ v (i) = -c(i) +
L
P(i,j)v (j) ~-c(i) +L
P(i,j)v(j) =v(i) •. n j ES n j ES
Since v (i) t z(i) we have by monotone convergence
n
-c(i) +
L
P(i,j){v (j) -r(j)} t -c(i) +L
P(i,j){z(j) - dj)} ,jES n jES
hence
z(i) = -c(i) +
L
P(i,j)z(j) ~ v(i) •j ES
Fix some ~ E S\E. For all n it holds that i E
r
hencen
v (i) = rei)
n 2:: - c(i ) +
L
P(i ,j )vn (j ) j ESand therefore (again by monotone convergence) z(i) = rei) 2:: -c(i) +
I
P(i,j)z(j) •jES So z satisfies the functional equation:
z(i) max{r(i) ,-c(i) -
L
P(i,j)z(j)}j ES
co
Now, suppose v 2:: -
L
pnc. Ther. z 2:: -I
pnc and since z satisfies thenO n=O n=O
f~nctional equat.ion, z is a c-excessive function dominating r. Because v is the smallest function with this property it must hold that v
=
z.
If v 2:: 0 it must hold that z 2:: 0 and v 2:: 0. We now prove that v. = z on f. nO
Let i E
r:
Let. now 1. E D:
o
~ v(i) - z(i) ~L
P(i.j){v(j) - z(j)} • jES...
Hence 0 ~ v-z ~ P(v-z).
Iterating this inequality gives
o
~ v - z ~pn
(v - z) ~pn
v -+ 0 for n -+ 00which proves v = z. The first assertion 1.S proved.
Suppose fn = fn+1 for some n 2: nO' Then vn = vn+1 and therefore fn+2= fn+l • By induction it follows that z
=
v which proves the theorem.n
Lemma 5. Let c be a charge. Let f be the stopping rule defined by f(i) for all i E S and let g be the improved stopping rule. then
and v 2: r • g If v g vf then f is optimal. 00 Proof. Since v
f =
L
pnc it holds that PVf andP~vf
are finite. Following n=Oexactly the proof of lemma 3 we have v
f ~ dg + Pgvf and from the proof of lemma 4 it follows. since
P~vf
is finite. thatn
I
k=O Note that Since c 1.S a charge: 00 w f :=I
pnlc! < 00 • n=O Hence wf
=
Icl + PWf and therefore P~wf tends to 0 if n tends to 00. Becausewf 2: Ivfl we may conclude
lim pnlv I
=
0 •Hence
00
= v
g
rei) and if g(i) = I then
I f g(i) =
o
then v (i)g
rei) < -c(i) +
I
P(i,j)vf(j)jES
::; v (i) • g
v
f
o
~ r ~ 0, hence the sequenceHence v ~ r.
g
Now, suppose v
g = vf' then
hence v
f ~s c-excess~ve and dominates r. Because vf ::; v and the fact that v
is the least function with this property, we have v = vf'
0
Corollary 2.
I) If r is nonnegative, we have for f
O
=
0v converges to v.
n
2) If c ~s a charge we may start with f_
1(i) := I for all i E S and try to
improve this stopping rule by fa. If no improvement is possible (i.e. v
f
a
= vf ) we have already the optimal stopping rule. Otherwise fasatis-:-1 fies a) V
o
= v f 2: r 0 00 b) Vo
~-
I
pnc n=O hence v converges to v. n Examples.I) There exists a stopping problem satisfying assumptions A, Band C where the policy iteration method does not converge to the optimal stopping rule.
Lo;t S = {l,2}; r(l)
=
r(2)=
-I~ c(l)=
c.(2)=
0 and P(I~I) =a= I -P(I,2),?(2,2)
=
S=
I - P(2,1). The optimal stopping rule is f(I)=
f(2)=
I andv( I)
=
v(2)=
O. The cost function is a charge and ]Ei[
I
r C{T)i
l ::; I. Notethat reI) = ar(I) + (I - a)r(2) and r(2) "" Sr(2) + (l - S)r(l) so that
r ~ c + Pr hence f
2) There exists a stopping problem satisfying assumptions A and B where the improved policy of fO is not at least as good as fl. Let
S
=
{0,I,2,3, ••• } u {x}, For ~ = 0,1,2,3, ••• : P(i,i + 1)=
1 - E, P(i,x) Further: I>E>O. E, rei)=
i '
c(i)=
° .
(1 - E) P(x,x) 1, r (x)=
1, c (x)=
1 •Note that rand c are nonnegative both (condition
A).
We shall examine the stopping time T=
n:n
Hence
wei) := sup v
T (i) = - - -....
i
+ 1 •r
n (1 - E)This function w satisfies the functional equation wei) = max{r(i),-c(i) +
I
P(i,j)w(j)}jES 00
and w ~ -
L
pnc, hence w=
v so that v(i) < 00 from which it follows thatn=O
Ei[!r(Xr)
IJ
< 00 for all ~ and all T (conditionB).
For i = 0, 1,2,3, ••• : 1 rei) = , < (1 - E)-~---~,~ (1 - E)~ (1 - E)~+I + E = -c(i) +
L
j I~S P(i,j)r(j)and rex)
=
1 > -c(x) + rex).Hence fI(i) = 1 for i E {0,I,2,3, ••• } and f
1(x) =
°
so that v1(i)all i, but vO(i) =
i
> 1 for i = 1,2,3, . . . .(I - E)
4.
An
applicationWe shall study ~n this section the optimal stopping of a random walk on the integers with a special cost and reward structure, to illustrate the compu-tational aspects of the policy iteration method. For simplicity we shall not formulate the results as general as possible.
Definition of the decision process.
Consider a random walk on the set of integers (2). Let the transition matrix P be defined by
4.1. P(i,i + 1) := Pi' P(i,i) := si' P(i,i - J) = qi with p. ,q. > 0, s~ 2
°
and p. + q. + s.~ ~ ~ ~ ~ ~ 1. The reward function
4.2.
o :::;
rei) :::; M, ~ E: 2 The cost function4.3.
c(i) 2 6 > 0, i E: 2 •Further we assume the existence of integers d, e, such that: 4.4. rei) < -c(i) + p.r(i + 1) + q.r(i - 1) + s.r(i)
~ ~ ~
if and only if d :::; ~ :::; e. Call H
:=
{i E: 2I
d :::; ~ :::; e}.Assumption 4.4 says that for i E: 2\H immediately stopping is more profitable than making one more transition. In statistical sequential analysis there are examples of random walks where this assumption is fulfilled in a natural way (compare [5J). In lemma 6 we collect some properties of this process.
Lemma 6. For the sequence of stopping rules f
O,f1,f2, ••• defined in 3.2 with fOCi)
=
°
for all i E: 2 it holds that1) there exist numbers k,£ E Z such that
n n D
=
{i E 2I
k :::; ~ :::; £ }, n n n n=O,I,2, ••• 2) k n 2 kn+I 2 3) for some n kn - 1 and £n :::; f is optimal. n £ n+l < ;C11 + 1.Proof. Since
a
~ rei) ~ M and c ~ 0 A and B are satisfied. By 1.4 we know that the entrance time inr
is optimal. hence the assumption C is fulfilled. By theorem I we have D c D I for n=
0.1.2.3, ••• and by theorem 2 we haven n+
lim v (i)n
=
v(i). We shall prove I and 2 with induction.
n--~s easy to verify that fl(i)
=
I if and only if i E H, hencee. Suppose 1 hold for n = m. For i < k - I and i > £ + 1
m m
f l(i) = 0 because v (i) rei) and ~ E Z\H. Therefore it can
m+ m
in the points i
=
km- I and i=
Q,m+ I that fm+1(i) >f (i). Since nand 2 are proved. Now the last assertion. happen only DO is empty. It k1 = d and £1 it holds that D c D m m+l
Note that 0 ~ rei) ~ M and c(i) ~ 0 > 0 for all ~ E Z. Choose I > E > 0 and a natural number k such that (I - E)k >
i .
Let f be the optimal stopping rule. We shall prove Fi[Tf ~ kJ ~ E. Suppose the contrary, i.e. let Fi[Tf ~ kJ < E. Then
which ~s a contradiction.
Hence for all i E Z
r
must be reachable in at most k steps, so thatDc {i j d-k ~ ~ ~ e +kL Since D I e Dc D and because D I is a proper
n- n
n-subset of D if f I(i)
#
f (i) for at least one i we may conclude thatn n- n
fn- 1 = fn for some n. 0
Computational aspects
In our case v ~s the smallest solution of
v(i)
=
max{r(i),-c(i) + p.v(i + 1) + s.v(i) + q.vCi - I)} •~ ~ ~
Because we know the structure of D we may say v is the smallest ~unction x which has the following properties.
For some k ~ d and some Q, ~ e, i,k,Q, E Z:
I) xCi) -c (i) + p.x(i + I ) + s.x(i) + q.x(i - I ) , k ~ i ~ Q,
J. ~ ~
2) xCi) = r (i) , i > Q" 1- < k
This ~s a two point boundary value problem with a free boundary. We shall show that for fixed k and £ the function x is completely determined by I and 2.
Define, for function on Z, the difference operator ~ as usual by 4.5. ~x(i) := xCi + 1) - xCi) •
Consider the difference equation, derivated from I,
4.6.
p.~x(i) - q.~x(i - I) = c(i) •~ ~ Call: Z. :
=
~x(i) , ~ Hence4.6
becomes b. ~ and bi :=---p:-
c(i) ~With induction on m it ~s easy to verify that for k ~ m ~ £
4.7. Z m m zk-l IT i=k a. + ~ m
L
i=k m {b. IT ~ j=i+I a. } J(an empty product has the value I , an empty sum the value 0).
Because x(£ + I) = r(Q, + 1) and x(k - I ) = r(k - 1) it holds that
Q, r(£+ I ) - r(k - 1)
L
,.., Z m=k-I m hence 9, mI
L
m==k-I i=k 4.8. r(~ + I) - r(Q, - 1)-v,
I
m=k-I mn
i=k a. ~ m {b. IT ~ j=i+1 a. } JFrom 4.7 and 4.8 one can compute zk,zk+I""'zQ, and even so
x(k) ,x(k + I) , ••• ,xU>, which shows that the function x is completely deter-min!:!1.
The boundary conditions 3 can be formulated as fo110';'/s
4.9.
~r(Q, + 1) - ao+'jzo ~ b , IV :IV £+1
which shows that we only have to compute the differences zk to check 3 and not the function x itself.
It is easy to verify that the sums and products in 4.7 and 4.8 can be corne puted recursively. We shall formulate an algorithm to compute the optimal stopping rule and the value function v.
Algorithm
2. compute zk_l (by 4.8) and Zt (by 4.7), set ~ := 0, 3. i f zk-l
-
~-1.
f:!.r(k - 2) > bk_1 then k := k - 1 and i := 1, 4. if f:!.r(.Q, + 1)
-
a.Q,+l z .Q, > b 1 then .Q, := .Q, + 1 and i := 1,H
5. i f ~
=
a
then goto 6, else goto 2,6. D ~s the set {i E Z
I
k ::; i ::; .Q,} and v can be compute by 4.7.Acknowledgement
The author whishes to express his gratitude to Dr. A. Hordijk for pointing out a serious mistake in an earlier version of this paper.
Literature
[IJ Dynkin, E.B., Juschkewitsch, A.A.; Satze und Aufgaben uber Markoffsche Prozesse. Springer-Verlag (1969).
[2J Rordijk, A., Potharst, R., Runnenburg, J.Th.; Optimaal stoppen van Markov ke tens. MC-syllabus 19 (1973).
[3J Rordijk, A.; Dynamic programming and Markov potential theory. MC tract (1974) •
[4J Howard, R.A.; Dynamic programming and Markov processes. Technology Press, Cambridge Massachusetts (1960).
[5J van Hee, K.M., Hordijk, A.; A sequential sampling problem solved by op-timal stopping. MC-rapport SW 25/73 (1973).
[6J van Ree, K.M.; Note on memoryless stopping rules. COSOR-notitie R-73-12, T.H. Eindhoven (1974).
[7J Ross, S.; Applied probability models wt~h optimization applications. Holden-Day (1970).