for Routing in
Communication Networks
W a l t e r H . A n d r a g T h e s i s p r e s e n t e d i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r t h e d e g r e e o f M a s t e r o f S c i e n c e a t t h e U n i v e r s i t y o f S t e l l e n b o s c h S u p e r v i s o r : P r o f C h r i s t i a n W . O m l i n A p r i l 2 0 0 3I,
the undersigned,
hereby declare that the work contained
in this thesis is my own
original work and has not previously
in its entirety
or in part been submitted
at any
university
for a degree.
R o u t i n g p o l i c i e s f o r p a c k e t - s w i t c h e d c o m m u n i c a t i o n n e t w o r k s m u s t b e a b l e t o a d a p t t o c h a n g i n g t r a f f i c p a t t e r n s a n d t o p o l o g i e s . W e s t u d y t h e f e a s i b i l i t y o f i m p l e m e n t i n g a n a d a p t i v e r o u t i n g p o l i c y u s i n g t h e Q - L e a r n i n g a l g o r i t h m w h i c h l e a r n s s e q u e n c e s o f a c t i o n s f r o m d e l a y e d r e w a r d s . T h e Q - R o u t i n g a l g o r i t h m a d a p t s a n e t w o r k ' s r o u t i n g p o l i c y b a s e d o n l o c a l i n f o r m a t i o n a l o n e a n d c o n v e r g e s t o w a r d a n o p t i m a l s o l u t i o n . W e d e m o n s t r a t e t h a t Q - R o u t i n g i s a v i a b l e a l t e r n a t i v e t o o t h e r a d a p t i v e r o u t i n g m e t h o d s s u c h a s B e l l m a n - F o r d . W e a l s o s t u d y v a r i a t i o n s o f Q - R o u t i n g d e s i g n e d t o b e t t e r e x p l o r e p o s s i b l e r o u t e s a n d t o t a k e i n t o c o n s i d e r a t i o n l i m i t e d b u f f e r s i z e a n d o p t i m i z e m u l t i p l e o b j e c t i v e s . 1 1
D i e r o e t e r i n g i n k o m m u n i k a s i e n e t w e r k e m o e t k a n a a n p a s b y v e r a n d e r i n g s i n n e t w e r k -t o p o l o g i e e n v e r k e e r s v e r s p r e i d i n g s . O n s b e s t u d e e r d i e b r u i k b a a r h e i d v a n 'n a a n p a s b a r e r o e t e r i n g s a l g o r i t m e g e b a s e e r o p d i e " Q - L e a r n i n g " - a l g o r i t m e w a t d i t m o o n t l i k m a a k o m 'n r e e k s b e s l u i t e t e k a n n e e m g e b a s e e r o p v e r t r a a g d e v e r g o e d i n g s . D i e r o e t e r i n g s a l g o -r i t m e g e b r u i k s l e g s n a b y g e l e e i n l i g t i n g o m r o e t e r i n g s b e s l u i t e t e m a a k e n k o n v e r g e e r n a 'n o p t i m a l e o p l o s s i n g . O n s d e m o n s t r e e r d a t d i e r o e t e r i n g s a l g o r i t m e 'n g o e i e a l t e r n a t i e f v i r a a n p a s b a r e r o e t e r i n g i s , a a n g e s i e n d i t i n b a i e o p s i g t e b e t e r v a a r a s d i e B e l l m a n - F o r d a l g o r i t m e . O n s b e s t u d e e r o o k v a r i a s i e s v a n d i e r o e t e r i n g s a l g o r i t m e w a t b e t e r p a a i e k a n o n t d e k , m i n d e r g e h e u e g e b r u i k b y n e t w e r k e l e m e n t e , e n w a t m e e r a s e e n d o e l f u n k s i e k a n o p t i m e e r . 1 1 1
I would like to sincerely thank my supervisor,
Prof. C. W. Omlin, for all the inspiration,
assistance
and funding
he provided.
This work was also made possible by funding from the South African National
Research
Foundation,
Telkom-Siemens
Centre of Excellence for ATM and Broadband
Networks
and their Applications
and the Harry Crossley Scholarship
Fund.
1
Introduction
1
1.1
Motivation.
. . . .
1
1.2
Problem
Statement
1
1.3
Premises
. . . .
2
1.4
Hypotheses
2
1.5
Technical
Objectives
3
1.6
Methodology
3
1.7
Achievements
4
1.8
Thesis Organization.
5
2
Routing
in Com m unication
Networks
6
2.1
The Routing
Problem.
. . . .
6
2.1.1
Performance
Criterion
8
2.1.2
Decision Time .
8
2.1.3
Decision Place .
9
2.1.4
Network
Information
Source
9
2.1.5
Routing
Information
Update
Timing
9
2.2
Conventional
Routing
Strategies
. . . .
10
2.2.3
F ix e d R o u tin g ..11
2.2.4
A d a p tiv e R o u tin g .11
2.2.5
L in k - S ta te R o u tin g12
2.2.6
D is ta n c e - V e c to r R o u tin g12
2.3
M o b ile A g e n ts ...13
2.3.1
A c tiv e N e tw o r k s13
2.3.2
S o c ia l I n s e c t M e ta p h o r s14
2.4
S u m m a r y...
14
3Reinforcement
Learning
16
3.1
V a lu e F u n c tio n s . . .17
3.2
T e m p o r a l- D if f e r e n c e L e a r n in g19
3.3
Q - L e a r n in g...
20
3.4
T D ( > ') L e a r n in g .22
3.5
Q ( > ') L e a r n in g ..24
3.6
C o n v e r g e n c e P r o p e r tie s o f Q - L e a r n in g25
3.7
E x p lo r a tio n v s E x p lo ita tio n27
3.8
S u m m a r y...
29
4
Q-Learning for Traffic Routing
30
4.1
O p tim iz a tio n o f P a c k e t D e liv e r y T im e30
4.1.1
Q - R o u tin g ..30
4.1.2
D R Q - R o u tin g35
4.2
4.3
4.4
4.1.5
Probabilistic
CDRQ-Routing
.
Finite
Buffer Size
.
Optimization
of M ultiple
Objectives.
Summary
44
48
50
59
5
Conclusion
5.1
Conclusion.
5.2
Future
W ork.
5.2.1
Realistic
Simulations
5.2.2
Improved
Routing
..
V ll61
61
62
62
63
2.1
Design elements of a routing strategy
..
4.1
The parameters
used in the simulations
.
Vlll
8
3 .1 T h e a g e n t-e n v iro n m e n t in te ra c tio n .
3 .2 E s tim a tin g
V
1r w ith T D (O ). ..3 .3 E s tim a tin g
Q*
w ith Q -L e a rn in g .3 .4 E s tim a tin g
V1r
w ith T D (A ).3 .5 W a tk in s 's Q (A ) a lg o rith m .
16
20
22
23
2 4 4 .1 T h e B ritis h S y n c h ro n o u s D ig ita l H ie ra rc h y (S D H ) n e tw o rk to p o lo g y ... 3 2 4 .2 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 1 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 4 4 .3 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 2 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 4 4 .4 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 3 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 5 4 .5 A v e ra g e p a c k e t d e liv e ry tim e s o f B e llm a n -F o rd fo r h ig h n e tw o rk lo a d fo r th e S D H n e tw o rk to p o lo g y . E rro r b a rs s h o w s ta n d a rd d e v ia tio n s . ... 3 64 .6 A v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g fo r h ig h n e tw o rk lo a d fo r
th e S D H n e tw o rk to p o lo g y . E rro r b a rs s h o w s ta n d a rd d e v ia tio n s . ... 3 6
4 .7 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d D R Q
-R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 7
4 .8 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d D R Q
-R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 8
4 .1 0 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q
-R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 0
4 .1 1 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q
-R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 1
4 .1 2 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q
-R o u tin g fo r n e tw o rk lo a d 4 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 1
4 .1 3 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,
D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H
n e tw o rk to p o lo g y . . . .. 4 3
4 .1 4 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,
D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H
n e tw o rk to p o lo g y . . . .. 4 3
4 .1 5 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,
D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 4 .0 fo r th e S D H
n e tw o rk to p o lo g y . . . .. 4 4
4 .1 6 T h e v a ria n c e fu n c tio n o f E q u a tio n 3 6 fo r
f3
o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . .. 4 54 .1 7 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 1 .5 ;
f3
o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . .. 4 6 4 .1 8 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 3 .0 ;f3
o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . 4 7 4 .1 9 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 4 .5 ;f3
o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . 4 7 4 .2 0 T h e C o n g e s tio n R is k o f E q u a tio n 3 8 fo re
o f 3 , 6 a n d 1 5 . 4 94 .2 1 T h e 1 3 n o d e n e tw o rk to p o lo g y u s e d fo r th e fin ite b u ffe r s im u la tio n . 5 0
4 .2 2 A v e ra g e p a c k e t d e liv e ry tim e fo r lo w lo a d . 5 1
4 .2 3 N u m b e r o f p a c k e ts d ro p p e d fo r lo w lo a d .. 5 1
4.26 Average packet delivery time for high load.
. . . ..
53
4.27 Number
of packets dropped
for high load.
53
4.28 The network
topology
for the 36 node grid. . . ..
54
4.29 The
average
packet
delivery
time
for single versus
multiple
objective
optimization
for the 36 node grid for differing a.
55
4.30 Details of the steady state behaviour
of Figure 4.29.
56
4.31 The average cost for single versus multiple objective
optimization
for the
36 node grid for differing
a. ... . . ..56
4.32 The
average
packet
delivery
time
for single versus
multiple
objective
optimization
for the BT SDH network for differing
a.
57
4.33 Details
of the steady
state behaviour
of Figure 4.32.
57
4.34 The average cost for single versus multiple objective
optimization
for the
BT SDH network for differing a.
58
4.35 The average saving of multiple objective optimization
of cost and delivery
time for the BT SDH network versus
a.58
Introduction
1.1
Motivation
M o d ern co m m u n icatio n n etw o rk s m u st co p e w ith ev er in creasin g d em an d s o n n etw o rk
reso u rces. T h e ran g e o f serv ices o ffered lead s to b o th reg u lar an d less p red ictab le
traffic p attern s. A d ap tiv e ro u tin g is ab le to resp o n d to ch an g in g traffic p attern s an d
to p o lo g y , th u s p ro v id in g efficien t u se o f n etw o rk reso u rces. In n etw o rk s ch aracterized
b y a co n stan tly ch an g in g to p o lo g y , ad ap tiv e ro u tin g is essen tial. A d ap tatio n m ay b e
n ecessary in trad itio n al n etw o rk s d u e to failu res o f lin k s o r n o d es; in m o b ile ad -h o c
n etw o rk s, m o b ile ro u ters are ab le to m o v e ran d o m ly , th u s co n stan tly an d u n p red ictab ly
ch an g in g th e n etw o rk to p o lo g y .
In o rd er to ad ap t ro u tin g to ch an g in g n etw o rk co n d itio n s, a cen tralized ro u tin g strateg y
n eed s in fo rm atio n ab o u t th e statu s o f all n o d es an d lin k s in th e n etw o rk . H o w ev er,
th is in fo rm atio n tran sm issio n o v erh ead co n su m es v alu ab le n etw o rk reso u rces. T h is
h ig h lig h ts th e n eed to m ak e d istrib u ted ro u tin g d ecisio n s b ased o n lo cally av ailab le
in fo rm atio n o n ly .
1.2
Problem Statement
A p ack et-sw itch ed co m m u n icatio n n etw o rk can b e m o d eled as a set o f n o d es an d in
ter-co n n ectin g lin k s. D ata is ex ch an g ed o v er th ese co m m u n icatio n lin k s as a seq u en ce o f
p ack ets. In g en eral, n o d es are n o t fu lly co n n ected ; th u s, th e p ack ets m u st p ass th ro u g h
interm ediate
nodes.
T he
route
is the sequence of nodes along w hich a packet travels
to
its final destination.
In m ost netw orks,
there m ay be m ore than one route betw een
pairs
of nodes.
T he routing
problem
consists of finding the
optimal
route betw een
source and
destination
nodes, w here the optim al
route is the one that
delivers packets to their final
destination
in the shortest
tim e possible.
1.3
Premises
T he
prem ises
of the
packet
routing
dom ain
w hich w e believe
m ake
adaptive
routing
indispensable
are as follow s:
1. A netw ork
is a highly
dynam ic
environm ent
in w hich
traffic
patterns
m ay
be
unpredictable
and links or nodes m ay fail.
2. A central
routing
m echanism
w hich has global inform ation
about
the state
of the
netw ork
is generally
not feasible because
of the overhead
involved.
3. T hus,
w e need a good routing
policy w hich
(a) uses only local inform ation
and
(b) m inim izes
average packet delivery tim e.
1.4
Hypotheses
M achine learning
covers a broad field of m ethods
concerned
w ith the ability of program s
to learn
from
experience,
thereby
im proving
their
perform ance.
W e w ish to test
the
follow ing hypotheses
in this thesis:
1. M achine
learning
is a viable alternative
to static
routing
because
(a) it can
adapt
to changing
environm ents,
i.e.
changes
in traffic
patterns
or
netw ork
topology;
2 . R e in fo rc e m e n t le a rn in g is a fie ld in m a c h in e le a rn in g c o n c e rn e d w ith p ro g ra m s
ta k in g o p tim a l a c tio n se q u e n c e s so a s to a c h ie v e a g o a l. R e in fo rc e m e n t le a rn in g
is w e ll-su ite d fo r a d a p tiv e ro u tin g b e c a u se
(a ) it is g o a l-o rie n te d , i.e . a c tio n s a re to b e le a rn e d w ith a d e sire d o u tc o m e . T h e
g o a l o f ro u tin g is to d e liv e r p a c k e ts w ith m in im u m d e la y .
(b ) R e in fo rc e m e n t le a rn in g a llo w s to a c q u ire a p o lic y , i.e . a se q u e n c e o f a c tio n s
th a t le a d to a d e sire d o u tc o m e . A c tio n s in ro u tin g a re p ro p a g a tio n o f p a c k e ts
to a n e ig h b o u rin g n o d e , a n d th e d e sire d o u tc o m e is th e d e liv e ry o f p a c k e ts
to th e ir in te n d e d fin a l d e stin a tio n .
(c ) R e in fo rc e m e n t le a rn in g a lg o rith m s a re a b le to le a rn p o lic ie s fro m d e la y e d
re w a rd s. W e o n ly k n o w w h e th e r a g o o d ro u te w a s c h o se n fo r a p a c k e t o n c e
it h a s re a c h e d its d e stin a tio n .
(d ) R e in fo rc e m e n t le a rn in g a lg o rith m s c a n a d a p t to c h a n g e s in th e e n v iro n m e n t.
A s tra ffic p a tte rn s c h a n g e , o r n o d e s o r lin k s fa il, a ro u tin g p o lic y w ill h a v e
to a d a p t.
1.5
Technical Objectives
T h e o b je c tiv e s w e se t o u t to a c h ie v e in o u r in v e stig a tio n a re a s fo llo w s:
1 . Im p le m e n t a d istrib u te d a d a p tiv e p a c k e t ro u tin g a lg o rith m w h ic h m in im iz e s th e
a v e ra g e p a c k e t d e liv e ry tim e , w h ile u sin g o n ly lo c a lly a v a ila b le in fo rm a tio n .
2 . C o m p a re th e p e rfo rm a n c e o f th e a d a p tiv e ro u tin g p o lic y to sta n d a rd ro u tin g a
l-g o rith m s.
3 . Im p ro v e th e a d a p tiv e ro u tin g m e c h a n ism to d isc o v e r n e w ro u te s.
4 . E x te n d a n d te st th e a lg o rith m s u n d e r m o re re a listic sc e n a rio s in c lu d in g n o d e s
w ith fin ite b u ffe r siz e a n d o p tim iz a tio n o f m u ltip le o b je c tiv e s.
1.6
Methodology
1 . Q - L e a r n i n g i s a r e i n f o r c e m e n t l e a r n i n g a l g o r i t h m t h a t i s a b l e t o l e a r n a n o p t i m a l s e q u e n c e o f a c t i o n s i n a n e n v i r o n m e n t w h i c h m a x i m i z e s r e w a r d s r e c e i v e d f r o m t h e e n v i r o n m e n t . Q - R o u t i n g i s a n i m p l e m e n t a t i o n o f Q - L e a r n i n g , w h i c h i s a b l e t o d i s t r i b u t i v e l y r o u t e p a c k e t s i n a n e t w o r k . E a c h n o d e i s a b l e t o m a k e a r o u t i n g d e c i s i o n u s i n g o n l y l o c a l l y a v a i l a b l e i n f o r m a t i o n . T h e r e w a r d r e c e i v e d i s t h e p a c k e t d e l i v e r y t i m e ; t h u s , t h e g o a l i s t o m i n i m i z e t h e a v e r a g e d e l i v e r y t i m e o f a l l p a c k e t s . 2 . W e w i l l e m p i r i c a l l y e v a l u a t e t h e r o u t i n g a l g o r i t h m s b y s i m u l a t i o n a n d c o m p a r e t h e i r p e r f o r m a n c e u n d e r d i f f e r e n t t r a f f i c l o a d s a n d n e t w o r k t o p o l o g i e s . W e w i l l u s e t h e a v e r a g e p a c k e t d e l i v e r y t i m e a n d t h e a v e r a g e n u m b e r o f d r o p p e d p a c k e t s a s t h e p e r f o r m a n c e m e a s u r e . 3 . W e w i l l i n v e s t i g a t e p o s s i b l e p e r f o r m a n c e i m p r o v e m e n t s t o Q - R o u t i n g d e s i g n e d t o i n c r e a s e t h e e x p l o r a t i o n a b i l i t y o f t h e a l g o r i t h m , w h i c h e n a b l e s t h e d i s c o v e r y o f n e w r o u t e s i n t h e n e t w o r k . T h i s i m p r o v e m e n t i s m a d e p o s s i b l e b y a d d i n g a p r o b a b i l i s t i c c o m p o n e n t t O J o u t i n g d e c i s i o n s . 4 . W e w i l l c o n s i d e r m o r e r e a l i s t i c n e t w o r k s c e n a r i o s : n e t w o r k s w i t h f i n i t e b u f f e r s a n d n e t w o r k s w h e r e t h e r e a r e m u l t i p l e o b j e c t i v e s t o b e o p t i m i z e d : ( a ) P r e v i o u s w o r k e x p l o r e d t h e p e r f o r m a n c e o f Q - R o u t i n g i n n e t w o r k s w i t h i n f i -n i t e p a c k e t b u f f e r s . W e w i l l e x a m i n e t h e m o r e r e a l i s t i c c a s e o f f i n i t e b u f f e r s . C o n g e s t i o n c o n t r o l i s a c h i e v e d b y a v o i d i n g n o d e s w i t h a h i g h l e v e l o f c o n g e s -t i o n . ( b ) W e w i l l a l s o e x a m i n e t h e p e r f o r m a n c e o f a n e w r o u t i n g a l g o r i t h m , a b l e t o o p t i m i z e m u l t i p l e p o s s i b l y c o n f l i c t i n g o b j e c t i v e s , e . g . p a c k e t d e l i v e r y t i m e v e r s u s c o s t . W e e x a m i n e t h e i m p l i c a t i o n s o f t h i s t r a d e - o f f .
1.7
Achievements
W e h a v e l e a r n e d t h e f o l l o w i n g f r o m o u r i n v e s t i g a t i o n : 1 . Q - R o u t i n g i s a b l e t o r o u t e p a c k e t s m i n i m i z i n g t h e a v e r a g e d e l a y w h i l e o n l y u s i n g l o c a l i n f o r m a t i o n . 2 . Q - R o u t i n g c o m p a r e s w e l l w i t h s t a n d a r d r o u t i n g a l g o r i t h m s . I n p a r t i c u l a r , i t c o n v e r g e s t o a m o r e s t a b l e r o u t i n g p o l i c y t h a n o u r v e r s i o n o f t h e d i s t r i b u t e dB e l l m a n - F o r d a l g o r i t h m . 3 . W e d e m o n s t r a t e d t h e e f f i c i e n c y o f n e w p a t h d i s c o v e r y o f a p r o p o s e d a l g o r i t h m b a s e d o n p r o b a b i l i s t i c e x p l o r a t i o n . 4 . T w o e x t e n d e d a l g o r i t h m s w e r e a l s o s h o w n t o p e r f o r m w e l l i n m o r e r e a l i s t i c n e t -w o r k s c e n a r i o s : ( a ) I n n e t w o r k s w i t h l i m i t e d b u f f e r s , t h e a l g o r i t h m i s a b l e t o p e r f o r m c o n g e s t i o n c o n t r o l , d r o p p i n g f e w e r p a c k e t s a t o v e r l o a d e d n o d e s . ( b ) A n i m p r o v e d r o u t i n g a l g o r i t h m i s a l s o a b l e t o o p t i m i z e m u l t i p l e o b j e c t i v e s . H o w e v e r , c o m p e t i n g o b j e c t i v e s m a y e s t a b l i s h a t r a d e - o f f .
1.8
Thesis Organization
I n C h a p t e r 2 , w e d i s c u s s t h e r o u t i n g p r o b l e m i n m o r e d e t a i l a n d l o o k a t s o m e o f t h e a p p r o a c h e s u s e d t o s o l v e i t . C h a p t e r 3 d i s c u s s e s t h e f i e l d o f r e i n f o r c e m e n t l e a r n i n g , p r e s e n t i n g t e c h n i q u e s o f s o l v i n g r e i n f o r c e m e n t l e a r n i n g p r o b l e m s . I n C h a p t e r 4 , w e p r e s e n t t h e s i m u l a t i o n r e s u l t s o f t h e c o m p a r i s o n b e t w e e n d i f f e r e n t r o u t i n g a l g o r i t h m s b y e v a l u a t i n g p e r f o r m a n c e u n d e r v a r i o u s s c e n a r i o s . C o n c l u s i o n s a n d d i r e c t i o n s o f f u t u r e r e s e a r c h a r e p r e s e n t e d i n C h a p t e r 5 .Routing in Communication
Networks
In this chapter, w e exam ine the routing problem and investigate different approaches
that have been proposed for solving it. W e define the routing problem , and discuss
the general requirem ents of routing algorithm s. N etw ork routing is very com plex; thus,
w e discuss som e of the characteristics that differentiate betw een different routing
algo-rithm s.
2.1
The Routing Problem
W e consider a com m unication netw ork [27; 15] as a undirected w eighted graph G
=
(N, L)
w ith a set of nodesN,
and a set of bidirectional linksL,
connecting the nodes.E ach link has a capacity and a user-defined associated cost. W e define a path as a
sequence of nodes connecting a source to a destination node. T here m ay be m ultiple
paths betw een sources and destinations. T he general routing problem consists of finding
the optim al path betw een source and destination nodes satisfying som e perform ance
criterion.
W e w ill discuss the routing problem in the context of packet sw itching. In a
packet-sw itched netw ork, data is broken up into a sequence of packets w hich are sent from node
to node until the destination is reached. T he routing decision at each node consists of
deciding to w hich neighbouring node to send a packet.
A r o u t i n g a l g o r i t h m h a s t h e f o l l o w i n g r e q u i r e m e n t s [ 2 7 ] : • C o r r e c t n e s s • S i m p l i c i t y • E f f i c i e n c y • R o b u s t n e s s • S t a b i l i t y • F a i r n e s s • O p t i m a l i t y T h e
correctness
o f a r o u t i n g a l g o r i t h m r e f e r s t o t h e f a c t t h a t i t m u s t r o u t e a l l p a c k e t s t o t h e c o r r e c t d e s t i n a t i o n s .Simple
r o u t i n g a l g o r i t h m s a r e a l s o p r e f e r r e d , a s t h e y h a v e l e s s r o u t i n g o v e r h e a d , w h i c h i n t u r n i n c r e a s e t h eefficiency
o f t h e n e t w o r k . A l l p a c k e t r o u t i n g s c h e m e s h a v e a c e r t a i n a m o u n t o f p r o c e s s i n g a n d t r a n s m i s s i o n o v e r h e a d , w h i c h m a y n e g a t i v e l y i m p a c t t h e e f f i c i e n c y o f t h e n e t w o r k . T h e b e n e f i t s o f o v e r h e a d s m u s t b e b a l a n c e d w i t h t h e d e c r e a s e i n e f f i c i e n c y c a u s e d . S o m e o f t h e s e r e q u i r e m e n t s a r e i n c o m p e t i t i o n w i t h e a c h o t h e r , e .g . r o b u s t n e s s a n d s t a b i l i t y . A r o u t i n g a l g o r i t h m i s s a i d t o b erobust
w h e n i t i s a b l e t o a d a p t t o n o d e o r l i n k f a i l u r e s a n d c h a n g e s i n n e t w o r k l o a d c o n d i t i o n s . W h e n a n o v e r l o a d i s d e t e c t e d i n a s e c t i o n o f t h e n e t w o r k , t r a f f i c i s r e r o u t e d t o l e s s c o n g e s t e d r e g i o n s . I f t h e r o u t i n g a l g o r i t h m r e s p o n d s t o o q u i c k l y , t h e s e l e s s c o n g e s t e d r e g i o n s w i l l i n t u r n b e c o m e c o n -g e s t e d . T h e r o u t i n g a l g o r i t h m i s c a l l e dunstable
i f i t c o n t i n u a l l y s h i f t s t h e l o a d b e t w e e n d i f f e r e n t s e c t i o n s o f t h e n e t w o r k . O n t h e o t h e r h a n d , i f t h e n e t w o r k a d a p t s t o o s l o w l y , p a c k e t s m a y b e d r o p p e d a t c o n g e s t e d n o d e s . T h e r e a l s o e x i s t s a t r a d e - o f f b e t w e e noptimality
a n dfairness:
i f a c e r t a i n p e r f o r m a n c e c r i t e r i o n f a v o u r s t h e e x c h a n g e o f p a c k e t s b e t w e e n n e a r b y n o d e s , t h e t h r o u g h p u t m a y b e i n c r e a s e d . T h i s m a y a p p e a r u n f a i r t o n o d e s w i t h a h i g h p r o p o r t i o n o f l o n g - d i s t a n c e t r a f f i c . W e b r i e f l y d i s c u s s t h e v a r i o u s d e s i g n e l e m e n t s t h a t c o n t r i b u t e t o a r o u t i n g s t r a t e g y a s p r e s e n t e d i n [ 2 7 ] ( s e e T a b l e 2 .1 ) .P e r f o r m a n c e c r i t e r i o n N u m b e r o f h o p s C o s t D e l a y T h r o u g h p u t D e c i s i o n t i m e P a c k e t S e s s i o n D e c i s i o n p l a c e E a c h n o d e ( d i s t r i b u t e d ) C e n t r a l n o d e ( c e n t r a l i z e d ) O r i g i n a t i n g n o d e ( s o u r c e ) N e t w o r k i n f o r m a t i o n s o u r c e N o n e L o c a l A d j a c e n t n o d e s N o d e s a l o n g r o u t e A l l N o d e s N e t w o r k i n f o r m a t i o n u p d a t e t i m i n g C o n t i n u o u s P e r i o d i c M a j o r l o a d c h a n g e T o p o l o g y c h a n g e T a b l e 2 . 1 : D e s i g n e l e m e n t s o f a r o u t i n g s t r a t e g y 2 . 1 . 1 P e r f o r m a n c e C r i t e r i o n A r o u t i n g p o l i c y h a s t o d e c i d e t o w h i c h n e i g h b o u r i n g n o d e t o f o r w a r d a p a c k e t t o b a s e d o n s o m e p e r f o r m a n c e c r i t e r i o n . T h e s i m p l e s t c h o i c e i s t o s e l e c t t h e n e i g h b o u r w h i c h i s o n t h e m i n i m u m h o p p a t h t o t h e p a c k e t 's d e s t i n a t i o n . A m o r e g e n e r a l a p p r o a c h i s t o a s s i g n a l i n k c o s t t o e a c h l i n k a n d t o s e l e c t t h e m i n i m u m c o s t p a t h . T h e s p e c i f i c c o s t m e t r i c u s e d d e t e r m i n e s t h e o p t i m a l p a t h . I f t h e l i n k c o s t i s i n v e r s e l y p r o p o r t i o n a l t o t h e l i n k c a p a c i t y , t h e l e a s t - c o s t p a t h m a x i m i z e s t h e t h r o u g h p u t w h e r e a s i t m i n i m i z e s t h e a v e r a g e p a c k e t d e l a y w h e n t h e l i n k c o s t i s t h e m e a s u r e d l i n k d e l a y . O t h e r p o s s i b l e c o s t m e t r i c s a r e r e l i a b i l i t y , l o a d a n d c o m m u n i c a t i o n s c o s t . T h e m e t r i c c a n a l s o b e a c o m b i n a t i o n o f s e v e r a l p e r f o r m a n c e c r i t e r i a ; i . e . t h e o p t i m a l r o u t e o v e r m u l t i p l e o b j e c t i v e s . 2 . 1 . 2 D e c i s i o n T i m e T h e d e c i s i o n t i m e o f r o u t i n g d e c i s i o n s r e f e r t o t w o t y p e s o f p a c k e t - s w i t c h e d n e t w o r k s . I n a
datagram
p a c k e t s w i t c h i n g n e t w o r k , e a c h n o d e m a k e s a r o u t i n g d e c i s i o n f o r e a c h i n c o m i n g p a c k e t . H o w e v e r , t h e r e i s a n o t h e r a p p r o a c h , c a l l e dvirtual-circuit
p a c k e t s w i t c h i n g , w h e r e t h e r o u t i n g d e c i s i o n i s m a d e o n l y o n c e p e rsession.
I f a s o u r c e n o d e w a n t s t o c o m m u n i c a t e w i t h a d e s t i n a t i o n n o d e , a v i r t u a l - c i r c u i t b e t w e e n s o u r c e a n dd e s tin a tio n is e s ta b lis h e d . A fte r th e c o n n e c tio n h a s b e e n s e t u p , e a c h n o d e s e le c ts
th e n e ig h b o u r b a s e d o n th e v irtu a l-c irc u it id e n tifie r. T h u s , a ll s u b s e q u e n t p a c k e ts o f a
s e s s io n w ill fo llo w th e s a m e ro u te th ro u g h th e n e tw o rk .
2 .1 .3 D e c is io n P la c e
T h e d e c is io n p la c e re fe rs to w h e re ro u tin g d e c is io n s a re m a d e . In
centralized
ro u tin g ,th e re is a c e n tra l c o n tro l n o d e w h ic h c o lle c ts in fo rm a tio n fro m th e n e tw o rk a n d c o m
-p u te s ro u tin g ta b le s w h ic h a re d is trib u te d to a ll n o d e s . T h e p ro b le m w ith th is a p p ro a c h
is th a t th e c o n tro llin g n o d e is a s in g le p o in t o f fa ilu re .
Distributed
ro u tin g a lg o rith m sm a k e ro u tin g d e c is io n s a t e a c h n o d e ; th u s , th e y a re m o re ro b u s t. In
source
ro u tin ga lg o rith m s , th e o rig in a tin g n o d e s e le c ts th e ro u te th ro u g h th e n e tw o rk .
2 .1 .4 N e tw o r k I n f o r m a tio n S o u r c e
M o s t ro u tin g a lg o rith m s u tiliz e s o m e in fo rm a tio n a b o u t th e n e tw o rk to p o lo g y , tra ffic
lo a d o r lin k c o s t. D is trib u te d ro u tin g m a y u tiliz e in fo rm a tio n a v a ila b le lo c a lly to th e
n o d e s u c h a s th e c o s t o f e a c h lin k . N o d e s m a y a ls o m a k e ro u tin g d e c is io n s b a s e d
o n in fo rm a tio n fro m n e ig h b o u rin g n o d e s , o r a ll n o d e s o n a p a th . C e n tra liz e d ro u tin g
m a k e s u s e o f in fo rm a tio n fro m a ll n o d e s . S o m e a lg o rith m s d o n o t u s e a n y n e tw o rk s ta te
in fo rm a tio n , e .g . flo o d in g a n d ra n d o m ro u tin g .
2 .1 .5 R o u tin g I n f o r m a tio n U p d a te T im in g
If th e ro u tin g s tra te g y u s e s lo c a lly a v a ila b le in fo rm a tio n , ro u tin g u p d a te s a re c o n tin u
-o u s . F o r a ll o th e r s tra te g ie s th a t m a k e u s e o f n e tw o rk in fo rm a tio n , ro u tin g in fo rm a tio n
u p d a te s a re m a d e p e rio d ic a lly in o rd e r to a d a p t to c h a n g in g n e tw o rk c o n d itio n s . T h e
a c c u ra c y o f in fo rm a tio n d e p e n d s o n h o w fre q u e n tly th e in fo rm a tio n is u p d a te d . T h u s ,
w ith m o re a c c u ra te in fo rm a tio n , b e tte r ro u tin g d e c is io n s a re m a d e . H o w e v e r, in fo rm a
2.2
Conventional
Routing Strategies
N e t w o r k r o u t i n g i s a v e r y c o m p l e x p r o b l e m a n d m a n y d i f f e r e n t a p p r o a c h e s t o s o l v i n g i t h a v e b e e n p r o p o s e d . W e b r i e f l y d i s c u s s s o m e o f t h e r o u t i n g s t r a t e g i e s u s e d , r a n g i n g f r o m t h e s i m p l e t o t h e m o r e c o m p l e x a d a p t i v e r o u t i n g s t r a t e g i e s .2.2.1
Flooding
F l o o d i n g [ 2 7 ] i s s i m p l e r o u t i n g s t r a t e g y w h e r e b y e a c h n o d e f o r w a r d s a p a c k e t t o e a c h o f i t s n e i g h b o u r s , e x c e p t t h e n o d e w h e r e t h e p a c k e t c a m e f r o m . N o d e s d o n o t n e e d a n y i n f o r m a t i o n a b o u t t h e n e t w o r k t o p o l o g y b e y o n d t h e i r i m m e d i a t e n e i g h b o u r s . P a c k e t s n e e d a s e q u e n c e n u m b e r a n d t h e d e s t i n a t i o n n o d e e m b e d d e d i n t h e i r h e a d e r s s o t h a t a d e s t i n a t i o n n o d e c a n d i s c a r d d u p l i c a t e p a c k e t s . F o r w a r d e d p a c k e t s w h i c h r e t u r n t o a p r e v i o u s l y v i s i t e d n o d e m u s t a l s o b e d i s c a r d e d ; o t h e r w i s e , t h e n u m b e r o f p a c k e t s i n c i r c u l a t i o n w i l l i n c r e a s e w i t h o u t b o u n d . A n o t h e r w a y t o a c c o m p l i s h t h i s i s f o r e a c h p a c k e t t o h a v e a h o p c o u n t w h i c h i s i n c r e m e n t e d a t e a c h n o d e , a n d d i s c a r d e d w h e n a p r e d e t e r m i n e d l i m i t i s r e a c h e d . S i n c e a l l p o s s i b l e r o u t e s b e t w e e n s o u r c e a n d d e s t i n a t i o n a r e t r i e d , a p a c k e t i s g u a r a n t e e d t o r e a c h t h e d e s t i n a t i o n i f i t i s r e a c h a b l e ; t h u s , f l o o d i n g i s v e r y r o b u s t . I t h a s b e e n u s e d i n m i l i t a r y n e t w o r k s w h e r e l i n k o r n o d e f a i l u r e s m a y f r e q u e n t l y o c c u r [ 1 5 ] . A n o t h e r p r o p e r t y o f f l o o d i n g i s t h a t a t l e a s t o n e p a c k e t w i l l t r a v e l a l o n g t h e s h o r t e s t r o u t e . T h i s m a y b e u s e d i n s o m e n e t w o r k s t o s e t u p v i r t u a l - c i r c u i t s . B e c a u s e a l l n o d e s d i r e c t l y o r i n d i r e c t l y c o n n e c t e d t o t h e s o u r c e n o d e a r e v i s i t e d , f l o o d i n g c a n b e u s e d t o d i s t r i b u t e i m p o r t a n t i n f o r m a t i o n ( e . g . r o u t i n g i n f o r m a t i o n ) t o a l l n o d e s . T h e b i g g e s t d i s a d v a n t a g e o f f l o o d i n g i s o f c o u r s e t h e h i g h l e v e l o f n e t w o r k b a n d w i d t h t h a t i s w a s t e d o n d u p l i c a t e p a c k e t s .2.2.2
Random Routing
A n o t h e r s i m p l e , r o b u s t r o u t i n g s t r a t e g y i s t h a t o f r a n d o m r o u t i n g [ 2 7 ] ' w h e r e e a c h n o d e r a n d o m l y s e l e c t s t h e n o d e t o f o r w a r d a p a c k e t t o , e x c l u d i n g t h e n o d e w h e r e t h e p a c k e t c a m e f r o m . A l t h o u g h t h i s s t r a t e g y w i l l i n g e n e r a l n o t s e l e c t t h e s h o r t e s t p a t h , i t g e n e r a t e s l e s s t r a f f i c t h a n f l o o d i n g . A r e f i n e m e n t o f t h i s t e c h n i q u e i s t o s e l e c t a no u t g o i n g l i n k w i t h a p r o b a b i l i t y p r o p o r t i o n a l t o t h e d a t a r a t e o f t h e l i n k . T h i s s t r a t e g y a t t e m p t s t o e n s u r e a g o o d t r a f f i c d i s t r i b u t i o n .
2.2.3
Fixed Routing
F i x e d r o u t i n g - a l s o c a l l e d s t a t i c s h o r t e s t p a t h r o u t i n g - c o m p u t e s l e a s t - c o s t p a t h s f o r a l l o r i g i n - d e s t i n a t i o n n o d e s i n t h e n e t w o r k . F r o m t h e s e f i x e d p a t h s , r o u t i n g t a b l e s a r e c o m p u t e d a n d s e n t t o e a c h n o d e . A s t h e l e a s t - c o s t p a t h s a r e c o m p u t e d o n c e , t h e l i n k c o s t s c a n n o t b e b a s e d o n d y n a m i c v a r i a b l e s s u c h a s t r a f f i c . I n s t e a d , t h e n e t w o r k i s d e s i g n e d b a s e d o n a n a n t i c i p a t e d t r a f f i c d i s t r i b u t i o n . F i x e d r o u t i n g i s s i m p l e a n d i t i s v e r y e f f e c t i v e i n r e l i a b l e n e t w o r k s w i t h s t a b l e l o a d . T h e d i s a d v a n t a g e i s t h a t i t d o e s n o t r e a c t t o c o n g e s t i o n o r n o d e f a i l u r e s , o r u n f o r e s e e n t r a f f i c p a t t e r n s .2.2.4
Adaptive Routing
I n o r d e r t o i n c r e a s e e f f i c i e n c y , a d a p t i v e r o u t i n g m e t h o d s d y n a m i c a l l y a l t e r r o u t e s w h e n n o d e o r l i n k f a i l u r e s a r e d e t e c t e d o r w h e n c o n g e s t i o n d e v e l o p s . F o r a n e t w o r k t o a d a p t t o t h e s e c h a n g e s , i t n e e d s t o c o l l e c t a n d e x c h a n g e n e t w o r k s t a t e i n f o r m a t i o n b e t w e e n n o d e s , s u c h a s d e l a y o r t h r o u g h p u t [ 2 6 ] . T h e o p t i m a l i t y o f t h e n e w r o u t e s d e p e n d s o n t h e q u a l i t y o f t h e n e t w o r k i n f o r m a t i o n , w h i c h n e c e s s i t a t e s a n i n c r e a s e d i n f o r m a t i o n e x c h a n g e . H o w e v e r , t h e r e e x i s t s a t r a d e - o f f b e t w e e n t h e q u a l i t y o f i n f o r m a t i o n a n d t h e o v e r h e a d : o v e r h e a d c o n s u m e s n e t w o r k r e s o u r c e s , w h i c h m a y d e g r a d e t h e o v e r a l l n e t w o r k p e r f o r m a n c e . A s e r i o u s p r o b l e m w i t h a d a p t i v e r o u t i n g i s t h a t i t m a y b e c o m e u n s t a b l e i f a r o u t i n g p o l i c y r e a c t s t o o q u i c k l y t o c o n g e s t i o n [ 1 5 ; 2 7 ; 1 4 ] . I f t h e a d a p t i v e r o u t i n g r e d i r e c t s m o s t t r a f f i c a w a y f r o m t h e c o n g e s t e d p a r t o f t h e n e t w o r k , c o n g e s t i o n m a y d e v e l o p e l s e w h e r e ; t h u s , t r a f f i c w i l l a g a i n s h i f t t o a d i f f e r e n t p a r t o f t h e n e t w o r k . T h i s o s c i l l a t i o n w i l l c o n t i n u e i n d e f i n i t e l y i f n o t p r o p e r l y m a n a g e d b y t h e r o u t i n g a l g o r i t h m . A s i t t a k e s t i m e f o r t h e n e t w o r k i n f o r m a t i o n t o r e a c h r e l e v a n t n o d e s , t h e r e i s n e v e r a t r u e p i c t u r e o f t h e n e t w o r k s t a t e . T e m p o r a r y r o u t i n g l o o p s [ 1 1 ; 7 ] c a n d e v e l o p , w h e r e p a c k e t s c i r c u l a t e t h r o u g h t h e n e t w o r k u n t i l a l l n o d e s h a v e c o n s i s t e n t r o u t i n g t a b l e s . T h i s l o o p i n g w a s t e s b a n d w i d t h a n d i n c r e a s e s d e l a y .A lth o u g h a d a p tiv e r o u tin g is c o m p le x , it is w id e ly u s e d a s it im p r o v e s th e n e tw o r k p e r f o r m a n c e , a n d h e lp s in c o n g e s tio n c o n tr o l.
2.2.5
Link-State Routing
L in k - s ta te r o u tin g [ 2 6 ] is a d is tr ib u te d , a d a p tiv e r o u tin g a lg o r ith m w h e r e e a c h n o d e m a in ta in s a v ie w o f th e w h o le n e tw o r k to p o lo g y w ith a c o s t f o r e a c h lin k . T o u p d a te th e ir v ie w o f th e c u r r e n t n e tw o r k s ta te , n o d e s r e g u la r ly b r o a d c a s t th e lin k c o s ts o f o u tg o in g lin k s to a ll o th e r n o d e s u s in g f lo o d in g . E a c h n o d e u s e s its v ie w to c a lc u la te th e s h o r te s t p a th s to a ll d e s tin a tio n s w ith D ijk s tr a 's a lg o r ith m . E a c h n o d e n e e d s s to r a g e s p a c e p r o p o r tio n a l to
O(N
2), w h e r eN
is th e n u m b e r o f n o d e s in th e n e tw o r k .O p e n S h o r te s t P a th F ir s t ( O S P F ) is th e lin k - s ta te r o u tin g p r o to c o l u s e d in th e I n te r -n e t [ 1 1 ] . I n s ta b ilitie s a r e a v o id e d b y d is s e m in a tin g th e lin k c o s t in f o r m a tio n q u ic k ly , a n d b y r e p r e s e n tin g th e lin k - c o s ts b y a s lo w ly c h a n g in g m e a s u r e o f a v e r a g e lin k u ti-liz a tio n [ 2 6 ; 2 7 ] . R a p id lin k c o s t d is s e m in a tio n c a n b e a c h ie v e d if r o u tin g p a c k e ts h a v e h ig h e r p r io r ity th a n d a ta p a c k e ts . R o u tin g lo o p s a r e s till p o s s ib le , b u t s in c e th e y d is a p p e a r in tim e p r o p o r tio n a l to th e d ia m e te r
D
o f th e n e tw o r k , th e y a r e s h o r t- liv e d .2.2.6
Distance-Vector Routing
D is ta n c e - v e c to r r o u tin g is a n o th e r d is tr ib u te d , a d a p tiv e r o u tin g a p p r o a c h b a s e d o n th e B e llm a n - F o r d a lg o r ith m [ 1 0 ; 2 6 ] . E a c h n o d e m a in ta in s a s e t o f d is ta n c e s to a ll d e s tin a tio n s v ia e a c h o f its n e ig h b o u r s . T h u s , th e s to r a g e n e e d e d a t e a c h n o d e is p r o p o r tio n a l to
O(N
xe),
w h e r e e is th e a v e r a g e n u m b e r o f n e ig h b o u r s o f e a c h n o d e in th e n e tw o r k . E a c h n o d e r o u te s a n in c o m in g p a c k e t to th e n e ig h b o u r w ith th e m in im u m d is ta n c e to th e d e s tin a tio n . N o d e s u p d a te th e ir d is ta n c e ta b le s b y e x c h a n g in gdistance-vectors
w ith th e ir n e ig h -b o u r s . T h e d is ta n c e - v e c to r a n o d e tr a n s m its c o n s is ts o f th e c u r r e n t s h o r te s t d is ta n c e f r o m a n o d e to e a c h d e s tin a tio n . U p o n r e c e iv in g a d is ta n c e - v e c to r , a n o d e c o m p u te s a n e w d is ta n c e ta b le b y s e le c tin g th e m in im u m b e tw e e n th e c u r r e n t a n d r e c e iv e d s h o r t-e s t d is ta n c e s . I f th e d is ta n c e ta b le c h a n g e s , th e n o d e w ill a g a in b r o a d c a s t its n e w ly c o m p u te d d is ta n c e - v e c to r to a ll n e ig h b o u r s . T h is a s y n c h r o n o u s u p d a te m e c h a n is m c o n v e r g e s to th e s h o r te s t d is ta n c e s f o r a ll c o n n e c te d p a ir s o f n o d e s [ 7 ] .T h e o r ig in a l A R P A N E T u s e d th e d is tr ib u te d B e llm a n - F o r d a lg o r ith m ; h o w e v e r , it w a s r e p la c e d in 1 9 7 9 b y a b r u te - f o r c e lin k - s ta te a lg o r ith m b e c a u s e o f s e v e r a l d r a w -b a c k s [ 2 7 ; 7 ] . I t w a s f o u n d to r e a c t s lo w ly to f a ilu r e s a n d lin k c o s t c h a n g e s . T h e p r o b le m is th a t th e d is ta n c e s e x c h a n g e d b e tw e e n n o d e s m a y c o n ta in p a th s w ith lo o p s . T h e lo o p in g o f p a c k e ts w a s te s b a n d w id th a n d is c a lle d th e b o u n c i n g e f f e c t . I f th e n e t-w o r k is d is c o n n e c te d , th e a lg o r ith m d o e s n o t e v e n te r m in a te ; th is is a ls o r e f e r r e d to a s th e c o u n t i n g - t o - i n f i n i t y p r o b le m . M e c h a n is m s to o v e r c o m e th e s e p r o b le m s h a v e b e e n p r o p o s e d w h ic h u s e v a r io u s n o d e c o o r d in a tio n te c h n iq u e s , d if f u s in g c o m p u ta tio n s a n d m a in ta in in g o n ly lo o p - f r e e p a th s [ 7 ; 1 1 ; 1 ; 2 6 ] . T h e s e te c h n iq u e s a ll e lim in a te lo n g - liv e d lo o p s , a n d s o m e a ls o e lim in a te s h o r t- liv e d lo o p s . H o w e v e r , th e s e te c h n iq u e s a ll h a v e in c r e a s e d c o m m u n ic a tio n o v e r h e a d to d if f e r in g d e g r e e s .
2.3
Mobile Agents
A s th e n e tw o r k a n d its tr a f f ic a r e a h ig h ly d y n a m ic a l s y s te m , it h a s b e e n a r g u e d th a t m o b ile s o f tw a r e a g e n ts a r e a g o o d a p p r o a c h f o r a d a p tiv e r o u tin g in s u c h a c o m p le x , in h e r e n tly d is tr ib u te d e n v ir o n m e n t [ 1 6 ; 6 ] . T h e u s e o f m u ltip le c o o p e r a tin g a g e n ts m a y f a c ilita te a h ig h le v e l o f a v a ila b ility , a d a p ta b ility a n d f a u lt- to le r a n c e in m o d e r n c o m m u n ic a tio n n e tw o r k s . M o b ile a g e n ts m a y a ls o s e r v e u s e f u l in d e s ig n , a b s tr a c tin g th e in te r a c tio n s b e tw e e n e n titie s in a c o m p le x s y s te m .
2.3.1
A c t i v e N e t w o r k sT h e n e w a p p r o a c h o f a c t i v e n e t w o r k s e n a b le n o d e s to e x e c u te c u s to m c o d e e m b e d d e d in p a c k e ts . T h is a llo w s p a c k e ts to r o u te th e m s e lv e s a n d p e r f o r m c o m p u ta tio n s a t n e tw o r k n o d e s o n th e r o u te [ 3 1 ; 1 6 ] . I n a d d itio n to r o u tin g , th is a p p r o a c h a ls o a llo w s f le x ib le in c o r p o r a tio n o f n e w s e r v ic e s in to a n e tw o r k w ith o u t th e n e e d to r e d e s ig n th e n e tw o r k in f r a s tr u c tu r e [ 3 1 ] .
T h e c h ie f p r o b le m s f a c in g a c tiv e n e tw o r k s a r e e n s u r in g th e s e c u r i t y a n d s c a l a b i l i t y o f th e n e tw o r k s . B e f o r e e x e c u tin g m o b ile c o d e , th e n o d e m u s t tr u s t th e c o d e . O n e w a y o f d o in g th is is w ith P r o o f - C a r r y in g C o d e ( P C C ) [ 2 2 ] . T h e m o b ile c o d e in c lu d e s a f o r m a l p r o o f o f its p r o p e r tie s , w h ic h th e p r o c e s s in g n o d e c a n v e r if y . T h e q u e s tio n is w h e th e r
th e in c re a se d fle x ib ility ju stifie s th e e x tra o v e rh e a d o f p e r p a c k e t e x e c u tio n , a n d h o w
w e ll th is p a ra d ig m sc a le s to v e ry la rg e n e tw o rk s.
2.3.2
Social Insect Metaphors
A n t-c o lo n y o p tim iz a tio n is a m e th o d o f so lv in g c o m b in a to ria l o p tim iz a tio n p ro b le m s
in sp ire d fro m th e fo ra g in g b e h a v io u r o f a n ts [6 ]. In n a tu re , a n ts a re a b le to fin d th e
sh o rte st d ista n c e to a fo o d so u rc e b y la y in g tra ils o f p h e ro m o n e s. A la rg e c o lle c tio n o f
a n ts c o o p e ra te o n a ta sk b y th is in d ire c t fo rm o f c o m m u n ic a tio n th ro u g h th e e n v iro n
-m e n t, c a lle d
stigmergy.
A d a p tiv e d istrib u te d ro u te d isc o v e ry is p e rfo rm e d b y a rtific ia l so ftw a re a n ts th a t e x p lo re
th e n e tw o rk [2 5 ; 6 ]. T h ro u g h o u t th e n e tw o rk , a n ts a re la u n c h e d to ra n d o m ly se le c te d
d e stin a tio n n o d e s. T h e se a n ts sh a re th e q u e u e s a t n o d e s w ith d a ta p a c k e ts, a n d re c o rd
th e e x p e rie n c e d d e la y w h ic h is u se d fo r u p d a tin g th e ro u tin g ta b le s. E a c h a n t c a n b e
th o u g h t o f a s p e rfo rm in g a sin g le M o n te C a rlo e x p e rim e n t o n th e a c tu a l n e tw o rk , a n d
th e re su lt is th e e x p e rie n c e d d e la y . T h e sy ste m a s a w h o le p e rfo rm s p a ra lle l M o n te
C a rlo e x p e rim e n ts w ith e x p lo ra tio n b ia se d to w a rd s m o re u se fu l re g io n s o f th e sta te
sp a c e [6 ].
T h e re su ltin g ro u tin g is v e ry ro b u st a s it d o e s n o t d e p e n d o n in d iv id u a l a n ts, b u t ra th e r
o n th e c o lle c tiv e b e h a v io u r o f th e e n tire a n t c o lo n y .
2.4
Summary
T h e a im o f p a c k e t-sw itc h e d n e tw o rk s is to m a k e m o re e ffic ie n t u se o f n e tw o rk re so u rc e s
b y fo rw a rd in g p a c k e ts b e tw e e n n o d e s o n a h o p -b y -h o p fa sh io n . T h e ro u tin g d e c isio n
a t e a c h n o d e c o n sists o f d e c id in g w h ic h n e ig h b o u r to se n d a p a c k e t to . W e d isc u sse d
th e sim p le ro u tin g stra te g ie s o f flo o d in g , ra n d o m ro u tin g a n d fix e d ro u tin g .
A d a p tiv e ro u tin g in c re a se s th e e ffic ie n c y o f a n e tw o rk b y re d ire c tin g tra ffic a w a y fro m
c o n g e ste d a re a s o r d y n a m ic a lly c h a n g in g ro u te s in n e tw o rk s c h a ra c te riz e d b y a c o n
-sta n tly c h a n g in g to p o lo g y . A d a p tiv e ro u tin g stra te g ie s h a v e to a v o id o sc illa tio n s in th e
n e tw o rk w h ic h a rise if th e y a d a p t to o q u ic k ly to c o n g e stio n . W e d isc u sse d th e tw o
Mobile software agents may prove helpful in managing
the complexity
of distributed,
dynamic networks.
We discussed the potential
of active networks, where packets route
themselves
by executing
code on a router.
The emergent
behaviour
exhibited
by ant
colonies also offer valuable insight into optimization
of a complex dynamical
system.
Promising results have already been obtained by routing based on a collection of simple
ant-like software agents.
Reinforcement
Learning
A b r o a d r a n g e o f le a r n in g p r o b le m s c a n b e c a s t in to th e r e in f o r c e m e n t le a r n in g f r a m e -w o r k [ 1 3 ; 2 0 ] . B r o a d ly s ta te d , r e in f o r c e m e n t le a r n in g is th e p r o b le m o f le a r n in g to a c h ie v e a g o a l th r o u g h in te r a c tio n in a d y n a m ic e n v ir o n m e n t. T h e le a r n in g e n tity w h ic h is r e s p o n s ib le f o r ta k in g a c tio n s is c a lle d a n
agent.
T h e a g e n t c o n tin u a lly in te r a c ts w ith th e e n v ir o n m e n t b y ta k in g a c tio n s , a n d r e c e iv in g r e w a r d s a n d s ta te in f o r m a tio n , a s s h o w n in F ig u r e 3 .1 . T h e g o a l o f th e a g e n t is to e x p e r im e n t w ith d if f e r e n t a c tio n s e q u e n c e s in o r d e r to m a x im iz e th e r e w a r d r e c e iv e d o v e r tim e . A n im p o r ta n t a s p e c t o f r e in f o r c e m e n t le a r n in g a lg o r ith m s is th a t th e y a r e a b le to le a r n f r o mdelayed rewards.
I n s o m e p r o b le m s , a n a g e n t h a s to e x e c u te a s p e c if ic s e q u e n c e o f a c tio n s b e f o r e it r e c e iv e s a r e w a r d . T o le a r n s u c h a s e q u e n c e , a n a g e n t h a s to o v e r c o m e th e p r o b le m o ftemporal credit assignment,
i.e . a n a g e n t h a s to d e c id e w h ic h s ta te s in th e a c tio n s e q u e n c e w e r e r e s p o n s ib le f o r th e r e c e iv e d r e w a r d . R e in f o r c e m e n t le a r n in g a lg o r ith m s th e r e f o r e a r e c o n c e r n e d w ith f in d in g th e o p tim a l s e q u e n c e o f a c tio n s th r o u g hAgent
s ta te r e w a r d a c tio n
Environment
F i g u r e 3 .1 : T h e a g e n t - e n v i r o n m e n t i n t e r a c t i o n .
tria l-a n d -e rro r in te ra c tio n s in a n e n v iro n m e n t th a t m a x im iz e s th e re c e iv e d re w a rd o v e r tim e .
R e in fo rc e m e n t le a rn in g a lg o rith m s d iffe r fro m s u p e rv is e d le a rn in g a lg o rith m s in th a t
th e y a re n o t tra in e d o n in p u t/o u tp u t p a irs s p e c ify in g w h ic h a c tio n is th e b e s t a t e a c h
s ta te . In s te a d , th e y a re g u id e d to th e g o a l b y th e re w a rd s re c e iv e d . In o th e r w o rd s ,
th e re w a rd re c e iv e d a fte r e a c h a c tio n fu lly s p e c ifie s th e p ro b le m to b e s o lv e d . A n o th e r
d iffe re n c e to s u p e rv is e d le a rn in g is th a t a ta s k o fte n h a s n o s e p a ra te tra in in g a n d te s tin g
p h a s e s . In s te a d , s o m e ta s k s re q u ire c o n tin u a l le a rn in g th ro u g h o u t a n a g e n t's life .
3.1
Value Functions
W e c a n fo rm u la te th e re in fo rc e m e n t le a rn in g ta s k a n a g e n t fa c e s a s a M a rk o v d e c is io n
p ro c e s s (M D P ) [1 3 ]. A fin ite M a rk o v d e c is io n p ro c e s s is c h a ra c te riz e d b y :
• a fin ite s e t o f s ta te s
S,
• a fin ite s e t o f a c tio n s
A,
• a re w a rd fu n c tio n
R : S
xA
----+ ~, a n d• a s ta te tra n s itio n fu n c tio n T : S x A x S ----+ ~, w h e re T ( s , a ,
Sf)
is th e p ro b a b ilityo f a d v a n c in g fro m s ta te s to s ' w h e n ta k in g a c tio n a .
T h e m o d e l is c a lle d M a r k o v if th e tra n s itio n p ro b a b ilitie s T a re in d e p e n d e n t o f p re v io u s
s ta te s a n d a c tio n s . T h u s , th e n e x t s ta te is s p e c ifie d p ro b a b ilis tic a lly b y th e tra n s itio n
fu n c tio n T a n d th e c u rre n t s ta te a n d a c tio n a lo n e . N o te th a t th e m o d e l is a n o n d e te r
-m in is tic M D P b e c a u s e th e a c tio n s a re c h o s e n p ro b a b ilis tic a lly .
A t e a c h tim e s te p
t ,
a n a g e n t o b s e rv e s th e s ta te S t a n d ta k e s a c tio n a t. T h e e n v iro n m e n tre s p o n d s b y re tu rn in g a re w a rd r H l
=
R ( s t, a t) a n d th e n e x t s ta te S H I w ith p ro b a b ilityT ( s t, a t, S H l ) ' T h is p ro c e s s is re p e a te d c o n tin u a lly u n til th e a g e n t a c h ie v e s its g o a l, o r
in d e fin ite ly fo r n o n -e p is o d ic ta s k s .
T h e p o lic y 7 f(s, a ) o f a n a g e n t is a m a p p in g o f e a c h s ta te S a n d a c tio n a to th e p ro b a b ility
o f ta k in g a c tio n a in s ta te s . T h e g o a l o f a n a g e n t is to im p ro v e its p o lic y b y m a x im iz in g
(1)
(2 ) T h e r e a r e d if f e r e n t w a y s o f c a lc u la tin g th e e x p e c te d r e tu r nR
t , b a s e d o n th e s p e c if ic ta s k th e a g e n t h a s to s o lv e . S o m e ta s k s c a n b e b r o k e n u p in to a s e r ie s o f e p is o d e s o r tr ia ls , w h e r e e a c h e p is o d e e n d s in a t e r m i n a l s ta te . A t th e e n d o f e a c h e p is o d e , th e a g e n t is r e s e t to a s ta r tin g s ta te . I n s u c h e p i s o d i c t a s k s , w e o b ta in th e e x p e c te d r e tu r n b y s u m m in g th e to ta l r e c e iv e d r e w a r d s o v e r a f in ite h o r iz o n h : hRt
=
I :
r t + k + l k=O S o m e ta s k s n e v e r e n d ; th u s , th e a b o v e s u m m a y b e in f in ite . T h is p r o b le m m a y b e s o lv e d b y d is c o u n tin g f u tu r e r e w a r d s : 0 0 Rt=
I :
' l r t + k + l , k=O w h e r e ry is th e d i s c o u n t r a t e a n d 0:S
ry<
1 . I n o u r d is c u s s io n s , w e w ill f o c u s e x c lu s iv e ly o n th is c a s e , w h ic h is c a lle d th e d i s c o u n t e d i n f i n i t e h o r i z o n c a s e . E p is o d ic ta s k s c a n a ls o b e h a n d le d b y th is d e f in itio n o f e x p e c te d r e tu r n b y in tr o d u c in g a n a b s o r b i n g s t a t e w h ic h is e n te r e d ju s t a f te r th e te r m in a l s ta te . T h e o n ly tr a n s itio n f r o m th e a b s o r b in g s ta te is to its e lf , w ith a n a s s o c ia te d r e w a r d o f z e r o .M o s t r e in f o r c e m e n t le a r n in g a lg o r ith m s a r e b a s e d o n e s tim a tin g v a l u e f u n c t i o n s th a t e s tim a te th e u tility o f s ta te s . T h e v a lu e o r u tility o f a s ta te is th e f u tu r e r e w a r d , o r r e tu r n , th a t a n a g e n t c a n e x p e c t. A s th e f u tu r e r e w a r d s d e p e n d o n w h ic h a c tio n s a n a g e n t ta k e s , th e v a lu e f u n c tio n d e p e n d s o n th e p a r tic u la r p o lic y th e a g e n t f o llo w s . T h e v a l u e V 7 r
(s)
o f a s ta tes
u n d e r p o lic y 7 r, is th e e x p e c te d r e tu r n b y f o llo w in g p o lic y 7 r f r o m s ta te s: V7 r( s )=
E
7 r{R
t1St
=
s } ,(3)
(4 ) w h e r e E7r{} d e n o te s th e e x p e c te d r e tu r n w h e n p o lic y 7 r is f o llo w e d . F o r th e d is c o u n te d in f in ite h o r iz o n c a s e , w e h a v e : V7 r( s )=
E 7 r{ ~ r y k r t + k + lI
S t=
s } . T h e o p t i m a l v a l u e f u n c t i o n V * is a tta in e d b y m a x im iz in g V 7 r f o r a ll s ta te s : V * (s)
=
m a x V7 r(\Is) .
7 r (5 )T h e o p t i m a l p o l i c y is d e f in e d a s th e p o lic y c o r r e s p o n d in g to th e o p tim a l v a lu e f u n c tio n in th e m a x im iz a tio n a b o v e :
7 r *
=
a r g m a x V7r(\Is) .
7 r
I n a M D P , w e h a v e a m o d e l o f th e e n v ir o n m e n t d y n a m ic s in th e f o r m o f s ta te tr a n s itio n p r o b a b ilitie s
T
a n d th e r e w a r d f u n c tio nR;
th u s , w e c a n u s e th e d y n a m ic p r o g r a m m in g te c h n iq u e c a lle dvalue iteration
to f in d th e o p tim a l v a lu e f u n c tio n . O n c e w e h a v e th e o p tim a l v a lu e f u n c tio n , w e c a n o b ta in th eoptimal policy
1 f * b y c h o o s in g , in e a c h s ta te ,th e a c tio n th a t r e s u lts in th e m a x im u m v a lu e f u n c tio n o f a ll th e im m e d ia te s u c c e s s o r s ta te s :
1f*(s)
=
a r g m a xV* (s'),
a w h e r es'
is th e s u c c e s s o r o f s ta tes.
(7 )
I n r e in f o r c e m e n t le a r n in g p r o b le m s , a n a g e n t g e n e r a lly d o e s n o t h a v e a c c e s s to th e e n v ir o n m e n t d y n a m ic s in th e f o r m o f th e tr a n s itio n p r o b a b ilitie sT;
th u s , w e c a n n o t u s e d y n a m ic p r o g r a m m in g te c h n iq u e s . I n th e n e x t s e c tio n s , w e e x a m in e r e in f o r c e m e n t le a r n in g m e th o d s b a s e d o n d y n a m ic p r o g r a m m in g 1 , w h e r e w e d o n o t h a v e a c c e s s to th e e n v ir o n m e n t d y n a m ic s . I n s te a d , a n a g e n t h a s to le a r n f r o m th e e n v ir o n m e n t th r o u g h th e r e w a r d s e x p e r ie n c e d b y ta k in g d if f e r e n t a c tio n s .3.2
Temporal-Difference
Learning
W e n o w tu r n o u r a tte n tio n to th e p r o b le m o f le a r n in g th e o p tim a l p o lic y w ith o u t p e r f e c t k n o w le d g e o f th e e n v ir o n m e n t. T h e o n ly w a y w e c a n le a r n a b o u t th e e n v ir o n m e n t is to e x p lo r e it b y ta k in g a c tio n s , o b s e r v in g th e r e w a r d a n d u s e th e e x p e r ie n c e to u p d a te th e v a lu e f u n c tio n . O n e w a y o f s o lv in g th e p r o b le m is to in c r e m e n ta lly e s tim a te th e v a lu e f u n c tio n V 7 r a s w e e n c o u n te r e a c h n e w s ta te . W e d e n o te th is a p p r o x im a te v a lu e f u n c tio n b y
V.
T h e c la s s o f te m p o r a l- d if f e r e n c e le a r n in g [ 2 8 ] a lg o r ith m s u p d a te th e c u r r e n t e s tim a te
V
(St)
b y u s in g th e v a lu e f u n c tio n e s tim a te s o ftemporally successive
s ta te s . T e m p o r a l-d if f e r e n c e m e th o d s a r e c a lle dbootstrapping
m e th o d s , b e c a u s e th e y u p d a te e s tim a te s b a s e d o n o th e r e s tim a te s . B y lo o k in g o n e s te p a h e a d a t th e v a lu e f u n c tio n o f th e n e x t s ta te , w e c a n u p d a te th e c u r r e n t v a lu e f u n c tio n e s tim a te a s f o llo w s :V(St)
+--
V(St)
+
a h + 1+
!,V(St+l)
- V(St)],
w h e r e a is th e s te p s iz e p a r a m e te r .
(8)
1B a r to a n d S u tto n [ 3 0 ] p r e s e n t a u n if ie d v ie w r e la tin g d y n a m ic p r o g r a m m in g , M o n te C a r lo , a n d te m p o r a l-d if f e r e n c e m e th o d s f o r s o lv in g r e in f o r c e m e n t le a r n in g p r o b le m s .
I n itia liz e
V
(s)
a r b itr a r ily , 7 f to th e p o lic y to b e e v a lu a te d r e p e a t f o r e a c h e p is o d e : I n itia liz es
r e p e a t f o r e a c h s te p in e p is o d e : c h o o s e a c tio na
in s ta tes
f r o m p o lic y 7 f ta k e a c tio na;
o b s e r v e r e w a r dr,
a n d n e x t s ta tes'
V(s) +- V(s)
+
a[r
+
,V(s')
- V(s)]
s
+-
s'
u n tils
is te r m in a lF ig u r e 3 .2 : E s tim a tin g V1l" w ith T D ( O ) .
T h e a lg o r ith m , c a lle d T D ( O ) f o r r e a s o n s w e w ill s e e s h o r tly , is s h o w n in F ig u r e 3 .2 . R e c a ll th a t th e v a lu e f u n c tio n
V
(s)
is th e e x p e c te d r e tu r n o f f o llo w in g p o lic y 7 [ f r o ms ta te
s.
T h u s , th e T D ( O ) a lg o r ith mpredicts
th e r e w a r d a n a g e n t w ill r e c e iv e b y f o llo w in g p o lic y 7 [ f r o m s ta tes.
I t h a s b e e n s h o w n th a t T D ( O ) c o n v e r g e s w ith p r o b a b ility 1 toV1l"
f o r a n y f ix e d 7 [ w ith a n a p p r o p r ia te c h o ic e o fa.
I f w e d e n o teak(a)
a s th e s te p s iz ep a r a m e te r a f te r th e k th s e le c tio n o f a c tio n
a,
a s u ita b le c h o ic e is ak(a)
=
t.
T h is f o llo w s f r o m th e w e ll- k n o w n r e s u lt in s to c h a s tic a p p r o x im a tio n th e o r y g iv in g th e c o n d itio n s f o r c o n v e r g e n c e w ith p r o b a b ility 1 a s : 00L
ak(a)
=
0 0 k = l a n d 0 0L
a%(a)
<
0 0 . k = l (9 ) A lth o u g h th is is a u s e f u l th e o r e tic a l r e s u lt, th e s te p s iz e d e c r e a s e a b o v e is s e ld o m u s e d in p r a c tic e [ 3 0 ] . I n s te a d , a c o n s ta n t s te p s iz eak(a)
=
a
is u s e d . T h is m a y b e s o f o r tw o r e a s o n s : f ir s t, th e c o n v e r g e n c e is o f te n s lo w o r n e e d s c o n s id e r a b le tu n in g f o r a s a tis f a c to r y c o n v e r g e n c e r a te ; s e c o n d , in n o n - s ta tio n a r y e n v ir o n m e n ts , c o n v e r g e n c e is u n d e s ir a b le a s th e r e w a r d f u n c tio nR
m a y c h a n g e o v e r tim e , th u s , w e w a n t o u r le a r n e d p o lic y to c o n tin u a lly c h a n g e in r e s p o n s e to th e la te s t r e c e iv e d r e w a r d s .3.3
Q-Learning
I n th e p r e v io u s s e c tio n , w e s a w h o w T D ( O ) c a n b e u s e d f o r p r e d ic tin g th e e x p e c te d r e w a r d o f a p a r tic u la r p o lic y 7 [ b y e s tim a tin g th e v a lu e f u n c tio n . I n th is s e c tio n , w e
If th e a g e n t k n o w s th e tra n s itio n p ro b a b ilitie s
T
o f th e e n v iro n m e n t, it c a n c h o o s e th ea c tio n th a t le a d s to th e s u c c e s s o r s ta te w ith th e c o m b in e d m a x im u m v a lu e fu n c tio n
(E q u a tio n 7 ) a n d im m e d ia te re w a rd . T h e p ro b le m is th a t w e g e n e ra lly d o n o t h a v e
a m o d e l o f th e e n v iro n m e n t; th u s , w e d o n o t k n o w w h ic h a c tio n s ta k e u s to w h ic h
s ta te s . T h e s o lu tio n is to d e fin e a n e w v a lu e fu n c tio n Q 7 l " ( s , a ) , d e fin e d a s th e v a lu e o f
ta k in g a c tio n a in s ta te s w h ile fo llo w in g p o lic y 1f. T h is n e w v a lu e fu n c tio n is c a lle d
th e a c t i o n - v a l u e fu n c tio n , a n d V 7 l "( s ) th e s t a t e - v a l u e fu n c tio n .
W e d e fin e
Q*
( s , a ) a s th e e x p e c te d re tu rn o f ta k in g a c tio n a in s ta te s , a n d fo llo w in gth e o p t i m a l p o l i c y fro m th e n o n . T h u s , w e c a n w rite Q * ( s , a ) in te rm s o f V * ( s ) : Q * ( s , a )
=
E { r t + 1+
I ' V * ( S t + l ) 1 s t=
s , a t=
a } R e c a ll th a tV*
(s)
is th e v a lu e o f ta k in g th e b e s t s te p in itia lly , s o w e a ls o h a v e : V * ( s )=
m a x Q * ( s , a ) , a w h ic h e n a b le s u s to w rite E q u a tio n 1 0 re c u rs iv e ly : Q * ( s , a )=
E { r t + l+
I'm a x Q * (S t+ l' a ' ) 1 s t=
s , a t=
a } . a '(1 0 )
(11)
(1 2 )
W h e re a s T D (O ) is u s e d to p re d ic t th e e x p e c te d re tu rn o f s ta te s w h ile fo llo w in g p o lic y
1f, Q -L e a rn in g [3 4 ] in c re m e n ta lly e s tim a te s th e o p tim a l a c tio n -v a lu e fu n c tio n Q * ( s , a ) .
T h e u p d a te ru le is g iv e n b y :
Q ( S t , a t ) f- Q ( S t , a t )
+
e x h + l+
I'm a x Q (s t+ 1 , a ) - Q ( S t , a t ) ] .(13)
a
T h e Q -L e a rn in g a lg o rith m s h o w n in F ig u re 3 .3 c o n v e rg e s to th e o p tim a l a c tio n -v a lu e
fu n c tio n
Q*
w ith p ro b a b ility1
u n d e r th e s a m e c o n d itio n s fo r e x a s in T D (O ), p ro v id e de a c h s ta te -a c tio n p a ir is trie d in fin ite ly o fte n . W e w ill p ro v e th e c o n v e rs io n re s u lts in a
la te r s e c tio n .
In th e Q -L e a rn in g a lg o rith m , w e m u s t s e le c t a c tio n s b a s e d o n a s u ita b le e x p lo ra tio n
s tra te g y d e riv e d fro m Q . A n y s tra te g y th a t g u a ra n te e s th a t e a c h s ta te -a c tio n p a ir w ill
b e trie d in fin ite ly o fte n w ill s u ffic e . O n e o f th e s im p le s t s tra te g ie s is E -g re e d y , w h e re
a n a g e n t c h o o s e s th e a c tio n w ith m a x im a l Q -v a lu e in th a t s ta te w ith p ro b a b ility
1 -
Ea n d a ra n d o m a c tio n w ith a s m a ll p ro b a b ility E . W h e n a n a g e n t c h o o s e s a n a c tio n
w ith m a x im u m Q -v a lu e , it is e x p l o i t i n g p re v io u s ly s to re d in fo rm a tio n , w h e re a s ra n d o m
a c tio n s re s u lt in e x p l o r a t i o n . W e w ill d is c u s s th e tra d e o ff b e tw e e n e x p lo ra tio n a n d
In itia liz e
Q(8, a)
a rb itra rily re p e a t fo r e a c h e p is o d e :In itia liz e 8
re p e a t fo r e a c h s te p in e p is o d e :
c h o o s e a c tio n a in s ta te 8 u s in g e x p lo ra tio n p o lic y d e riv e d fro m Q
ta k e a c tio n a ; o b s e rv e re w a rd r , a n d n e x t s ta te 8 '
Q ( 8 , a )
+--
Q ( 8 , a )+
a [ r+
'Y
m a xa , Q ( 8 ', a ') - Q ( 8 , a ) ]8
+--
8 'u n til 8 is te rm in a l
F ig u re 3 .3 : E s tim a tin g
Q*
w ith Q -L e a rn in g .Q -L e a rn in g is c a lle d a n o f f - p o l i c y le a rn in g a lg o rith m b e c a u s e it c o n v e rg e s to th e o p tim a l
v a lu e fu n c tio n i n d e p e n d e n t o f th e e x p lo ra tio n p o lic y b e in g fo llo w e d . In o th e r w o rd s , th e
d e ta ils o f th e p a rtic u la r e x p lo ra tio n s tra te g y d o n o t in flu e n c e th e v a lu e fu n c tio n , b u t
o n ly th e ra te o f c o n v e rg e n c e . T h e re is a ls o a n o n -p o lic y Q -L e a rn in g a lg o rith m c a lle d
S A R S A [3 0 ]' in w h ic h th e e x p lo ra tio n s tra te g y is ta k e n in to a c c o u n t. H o w e v e r, b o th
a lg o rith m s c o n v e rg e to th e s a m e v a lu e fu n c tio n w h e n E , th e p ro b a b ility o f e x p lo ra tio n ,
d e c re a s e s to w a rd s z e ro .
3 .4
TD (,\) Learning
T h e T D (O ) le a rn in g m e th o d w e s tu d ie d p re v io u s ly is a s p e c ia l c a s e o f a c la s s o f te m p o ra
l-d iffe re n c e le a rn in g m e th o d s c a lle d T D (A ), w ith A
=
o.
In th e u p d a te ru le o f T D (O )(E q u a tio n 8 ), w e lo o k a h e a d o n e s te p to th e v a lu e fu n c tio n o f th e n e x t s ta te . T h e u p d a te m o v e s th e e s tim a te c lo s e r to th e ta rg e t v a lu e o f e s tim a te d re tu rn : R~l) =r t + l
+
'Y v t(S t+ l).
(1 4 )
W e c a n g e n e ra liz e th e ta rg e t to th e c a s e o f n s te p s , a ls o c a lle d th e c o r r e c t e d n - s t e p t r u n c a t e d r e t u r n :R(n)
2 n - lnV; (
)
t=
r t + l+
'Y
r t + 2+
'Y
r t + 3+ ...+
'Y
r t + n+
'Y
t S t + n .(1 5 )
It c a n b e s h o w n [3 0 ] th a t th e e x p e c te d v a lu e o f th e c o rre c te d n -s te p tru n c a te d re tu rn isa n im p ro v e m e n t o v e r th e c u rre n t v a lu e fu n c tio n a s a n a p p ro x im a tio n to th e tru e v a lu e