• No results found

Reinforcement learning for routing in communication networks

N/A
N/A
Protected

Academic year: 2021

Share "Reinforcement learning for routing in communication networks"

Copied!
79
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

for Routing in

Communication Networks

W a l t e r H . A n d r a g T h e s i s p r e s e n t e d i n p a r t i a l f u l f i l m e n t o f t h e r e q u i r e m e n t s f o r t h e d e g r e e o f M a s t e r o f S c i e n c e a t t h e U n i v e r s i t y o f S t e l l e n b o s c h S u p e r v i s o r : P r o f C h r i s t i a n W . O m l i n A p r i l 2 0 0 3

(2)

I,

the undersigned,

hereby declare that the work contained

in this thesis is my own

original work and has not previously

in its entirety

or in part been submitted

at any

university

for a degree.

(3)

R o u t i n g p o l i c i e s f o r p a c k e t - s w i t c h e d c o m m u n i c a t i o n n e t w o r k s m u s t b e a b l e t o a d a p t t o c h a n g i n g t r a f f i c p a t t e r n s a n d t o p o l o g i e s . W e s t u d y t h e f e a s i b i l i t y o f i m p l e m e n t i n g a n a d a p t i v e r o u t i n g p o l i c y u s i n g t h e Q - L e a r n i n g a l g o r i t h m w h i c h l e a r n s s e q u e n c e s o f a c t i o n s f r o m d e l a y e d r e w a r d s . T h e Q - R o u t i n g a l g o r i t h m a d a p t s a n e t w o r k ' s r o u t i n g p o l i c y b a s e d o n l o c a l i n f o r m a t i o n a l o n e a n d c o n v e r g e s t o w a r d a n o p t i m a l s o l u t i o n . W e d e m o n s t r a t e t h a t Q - R o u t i n g i s a v i a b l e a l t e r n a t i v e t o o t h e r a d a p t i v e r o u t i n g m e t h o d s s u c h a s B e l l m a n - F o r d . W e a l s o s t u d y v a r i a t i o n s o f Q - R o u t i n g d e s i g n e d t o b e t t e r e x p l o r e p o s s i b l e r o u t e s a n d t o t a k e i n t o c o n s i d e r a t i o n l i m i t e d b u f f e r s i z e a n d o p t i m i z e m u l t i p l e o b j e c t i v e s . 1 1

(4)

D i e r o e t e r i n g i n k o m m u n i k a s i e n e t w e r k e m o e t k a n a a n p a s b y v e r a n d e r i n g s i n n e t w e r k -t o p o l o g i e e n v e r k e e r s v e r s p r e i d i n g s . O n s b e s t u d e e r d i e b r u i k b a a r h e i d v a n 'n a a n p a s b a r e r o e t e r i n g s a l g o r i t m e g e b a s e e r o p d i e " Q - L e a r n i n g " - a l g o r i t m e w a t d i t m o o n t l i k m a a k o m 'n r e e k s b e s l u i t e t e k a n n e e m g e b a s e e r o p v e r t r a a g d e v e r g o e d i n g s . D i e r o e t e r i n g s a l g o -r i t m e g e b r u i k s l e g s n a b y g e l e e i n l i g t i n g o m r o e t e r i n g s b e s l u i t e t e m a a k e n k o n v e r g e e r n a 'n o p t i m a l e o p l o s s i n g . O n s d e m o n s t r e e r d a t d i e r o e t e r i n g s a l g o r i t m e 'n g o e i e a l t e r n a t i e f v i r a a n p a s b a r e r o e t e r i n g i s , a a n g e s i e n d i t i n b a i e o p s i g t e b e t e r v a a r a s d i e B e l l m a n - F o r d a l g o r i t m e . O n s b e s t u d e e r o o k v a r i a s i e s v a n d i e r o e t e r i n g s a l g o r i t m e w a t b e t e r p a a i e k a n o n t d e k , m i n d e r g e h e u e g e b r u i k b y n e t w e r k e l e m e n t e , e n w a t m e e r a s e e n d o e l f u n k s i e k a n o p t i m e e r . 1 1 1

(5)

I would like to sincerely thank my supervisor,

Prof. C. W. Omlin, for all the inspiration,

assistance

and funding

he provided.

This work was also made possible by funding from the South African National

Research

Foundation,

Telkom-Siemens

Centre of Excellence for ATM and Broadband

Networks

and their Applications

and the Harry Crossley Scholarship

Fund.

(6)

1

Introduction

1

1.1

Motivation.

. . . .

1

1.2

Problem

Statement

1

1.3

Premises

. . . .

2

1.4

Hypotheses

2

1.5

Technical

Objectives

3

1.6

Methodology

3

1.7

Achievements

4

1.8

Thesis Organization.

5

2

Routing

in Com m unication

Networks

6

2.1

The Routing

Problem.

. . . .

6

2.1.1

Performance

Criterion

8

2.1.2

Decision Time .

8

2.1.3

Decision Place .

9

2.1.4

Network

Information

Source

9

2.1.5

Routing

Information

Update

Timing

9

2.2

Conventional

Routing

Strategies

. . . .

10

(7)

2.2.3

F ix e d R o u tin g ..

11

2.2.4

A d a p tiv e R o u tin g .

11

2.2.5

L in k - S ta te R o u tin g

12

2.2.6

D is ta n c e - V e c to r R o u tin g

12

2.3

M o b ile A g e n ts ...

13

2.3.1

A c tiv e N e tw o r k s

13

2.3.2

S o c ia l I n s e c t M e ta p h o r s

14

2.4

S u m m a r y

...

14

3

Reinforcement

Learning

16

3.1

V a lu e F u n c tio n s . . .

17

3.2

T e m p o r a l- D if f e r e n c e L e a r n in g

19

3.3

Q - L e a r n in g

...

20

3.4

T D ( > ') L e a r n in g .

22

3.5

Q ( > ') L e a r n in g ..

24

3.6

C o n v e r g e n c e P r o p e r tie s o f Q - L e a r n in g

25

3.7

E x p lo r a tio n v s E x p lo ita tio n

27

3.8

S u m m a r y

...

29

4

Q-Learning for Traffic Routing

30

4.1

O p tim iz a tio n o f P a c k e t D e liv e r y T im e

30

4.1.1

Q - R o u tin g ..

30

4.1.2

D R Q - R o u tin g

35

(8)

4.2

4.3

4.4

4.1.5

Probabilistic

CDRQ-Routing

.

Finite

Buffer Size

.

Optimization

of M ultiple

Objectives.

Summary

44

48

50

59

5

Conclusion

5.1

Conclusion.

5.2

Future

W ork.

5.2.1

Realistic

Simulations

5.2.2

Improved

Routing

..

V ll

61

61

62

62

63

(9)

2.1

Design elements of a routing strategy

..

4.1

The parameters

used in the simulations

.

Vlll

8

(10)

3 .1 T h e a g e n t-e n v iro n m e n t in te ra c tio n .

3 .2 E s tim a tin g

V

1r w ith T D (O ). ..

3 .3 E s tim a tin g

Q*

w ith Q -L e a rn in g .

3 .4 E s tim a tin g

V1r

w ith T D (A ).

3 .5 W a tk in s 's Q (A ) a lg o rith m .

16

20

22

23

2 4 4 .1 T h e B ritis h S y n c h ro n o u s D ig ita l H ie ra rc h y (S D H ) n e tw o rk to p o lo g y ... 3 2 4 .2 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 1 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 4 4 .3 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 2 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 4 4 .4 A v e ra g e p a c k e t d e liv e ry tim e s fo r n e tw o rk lo a d 3 .2 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 5 4 .5 A v e ra g e p a c k e t d e liv e ry tim e s o f B e llm a n -F o rd fo r h ig h n e tw o rk lo a d fo r th e S D H n e tw o rk to p o lo g y . E rro r b a rs s h o w s ta n d a rd d e v ia tio n s . ... 3 6

4 .6 A v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g fo r h ig h n e tw o rk lo a d fo r

th e S D H n e tw o rk to p o lo g y . E rro r b a rs s h o w s ta n d a rd d e v ia tio n s . ... 3 6

4 .7 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d D R Q

-R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 7

4 .8 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d D R Q

-R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 3 8

(11)

4 .1 0 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q

-R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 0

4 .1 1 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q

-R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 1

4 .1 2 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g a n d C Q

-R o u tin g fo r n e tw o rk lo a d 4 .0 fo r th e S D H n e tw o rk to p o lo g y . . . .. 4 1

4 .1 3 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,

D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 2 .0 fo r th e S D H

n e tw o rk to p o lo g y . . . .. 4 3

4 .1 4 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,

D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 3 .0 fo r th e S D H

n e tw o rk to p o lo g y . . . .. 4 3

4 .1 5 C o m p a rin g th e a v e ra g e p a c k e t d e liv e ry tim e s o f Q -R o u tin g , C Q -R o u tin g ,

D R Q -R o u tin g a n d C D R Q -R o u tin g fo r n e tw o rk lo a d 4 .0 fo r th e S D H

n e tw o rk to p o lo g y . . . .. 4 4

4 .1 6 T h e v a ria n c e fu n c tio n o f E q u a tio n 3 6 fo r

f3

o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . .. 4 5

4 .1 7 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 1 .5 ;

f3

o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . .. 4 6 4 .1 8 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 3 .0 ;

f3

o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . 4 7 4 .1 9 T h e A v e ra g e P a c k e t D e liv e ry T im e fo r th e S D H n e tw o rk fo r n e tw o rk lo a d 4 .5 ;

f3

o f 0 .2 , 0 .4 , 0 .6 a n d 0 .8 . . . 4 7 4 .2 0 T h e C o n g e s tio n R is k o f E q u a tio n 3 8 fo r

e

o f 3 , 6 a n d 1 5 . 4 9

4 .2 1 T h e 1 3 n o d e n e tw o rk to p o lo g y u s e d fo r th e fin ite b u ffe r s im u la tio n . 5 0

4 .2 2 A v e ra g e p a c k e t d e liv e ry tim e fo r lo w lo a d . 5 1

4 .2 3 N u m b e r o f p a c k e ts d ro p p e d fo r lo w lo a d .. 5 1

(12)

4.26 Average packet delivery time for high load.

. . . ..

53

4.27 Number

of packets dropped

for high load.

53

4.28 The network

topology

for the 36 node grid. . . ..

54

4.29 The

average

packet

delivery

time

for single versus

multiple

objective

optimization

for the 36 node grid for differing a.

55

4.30 Details of the steady state behaviour

of Figure 4.29.

56

4.31 The average cost for single versus multiple objective

optimization

for the

36 node grid for differing

a. ... . . ..

56

4.32 The

average

packet

delivery

time

for single versus

multiple

objective

optimization

for the BT SDH network for differing

a.

57

4.33 Details

of the steady

state behaviour

of Figure 4.32.

57

4.34 The average cost for single versus multiple objective

optimization

for the

BT SDH network for differing a.

58

4.35 The average saving of multiple objective optimization

of cost and delivery

time for the BT SDH network versus

a.

58

(13)

Introduction

1.1

Motivation

M o d ern co m m u n icatio n n etw o rk s m u st co p e w ith ev er in creasin g d em an d s o n n etw o rk

reso u rces. T h e ran g e o f serv ices o ffered lead s to b o th reg u lar an d less p red ictab le

traffic p attern s. A d ap tiv e ro u tin g is ab le to resp o n d to ch an g in g traffic p attern s an d

to p o lo g y , th u s p ro v id in g efficien t u se o f n etw o rk reso u rces. In n etw o rk s ch aracterized

b y a co n stan tly ch an g in g to p o lo g y , ad ap tiv e ro u tin g is essen tial. A d ap tatio n m ay b e

n ecessary in trad itio n al n etw o rk s d u e to failu res o f lin k s o r n o d es; in m o b ile ad -h o c

n etw o rk s, m o b ile ro u ters are ab le to m o v e ran d o m ly , th u s co n stan tly an d u n p red ictab ly

ch an g in g th e n etw o rk to p o lo g y .

In o rd er to ad ap t ro u tin g to ch an g in g n etw o rk co n d itio n s, a cen tralized ro u tin g strateg y

n eed s in fo rm atio n ab o u t th e statu s o f all n o d es an d lin k s in th e n etw o rk . H o w ev er,

th is in fo rm atio n tran sm issio n o v erh ead co n su m es v alu ab le n etw o rk reso u rces. T h is

h ig h lig h ts th e n eed to m ak e d istrib u ted ro u tin g d ecisio n s b ased o n lo cally av ailab le

in fo rm atio n o n ly .

1.2

Problem Statement

A p ack et-sw itch ed co m m u n icatio n n etw o rk can b e m o d eled as a set o f n o d es an d in

ter-co n n ectin g lin k s. D ata is ex ch an g ed o v er th ese co m m u n icatio n lin k s as a seq u en ce o f

p ack ets. In g en eral, n o d es are n o t fu lly co n n ected ; th u s, th e p ack ets m u st p ass th ro u g h

(14)

interm ediate

nodes.

T he

route

is the sequence of nodes along w hich a packet travels

to

its final destination.

In m ost netw orks,

there m ay be m ore than one route betw een

pairs

of nodes.

T he routing

problem

consists of finding the

optimal

route betw een

source and

destination

nodes, w here the optim al

route is the one that

delivers packets to their final

destination

in the shortest

tim e possible.

1.3

Premises

T he

prem ises

of the

packet

routing

dom ain

w hich w e believe

m ake

adaptive

routing

indispensable

are as follow s:

1. A netw ork

is a highly

dynam ic

environm ent

in w hich

traffic

patterns

m ay

be

unpredictable

and links or nodes m ay fail.

2. A central

routing

m echanism

w hich has global inform ation

about

the state

of the

netw ork

is generally

not feasible because

of the overhead

involved.

3. T hus,

w e need a good routing

policy w hich

(a) uses only local inform ation

and

(b) m inim izes

average packet delivery tim e.

1.4

Hypotheses

M achine learning

covers a broad field of m ethods

concerned

w ith the ability of program s

to learn

from

experience,

thereby

im proving

their

perform ance.

W e w ish to test

the

follow ing hypotheses

in this thesis:

1. M achine

learning

is a viable alternative

to static

routing

because

(a) it can

adapt

to changing

environm ents,

i.e.

changes

in traffic

patterns

or

netw ork

topology;

(15)

2 . R e in fo rc e m e n t le a rn in g is a fie ld in m a c h in e le a rn in g c o n c e rn e d w ith p ro g ra m s

ta k in g o p tim a l a c tio n se q u e n c e s so a s to a c h ie v e a g o a l. R e in fo rc e m e n t le a rn in g

is w e ll-su ite d fo r a d a p tiv e ro u tin g b e c a u se

(a ) it is g o a l-o rie n te d , i.e . a c tio n s a re to b e le a rn e d w ith a d e sire d o u tc o m e . T h e

g o a l o f ro u tin g is to d e liv e r p a c k e ts w ith m in im u m d e la y .

(b ) R e in fo rc e m e n t le a rn in g a llo w s to a c q u ire a p o lic y , i.e . a se q u e n c e o f a c tio n s

th a t le a d to a d e sire d o u tc o m e . A c tio n s in ro u tin g a re p ro p a g a tio n o f p a c k e ts

to a n e ig h b o u rin g n o d e , a n d th e d e sire d o u tc o m e is th e d e liv e ry o f p a c k e ts

to th e ir in te n d e d fin a l d e stin a tio n .

(c ) R e in fo rc e m e n t le a rn in g a lg o rith m s a re a b le to le a rn p o lic ie s fro m d e la y e d

re w a rd s. W e o n ly k n o w w h e th e r a g o o d ro u te w a s c h o se n fo r a p a c k e t o n c e

it h a s re a c h e d its d e stin a tio n .

(d ) R e in fo rc e m e n t le a rn in g a lg o rith m s c a n a d a p t to c h a n g e s in th e e n v iro n m e n t.

A s tra ffic p a tte rn s c h a n g e , o r n o d e s o r lin k s fa il, a ro u tin g p o lic y w ill h a v e

to a d a p t.

1.5

Technical Objectives

T h e o b je c tiv e s w e se t o u t to a c h ie v e in o u r in v e stig a tio n a re a s fo llo w s:

1 . Im p le m e n t a d istrib u te d a d a p tiv e p a c k e t ro u tin g a lg o rith m w h ic h m in im iz e s th e

a v e ra g e p a c k e t d e liv e ry tim e , w h ile u sin g o n ly lo c a lly a v a ila b le in fo rm a tio n .

2 . C o m p a re th e p e rfo rm a n c e o f th e a d a p tiv e ro u tin g p o lic y to sta n d a rd ro u tin g a

l-g o rith m s.

3 . Im p ro v e th e a d a p tiv e ro u tin g m e c h a n ism to d isc o v e r n e w ro u te s.

4 . E x te n d a n d te st th e a lg o rith m s u n d e r m o re re a listic sc e n a rio s in c lu d in g n o d e s

w ith fin ite b u ffe r siz e a n d o p tim iz a tio n o f m u ltip le o b je c tiv e s.

1.6

Methodology

(16)

1 . Q - L e a r n i n g i s a r e i n f o r c e m e n t l e a r n i n g a l g o r i t h m t h a t i s a b l e t o l e a r n a n o p t i m a l s e q u e n c e o f a c t i o n s i n a n e n v i r o n m e n t w h i c h m a x i m i z e s r e w a r d s r e c e i v e d f r o m t h e e n v i r o n m e n t . Q - R o u t i n g i s a n i m p l e m e n t a t i o n o f Q - L e a r n i n g , w h i c h i s a b l e t o d i s t r i b u t i v e l y r o u t e p a c k e t s i n a n e t w o r k . E a c h n o d e i s a b l e t o m a k e a r o u t i n g d e c i s i o n u s i n g o n l y l o c a l l y a v a i l a b l e i n f o r m a t i o n . T h e r e w a r d r e c e i v e d i s t h e p a c k e t d e l i v e r y t i m e ; t h u s , t h e g o a l i s t o m i n i m i z e t h e a v e r a g e d e l i v e r y t i m e o f a l l p a c k e t s . 2 . W e w i l l e m p i r i c a l l y e v a l u a t e t h e r o u t i n g a l g o r i t h m s b y s i m u l a t i o n a n d c o m p a r e t h e i r p e r f o r m a n c e u n d e r d i f f e r e n t t r a f f i c l o a d s a n d n e t w o r k t o p o l o g i e s . W e w i l l u s e t h e a v e r a g e p a c k e t d e l i v e r y t i m e a n d t h e a v e r a g e n u m b e r o f d r o p p e d p a c k e t s a s t h e p e r f o r m a n c e m e a s u r e . 3 . W e w i l l i n v e s t i g a t e p o s s i b l e p e r f o r m a n c e i m p r o v e m e n t s t o Q - R o u t i n g d e s i g n e d t o i n c r e a s e t h e e x p l o r a t i o n a b i l i t y o f t h e a l g o r i t h m , w h i c h e n a b l e s t h e d i s c o v e r y o f n e w r o u t e s i n t h e n e t w o r k . T h i s i m p r o v e m e n t i s m a d e p o s s i b l e b y a d d i n g a p r o b a b i l i s t i c c o m p o n e n t t O J o u t i n g d e c i s i o n s . 4 . W e w i l l c o n s i d e r m o r e r e a l i s t i c n e t w o r k s c e n a r i o s : n e t w o r k s w i t h f i n i t e b u f f e r s a n d n e t w o r k s w h e r e t h e r e a r e m u l t i p l e o b j e c t i v e s t o b e o p t i m i z e d : ( a ) P r e v i o u s w o r k e x p l o r e d t h e p e r f o r m a n c e o f Q - R o u t i n g i n n e t w o r k s w i t h i n f i -n i t e p a c k e t b u f f e r s . W e w i l l e x a m i n e t h e m o r e r e a l i s t i c c a s e o f f i n i t e b u f f e r s . C o n g e s t i o n c o n t r o l i s a c h i e v e d b y a v o i d i n g n o d e s w i t h a h i g h l e v e l o f c o n g e s -t i o n . ( b ) W e w i l l a l s o e x a m i n e t h e p e r f o r m a n c e o f a n e w r o u t i n g a l g o r i t h m , a b l e t o o p t i m i z e m u l t i p l e p o s s i b l y c o n f l i c t i n g o b j e c t i v e s , e . g . p a c k e t d e l i v e r y t i m e v e r s u s c o s t . W e e x a m i n e t h e i m p l i c a t i o n s o f t h i s t r a d e - o f f .

1.7

Achievements

W e h a v e l e a r n e d t h e f o l l o w i n g f r o m o u r i n v e s t i g a t i o n : 1 . Q - R o u t i n g i s a b l e t o r o u t e p a c k e t s m i n i m i z i n g t h e a v e r a g e d e l a y w h i l e o n l y u s i n g l o c a l i n f o r m a t i o n . 2 . Q - R o u t i n g c o m p a r e s w e l l w i t h s t a n d a r d r o u t i n g a l g o r i t h m s . I n p a r t i c u l a r , i t c o n v e r g e s t o a m o r e s t a b l e r o u t i n g p o l i c y t h a n o u r v e r s i o n o f t h e d i s t r i b u t e d

(17)

B e l l m a n - F o r d a l g o r i t h m . 3 . W e d e m o n s t r a t e d t h e e f f i c i e n c y o f n e w p a t h d i s c o v e r y o f a p r o p o s e d a l g o r i t h m b a s e d o n p r o b a b i l i s t i c e x p l o r a t i o n . 4 . T w o e x t e n d e d a l g o r i t h m s w e r e a l s o s h o w n t o p e r f o r m w e l l i n m o r e r e a l i s t i c n e t -w o r k s c e n a r i o s : ( a ) I n n e t w o r k s w i t h l i m i t e d b u f f e r s , t h e a l g o r i t h m i s a b l e t o p e r f o r m c o n g e s t i o n c o n t r o l , d r o p p i n g f e w e r p a c k e t s a t o v e r l o a d e d n o d e s . ( b ) A n i m p r o v e d r o u t i n g a l g o r i t h m i s a l s o a b l e t o o p t i m i z e m u l t i p l e o b j e c t i v e s . H o w e v e r , c o m p e t i n g o b j e c t i v e s m a y e s t a b l i s h a t r a d e - o f f .

1.8

Thesis Organization

I n C h a p t e r 2 , w e d i s c u s s t h e r o u t i n g p r o b l e m i n m o r e d e t a i l a n d l o o k a t s o m e o f t h e a p p r o a c h e s u s e d t o s o l v e i t . C h a p t e r 3 d i s c u s s e s t h e f i e l d o f r e i n f o r c e m e n t l e a r n i n g , p r e s e n t i n g t e c h n i q u e s o f s o l v i n g r e i n f o r c e m e n t l e a r n i n g p r o b l e m s . I n C h a p t e r 4 , w e p r e s e n t t h e s i m u l a t i o n r e s u l t s o f t h e c o m p a r i s o n b e t w e e n d i f f e r e n t r o u t i n g a l g o r i t h m s b y e v a l u a t i n g p e r f o r m a n c e u n d e r v a r i o u s s c e n a r i o s . C o n c l u s i o n s a n d d i r e c t i o n s o f f u t u r e r e s e a r c h a r e p r e s e n t e d i n C h a p t e r 5 .

(18)

Routing in Communication

Networks

In this chapter, w e exam ine the routing problem and investigate different approaches

that have been proposed for solving it. W e define the routing problem , and discuss

the general requirem ents of routing algorithm s. N etw ork routing is very com plex; thus,

w e discuss som e of the characteristics that differentiate betw een different routing

algo-rithm s.

2.1

The Routing Problem

W e consider a com m unication netw ork [27; 15] as a undirected w eighted graph G

=

(N, L)

w ith a set of nodes

N,

and a set of bidirectional links

L,

connecting the nodes.

E ach link has a capacity and a user-defined associated cost. W e define a path as a

sequence of nodes connecting a source to a destination node. T here m ay be m ultiple

paths betw een sources and destinations. T he general routing problem consists of finding

the optim al path betw een source and destination nodes satisfying som e perform ance

criterion.

W e w ill discuss the routing problem in the context of packet sw itching. In a

packet-sw itched netw ork, data is broken up into a sequence of packets w hich are sent from node

to node until the destination is reached. T he routing decision at each node consists of

deciding to w hich neighbouring node to send a packet.

(19)

A r o u t i n g a l g o r i t h m h a s t h e f o l l o w i n g r e q u i r e m e n t s [ 2 7 ] : • C o r r e c t n e s s • S i m p l i c i t y • E f f i c i e n c y • R o b u s t n e s s • S t a b i l i t y • F a i r n e s s • O p t i m a l i t y T h e

correctness

o f a r o u t i n g a l g o r i t h m r e f e r s t o t h e f a c t t h a t i t m u s t r o u t e a l l p a c k e t s t o t h e c o r r e c t d e s t i n a t i o n s .

Simple

r o u t i n g a l g o r i t h m s a r e a l s o p r e f e r r e d , a s t h e y h a v e l e s s r o u t i n g o v e r h e a d , w h i c h i n t u r n i n c r e a s e t h e

efficiency

o f t h e n e t w o r k . A l l p a c k e t r o u t i n g s c h e m e s h a v e a c e r t a i n a m o u n t o f p r o c e s s i n g a n d t r a n s m i s s i o n o v e r h e a d , w h i c h m a y n e g a t i v e l y i m p a c t t h e e f f i c i e n c y o f t h e n e t w o r k . T h e b e n e f i t s o f o v e r h e a d s m u s t b e b a l a n c e d w i t h t h e d e c r e a s e i n e f f i c i e n c y c a u s e d . S o m e o f t h e s e r e q u i r e m e n t s a r e i n c o m p e t i t i o n w i t h e a c h o t h e r , e .g . r o b u s t n e s s a n d s t a b i l i t y . A r o u t i n g a l g o r i t h m i s s a i d t o b e

robust

w h e n i t i s a b l e t o a d a p t t o n o d e o r l i n k f a i l u r e s a n d c h a n g e s i n n e t w o r k l o a d c o n d i t i o n s . W h e n a n o v e r l o a d i s d e t e c t e d i n a s e c t i o n o f t h e n e t w o r k , t r a f f i c i s r e r o u t e d t o l e s s c o n g e s t e d r e g i o n s . I f t h e r o u t i n g a l g o r i t h m r e s p o n d s t o o q u i c k l y , t h e s e l e s s c o n g e s t e d r e g i o n s w i l l i n t u r n b e c o m e c o n -g e s t e d . T h e r o u t i n g a l g o r i t h m i s c a l l e d

unstable

i f i t c o n t i n u a l l y s h i f t s t h e l o a d b e t w e e n d i f f e r e n t s e c t i o n s o f t h e n e t w o r k . O n t h e o t h e r h a n d , i f t h e n e t w o r k a d a p t s t o o s l o w l y , p a c k e t s m a y b e d r o p p e d a t c o n g e s t e d n o d e s . T h e r e a l s o e x i s t s a t r a d e - o f f b e t w e e n

optimality

a n d

fairness:

i f a c e r t a i n p e r f o r m a n c e c r i t e r i o n f a v o u r s t h e e x c h a n g e o f p a c k e t s b e t w e e n n e a r b y n o d e s , t h e t h r o u g h p u t m a y b e i n c r e a s e d . T h i s m a y a p p e a r u n f a i r t o n o d e s w i t h a h i g h p r o p o r t i o n o f l o n g - d i s t a n c e t r a f f i c . W e b r i e f l y d i s c u s s t h e v a r i o u s d e s i g n e l e m e n t s t h a t c o n t r i b u t e t o a r o u t i n g s t r a t e g y a s p r e s e n t e d i n [ 2 7 ] ( s e e T a b l e 2 .1 ) .

(20)

P e r f o r m a n c e c r i t e r i o n N u m b e r o f h o p s C o s t D e l a y T h r o u g h p u t D e c i s i o n t i m e P a c k e t S e s s i o n D e c i s i o n p l a c e E a c h n o d e ( d i s t r i b u t e d ) C e n t r a l n o d e ( c e n t r a l i z e d ) O r i g i n a t i n g n o d e ( s o u r c e ) N e t w o r k i n f o r m a t i o n s o u r c e N o n e L o c a l A d j a c e n t n o d e s N o d e s a l o n g r o u t e A l l N o d e s N e t w o r k i n f o r m a t i o n u p d a t e t i m i n g C o n t i n u o u s P e r i o d i c M a j o r l o a d c h a n g e T o p o l o g y c h a n g e T a b l e 2 . 1 : D e s i g n e l e m e n t s o f a r o u t i n g s t r a t e g y 2 . 1 . 1 P e r f o r m a n c e C r i t e r i o n A r o u t i n g p o l i c y h a s t o d e c i d e t o w h i c h n e i g h b o u r i n g n o d e t o f o r w a r d a p a c k e t t o b a s e d o n s o m e p e r f o r m a n c e c r i t e r i o n . T h e s i m p l e s t c h o i c e i s t o s e l e c t t h e n e i g h b o u r w h i c h i s o n t h e m i n i m u m h o p p a t h t o t h e p a c k e t 's d e s t i n a t i o n . A m o r e g e n e r a l a p p r o a c h i s t o a s s i g n a l i n k c o s t t o e a c h l i n k a n d t o s e l e c t t h e m i n i m u m c o s t p a t h . T h e s p e c i f i c c o s t m e t r i c u s e d d e t e r m i n e s t h e o p t i m a l p a t h . I f t h e l i n k c o s t i s i n v e r s e l y p r o p o r t i o n a l t o t h e l i n k c a p a c i t y , t h e l e a s t - c o s t p a t h m a x i m i z e s t h e t h r o u g h p u t w h e r e a s i t m i n i m i z e s t h e a v e r a g e p a c k e t d e l a y w h e n t h e l i n k c o s t i s t h e m e a s u r e d l i n k d e l a y . O t h e r p o s s i b l e c o s t m e t r i c s a r e r e l i a b i l i t y , l o a d a n d c o m m u n i c a t i o n s c o s t . T h e m e t r i c c a n a l s o b e a c o m b i n a t i o n o f s e v e r a l p e r f o r m a n c e c r i t e r i a ; i . e . t h e o p t i m a l r o u t e o v e r m u l t i p l e o b j e c t i v e s . 2 . 1 . 2 D e c i s i o n T i m e T h e d e c i s i o n t i m e o f r o u t i n g d e c i s i o n s r e f e r t o t w o t y p e s o f p a c k e t - s w i t c h e d n e t w o r k s . I n a

datagram

p a c k e t s w i t c h i n g n e t w o r k , e a c h n o d e m a k e s a r o u t i n g d e c i s i o n f o r e a c h i n c o m i n g p a c k e t . H o w e v e r , t h e r e i s a n o t h e r a p p r o a c h , c a l l e d

virtual-circuit

p a c k e t s w i t c h i n g , w h e r e t h e r o u t i n g d e c i s i o n i s m a d e o n l y o n c e p e r

session.

I f a s o u r c e n o d e w a n t s t o c o m m u n i c a t e w i t h a d e s t i n a t i o n n o d e , a v i r t u a l - c i r c u i t b e t w e e n s o u r c e a n d

(21)

d e s tin a tio n is e s ta b lis h e d . A fte r th e c o n n e c tio n h a s b e e n s e t u p , e a c h n o d e s e le c ts

th e n e ig h b o u r b a s e d o n th e v irtu a l-c irc u it id e n tifie r. T h u s , a ll s u b s e q u e n t p a c k e ts o f a

s e s s io n w ill fo llo w th e s a m e ro u te th ro u g h th e n e tw o rk .

2 .1 .3 D e c is io n P la c e

T h e d e c is io n p la c e re fe rs to w h e re ro u tin g d e c is io n s a re m a d e . In

centralized

ro u tin g ,

th e re is a c e n tra l c o n tro l n o d e w h ic h c o lle c ts in fo rm a tio n fro m th e n e tw o rk a n d c o m

-p u te s ro u tin g ta b le s w h ic h a re d is trib u te d to a ll n o d e s . T h e p ro b le m w ith th is a p p ro a c h

is th a t th e c o n tro llin g n o d e is a s in g le p o in t o f fa ilu re .

Distributed

ro u tin g a lg o rith m s

m a k e ro u tin g d e c is io n s a t e a c h n o d e ; th u s , th e y a re m o re ro b u s t. In

source

ro u tin g

a lg o rith m s , th e o rig in a tin g n o d e s e le c ts th e ro u te th ro u g h th e n e tw o rk .

2 .1 .4 N e tw o r k I n f o r m a tio n S o u r c e

M o s t ro u tin g a lg o rith m s u tiliz e s o m e in fo rm a tio n a b o u t th e n e tw o rk to p o lo g y , tra ffic

lo a d o r lin k c o s t. D is trib u te d ro u tin g m a y u tiliz e in fo rm a tio n a v a ila b le lo c a lly to th e

n o d e s u c h a s th e c o s t o f e a c h lin k . N o d e s m a y a ls o m a k e ro u tin g d e c is io n s b a s e d

o n in fo rm a tio n fro m n e ig h b o u rin g n o d e s , o r a ll n o d e s o n a p a th . C e n tra liz e d ro u tin g

m a k e s u s e o f in fo rm a tio n fro m a ll n o d e s . S o m e a lg o rith m s d o n o t u s e a n y n e tw o rk s ta te

in fo rm a tio n , e .g . flo o d in g a n d ra n d o m ro u tin g .

2 .1 .5 R o u tin g I n f o r m a tio n U p d a te T im in g

If th e ro u tin g s tra te g y u s e s lo c a lly a v a ila b le in fo rm a tio n , ro u tin g u p d a te s a re c o n tin u

-o u s . F o r a ll o th e r s tra te g ie s th a t m a k e u s e o f n e tw o rk in fo rm a tio n , ro u tin g in fo rm a tio n

u p d a te s a re m a d e p e rio d ic a lly in o rd e r to a d a p t to c h a n g in g n e tw o rk c o n d itio n s . T h e

a c c u ra c y o f in fo rm a tio n d e p e n d s o n h o w fre q u e n tly th e in fo rm a tio n is u p d a te d . T h u s ,

w ith m o re a c c u ra te in fo rm a tio n , b e tte r ro u tin g d e c is io n s a re m a d e . H o w e v e r, in fo rm a

(22)

2.2

Conventional

Routing Strategies

N e t w o r k r o u t i n g i s a v e r y c o m p l e x p r o b l e m a n d m a n y d i f f e r e n t a p p r o a c h e s t o s o l v i n g i t h a v e b e e n p r o p o s e d . W e b r i e f l y d i s c u s s s o m e o f t h e r o u t i n g s t r a t e g i e s u s e d , r a n g i n g f r o m t h e s i m p l e t o t h e m o r e c o m p l e x a d a p t i v e r o u t i n g s t r a t e g i e s .

2.2.1

Flooding

F l o o d i n g [ 2 7 ] i s s i m p l e r o u t i n g s t r a t e g y w h e r e b y e a c h n o d e f o r w a r d s a p a c k e t t o e a c h o f i t s n e i g h b o u r s , e x c e p t t h e n o d e w h e r e t h e p a c k e t c a m e f r o m . N o d e s d o n o t n e e d a n y i n f o r m a t i o n a b o u t t h e n e t w o r k t o p o l o g y b e y o n d t h e i r i m m e d i a t e n e i g h b o u r s . P a c k e t s n e e d a s e q u e n c e n u m b e r a n d t h e d e s t i n a t i o n n o d e e m b e d d e d i n t h e i r h e a d e r s s o t h a t a d e s t i n a t i o n n o d e c a n d i s c a r d d u p l i c a t e p a c k e t s . F o r w a r d e d p a c k e t s w h i c h r e t u r n t o a p r e v i o u s l y v i s i t e d n o d e m u s t a l s o b e d i s c a r d e d ; o t h e r w i s e , t h e n u m b e r o f p a c k e t s i n c i r c u l a t i o n w i l l i n c r e a s e w i t h o u t b o u n d . A n o t h e r w a y t o a c c o m p l i s h t h i s i s f o r e a c h p a c k e t t o h a v e a h o p c o u n t w h i c h i s i n c r e m e n t e d a t e a c h n o d e , a n d d i s c a r d e d w h e n a p r e d e t e r m i n e d l i m i t i s r e a c h e d . S i n c e a l l p o s s i b l e r o u t e s b e t w e e n s o u r c e a n d d e s t i n a t i o n a r e t r i e d , a p a c k e t i s g u a r a n t e e d t o r e a c h t h e d e s t i n a t i o n i f i t i s r e a c h a b l e ; t h u s , f l o o d i n g i s v e r y r o b u s t . I t h a s b e e n u s e d i n m i l i t a r y n e t w o r k s w h e r e l i n k o r n o d e f a i l u r e s m a y f r e q u e n t l y o c c u r [ 1 5 ] . A n o t h e r p r o p e r t y o f f l o o d i n g i s t h a t a t l e a s t o n e p a c k e t w i l l t r a v e l a l o n g t h e s h o r t e s t r o u t e . T h i s m a y b e u s e d i n s o m e n e t w o r k s t o s e t u p v i r t u a l - c i r c u i t s . B e c a u s e a l l n o d e s d i r e c t l y o r i n d i r e c t l y c o n n e c t e d t o t h e s o u r c e n o d e a r e v i s i t e d , f l o o d i n g c a n b e u s e d t o d i s t r i b u t e i m p o r t a n t i n f o r m a t i o n ( e . g . r o u t i n g i n f o r m a t i o n ) t o a l l n o d e s . T h e b i g g e s t d i s a d v a n t a g e o f f l o o d i n g i s o f c o u r s e t h e h i g h l e v e l o f n e t w o r k b a n d w i d t h t h a t i s w a s t e d o n d u p l i c a t e p a c k e t s .

2.2.2

Random Routing

A n o t h e r s i m p l e , r o b u s t r o u t i n g s t r a t e g y i s t h a t o f r a n d o m r o u t i n g [ 2 7 ] ' w h e r e e a c h n o d e r a n d o m l y s e l e c t s t h e n o d e t o f o r w a r d a p a c k e t t o , e x c l u d i n g t h e n o d e w h e r e t h e p a c k e t c a m e f r o m . A l t h o u g h t h i s s t r a t e g y w i l l i n g e n e r a l n o t s e l e c t t h e s h o r t e s t p a t h , i t g e n e r a t e s l e s s t r a f f i c t h a n f l o o d i n g . A r e f i n e m e n t o f t h i s t e c h n i q u e i s t o s e l e c t a n

(23)

o u t g o i n g l i n k w i t h a p r o b a b i l i t y p r o p o r t i o n a l t o t h e d a t a r a t e o f t h e l i n k . T h i s s t r a t e g y a t t e m p t s t o e n s u r e a g o o d t r a f f i c d i s t r i b u t i o n .

2.2.3

Fixed Routing

F i x e d r o u t i n g - a l s o c a l l e d s t a t i c s h o r t e s t p a t h r o u t i n g - c o m p u t e s l e a s t - c o s t p a t h s f o r a l l o r i g i n - d e s t i n a t i o n n o d e s i n t h e n e t w o r k . F r o m t h e s e f i x e d p a t h s , r o u t i n g t a b l e s a r e c o m p u t e d a n d s e n t t o e a c h n o d e . A s t h e l e a s t - c o s t p a t h s a r e c o m p u t e d o n c e , t h e l i n k c o s t s c a n n o t b e b a s e d o n d y n a m i c v a r i a b l e s s u c h a s t r a f f i c . I n s t e a d , t h e n e t w o r k i s d e s i g n e d b a s e d o n a n a n t i c i p a t e d t r a f f i c d i s t r i b u t i o n . F i x e d r o u t i n g i s s i m p l e a n d i t i s v e r y e f f e c t i v e i n r e l i a b l e n e t w o r k s w i t h s t a b l e l o a d . T h e d i s a d v a n t a g e i s t h a t i t d o e s n o t r e a c t t o c o n g e s t i o n o r n o d e f a i l u r e s , o r u n f o r e s e e n t r a f f i c p a t t e r n s .

2.2.4

Adaptive Routing

I n o r d e r t o i n c r e a s e e f f i c i e n c y , a d a p t i v e r o u t i n g m e t h o d s d y n a m i c a l l y a l t e r r o u t e s w h e n n o d e o r l i n k f a i l u r e s a r e d e t e c t e d o r w h e n c o n g e s t i o n d e v e l o p s . F o r a n e t w o r k t o a d a p t t o t h e s e c h a n g e s , i t n e e d s t o c o l l e c t a n d e x c h a n g e n e t w o r k s t a t e i n f o r m a t i o n b e t w e e n n o d e s , s u c h a s d e l a y o r t h r o u g h p u t [ 2 6 ] . T h e o p t i m a l i t y o f t h e n e w r o u t e s d e p e n d s o n t h e q u a l i t y o f t h e n e t w o r k i n f o r m a t i o n , w h i c h n e c e s s i t a t e s a n i n c r e a s e d i n f o r m a t i o n e x c h a n g e . H o w e v e r , t h e r e e x i s t s a t r a d e - o f f b e t w e e n t h e q u a l i t y o f i n f o r m a t i o n a n d t h e o v e r h e a d : o v e r h e a d c o n s u m e s n e t w o r k r e s o u r c e s , w h i c h m a y d e g r a d e t h e o v e r a l l n e t w o r k p e r f o r m a n c e . A s e r i o u s p r o b l e m w i t h a d a p t i v e r o u t i n g i s t h a t i t m a y b e c o m e u n s t a b l e i f a r o u t i n g p o l i c y r e a c t s t o o q u i c k l y t o c o n g e s t i o n [ 1 5 ; 2 7 ; 1 4 ] . I f t h e a d a p t i v e r o u t i n g r e d i r e c t s m o s t t r a f f i c a w a y f r o m t h e c o n g e s t e d p a r t o f t h e n e t w o r k , c o n g e s t i o n m a y d e v e l o p e l s e w h e r e ; t h u s , t r a f f i c w i l l a g a i n s h i f t t o a d i f f e r e n t p a r t o f t h e n e t w o r k . T h i s o s c i l l a t i o n w i l l c o n t i n u e i n d e f i n i t e l y i f n o t p r o p e r l y m a n a g e d b y t h e r o u t i n g a l g o r i t h m . A s i t t a k e s t i m e f o r t h e n e t w o r k i n f o r m a t i o n t o r e a c h r e l e v a n t n o d e s , t h e r e i s n e v e r a t r u e p i c t u r e o f t h e n e t w o r k s t a t e . T e m p o r a r y r o u t i n g l o o p s [ 1 1 ; 7 ] c a n d e v e l o p , w h e r e p a c k e t s c i r c u l a t e t h r o u g h t h e n e t w o r k u n t i l a l l n o d e s h a v e c o n s i s t e n t r o u t i n g t a b l e s . T h i s l o o p i n g w a s t e s b a n d w i d t h a n d i n c r e a s e s d e l a y .

(24)

A lth o u g h a d a p tiv e r o u tin g is c o m p le x , it is w id e ly u s e d a s it im p r o v e s th e n e tw o r k p e r f o r m a n c e , a n d h e lp s in c o n g e s tio n c o n tr o l.

2.2.5

Link-State Routing

L in k - s ta te r o u tin g [ 2 6 ] is a d is tr ib u te d , a d a p tiv e r o u tin g a lg o r ith m w h e r e e a c h n o d e m a in ta in s a v ie w o f th e w h o le n e tw o r k to p o lo g y w ith a c o s t f o r e a c h lin k . T o u p d a te th e ir v ie w o f th e c u r r e n t n e tw o r k s ta te , n o d e s r e g u la r ly b r o a d c a s t th e lin k c o s ts o f o u tg o in g lin k s to a ll o th e r n o d e s u s in g f lo o d in g . E a c h n o d e u s e s its v ie w to c a lc u la te th e s h o r te s t p a th s to a ll d e s tin a tio n s w ith D ijk s tr a 's a lg o r ith m . E a c h n o d e n e e d s s to r a g e s p a c e p r o p o r tio n a l to

O(N

2), w h e r e

N

is th e n u m b e r o f n o d e s in th e n e tw o r k .

O p e n S h o r te s t P a th F ir s t ( O S P F ) is th e lin k - s ta te r o u tin g p r o to c o l u s e d in th e I n te r -n e t [ 1 1 ] . I n s ta b ilitie s a r e a v o id e d b y d is s e m in a tin g th e lin k c o s t in f o r m a tio n q u ic k ly , a n d b y r e p r e s e n tin g th e lin k - c o s ts b y a s lo w ly c h a n g in g m e a s u r e o f a v e r a g e lin k u ti-liz a tio n [ 2 6 ; 2 7 ] . R a p id lin k c o s t d is s e m in a tio n c a n b e a c h ie v e d if r o u tin g p a c k e ts h a v e h ig h e r p r io r ity th a n d a ta p a c k e ts . R o u tin g lo o p s a r e s till p o s s ib le , b u t s in c e th e y d is a p p e a r in tim e p r o p o r tio n a l to th e d ia m e te r

D

o f th e n e tw o r k , th e y a r e s h o r t- liv e d .

2.2.6

Distance-Vector Routing

D is ta n c e - v e c to r r o u tin g is a n o th e r d is tr ib u te d , a d a p tiv e r o u tin g a p p r o a c h b a s e d o n th e B e llm a n - F o r d a lg o r ith m [ 1 0 ; 2 6 ] . E a c h n o d e m a in ta in s a s e t o f d is ta n c e s to a ll d e s tin a tio n s v ia e a c h o f its n e ig h b o u r s . T h u s , th e s to r a g e n e e d e d a t e a c h n o d e is p r o p o r tio n a l to

O(N

x

e),

w h e r e e is th e a v e r a g e n u m b e r o f n e ig h b o u r s o f e a c h n o d e in th e n e tw o r k . E a c h n o d e r o u te s a n in c o m in g p a c k e t to th e n e ig h b o u r w ith th e m in im u m d is ta n c e to th e d e s tin a tio n . N o d e s u p d a te th e ir d is ta n c e ta b le s b y e x c h a n g in g

distance-vectors

w ith th e ir n e ig h -b o u r s . T h e d is ta n c e - v e c to r a n o d e tr a n s m its c o n s is ts o f th e c u r r e n t s h o r te s t d is ta n c e f r o m a n o d e to e a c h d e s tin a tio n . U p o n r e c e iv in g a d is ta n c e - v e c to r , a n o d e c o m p u te s a n e w d is ta n c e ta b le b y s e le c tin g th e m in im u m b e tw e e n th e c u r r e n t a n d r e c e iv e d s h o r t-e s t d is ta n c e s . I f th e d is ta n c e ta b le c h a n g e s , th e n o d e w ill a g a in b r o a d c a s t its n e w ly c o m p u te d d is ta n c e - v e c to r to a ll n e ig h b o u r s . T h is a s y n c h r o n o u s u p d a te m e c h a n is m c o n v e r g e s to th e s h o r te s t d is ta n c e s f o r a ll c o n n e c te d p a ir s o f n o d e s [ 7 ] .

(25)

T h e o r ig in a l A R P A N E T u s e d th e d is tr ib u te d B e llm a n - F o r d a lg o r ith m ; h o w e v e r , it w a s r e p la c e d in 1 9 7 9 b y a b r u te - f o r c e lin k - s ta te a lg o r ith m b e c a u s e o f s e v e r a l d r a w -b a c k s [ 2 7 ; 7 ] . I t w a s f o u n d to r e a c t s lo w ly to f a ilu r e s a n d lin k c o s t c h a n g e s . T h e p r o b le m is th a t th e d is ta n c e s e x c h a n g e d b e tw e e n n o d e s m a y c o n ta in p a th s w ith lo o p s . T h e lo o p in g o f p a c k e ts w a s te s b a n d w id th a n d is c a lle d th e b o u n c i n g e f f e c t . I f th e n e t-w o r k is d is c o n n e c te d , th e a lg o r ith m d o e s n o t e v e n te r m in a te ; th is is a ls o r e f e r r e d to a s th e c o u n t i n g - t o - i n f i n i t y p r o b le m . M e c h a n is m s to o v e r c o m e th e s e p r o b le m s h a v e b e e n p r o p o s e d w h ic h u s e v a r io u s n o d e c o o r d in a tio n te c h n iq u e s , d if f u s in g c o m p u ta tio n s a n d m a in ta in in g o n ly lo o p - f r e e p a th s [ 7 ; 1 1 ; 1 ; 2 6 ] . T h e s e te c h n iq u e s a ll e lim in a te lo n g - liv e d lo o p s , a n d s o m e a ls o e lim in a te s h o r t- liv e d lo o p s . H o w e v e r , th e s e te c h n iq u e s a ll h a v e in c r e a s e d c o m m u n ic a tio n o v e r h e a d to d if f e r in g d e g r e e s .

2.3

Mobile Agents

A s th e n e tw o r k a n d its tr a f f ic a r e a h ig h ly d y n a m ic a l s y s te m , it h a s b e e n a r g u e d th a t m o b ile s o f tw a r e a g e n ts a r e a g o o d a p p r o a c h f o r a d a p tiv e r o u tin g in s u c h a c o m p le x , in h e r e n tly d is tr ib u te d e n v ir o n m e n t [ 1 6 ; 6 ] . T h e u s e o f m u ltip le c o o p e r a tin g a g e n ts m a y f a c ilita te a h ig h le v e l o f a v a ila b ility , a d a p ta b ility a n d f a u lt- to le r a n c e in m o d e r n c o m m u n ic a tio n n e tw o r k s . M o b ile a g e n ts m a y a ls o s e r v e u s e f u l in d e s ig n , a b s tr a c tin g th e in te r a c tio n s b e tw e e n e n titie s in a c o m p le x s y s te m .

2.3.1

A c t i v e N e t w o r k s

T h e n e w a p p r o a c h o f a c t i v e n e t w o r k s e n a b le n o d e s to e x e c u te c u s to m c o d e e m b e d d e d in p a c k e ts . T h is a llo w s p a c k e ts to r o u te th e m s e lv e s a n d p e r f o r m c o m p u ta tio n s a t n e tw o r k n o d e s o n th e r o u te [ 3 1 ; 1 6 ] . I n a d d itio n to r o u tin g , th is a p p r o a c h a ls o a llo w s f le x ib le in c o r p o r a tio n o f n e w s e r v ic e s in to a n e tw o r k w ith o u t th e n e e d to r e d e s ig n th e n e tw o r k in f r a s tr u c tu r e [ 3 1 ] .

T h e c h ie f p r o b le m s f a c in g a c tiv e n e tw o r k s a r e e n s u r in g th e s e c u r i t y a n d s c a l a b i l i t y o f th e n e tw o r k s . B e f o r e e x e c u tin g m o b ile c o d e , th e n o d e m u s t tr u s t th e c o d e . O n e w a y o f d o in g th is is w ith P r o o f - C a r r y in g C o d e ( P C C ) [ 2 2 ] . T h e m o b ile c o d e in c lu d e s a f o r m a l p r o o f o f its p r o p e r tie s , w h ic h th e p r o c e s s in g n o d e c a n v e r if y . T h e q u e s tio n is w h e th e r

(26)

th e in c re a se d fle x ib ility ju stifie s th e e x tra o v e rh e a d o f p e r p a c k e t e x e c u tio n , a n d h o w

w e ll th is p a ra d ig m sc a le s to v e ry la rg e n e tw o rk s.

2.3.2

Social Insect Metaphors

A n t-c o lo n y o p tim iz a tio n is a m e th o d o f so lv in g c o m b in a to ria l o p tim iz a tio n p ro b le m s

in sp ire d fro m th e fo ra g in g b e h a v io u r o f a n ts [6 ]. In n a tu re , a n ts a re a b le to fin d th e

sh o rte st d ista n c e to a fo o d so u rc e b y la y in g tra ils o f p h e ro m o n e s. A la rg e c o lle c tio n o f

a n ts c o o p e ra te o n a ta sk b y th is in d ire c t fo rm o f c o m m u n ic a tio n th ro u g h th e e n v iro n

-m e n t, c a lle d

stigmergy.

A d a p tiv e d istrib u te d ro u te d isc o v e ry is p e rfo rm e d b y a rtific ia l so ftw a re a n ts th a t e x p lo re

th e n e tw o rk [2 5 ; 6 ]. T h ro u g h o u t th e n e tw o rk , a n ts a re la u n c h e d to ra n d o m ly se le c te d

d e stin a tio n n o d e s. T h e se a n ts sh a re th e q u e u e s a t n o d e s w ith d a ta p a c k e ts, a n d re c o rd

th e e x p e rie n c e d d e la y w h ic h is u se d fo r u p d a tin g th e ro u tin g ta b le s. E a c h a n t c a n b e

th o u g h t o f a s p e rfo rm in g a sin g le M o n te C a rlo e x p e rim e n t o n th e a c tu a l n e tw o rk , a n d

th e re su lt is th e e x p e rie n c e d d e la y . T h e sy ste m a s a w h o le p e rfo rm s p a ra lle l M o n te

C a rlo e x p e rim e n ts w ith e x p lo ra tio n b ia se d to w a rd s m o re u se fu l re g io n s o f th e sta te

sp a c e [6 ].

T h e re su ltin g ro u tin g is v e ry ro b u st a s it d o e s n o t d e p e n d o n in d iv id u a l a n ts, b u t ra th e r

o n th e c o lle c tiv e b e h a v io u r o f th e e n tire a n t c o lo n y .

2.4

Summary

T h e a im o f p a c k e t-sw itc h e d n e tw o rk s is to m a k e m o re e ffic ie n t u se o f n e tw o rk re so u rc e s

b y fo rw a rd in g p a c k e ts b e tw e e n n o d e s o n a h o p -b y -h o p fa sh io n . T h e ro u tin g d e c isio n

a t e a c h n o d e c o n sists o f d e c id in g w h ic h n e ig h b o u r to se n d a p a c k e t to . W e d isc u sse d

th e sim p le ro u tin g stra te g ie s o f flo o d in g , ra n d o m ro u tin g a n d fix e d ro u tin g .

A d a p tiv e ro u tin g in c re a se s th e e ffic ie n c y o f a n e tw o rk b y re d ire c tin g tra ffic a w a y fro m

c o n g e ste d a re a s o r d y n a m ic a lly c h a n g in g ro u te s in n e tw o rk s c h a ra c te riz e d b y a c o n

-sta n tly c h a n g in g to p o lo g y . A d a p tiv e ro u tin g stra te g ie s h a v e to a v o id o sc illa tio n s in th e

n e tw o rk w h ic h a rise if th e y a d a p t to o q u ic k ly to c o n g e stio n . W e d isc u sse d th e tw o

(27)

Mobile software agents may prove helpful in managing

the complexity

of distributed,

dynamic networks.

We discussed the potential

of active networks, where packets route

themselves

by executing

code on a router.

The emergent

behaviour

exhibited

by ant

colonies also offer valuable insight into optimization

of a complex dynamical

system.

Promising results have already been obtained by routing based on a collection of simple

ant-like software agents.

(28)

Reinforcement

Learning

A b r o a d r a n g e o f le a r n in g p r o b le m s c a n b e c a s t in to th e r e in f o r c e m e n t le a r n in g f r a m e -w o r k [ 1 3 ; 2 0 ] . B r o a d ly s ta te d , r e in f o r c e m e n t le a r n in g is th e p r o b le m o f le a r n in g to a c h ie v e a g o a l th r o u g h in te r a c tio n in a d y n a m ic e n v ir o n m e n t. T h e le a r n in g e n tity w h ic h is r e s p o n s ib le f o r ta k in g a c tio n s is c a lle d a n

agent.

T h e a g e n t c o n tin u a lly in te r a c ts w ith th e e n v ir o n m e n t b y ta k in g a c tio n s , a n d r e c e iv in g r e w a r d s a n d s ta te in f o r m a tio n , a s s h o w n in F ig u r e 3 .1 . T h e g o a l o f th e a g e n t is to e x p e r im e n t w ith d if f e r e n t a c tio n s e q u e n c e s in o r d e r to m a x im iz e th e r e w a r d r e c e iv e d o v e r tim e . A n im p o r ta n t a s p e c t o f r e in f o r c e m e n t le a r n in g a lg o r ith m s is th a t th e y a r e a b le to le a r n f r o m

delayed rewards.

I n s o m e p r o b le m s , a n a g e n t h a s to e x e c u te a s p e c if ic s e q u e n c e o f a c tio n s b e f o r e it r e c e iv e s a r e w a r d . T o le a r n s u c h a s e q u e n c e , a n a g e n t h a s to o v e r c o m e th e p r o b le m o f

temporal credit assignment,

i.e . a n a g e n t h a s to d e c id e w h ic h s ta te s in th e a c tio n s e q u e n c e w e r e r e s p o n s ib le f o r th e r e c e iv e d r e w a r d . R e in f o r c e m e n t le a r n in g a lg o r ith m s th e r e f o r e a r e c o n c e r n e d w ith f in d in g th e o p tim a l s e q u e n c e o f a c tio n s th r o u g h

Agent

s ta te r e w a r d a c tio n

Environment

F i g u r e 3 .1 : T h e a g e n t - e n v i r o n m e n t i n t e r a c t i o n .

(29)

tria l-a n d -e rro r in te ra c tio n s in a n e n v iro n m e n t th a t m a x im iz e s th e re c e iv e d re w a rd o v e r tim e .

R e in fo rc e m e n t le a rn in g a lg o rith m s d iffe r fro m s u p e rv is e d le a rn in g a lg o rith m s in th a t

th e y a re n o t tra in e d o n in p u t/o u tp u t p a irs s p e c ify in g w h ic h a c tio n is th e b e s t a t e a c h

s ta te . In s te a d , th e y a re g u id e d to th e g o a l b y th e re w a rd s re c e iv e d . In o th e r w o rd s ,

th e re w a rd re c e iv e d a fte r e a c h a c tio n fu lly s p e c ifie s th e p ro b le m to b e s o lv e d . A n o th e r

d iffe re n c e to s u p e rv is e d le a rn in g is th a t a ta s k o fte n h a s n o s e p a ra te tra in in g a n d te s tin g

p h a s e s . In s te a d , s o m e ta s k s re q u ire c o n tin u a l le a rn in g th ro u g h o u t a n a g e n t's life .

3.1

Value Functions

W e c a n fo rm u la te th e re in fo rc e m e n t le a rn in g ta s k a n a g e n t fa c e s a s a M a rk o v d e c is io n

p ro c e s s (M D P ) [1 3 ]. A fin ite M a rk o v d e c is io n p ro c e s s is c h a ra c te riz e d b y :

• a fin ite s e t o f s ta te s

S,

• a fin ite s e t o f a c tio n s

A,

• a re w a rd fu n c tio n

R : S

x

A

----+ ~, a n d

• a s ta te tra n s itio n fu n c tio n T : S x A x S ----+ ~, w h e re T ( s , a ,

Sf)

is th e p ro b a b ility

o f a d v a n c in g fro m s ta te s to s ' w h e n ta k in g a c tio n a .

T h e m o d e l is c a lle d M a r k o v if th e tra n s itio n p ro b a b ilitie s T a re in d e p e n d e n t o f p re v io u s

s ta te s a n d a c tio n s . T h u s , th e n e x t s ta te is s p e c ifie d p ro b a b ilis tic a lly b y th e tra n s itio n

fu n c tio n T a n d th e c u rre n t s ta te a n d a c tio n a lo n e . N o te th a t th e m o d e l is a n o n d e te r

-m in is tic M D P b e c a u s e th e a c tio n s a re c h o s e n p ro b a b ilis tic a lly .

A t e a c h tim e s te p

t ,

a n a g e n t o b s e rv e s th e s ta te S t a n d ta k e s a c tio n a t. T h e e n v iro n m e n t

re s p o n d s b y re tu rn in g a re w a rd r H l

=

R ( s t, a t) a n d th e n e x t s ta te S H I w ith p ro b a b ility

T ( s t, a t, S H l ) ' T h is p ro c e s s is re p e a te d c o n tin u a lly u n til th e a g e n t a c h ie v e s its g o a l, o r

in d e fin ite ly fo r n o n -e p is o d ic ta s k s .

T h e p o lic y 7 f(s, a ) o f a n a g e n t is a m a p p in g o f e a c h s ta te S a n d a c tio n a to th e p ro b a b ility

o f ta k in g a c tio n a in s ta te s . T h e g o a l o f a n a g e n t is to im p ro v e its p o lic y b y m a x im iz in g

(30)

(1)

(2 ) T h e r e a r e d if f e r e n t w a y s o f c a lc u la tin g th e e x p e c te d r e tu r n

R

t , b a s e d o n th e s p e c if ic ta s k th e a g e n t h a s to s o lv e . S o m e ta s k s c a n b e b r o k e n u p in to a s e r ie s o f e p is o d e s o r tr ia ls , w h e r e e a c h e p is o d e e n d s in a t e r m i n a l s ta te . A t th e e n d o f e a c h e p is o d e , th e a g e n t is r e s e t to a s ta r tin g s ta te . I n s u c h e p i s o d i c t a s k s , w e o b ta in th e e x p e c te d r e tu r n b y s u m m in g th e to ta l r e c e iv e d r e w a r d s o v e r a f in ite h o r iz o n h : h

Rt

=

I :

r t + k + l k=O S o m e ta s k s n e v e r e n d ; th u s , th e a b o v e s u m m a y b e in f in ite . T h is p r o b le m m a y b e s o lv e d b y d is c o u n tin g f u tu r e r e w a r d s : 0 0 Rt

=

I :

' l r t + k + l , k=O w h e r e ry is th e d i s c o u n t r a t e a n d 0

:S

ry

<

1 . I n o u r d is c u s s io n s , w e w ill f o c u s e x c lu s iv e ly o n th is c a s e , w h ic h is c a lle d th e d i s c o u n t e d i n f i n i t e h o r i z o n c a s e . E p is o d ic ta s k s c a n a ls o b e h a n d le d b y th is d e f in itio n o f e x p e c te d r e tu r n b y in tr o d u c in g a n a b s o r b i n g s t a t e w h ic h is e n te r e d ju s t a f te r th e te r m in a l s ta te . T h e o n ly tr a n s itio n f r o m th e a b s o r b in g s ta te is to its e lf , w ith a n a s s o c ia te d r e w a r d o f z e r o .

M o s t r e in f o r c e m e n t le a r n in g a lg o r ith m s a r e b a s e d o n e s tim a tin g v a l u e f u n c t i o n s th a t e s tim a te th e u tility o f s ta te s . T h e v a lu e o r u tility o f a s ta te is th e f u tu r e r e w a r d , o r r e tu r n , th a t a n a g e n t c a n e x p e c t. A s th e f u tu r e r e w a r d s d e p e n d o n w h ic h a c tio n s a n a g e n t ta k e s , th e v a lu e f u n c tio n d e p e n d s o n th e p a r tic u la r p o lic y th e a g e n t f o llo w s . T h e v a l u e V 7 r

(s)

o f a s ta te

s

u n d e r p o lic y 7 r, is th e e x p e c te d r e tu r n b y f o llo w in g p o lic y 7 r f r o m s ta te s: V7 r( s )

=

E

7 r

{R

t

1St

=

s } ,

(3)

(4 ) w h e r e E7r{} d e n o te s th e e x p e c te d r e tu r n w h e n p o lic y 7 r is f o llo w e d . F o r th e d is c o u n te d in f in ite h o r iz o n c a s e , w e h a v e : V7 r( s )

=

E 7 r{ ~ r y k r t + k + l

I

S t

=

s } . T h e o p t i m a l v a l u e f u n c t i o n V * is a tta in e d b y m a x im iz in g V 7 r f o r a ll s ta te s : V * (

s)

=

m a x V7 r

(\Is) .

7 r (5 )

T h e o p t i m a l p o l i c y is d e f in e d a s th e p o lic y c o r r e s p o n d in g to th e o p tim a l v a lu e f u n c tio n in th e m a x im iz a tio n a b o v e :

7 r *

=

a r g m a x V7r

(\Is) .

7 r

(31)

I n a M D P , w e h a v e a m o d e l o f th e e n v ir o n m e n t d y n a m ic s in th e f o r m o f s ta te tr a n s itio n p r o b a b ilitie s

T

a n d th e r e w a r d f u n c tio n

R;

th u s , w e c a n u s e th e d y n a m ic p r o g r a m m in g te c h n iq u e c a lle d

value iteration

to f in d th e o p tim a l v a lu e f u n c tio n . O n c e w e h a v e th e o p tim a l v a lu e f u n c tio n , w e c a n o b ta in th e

optimal policy

1 f * b y c h o o s in g , in e a c h s ta te ,

th e a c tio n th a t r e s u lts in th e m a x im u m v a lu e f u n c tio n o f a ll th e im m e d ia te s u c c e s s o r s ta te s :

1f*(s)

=

a r g m a x

V* (s'),

a w h e r e

s'

is th e s u c c e s s o r o f s ta te

s.

(7 )

I n r e in f o r c e m e n t le a r n in g p r o b le m s , a n a g e n t g e n e r a lly d o e s n o t h a v e a c c e s s to th e e n v ir o n m e n t d y n a m ic s in th e f o r m o f th e tr a n s itio n p r o b a b ilitie s

T;

th u s , w e c a n n o t u s e d y n a m ic p r o g r a m m in g te c h n iq u e s . I n th e n e x t s e c tio n s , w e e x a m in e r e in f o r c e m e n t le a r n in g m e th o d s b a s e d o n d y n a m ic p r o g r a m m in g 1 , w h e r e w e d o n o t h a v e a c c e s s to th e e n v ir o n m e n t d y n a m ic s . I n s te a d , a n a g e n t h a s to le a r n f r o m th e e n v ir o n m e n t th r o u g h th e r e w a r d s e x p e r ie n c e d b y ta k in g d if f e r e n t a c tio n s .

3.2

Temporal-Difference

Learning

W e n o w tu r n o u r a tte n tio n to th e p r o b le m o f le a r n in g th e o p tim a l p o lic y w ith o u t p e r f e c t k n o w le d g e o f th e e n v ir o n m e n t. T h e o n ly w a y w e c a n le a r n a b o u t th e e n v ir o n m e n t is to e x p lo r e it b y ta k in g a c tio n s , o b s e r v in g th e r e w a r d a n d u s e th e e x p e r ie n c e to u p d a te th e v a lu e f u n c tio n . O n e w a y o f s o lv in g th e p r o b le m is to in c r e m e n ta lly e s tim a te th e v a lu e f u n c tio n V 7 r a s w e e n c o u n te r e a c h n e w s ta te . W e d e n o te th is a p p r o x im a te v a lu e f u n c tio n b y

V.

T h e c la s s o f te m p o r a l- d if f e r e n c e le a r n in g [ 2 8 ] a lg o r ith m s u p d a te th e c u r r e n t e s tim a te

V

(St)

b y u s in g th e v a lu e f u n c tio n e s tim a te s o f

temporally successive

s ta te s . T e m p o r a l-d if f e r e n c e m e th o d s a r e c a lle d

bootstrapping

m e th o d s , b e c a u s e th e y u p d a te e s tim a te s b a s e d o n o th e r e s tim a te s . B y lo o k in g o n e s te p a h e a d a t th e v a lu e f u n c tio n o f th e n e x t s ta te , w e c a n u p d a te th e c u r r e n t v a lu e f u n c tio n e s tim a te a s f o llo w s :

V(St)

+--

V(St)

+

a h + 1

+

!,V(St+l)

- V(St)],

w h e r e a is th e s te p s iz e p a r a m e te r .

(8)

1B a r to a n d S u tto n [ 3 0 ] p r e s e n t a u n if ie d v ie w r e la tin g d y n a m ic p r o g r a m m in g , M o n te C a r lo , a n d te m p o r a l-d if f e r e n c e m e th o d s f o r s o lv in g r e in f o r c e m e n t le a r n in g p r o b le m s .

(32)

I n itia liz e

V

(s)

a r b itr a r ily , 7 f to th e p o lic y to b e e v a lu a te d r e p e a t f o r e a c h e p is o d e : I n itia liz e

s

r e p e a t f o r e a c h s te p in e p is o d e : c h o o s e a c tio n

a

in s ta te

s

f r o m p o lic y 7 f ta k e a c tio n

a;

o b s e r v e r e w a r d

r,

a n d n e x t s ta te

s'

V(s) +- V(s)

+

a[r

+

,V(s')

- V(s)]

s

+-

s'

u n til

s

is te r m in a l

F ig u r e 3 .2 : E s tim a tin g V1l" w ith T D ( O ) .

T h e a lg o r ith m , c a lle d T D ( O ) f o r r e a s o n s w e w ill s e e s h o r tly , is s h o w n in F ig u r e 3 .2 . R e c a ll th a t th e v a lu e f u n c tio n

V

(s)

is th e e x p e c te d r e tu r n o f f o llo w in g p o lic y 7 [ f r o m

s ta te

s.

T h u s , th e T D ( O ) a lg o r ith m

predicts

th e r e w a r d a n a g e n t w ill r e c e iv e b y f o llo w in g p o lic y 7 [ f r o m s ta te

s.

I t h a s b e e n s h o w n th a t T D ( O ) c o n v e r g e s w ith p r o b a b ility 1 to

V1l"

f o r a n y f ix e d 7 [ w ith a n a p p r o p r ia te c h o ic e o f

a.

I f w e d e n o te

ak(a)

a s th e s te p s iz e

p a r a m e te r a f te r th e k th s e le c tio n o f a c tio n

a,

a s u ita b le c h o ic e is ak

(a)

=

t.

T h is f o llo w s f r o m th e w e ll- k n o w n r e s u lt in s to c h a s tic a p p r o x im a tio n th e o r y g iv in g th e c o n d itio n s f o r c o n v e r g e n c e w ith p r o b a b ility 1 a s : 00

L

ak(a)

=

0 0 k = l a n d 0 0

L

a%(a)

<

0 0 . k = l (9 ) A lth o u g h th is is a u s e f u l th e o r e tic a l r e s u lt, th e s te p s iz e d e c r e a s e a b o v e is s e ld o m u s e d in p r a c tic e [ 3 0 ] . I n s te a d , a c o n s ta n t s te p s iz e

ak(a)

=

a

is u s e d . T h is m a y b e s o f o r tw o r e a s o n s : f ir s t, th e c o n v e r g e n c e is o f te n s lo w o r n e e d s c o n s id e r a b le tu n in g f o r a s a tis f a c to r y c o n v e r g e n c e r a te ; s e c o n d , in n o n - s ta tio n a r y e n v ir o n m e n ts , c o n v e r g e n c e is u n d e s ir a b le a s th e r e w a r d f u n c tio n

R

m a y c h a n g e o v e r tim e , th u s , w e w a n t o u r le a r n e d p o lic y to c o n tin u a lly c h a n g e in r e s p o n s e to th e la te s t r e c e iv e d r e w a r d s .

3.3

Q-Learning

I n th e p r e v io u s s e c tio n , w e s a w h o w T D ( O ) c a n b e u s e d f o r p r e d ic tin g th e e x p e c te d r e w a r d o f a p a r tic u la r p o lic y 7 [ b y e s tim a tin g th e v a lu e f u n c tio n . I n th is s e c tio n , w e

(33)

If th e a g e n t k n o w s th e tra n s itio n p ro b a b ilitie s

T

o f th e e n v iro n m e n t, it c a n c h o o s e th e

a c tio n th a t le a d s to th e s u c c e s s o r s ta te w ith th e c o m b in e d m a x im u m v a lu e fu n c tio n

(E q u a tio n 7 ) a n d im m e d ia te re w a rd . T h e p ro b le m is th a t w e g e n e ra lly d o n o t h a v e

a m o d e l o f th e e n v iro n m e n t; th u s , w e d o n o t k n o w w h ic h a c tio n s ta k e u s to w h ic h

s ta te s . T h e s o lu tio n is to d e fin e a n e w v a lu e fu n c tio n Q 7 l " ( s , a ) , d e fin e d a s th e v a lu e o f

ta k in g a c tio n a in s ta te s w h ile fo llo w in g p o lic y 1f. T h is n e w v a lu e fu n c tio n is c a lle d

th e a c t i o n - v a l u e fu n c tio n , a n d V 7 l "( s ) th e s t a t e - v a l u e fu n c tio n .

W e d e fin e

Q*

( s , a ) a s th e e x p e c te d re tu rn o f ta k in g a c tio n a in s ta te s , a n d fo llo w in g

th e o p t i m a l p o l i c y fro m th e n o n . T h u s , w e c a n w rite Q * ( s , a ) in te rm s o f V * ( s ) : Q * ( s , a )

=

E { r t + 1

+

I ' V * ( S t + l ) 1 s t

=

s , a t

=

a } R e c a ll th a t

V*

(s)

is th e v a lu e o f ta k in g th e b e s t s te p in itia lly , s o w e a ls o h a v e : V * ( s )

=

m a x Q * ( s , a ) , a w h ic h e n a b le s u s to w rite E q u a tio n 1 0 re c u rs iv e ly : Q * ( s , a )

=

E { r t + l

+

I'm a x Q * (S t+ l' a ' ) 1 s t

=

s , a t

=

a } . a '

(1 0 )

(11)

(1 2 )

W h e re a s T D (O ) is u s e d to p re d ic t th e e x p e c te d re tu rn o f s ta te s w h ile fo llo w in g p o lic y

1f, Q -L e a rn in g [3 4 ] in c re m e n ta lly e s tim a te s th e o p tim a l a c tio n -v a lu e fu n c tio n Q * ( s , a ) .

T h e u p d a te ru le is g iv e n b y :

Q ( S t , a t ) f- Q ( S t , a t )

+

e x h + l

+

I'm a x Q (s t+ 1 , a ) - Q ( S t , a t ) ] .

(13)

a

T h e Q -L e a rn in g a lg o rith m s h o w n in F ig u re 3 .3 c o n v e rg e s to th e o p tim a l a c tio n -v a lu e

fu n c tio n

Q*

w ith p ro b a b ility

1

u n d e r th e s a m e c o n d itio n s fo r e x a s in T D (O ), p ro v id e d

e a c h s ta te -a c tio n p a ir is trie d in fin ite ly o fte n . W e w ill p ro v e th e c o n v e rs io n re s u lts in a

la te r s e c tio n .

In th e Q -L e a rn in g a lg o rith m , w e m u s t s e le c t a c tio n s b a s e d o n a s u ita b le e x p lo ra tio n

s tra te g y d e riv e d fro m Q . A n y s tra te g y th a t g u a ra n te e s th a t e a c h s ta te -a c tio n p a ir w ill

b e trie d in fin ite ly o fte n w ill s u ffic e . O n e o f th e s im p le s t s tra te g ie s is E -g re e d y , w h e re

a n a g e n t c h o o s e s th e a c tio n w ith m a x im a l Q -v a lu e in th a t s ta te w ith p ro b a b ility

1 -

E

a n d a ra n d o m a c tio n w ith a s m a ll p ro b a b ility E . W h e n a n a g e n t c h o o s e s a n a c tio n

w ith m a x im u m Q -v a lu e , it is e x p l o i t i n g p re v io u s ly s to re d in fo rm a tio n , w h e re a s ra n d o m

a c tio n s re s u lt in e x p l o r a t i o n . W e w ill d is c u s s th e tra d e o ff b e tw e e n e x p lo ra tio n a n d

(34)

In itia liz e

Q(8, a)

a rb itra rily re p e a t fo r e a c h e p is o d e :

In itia liz e 8

re p e a t fo r e a c h s te p in e p is o d e :

c h o o s e a c tio n a in s ta te 8 u s in g e x p lo ra tio n p o lic y d e riv e d fro m Q

ta k e a c tio n a ; o b s e rv e re w a rd r , a n d n e x t s ta te 8 '

Q ( 8 , a )

+--

Q ( 8 , a )

+

a [ r

+

'Y

m a xa , Q ( 8 ', a ') - Q ( 8 , a ) ]

8

+--

8 '

u n til 8 is te rm in a l

F ig u re 3 .3 : E s tim a tin g

Q*

w ith Q -L e a rn in g .

Q -L e a rn in g is c a lle d a n o f f - p o l i c y le a rn in g a lg o rith m b e c a u s e it c o n v e rg e s to th e o p tim a l

v a lu e fu n c tio n i n d e p e n d e n t o f th e e x p lo ra tio n p o lic y b e in g fo llo w e d . In o th e r w o rd s , th e

d e ta ils o f th e p a rtic u la r e x p lo ra tio n s tra te g y d o n o t in flu e n c e th e v a lu e fu n c tio n , b u t

o n ly th e ra te o f c o n v e rg e n c e . T h e re is a ls o a n o n -p o lic y Q -L e a rn in g a lg o rith m c a lle d

S A R S A [3 0 ]' in w h ic h th e e x p lo ra tio n s tra te g y is ta k e n in to a c c o u n t. H o w e v e r, b o th

a lg o rith m s c o n v e rg e to th e s a m e v a lu e fu n c tio n w h e n E , th e p ro b a b ility o f e x p lo ra tio n ,

d e c re a s e s to w a rd s z e ro .

3 .4

TD (,\) Learning

T h e T D (O ) le a rn in g m e th o d w e s tu d ie d p re v io u s ly is a s p e c ia l c a s e o f a c la s s o f te m p o ra

l-d iffe re n c e le a rn in g m e th o d s c a lle d T D (A ), w ith A

=

o.

In th e u p d a te ru le o f T D (O )

(E q u a tio n 8 ), w e lo o k a h e a d o n e s te p to th e v a lu e fu n c tio n o f th e n e x t s ta te . T h e u p d a te m o v e s th e e s tim a te c lo s e r to th e ta rg e t v a lu e o f e s tim a te d re tu rn : R~l) =r t + l

+

'Y v t(S t+ l).

(1 4 )

W e c a n g e n e ra liz e th e ta rg e t to th e c a s e o f n s te p s , a ls o c a lle d th e c o r r e c t e d n - s t e p t r u n c a t e d r e t u r n :

R(n)

2 n - l

nV; (

)

t

=

r t + l

+

'Y

r t + 2

+

'Y

r t + 3

+ ...+

'Y

r t + n

+

'Y

t S t + n .

(1 5 )

It c a n b e s h o w n [3 0 ] th a t th e e x p e c te d v a lu e o f th e c o rre c te d n -s te p tru n c a te d re tu rn is

a n im p ro v e m e n t o v e r th e c u rre n t v a lu e fu n c tio n a s a n a p p ro x im a tio n to th e tru e v a lu e

Referenties

GERELATEERDE DOCUMENTEN

Copyright and moral rights for the publications made accessible in the public portal are retained by the authors and/or other copyright owners and it is a condition of

Uit andere grachten komt schervenmateriaal dat met zekerheid in de Romeinse periode kan geplaatst worden. Deze grachten onderscheiden zich ook door hun kleur en vertonen een

Hier zal men bij de verbouwing dan ook de Romaanse kerk op uitzondering van de toren afgebroken hebben en een nieuwe, driebeukige kerk gebouwd hebben ten oosten van de toren.

Although the kinetic data obtained could not substantiate the use of pNPB as sole substrate for activity monitoring, it was shown, using food grade lecithin as

Hoewel de ICRP-mcdelbenadering bedoeld is voor toepassing bij beroepsmatig blootgestelde personen kunnen deze gegevens toch worden gebruikt bij het bepalen van de ordegrootte van

Theorem 5 (Equilibrium flows in unweighted atomic instances) [R18.12] Let

- establish a direct link to each other rather than route through the core Internet. How to

Fifth, in Chapter 6, we want to test the performance of our new tuning method in various ways: (i) by comparing our method for tuning routing algorithms with alternative tuning