Si

(1)

with a simple neural

network

Ba helor'sthesis

July2012

Student:MartienS heepens

(2)

1 Introdu tion 2

2 Theory 3

2.1 Per eptron. . . 3

2.2 Hebbianlearning . . . 3

2.3 Deliberationversus Voting . . . 4

3 Building a neural network 5 3.1 Creatingan ensemble. . . 5

3.2 Communi ation inthe ensemble . . . 6

3.3 Deliberationintheensemble. . . 7

3.3.1 Rosenblatt . . . 7

3.3.2 HebbianLearning. . . 8

4 Simulation and Results 10 4.1 Ee tof the initial overlap. . . 10

4.2 Ee tof theensemblesize . . . 11

4.3 Ee tof size oftheinput pattern . . . 11

4.4 Ee tof using alimitedset ofexamples . . . 11

4.5 Heterogeneous ensembles. . . 12

4.6 Reliability . . . 13

5 Con lusion and Outlook 15 5.1 ComparisonwithHartmann&Raee-Rad . . . 15

5.2 Outlook . . . 16

(3)

Introdu tion

StephanHartmannandSoroushRaee-Radanalyzedthedieren ebetween

voting and deliberation in various groups. They showed that deliberation

and voting perform dierent in nding the truth. In general voting is the

better method to nd the truth, but for small groups with groupmembers

who dier in quality, deliberation has a greater han e to arrive at truth.

Inthis thesisitisinvestigated, ifit ispossible to simulatedeliberationwith

per eptronsasgroupmembers(aper eptronisasimpleneuralnetworkthat

an betrained to make de isions for linear problems) and ifthe results are

omparable with the results of Hartmann and Raee-Rad. Both questions

wereanswered positively. Furthermore arelation wasfound whi h makesit

possibleto ompare heterogeneousgroupsandtopredi ttheresultbasedon

resultsof othergroups.

(4)

Theory

2.1 Per eptron

Aper eptronisalayeredfeed-forward network reated byFrankRosenblatt

in 1957 [1℄ [2℄. A per eptron has an input layer, an output layer and zero

or morehiddenlayers. The weight ofea h onne tion an be adjusted. The

weightsofevery onne tionarestoredinweight ve tors

w _i

. Theoutputofa per eptronisoftendetermined withathresholdfun tion,for exampleasign

fun tion. Before using a per eptronto lassify input, itmust be trained by

atea her. To train theper eptronwe need a set ofexamples (input)

ξ

^and

theexpe ted output

E ^µ(t) = w(t) · xi ^µ(t) S ^µ(t)

^,^where

S = ±1

^are^labels.

Thelearning rulefor a per eptronnetwork (Rosenblatt algorithm)is

w (t + 1) =

w (t) + _N ¹ ξ ^µ(t) S ^µ(t) if E ^µ(t) ≤ 0

w (t) else

^(2.1)

The per eptron is not updated if the signum of the output was orre t,

otherwisea orre tion termis added to the weight ve tor

w _i

2.2 Hebbian learning

Hebbian learning wasintrodu ed by Donald Hebb in 1949 [1 ℄ [3℄ and orig-

inates from neuros ien e. It des ribes a possible way for learning in the

human brainand isapplied inneural networks,too.

Hebbian learning is lo al. The output

S ^µ(t)

ôf â ^simple ^system îs ^matrix

multipli ationbetween theinputve tors

ξ

^and ^the^weight ^ve
tors

w

.

S ^µ(t) = ξ ^T w

(2.2)

(5)

The update of a weight ve tor

w _i

is a multipli ation of theoutput

S ^µ(t)

^of

the systemwith the input ve tor

ξ i

^.

η

îs â ^fa
tor ^to ôntrol ^the ^hange ôf

thelearningrate.

∆w i = ηS ^µ(t) ξ _i

^(2.3)

2.3 Deliberation versus Voting

StephanHartmannandSoroushRaee-Rad ompared theresultsfor voting

anddeliberation invarious groups withdierent sizes[4℄.

The results for voting are al ulated with Condor et's jury theorem. Con-

dor et'sjury theoremdes ribes the han e ofa grouparriving at thetruth.

The voters are independent and have the same probability

p

^to ^make ^the

orre t de ision (homogeneous group). If the probability

p > ¹ ₂

^, ^the ^group

will make the orre t de ision. Furthermore, in reasing the group size will

resultinabetterde isionofthegroup. Iftheprobability

p > ¹ ₂

^and

p ₁ > p ₀

^,

theresultfor

p ₁

^is^better^than ^for

p ₀

^[5℄.

Deliberation means that the members of the group try to onvin e other

members, aim is to rea h a onsensus. The results for deliberation are al-

ulated witha Bayesian Model and two variables on the reliability of ea h

member. ItisshownbyHartmannandRaee-Radthatdeliberationistruth-

ondu tive.

Con lusionofHartmannandRaee-Radwasthatvotinghasahigher han e

of arriving at truthfor homogeneous and almost homogeneous groups. For

heterogeneous groupsdeliberationperformsbetter than votingwitha max-

imumfor the groupsize of 50members.

(6)

Building a neural network

I want to show that a simple neural network an be used to simulate the

deliberationphase. Thesteps tomodifyper eptrons forthistaskareshown

inthe nextse tions.

3.1 Creating an ensemble

A requirement for theensemble is that the studentshave a predened dis-

tan eto thetea heratthe start ofthesimulation. It ispossible tomake an

ensemblewithregularper eptrontraining anda stop onditioninthealgo-

rithm, but onstru ting thestudentsmathemati ally is qui kerand models

thefa tthatper eptrontrainingisnot ompletelyreliable. Also omparison

of simulationruns is easier sin e theresults of these runs an be ompared

basedon the start value.

To onstru t the per eptrons, the per eptrons must fulll

|w i | = 1

^and

w _i · B = R

^with

B

astea her andthe preset overlap

R

^,

R

^is ^dened ^as

R ≡ w · B

^(3.1)

Therststep isto reate anormalized ve tor

w ˜ _i

withrandom omponents, theinitial state is

| ˜ w _i | = 1

^. ^Then ^we ^have ^to ^al
ulate

a

^and

b

^for ^equation

3.2fromproperties 3.3and 3.4 .

w _i = a · ˜ w _i + b · B

^(3.2)

w _i · w _i ⇔ a ² + 2ab · ˜ w _i · B + b ² = 1

^(3.3)

w _i · B ⇔ a · ˜ w _i · B + b = R

^(3.4)

Thesolutions for

a

^and

b

^are

(7)

a = s

1 − R ²

1 − ˜ w _i · B

^(3.5)

b = R − a ˜ w _i · B

^(3.6)

With al ulating

a

^and

b

^andsubstituting3.5and3.6in3.2 ,wegetnormal- ized ve tors (students) whi h are pla ed on a multidimensional one with

thetea herve toronthe entralaxis. Themeanofthestudentsisinaideal

situation identi alto thepositionof thetea herve tor.

3.2 Communi ation in the ensemble

Wewant tosimulatea phaseofdeliberation withintheensemble. There are

severalpossibilitiesfor implementing thedis ussion:

•

^Random: â ^student â
ts âs^tea
her^for â ^randomly^hosen^student.

Thestudent and thetea herare hosenrandomly. Therolesareshuf-

edeveryiteration.

•

Consulting: all students a tastea herfor one student.

One student is appointed randomly the role of student. All other

studentswilla tastea herforthatstudent. Thesequen eoftea hers

israndomized.

•

Broad asting: one student a tsastea her for theentire ensemble.

One student is appointed randomly the role of tea her. The student

willfulllthisrole duringoneround andwill ommuni atewithevery

otherstudent on e.

To determine whi h method is thebestthe methods have to be ompared.

Themethods anbe ompared onhowwelltheensemblegrowstogetherand

whatthe dieren ewiththe tea heris.

Theoverlap of the group withthe tea her(see formula3.1 ) will be usedto

determine the best method. Aim is to nd the method with the highest

overlap

R

^:

R ≡ w i · B

|w _i |

^(3.7)

The mean of all possible overlap ombinations between all students. This

an be usedto measurethedistan es ofthe membersof thegroup.

S ≡ w i · w j

|w _i ||w _j |

^(3.8)

(8)

T ≡ w i · B

|w _i |

^(3.9)

Thevalues inthe tableshowthatalltypesof ommuni ation have thesame

startingposition. Theresultsfor hoosingrandomly astudentgeneratesthe

bestresultsfortheoverlapwiththetea her(

R _{f inal}

^). Broad astingperforms worst,but all studentsintheensemble merge towards one opinion (

S _{f inal}

^).

Theperforman eofbroad astingisprobablybadbe auseoftheimportan e

of one student. He an tellall theother students his opinion and all other

students will adjust their opinion and move towards this student. Why

onsulting isnot workingwasnot investigated. The approa h withrandom

rolesobviously performs best.

random onsulting broad asting

T _initial

^0.7 ^0.7 ^0.7

S initial

^0.5 ^0.5 ^0.5

R initial

^0.98 ^0.98 ^0.98

T _{f inal}

^0.89 ^0.69 ^0.53

S _{f inal}

^0.90 ^0.73 ^0.99

R _{f inal}

^0.95 ^0.81 ^0.51

Table 3.1: Results for dierent typesof ommuni ation

3.3 Deliberation in the ensemble

The deliberation phase inthe dis ussion is modeled by thelearning step of

theensemble. As shown earlier the students are pla ed on a onewith the

tea herasits enter. Theidealmethodshould ontra tthe ir leofthe one

where the studentsare pla eduniformly. The method withrandom pairing

approa heswill beusedas ommuni ation method.

3.3.1 Rosenblatt

Inanormalper eptronar hite ture theRosenblattalgorithm isused,equa-

tion2.1 . Ifthe ondition

E ^µ(t) ≤ 0

îs ^true, ^the ^per
eptronîs ûpdated. ^The

onditionisbasedontheangle ofthetea herto thestudentsandtheaim is

to minimizethis angle.

A possible output of the simulation is shown in gure 3.1. The line in the

plotis

R(t)

^(see^formula^3.7),^the^overlap^of^the^group^with^the^tea
her. ^The

qualityof theoutputof the ensemblede reases withevery iteration.

(9)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

# discussion rounds

overlap R

Figure3.1: SimulationwithRosenblatt learningstep

Inasituationwheretheomnis ienttea herisabsentandrepla edbya(ran-

domly hosen) student a ting astea her, unwanted behavior o urs. If the

ensemblemembershaveasimilaropinionononeexample,no ommuni ation

takespla eandthereisnolearningeort. Iftheensemblemembersdierin

opinion,thememberintheroleofstudentisfor edtore al ulatehisposition

andisshiftedtowardshistea her. Theee tofthesetwooptionsisthatthe

dieren e between the meanof theensembleand thetruth in reases. After

movingawayfromthetruth,theensemblewill notbeableto ndthetruth

anymore. Furthermore the ensemble annot merge to one ommon output,

sin e theyonly re al ulatepositions ifthedieren eis greatenough.

3.3.2 Hebbian Learning

Thesolution for the problemisto manipulate the ondition

E ^µ(t) ≤ 0

^. ^It ^is

possible to drop the ondition, whi h means thatwe useHebbian Learning

for the deliberation phase [6℄ [7 ℄. Another possibility is to use the oppo-

site ondition

E ^µ(t) > 0

^. ^This ^means ^that ^agreements ^between ^ensemble

membersareenhan ed, whiledisagreementsareignored. After hangingthe

ondition,the outputofthe ensembleisstable. The resultsforbothoptions

aresimilar aftera great number of steps,the ve torsof thestudentsmerge

towardstheir mean. Theresults areshown ingure3.2.

The option with the inversed ondition was hosen for the next tests sin e

theresultsare slightly better.

(10)

0 100 200 300 400 500 600 700 800 900 1000 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

# discussion rounds

overlap R

without if statement inversed if statement original configuration

Figure3.2: Comparison of Rosenblatt andHebb

(11)

Simulation and Results

Forallsimulationsandgraphi sthefollowingparameterswereusedunlessin-

di atedotherwise:

R = 0.7

^,

n = 20

^,

ensemblesize = 20

^,

durationdiscussion = 1000

^. ^The^dis
ussion^length^of¹⁰⁰⁰ ^rounds^(with^ea
h ^having

ensemblesize

intera tions) was arbitrarily hosen, the output of the ensemble is usually

stableafterabout 500-600rounds. Theensemblesize was hosenfor agood

performan e/qualityratio.

4.1 Ee t of the initial overlap

The initial overlap

R _initial

îs â ^parameter ⁱⁿ ^the ^pro
edure ^to ^reate ân

ensemble3.6. Theee t ofmanipulating

R _initial

^is^shownⁱⁿ^gure^4.1. ^The

predi tion oftheensemble(

R f inal

⁾ ^gets^better ^with^an^in
reased

R initial

^.

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

overlap R initial

overlap R final

Figure4.1: Ee t of

R _initial

^on ^the^nal^overlap

R _{f inal}

(12)

0 10 20 30 40 50 60 70 80 90 100 0.6

0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

ensemble size

final overlap R

Figure4.2: Ee t oftheensemblesize on thenaloverlap

R _{f inal}

4.2 Ee t of the ensemble size

Theensembleperformsbetterwithmore members, seegure4.2 . Sin ethe

members of the ensemble are pla ed on a ir le with the tea her as enter,

a su ient number of members in the ensemble is important. With a few

students the han eis low thatthese studentshave equal distan es to ea h

other, witha highnumberitismore likely thatthe distributionis uniform.

A se ondaspe t isthatunsupervised learning needsdata withredundan y,

otherwisethe network annotseparate noise frompatterns.

4.3 Ee t of size of the input pattern

The dimensionality of the input pattern has very little or no ee t on the

naloverlap

R _{f inal}

^(see ^table^below).

dimension 5 10 20 30 50 100

naloverlap

R _{f inal}

^0.955 ^0.943 ^0.954 ^0.935 ^0.942 ^0.942

4.4 Ee t of using a limited set of examples

Allresultsinthepre edingtestsaremadewithanensemble thathasa ess

to an unlimited set of examples (examples are generated randomly when

(13)

in a training set and a test set. To simulate a test set, the patterns are

generatedbeforestartingthealgorithm,arandomfun tionpi kseveryround

anexample. As showningure4.3more exampleswill givea better result.

Thegrowth seemsto beof alogarithmi nature.

0 20 40 60 80 100 120

0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65

dimension of P

overlap R

10 ⁰ 10 ¹ 10 ² 10 ³ 10 ⁴

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

dimension of P

overlap R

Figure4.3: Ee tofsizeofthesetwithexamplesonthenaloverlap

R f inal

^.

Onthe leftwithlinear s ale,on the right withlogarithmi s ale

4.5 Heterogeneous ensembles

0 20 40 60 80 100 120 140 160 180 200

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

# discussion rounds

overlap R

Figure4.4: Evolution ofstudent overlap

R i (t)

ⁱⁿ ^a^small ^simulation ^with ^a

heterogeneousensemble, 4memberswith

R _initial = 0.3

^and² ^members^with

R _initial = 0.8

Heterogeneous ensembles are ensembles where a part of the group has an

(14)

expe ted behavior o urs or if the heterogeneous ensemble behaves like a

homogeneousensemble.

Asimplesimulationwithsixstudents(fourstudentshave

R _initial = 0.3

^,^two

students have

R _initial = 0.8

⁾ ^shows ^that^the ^outputs ^of ^the^members ^merge

(gure4.4 ). The students withthe low Rimprove a lot, while thestudents

theRvalueof theotherstudentsde reases slightly (see alsogure4.5 ).

Thesimulationwasrepeatedwithdierentvaluesfor

R ₁

^and

R ₂

^and^varying

ratios

R ₁ : R ₂

^. ^Tôômpare^theôverlap

R _{f inal}

^from^the^dierentsimulations, themeanof theoverlap at thestart was al ulated:

R heterogenous =

n

X

i=0

R i

n .

^(4.1)

Theresultis plotted ingure4.6 .

Theplotshowsjustoneline,itappearsthatarelationshipexistsbetweenthe

meanofall

R _i

ⁱⁿ^a^group^and

R _f inal

^. Îfône^knowsâll

R _i

ⁱⁿân ênsemble, ît

ispossibletomakeapredi tionforthenaloverlap. Aheterogeneousgroup

an be simulatedwitha simpler homogeneous group ofthesame size.

4.6 Reliability

Astudent willlearn fromanyotherstudent whoa tsastea her (unlessthe

ifstatement dis ussed in 3.2isnot true). In realworldsome peoplebelieve

or disbelieveother people.

Themodelof Hartmannand Raee-Rad usestwo reliabilities for thegroup

members. The rst reliability is the han e of making the orre t de ision.

The othervalue is the han eof a group memberto seewhat the rst reli-

ability of an other group member is. A low valuefor the se ond reliability

(15)

−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1

−1

−0.8

−0.6

−0.4

−0.2 0 0.2 0.4 0.6 0.8 1

initial overlap R

final overlap R

Figure4.6: Finaloverlapofthestudentsasfun tionofthemeanoftheinitial

overlap

means that a member will not learn from other members sin e he annot

tra ktheir opinion.

To simulate reliabilities inmy model, ea h student hasa stati ve tor with

reliability values for the other students. Using the reliability ve tors for a

homogeneous groupis meaningless,the reliability value behavesas a fa tor

whi hjustslows the pro essdown.

A simulationwithtwo (or more) independent groups withdierent

R _i

^pro-

vides no interesting results. If the members of ea h group are distributed

uniformlyon the one, ea h group should arrive nearthe truth. The ee t

is omparablewithredu ing theensemblesize. Theuseofreliabilityve tors

for aheterogeneous groupis for thesame reason not interesting.

(16)

Con lusion and Outlook

5.1 Comparison with Hartmann & Raee-Rad

Thesimulationofthe deliberationpro essinagroup waspossible withreg-

ular per eptrons. The result of the deliberation pro ess with per eptrons

annotbe ompared easilywiththeresultsofHartmann&Raee-Rad. The

gures below look similar, but byadjusting the per eptron parameters the

right gure ould look the same as the left gure. This might look more

impressive butthe mathemati al modelof Hartmann&Raee-Rad ismore

sophisti atedthansimpleper eptronintera tions. ThereforeI anonly on-

ludethatinbothmodelsthereisarelationship between thegroupsize and

theprobabilitythatthe group hooses the orre tanswer. Ifthegroup size

in reases,the probabilityin reases withthe value

p max = 1

^as^asymptote.

(a)HartmannandRaee-Rad

0 20 40 60 80 100

0.5 0.6 0.7 0.8 0.9 1 1.1

ensemble size

probality

(b)se tion 4.2

Figure 5.1: Results of Hartmann &Raee-Rad and my work for the group

size (homogeneousgroup), gures have similars ale.

(17)

It was shown that per eptrons are suitable to simulate deliberation in a

group, with the best result if the order of students is hosen randomly. It

mightbeinterestingto he kwhetherotherlearningalgorithmswill omplete

the task and the results are omparable. I looked briey into two other

typesof ommuni ation and onsidered broad asting and onsulting asnot

suitablefor thetask. Itshould be possible tond othertypesofintera tion

forthe ensemble. Theintera tion ouldbe basedonthedistan eofthetwo

students.

Anotherimprovement isto reate a dynami reliabilityve tor. A reliability

value ouldbein reasedifthe twostudentsinuen edea hotherearlierand

de reaseifthey repel ea h other. Also thereliability ve tor ouldbe based

on the number neighbors. In other words, ifmany other students are near

to the hosen student of thatmoment,thereliabilityin reases.

(18)

[1℄ J. Hertz,A. Krogh &R.G. Palmer (1991). Introdu tion to the theory of

neural omputation.Boulder: Westview Press.

[2℄ F. Rosenblatt (1957). The Per eptrona per eiving and re ognizing au-

tomaton. Report 85-460-1,Cornell Aeronauti al Laboratory.

[3℄ D.O.Hebb (1949).The Organizationof Behavior. NewYork: Wileyand

Sons.

[4℄ S. Hartmann & S. Raee-Rad (2010). Voting, Deliberation and

Truth. http://stephanhartmann.org/HartmannRafieeRad_VDT.pdf.

Retrieved on01-12-2011.

[5℄ P.J.Boland (1989).MajoritysystemsandtheCondor etJury Theorem.

The Statisti ian,38, 181-189.

[6℄ D.Bolle&G.M.Shim(1995). NonlinearHebbiantrainingoftheper ep-

tron. Network: Computationin Neural Systems,6,619-633.

[7℄ M. Biehl & A. Mietzner (1993). Statisti al Me hani s of Unsupervised

Learning. Europhysi s Letters,24, 421-426.

Matlab ode

Thesour e ode ishosted athttp://martien.home.fmf.nl/s riptie/.

A knowledgement

I would like to thank Soroush Raee-Rad for providing the gures of his

resear h.

Si

w i

ξ

E µ(t) = w(t) · xi µ(t) S µ(t)

S = ±1

w (t + 1) =

 w (t) + N 1 ξ µ(t) S µ(t) if E µ(t) ≤ 0

w (t) else

w i

S µ(t)

ξ

w

S µ(t) = ξ T w

w i

S µ(t)

ξ i

η

∆w i = ηS µ(t) ξ i

p

p > 1 2

p > 1 2

p 1 > p 0

p 1

p 0

|w i | = 1

w i · B = R

B

R

R

R ≡ w · B

w ˜ i

| ˜ w i | = 1

a

b

w i = a · ˜ w i + b · B

w i · w i ⇔ a 2 + 2ab · ˜ w i · B + b 2 = 1

w i · B ⇔ a · ˜ w i · B + b = R

a

b

a = s

1 − R 2

1 − ˜ w i · B

b = R − a ˜ w i · B

a

b

•

•

•

R

R ≡ w i · B

|w i |

S ≡ w i · w j

|w i ||w j |

T ≡ w i · B

|w i |

R f inal

S f inal

T initial

S initial

R initial

T f inal

S f inal

R f inal

E µ(t) ≤ 0

R(t)

0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0

0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1

# discussion rounds

overlap R

E µ(t) ≤ 0

E µ(t) > 0

0 100 200 300 400 500 600 700 800 900 1000 0.5

0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1

# discussion rounds

overlap R

without if statement inversed if statement original configuration

R = 0.7

n = 20

ensemblesize = 20

durationdiscussion = 1000

w _i

E ^µ(t) = w(t) · xi ^µ(t) S ^µ(t)

w (t) + _N ¹ ξ ^µ(t) S ^µ(t) if E ^µ(t) ≤ 0

w _i

S ^µ(t)

S ^µ(t) = ξ ^T w

w _i

S ^µ(t)

∆w i = ηS ^µ(t) ξ _i

p > ¹ ₂

p > ¹ ₂

p ₁ > p ₀

p ₁

p ₀

w _i · B = R

w ˜ _i

| ˜ w _i | = 1

w _i = a · ˜ w _i + b · B

w _i · w _i ⇔ a ² + 2ab · ˜ w _i · B + b ² = 1

w _i · B ⇔ a · ˜ w _i · B + b = R

1 − R ²

1 − ˜ w _i · B

b = R − a ˜ w _i · B

|w _i |

|w _i ||w _j |

|w _i |

R _{f inal}

S _{f inal}

T _initial

T _{f inal}

S _{f inal}

R _{f inal}

E ^µ(t) ≤ 0

E ^µ(t) ≤ 0

E ^µ(t) > 0

R _initial

R _initial

R _initial

R _{f inal}

R _{f inal}

R _{f inal}

R _{f inal}

10 ⁰ 10 ¹ 10 ² 10 ³ 10 ⁴

R _initial = 0.3

R _initial = 0.8

R _initial = 0.3

R _initial = 0.8

R ₁

R ₂

R ₁ : R ₂

R _{f inal}

R _i

R _f inal

R _i

R _i