with a simple neural
network
Ba helor'sthesis
July2012
Student:MartienS heepens
1 Introdu tion 2
2 Theory 3
2.1 Per eptron. . . 3
2.2 Hebbianlearning . . . 3
2.3 Deliberationversus Voting . . . 4
3 Building a neural network 5 3.1 Creatingan ensemble. . . 5
3.2 Communi ation inthe ensemble . . . 6
3.3 Deliberationintheensemble. . . 7
3.3.1 Rosenblatt . . . 7
3.3.2 HebbianLearning. . . 8
4 Simulation and Results 10 4.1 Ee tof the initial overlap. . . 10
4.2 Ee tof theensemblesize . . . 11
4.3 Ee tof size oftheinput pattern . . . 11
4.4 Ee tof using alimitedset ofexamples . . . 11
4.5 Heterogeneous ensembles. . . 12
4.6 Reliability . . . 13
5 Con lusion and Outlook 15 5.1 ComparisonwithHartmann&Raee-Rad . . . 15
5.2 Outlook . . . 16
Introdu tion
StephanHartmannandSoroushRaee-Radanalyzedthedieren ebetween
voting and deliberation in various groups. They showed that deliberation
and voting perform dierent in nding the truth. In general voting is the
better method to nd the truth, but for small groups with groupmembers
who dier in quality, deliberation has a greater han e to arrive at truth.
Inthis thesisitisinvestigated, ifit ispossible to simulatedeliberationwith
per eptronsasgroupmembers(aper eptronisasimpleneuralnetworkthat
an betrained to make de isions for linear problems) and ifthe results are
omparable with the results of Hartmann and Raee-Rad. Both questions
wereanswered positively. Furthermore arelation wasfound whi h makesit
possibleto ompare heterogeneousgroupsandtopredi ttheresultbasedon
resultsof othergroups.
Theory
2.1 Per eptron
Aper eptronisalayeredfeed-forward network reated byFrankRosenblatt
in 1957 [1℄ [2℄. A per eptron has an input layer, an output layer and zero
or morehiddenlayers. The weight ofea h onne tion an be adjusted. The
weightsofevery onne tionarestoredinweight ve tors
w i. Theoutputofa per eptronisoftendetermined withathresholdfun tion,for exampleasign
fun tion. Before using a per eptronto lassify input, itmust be trained by
atea her. To train theper eptronwe need a set ofexamples (input)
ξ
andtheexpe ted output
E µ(t) = w(t) · xi µ(t) S µ(t),where S = ±1
arelabels.
Thelearning rulefor a per eptronnetwork (Rosenblatt algorithm)is
w (t + 1) =
w (t) + N 1 ξ µ(t) S µ(t) if E µ(t) ≤ 0
w (t) else
(2.1)The per eptron is not updated if the signum of the output was orre t,
otherwisea orre tion termis added to the weight ve tor
w i
2.2 Hebbian learning
Hebbian learning wasintrodu ed by Donald Hebb in 1949 [1 ℄ [3℄ and orig-
inates from neuros ien e. It des ribes a possible way for learning in the
human brainand isapplied inneural networks,too.
Hebbian learning is lo al. The output
S µ(t) of a simple system is matrix
multipli ationbetween theinputve tors
ξ
and theweight ve torsw
.S µ(t) = ξ T w
(2.2)The update of a weight ve tor
w i
is a multipli ation of theoutput
S µ(t) of
the systemwith the input ve tor
ξ i. η
is a fa
tor to
ontrol the
hange of
thelearningrate.
∆w i = ηS µ(t) ξ i (2.3)
2.3 Deliberation versus Voting
StephanHartmannandSoroushRaee-Rad ompared theresultsfor voting
anddeliberation invarious groups withdierent sizes[4℄.
The results for voting are al ulated with Condor et's jury theorem. Con-
dor et'sjury theoremdes ribes the han e ofa grouparriving at thetruth.
The voters are independent and have the same probability
p
to make theorre t de ision (homogeneous group). If the probability
p > 1 2, the group
will make the orre t de ision. Furthermore, in reasing the group size will
resultinabetterde isionofthegroup. Iftheprobability
p > 1 2 andp 1 > p 0,
theresultfor
p 1 isbetterthan for p 0[5℄.
Deliberation means that the members of the group try to onvin e other
members, aim is to rea h a onsensus. The results for deliberation are al-
ulated witha Bayesian Model and two variables on the reliability of ea h
member. ItisshownbyHartmannandRaee-Radthatdeliberationistruth-
ondu tive.
Con lusionofHartmannandRaee-Radwasthatvotinghasahigher han e
of arriving at truthfor homogeneous and almost homogeneous groups. For
heterogeneous groupsdeliberationperformsbetter than votingwitha max-
imumfor the groupsize of 50members.
Building a neural network
I want to show that a simple neural network an be used to simulate the
deliberationphase. Thesteps tomodifyper eptrons forthistaskareshown
inthe nextse tions.
3.1 Creating an ensemble
A requirement for theensemble is that the studentshave a predened dis-
tan eto thetea heratthe start ofthesimulation. It ispossible tomake an
ensemblewithregularper eptrontraining anda stop onditioninthealgo-
rithm, but onstru ting thestudentsmathemati ally is qui kerand models
thefa tthatper eptrontrainingisnot ompletelyreliable. Also omparison
of simulationruns is easier sin e theresults of these runs an be ompared
basedon the start value.
To onstru t the per eptrons, the per eptrons must fulll
|w i | = 1
andw i · B = R
withB
astea her andthe preset overlap
R
,R
is dened asR ≡ w · B
(3.1)Therststep isto reate anormalized ve tor
w ˜ i withrandom
omponents,
theinitial state is| ˜ w i | = 1
. Then we have to
al
ulatea
and b
for equation
3.2fromproperties 3.3and 3.4 .
w i = a · ˜ w i + b · B
(3.2)w i · w i ⇔ a 2 + 2ab · ˜ w i · B + b 2 = 1
(3.3)w i · B ⇔ a · ˜ w i · B + b = R
(3.4)Thesolutions for
a
andb
area = s
1 − R 2
1 − ˜ w i · B
(3.5)b = R − a ˜ w i · B
(3.6)With al ulating
a
andb
andsubstituting3.5and3.6in3.2 ,wegetnormal- ized ve tors (students) whi h are pla ed on a multidimensional one withthetea herve toronthe entralaxis. Themeanofthestudentsisinaideal
situation identi alto thepositionof thetea herve tor.
3.2 Communi ation in the ensemble
Wewant tosimulatea phaseofdeliberation withintheensemble. There are
severalpossibilitiesfor implementing thedis ussion:
•
Random: a student a ts astea herfor a randomly hosenstudent.Thestudent and thetea herare hosenrandomly. Therolesareshuf-
edeveryiteration.
•
Consulting: all students a tastea herfor one student.One student is appointed randomly the role of student. All other
studentswilla tastea herforthatstudent. Thesequen eoftea hers
israndomized.
•
Broad asting: one student a tsastea her for theentire ensemble.One student is appointed randomly the role of tea her. The student
willfulllthisrole duringoneround andwill ommuni atewithevery
otherstudent on e.
To determine whi h method is thebestthe methods have to be ompared.
Themethods anbe ompared onhowwelltheensemblegrowstogetherand
whatthe dieren ewiththe tea heris.
Theoverlap of the group withthe tea her(see formula3.1 ) will be usedto
determine the best method. Aim is to nd the method with the highest
overlap
R
:R ≡ w i · B
|w i |
(3.7)The mean of all possible overlap ombinations between all students. This
an be usedto measurethedistan es ofthe membersof thegroup.
S ≡ w i · w j
|w i ||w j |
(3.8)T ≡ w i · B
|w i |
(3.9)Thevalues inthe tableshowthatalltypesof ommuni ation have thesame
startingposition. Theresultsfor hoosingrandomly astudentgeneratesthe
bestresultsfortheoverlapwiththetea her(
R f inal). Broad
astingperforms
worst,but all studentsintheensemble merge towards one opinion (S f inal).
Theperforman eofbroad astingisprobablybadbe auseoftheimportan e
of one student. He an tellall theother students his opinion and all other
students will adjust their opinion and move towards this student. Why
onsulting isnot workingwasnot investigated. The approa h withrandom
rolesobviously performs best.
random onsulting broad asting
T initial 0.7 0.7 0.7
S initial 0.5 0.5 0.5
R initial 0.98 0.98 0.98
T f inal 0.89 0.69 0.53
S f inal 0.90 0.73 0.99
R f inal 0.95 0.81 0.51
Table 3.1: Results for dierent typesof ommuni ation
3.3 Deliberation in the ensemble
The deliberation phase inthe dis ussion is modeled by thelearning step of
theensemble. As shown earlier the students are pla ed on a onewith the
tea herasits enter. Theidealmethodshould ontra tthe ir leofthe one
where the studentsare pla eduniformly. The method withrandom pairing
approa heswill beusedas ommuni ation method.
3.3.1 Rosenblatt
Inanormalper eptronar hite ture theRosenblattalgorithm isused,equa-
tion2.1 . Ifthe ondition
E µ(t) ≤ 0
is true, the per eptronis updated. Theonditionisbasedontheangle ofthetea herto thestudentsandtheaim is
to minimizethis angle.
A possible output of the simulation is shown in gure 3.1. The line in the
plotis
R(t)
(seeformula3.7),theoverlapofthegroupwiththetea her. Thequalityof theoutputof the ensemblede reases withevery iteration.
0 1000 2000 3000 4000 5000 6000 7000 8000 9000 10000 0
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
# discussion rounds
overlap R
Figure3.1: SimulationwithRosenblatt learningstep
Inasituationwheretheomnis ienttea herisabsentandrepla edbya(ran-
domly hosen) student a ting astea her, unwanted behavior o urs. If the
ensemblemembershaveasimilaropinionononeexample,no ommuni ation
takespla eandthereisnolearningeort. Iftheensemblemembersdierin
opinion,thememberintheroleofstudentisfor edtore al ulatehisposition
andisshiftedtowardshistea her. Theee tofthesetwooptionsisthatthe
dieren e between the meanof theensembleand thetruth in reases. After
movingawayfromthetruth,theensemblewill notbeableto ndthetruth
anymore. Furthermore the ensemble annot merge to one ommon output,
sin e theyonly re al ulatepositions ifthedieren eis greatenough.
3.3.2 Hebbian Learning
Thesolution for the problemisto manipulate the ondition
E µ(t) ≤ 0
. It ispossible to drop the ondition, whi h means thatwe useHebbian Learning
for the deliberation phase [6℄ [7 ℄. Another possibility is to use the oppo-
site ondition
E µ(t) > 0
. This means that agreements between ensemblemembersareenhan ed, whiledisagreementsareignored. After hangingthe
ondition,the outputofthe ensembleisstable. The resultsforbothoptions
aresimilar aftera great number of steps,the ve torsof thestudentsmerge
towardstheir mean. Theresults areshown ingure3.2.
The option with the inversed ondition was hosen for the next tests sin e
theresultsare slightly better.
0 100 200 300 400 500 600 700 800 900 1000 0.5
0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
# discussion rounds
overlap R
without if statement inversed if statement original configuration
Figure3.2: Comparison of Rosenblatt andHebb
Simulation and Results
Forallsimulationsandgraphi sthefollowingparameterswereusedunlessin-
di atedotherwise:
R = 0.7
,n = 20
,ensemblesize = 20
,durationdiscussion = 1000
. Thedis ussionlengthof1000 rounds(withea h havingensemblesize
intera tions) was arbitrarily hosen, the output of the ensemble is usually
stableafterabout 500-600rounds. Theensemblesize was hosenfor agood
performan e/qualityratio.
4.1 Ee t of the initial overlap
The initial overlap
R initial is a parameter in the pro edure to reate an
ensemble3.6. Theee t ofmanipulating
R initial isshowningure4.1. The
predi tion oftheensemble(
R f inal) getsbetter withanin
reased R initial.
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
overlap R initial
overlap R final
Figure4.1: Ee t of
R initial on thenaloverlap R f inal
0 10 20 30 40 50 60 70 80 90 100 0.6
0.65 0.7 0.75 0.8 0.85 0.9 0.95 1
ensemble size
final overlap R
Figure4.2: Ee t oftheensemblesize on thenaloverlap
R f inal
4.2 Ee t of the ensemble size
Theensembleperformsbetterwithmore members, seegure4.2 . Sin ethe
members of the ensemble are pla ed on a ir le with the tea her as enter,
a su ient number of members in the ensemble is important. With a few
students the han eis low thatthese studentshave equal distan es to ea h
other, witha highnumberitismore likely thatthe distributionis uniform.
A se ondaspe t isthatunsupervised learning needsdata withredundan y,
otherwisethe network annotseparate noise frompatterns.
4.3 Ee t of size of the input pattern
The dimensionality of the input pattern has very little or no ee t on the
naloverlap
R f inal (see tablebelow).
dimension 5 10 20 30 50 100
naloverlap
R f inal 0.955 0.943 0.954 0.935 0.942 0.942
4.4 Ee t of using a limited set of examples
Allresultsinthepre edingtestsaremadewithanensemble thathasa ess
to an unlimited set of examples (examples are generated randomly when
in a training set and a test set. To simulate a test set, the patterns are
generatedbeforestartingthealgorithm,arandomfun tionpi kseveryround
anexample. As showningure4.3more exampleswill givea better result.
Thegrowth seemsto beof alogarithmi nature.
0 20 40 60 80 100 120
0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65
dimension of P
overlap R
10 0 10 1 10 2 10 3 10 4
0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
dimension of P
overlap R
Figure4.3: Ee tofsizeofthesetwithexamplesonthenaloverlap
R f inal.
Onthe leftwithlinear s ale,on the right withlogarithmi s ale
4.5 Heterogeneous ensembles
0 20 40 60 80 100 120 140 160 180 200
0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1
# discussion rounds
overlap R
Figure4.4: Evolution ofstudent overlap
R i (t)
in asmall simulation with aheterogeneousensemble, 4memberswith
R initial = 0.3
and2 memberswithR initial = 0.8
Heterogeneous ensembles are ensembles where a part of the group has an
expe ted behavior o urs or if the heterogeneous ensemble behaves like a
homogeneousensemble.
Asimplesimulationwithsixstudents(fourstudentshave
R initial = 0.3
,twostudents have
R initial = 0.8
) shows thatthe outputs of themembers merge(gure4.4 ). The students withthe low Rimprove a lot, while thestudents
theRvalueof theotherstudentsde reases slightly (see alsogure4.5 ).
Thesimulationwasrepeatedwithdierentvaluesfor
R 1andR 2 andvarying
ratios
R 1 : R 2. To
omparetheoverlapR f inalfromthedierentsimulations,
themeanof theoverlap at thestart was
al
ulated:
R heterogenous =
n
X
i=0
R i
n .
(4.1)Theresultis plotted ingure4.6 .
Theplotshowsjustoneline,itappearsthatarelationshipexistsbetweenthe
meanofall
R i inagroupandR f inal
. Ifoneknowsall R i inan ensemble, it
ispossibletomakeapredi tionforthenaloverlap. Aheterogeneousgroup
an be simulatedwitha simpler homogeneous group ofthesame size.
4.6 Reliability
Astudent willlearn fromanyotherstudent whoa tsastea her (unlessthe
ifstatement dis ussed in 3.2isnot true). In realworldsome peoplebelieve
or disbelieveother people.
Themodelof Hartmannand Raee-Rad usestwo reliabilities for thegroup
members. The rst reliability is the han e of making the orre t de ision.
The othervalue is the han eof a group memberto seewhat the rst reli-
ability of an other group member is. A low valuefor the se ond reliability
−1 −0.8 −0.6 −0.4 −0.2 0 0.2 0.4 0.6 0.8 1
−1
−0.8
−0.6
−0.4
−0.2 0 0.2 0.4 0.6 0.8 1
initial overlap R
final overlap R
Figure4.6: Finaloverlapofthestudentsasfun tionofthemeanoftheinitial
overlap
means that a member will not learn from other members sin e he annot
tra ktheir opinion.
To simulate reliabilities inmy model, ea h student hasa stati ve tor with
reliability values for the other students. Using the reliability ve tors for a
homogeneous groupis meaningless,the reliability value behavesas a fa tor
whi hjustslows the pro essdown.
A simulationwithtwo (or more) independent groups withdierent
R i pro-
vides no interesting results. If the members of ea h group are distributed
uniformlyon the one, ea h group should arrive nearthe truth. The ee t
is omparablewithredu ing theensemblesize. Theuseofreliabilityve tors
for aheterogeneous groupis for thesame reason not interesting.
Con lusion and Outlook
5.1 Comparison with Hartmann & Raee-Rad
Thesimulationofthe deliberationpro essinagroup waspossible withreg-
ular per eptrons. The result of the deliberation pro ess with per eptrons
annotbe ompared easilywiththeresultsofHartmann&Raee-Rad. The
gures below look similar, but byadjusting the per eptron parameters the
right gure ould look the same as the left gure. This might look more
impressive butthe mathemati al modelof Hartmann&Raee-Rad ismore
sophisti atedthansimpleper eptronintera tions. ThereforeI anonly on-
ludethatinbothmodelsthereisarelationship between thegroupsize and
theprobabilitythatthe group hooses the orre tanswer. Ifthegroup size
in reases,the probabilityin reases withthe value
p max = 1
asasymptote.(a)HartmannandRaee-Rad
0 20 40 60 80 100
0.5 0.6 0.7 0.8 0.9 1 1.1
ensemble size
probality
(b)se tion 4.2
Figure 5.1: Results of Hartmann &Raee-Rad and my work for the group
size (homogeneousgroup), gures have similars ale.
It was shown that per eptrons are suitable to simulate deliberation in a
group, with the best result if the order of students is hosen randomly. It
mightbeinterestingto he kwhetherotherlearningalgorithmswill omplete
the task and the results are omparable. I looked briey into two other
typesof ommuni ation and onsidered broad asting and onsulting asnot
suitablefor thetask. Itshould be possible tond othertypesofintera tion
forthe ensemble. Theintera tion ouldbe basedonthedistan eofthetwo
students.
Anotherimprovement isto reate a dynami reliabilityve tor. A reliability
value ouldbein reasedifthe twostudentsinuen edea hotherearlierand
de reaseifthey repel ea h other. Also thereliability ve tor ouldbe based
on the number neighbors. In other words, ifmany other students are near
to the hosen student of thatmoment,thereliabilityin reases.
[1℄ J. Hertz,A. Krogh &R.G. Palmer (1991). Introdu tion to the theory of
neural omputation.Boulder: Westview Press.
[2℄ F. Rosenblatt (1957). The Per eptrona per eiving and re ognizing au-
tomaton. Report 85-460-1,Cornell Aeronauti al Laboratory.
[3℄ D.O.Hebb (1949).The Organizationof Behavior. NewYork: Wileyand
Sons.
[4℄ S. Hartmann & S. Raee-Rad (2010). Voting, Deliberation and
Truth. http://stephanhartmann.org/HartmannRafieeRad_VDT.pdf.
Retrieved on01-12-2011.
[5℄ P.J.Boland (1989).MajoritysystemsandtheCondor etJury Theorem.
The Statisti ian,38, 181-189.
[6℄ D.Bolle&G.M.Shim(1995). NonlinearHebbiantrainingoftheper ep-
tron. Network: Computationin Neural Systems,6,619-633.
[7℄ M. Biehl & A. Mietzner (1993). Statisti al Me hani s of Unsupervised
Learning. Europhysi s Letters,24, 421-426.
Matlab ode
Thesour e ode ishosted athttp://martien.home.fmf.nl/s riptie/.
A knowledgement
I would like to thank Soroush Raee-Rad for providing the gures of his
resear h.