quantum control
Shir, O.M.
Citation
Shir, O. M. (2008, June 25). Niching in derandomized evolution strategies and its applications in quantum control. Retrieved from https://hdl.handle.net/1887/12981
Version: Corrected Publisher’s Version
License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden
Downloaded from: https://hdl.handle.net/1887/12981
Note: To cite this publication please use the final published version (if applicable).
A Journey from Organi Diversity to Con eptualQuantum Designs
Ofer Mi hael Shir
A Journey from Organi Diversity to Con eptualQuantum Designs
PROEFSCHRIFT
terverkrijgingvan
de graad vanDo toraan de Universiteit Leiden,
op gezag van Re torMagni us prof. mr. P.F.van der Heijden,
volgens besluitvanhet College voor Promoties
teverdedigenop woensdag25 juni 2008
klokke16:15 uur
door
Ofer Mi hael Shir
geborente Jeruzalem, Israël in 1978
Prof. dr. ThomasBä k Promotor
Prof. dr. Mar Vrakking (Amolf-Amsterdam) Promotor
Dr. Mi hael Emmeri h Co-promotor
Prof. dr. DarrellWhitley (Colorado StateUniversity) Referent
Prof. dr. Farhad Arbab
Prof. dr. Joost Kok
Thisworkispartoftheresear hprogrammeofthe'Sti htingvoorFundamenteelOnder-
zoekderMaterie(FOM)',whi hisnan iallysupportedbythe'NederlandseOrganisatie
voorWetens happelijkOnderzoek(NWO)'.
Ni hinginDerandomizedEvolutionStrategiesanditsAppli ationsinQuantumControl.
OferMi hael Shir.
ThesisUniversiteitLeiden.
ISBN:978-90-6464-256-2
Introdu tion 1
I Ni hing in Derandomized Evolution Strategies 5
1 Evolution Strategies 7
1.1 Ba kground . . . 7
1.1.1 The Framework: Global Optimization . . . 7
1.1.2 Evolutionary Algorithms. . . 9
1.2 TheStandard EvolutionStrategy . . . 11
1.2.1 Notationand Terminology . . . 11
1.2.2 Motivation: The
(1 + 1)
Evolution Strategy . . . . . . 121.2.3 The Self-Adaptation Prin iple . . . 13
1.2.4 The Canoni al
µ/ν + , λ
-ES Algorithm . . . 141.3 DerandomizedEvolution Strategies (DES) . . . 20
1.3.1
(1, λ)
Derandomized ESVariants . . . 211.3.2 First Levelof Derandomization . . . 22
1.4 TheCovarian e MatrixAdaptation ES . . . 24
1.4.1 Preliminary . . . 24
1.4.2 The
(1, λ)
Rank-OneCMA . . . . . . . . . . . . . . . 261.4.3 The
(µ W , λ)
Rank-µ
CMA. . . . . . . . . . . . . . . . 281.4.4 The
(1 + λ)
CMA . . . . . . . . . . . . . . . . . . . . 291.4.5 ConstraintsHandling . . . 30
1.4.6 Dis ussion . . . 31
2 Introdu tion to Ni hing 33 2.1 Motivation: Spe iation Theoryvs. Con eptual Designs . . . . 33
2.2 From DNAto Organi Diversity. . . 34
2.2.1 Geneti Drift . . . 34
2.2.2 Organi Diversity . . . 35
2.3 "E ologi al Optima": Basins ofAttra tion . . . 38
2.3.1 Classi ation ofOptima: The Pra ti al Perspe tive . . 39
2.4 PopulationDiversitywithin Evolutionary Algorithms . . . 39
2.4.1 Diversity LossinEvolution Strategies . . . 41
2.4.2 Point ofReferen e: DiversityLoss withinGAs. . . 44
2.4.3 NeutralityinES Variations: Mutation Drift . . . 45
2.5 Classi alNi hing Te hniques . . . 47
2.5.1 FitnessSharing . . . 48
2.5.2 Dynami FitnessSharing . . . 49
2.5.3 Clearing . . . 49
2.5.4 Crowding . . . 50
2.5.5 Clustering . . . 51
2.5.6 The Sequential Ni heTe hnique . . . 52
2.5.7 The IslandsModel . . . 53
2.5.8 Other GA-BasedMethods . . . 53
2.5.9 Mis ellaneous: Mating S hemes . . . 54
2.6 Ni hinginEvolution Strategies . . . 55
2.7 Dis ussion andMission Statement . . . 55
3 Ni hing with Derandomized Evolution Strategies 57 3.1 General . . . 57
3.2 TheProposed Algorithm . . . 57
3.2.1 Ni hingwith
1 + , λ
DES Kernels . . . 583.3 Ni heRadius Cal ulation . . . 60
3.4 Experimental Pro edure . . . 60
3.4.1 Multi-Modal TestFun tions . . . 61
3.4.2 Performan eCriteria . . . 63
3.4.3 New Perspe tive: MPR vs.Time . . . 65
3.4.4 MPR Analysis: Previous Observation . . . 65
3.5 Numeri alObservation . . . 66
3.5.1 Modus Operandi . . . 66
3.5.2 Numeri alResults . . . 66
3.5.3 Dis ussion . . . 70
4 Self-Adaptive Ni he-Shape Approa hes 71 4.1 General . . . 71
4.1.1 Related Work . . . 71
4.1.2 Our Approa h . . . 73
4.2 New ProposedApproa hes . . . 73
4.2.1 Self-Adaptive Radius: Step-SizeCoupling . . . 74
4.2.2 Mahalanobis Metri : Covarian e Exploitation . . . 77
4.3 Experimental Pro edure . . . 79
4.3.1 Numeri alObservation . . . 79
4.3.2 General Behavior . . . 85
5 Ni hing-CMA as EMOA 87
5.1 Multi-Obje tiveOptimization . . . 87
5.1.1 Formulation . . . 87
5.1.2 The NSGA-IIAlgorithm . . . 89
5.2 OnDiversity inMulti-Obje tive Optimization . . . 91
5.2.1 Related Work . . . 92
5.3 Multi-Parent Ni hingwith
(µ W , λ)
-CMA . . . . . . . . . . . . 955.4 Ni hing-CMAasEMOA . . . 96
5.4.1 The Ni hingDistan e Metri . . . 97
5.4.2 Sele tion: Non-dominating Ranking . . . 97
5.4.3 Estimation ofthe Ni he Radius . . . 97
5.5 Numeri alSimulations . . . 99
5.5.1 Test Fun tions: Arti ialLands apes. . . 99
5.5.2 Modus Operandi . . . 100
5.5.3 Numeri alObservation . . . 101
II Quantum Control 105 6 Introdu tion to Quantum Control 107 6.1 OptimalControl Theory . . . 108
6.1.1 The QuantumControl Framework . . . 108
6.1.2 Controllability . . . 112
6.1.3 Control LevelSets . . . 113
6.1.4 Computational Complexity . . . 115
6.2 OptimalControl Experiments . . . 117
6.2.1 Femtose ond LaserPulse Shaping. . . 117
6.2.2 LaboratoryRealization: Constraints . . . 119
6.3 Experimental Pro edure . . . 122
6.3.1 Numeri alSimulations . . . 122
6.3.2 LaboratoryExperiments . . . 123
7 Two PhotonPro esses 125 7.1 Introdu tion . . . 125
7.2 Se ondHarmoni Generation . . . 125
7.2.1 Total SHG . . . 126
7.2.2 FilteredSHG . . . 128
7.3 Numeri alSimulations . . . 130
7.3.1 PreliminaryES Failure: Stret hed Phases . . . 130
7.3.2 Numeri alObservation . . . 131
7.4 LaboratoryExperiments . . . 132
7.4.1 Performan eEvaluations. . . 133
8 The Rotational Framework 139
8.1 Numeri alModeling . . . 139
8.1.1 Preliminary: Two Ele troni States Systems . . . 139
8.1.2 Rotational Levels . . . 140
8.2 PopulationTransfer: Optimization . . . 141
8.2.1 Experimental Pro edure . . . 142
8.2.2 Numeri alObservation:
J = 0 −→ J = 4
. . . . . . . . 1438.2.3 IntermediateDis ussion . . . 144
8.3 Appli ation of Ni hing . . . 146
8.3.1 Preliminary: Distan eMeasure . . . 146
8.3.2 Numeri alObservation . . . 147
9 Dynami Mole ular Alignment 151 9.1 Numeri alModeling . . . 152
9.1.1 Numeri alSimulations: Te hni al Details . . . 153
9.2 Experimental Pro edure . . . 154
9.2.1 First Numeri alResults: Comparisonof theAlgorithms155 9.2.2 The Complete-Basis-Fun tions Parameterization . . . 155
9.2.3 Further Investigation . . . 164
9.3 TheZeroKelvin CaseStudy . . . 165
9.3.1 Con eptual Quantum Stru tures . . . 168
9.3.2 Maximally Attained Yield . . . 169
9.3.3 Another Perspe tive to Optimality: Phasing-Up . . . . 170
9.4 Evolutionof PulsesunderDynami Intensity . . . 173
9.4.1 Evolutionary AlgorithmsinDynami Environments. . 173
9.4.2 Dynami IntensityEnvironment: Pro edure . . . 174
9.5 S alability: Control Dis retization . . . 180
9.5.1 Numeri alObservation . . . 181
9.6 IntermediateDis ussion . . . 184
9.7 Multi-Obje tiveOptimization . . . 185
9.7.1 Choi eof Methods . . . 185
9.7.2 Numeri alObservation . . . 189
9.8 Appli ation of Ni hing . . . 191
9.8.1 Numeri alObservation . . . 191
Summary and Outlook 197
A Additional Figures 203
B Complete-Basis Fun tions 221
Bibliography 225
Samenvatting (Dut h) 243
Prin eHamlet,Hamlet;WilliamShakespeare
Introdu tion
Optimal behavior of natural systems is frequently en ountered at all lev-
els of everyday life, and thus hasbe ome a major sour e of inspiration for
variouselds. Thedis iplineofNaturalComputingaimsatdeveloping om-
putational te hniquesthatmimi olle tive phenomenainnaturethatoften
exhibit ex ellent behavior in information pro essing. Among a long list of
natural omputing bran hes, we are parti ularly interested in the fas inat-
ingeldofOrgani Evolution,andits omputationalderivative,theso- alled
Evolutionary Algorithms (EAs) eld. By en odingan optimizationproblem
into an arti ialbiologi al environment, EAsmimi ertain elementsin the
Darwinian dynami s and aim at obtaining highly-t solutions in terms of
theproblem. Apopulationoftrialsolutionsundergoarti ialvariationsand
survive this simulationuponthe riteria posedbythesele tion me hanism.
Analogously,itissuggestedthatthispopulationwouldevolveinto highly-t
solutions of theoptimization problem.
The original goal of this work was to extend spe i variants of EAs,
alled Evolution Strategies (ES), to subpopulations of trial solutions whi h
evolve inparallel to various solutions ofthe problem. Thisidea stemsfrom
theevolutionary on eptoforgani spe iation. Essentially,thenatural om-
puting wayof thinking isrequired here tofurther deepen into Evolutionary
Biology Theory,andattain reativesolutions for thearti ial populationin
light of the desired spe iation ee t. The so- alled ni hing te hniques are
the extension of EAs to spe iation forming multiple subpopulations. They
have been investigated sin e the earlydays of EAs, mainly within thepop-
ular variants of Geneti Algorithms (GAs). In addition to the theoreti al
hallengeto designsu hte hniques, whi his well supported by thebiologi-
ally inspiredmotivation,there isa real-world in entive for thiseort. The
dis ipline of de ision making, whi h makesdire t benet out of the advent
of the global optimization eld, poses the demand for the multipli ity of
dierent optimal solutions. Ideally,those multiplesolutions, asobtained by
the optimization routine, would have high diversityamong ea h other, and
represent dierent on eptual designs.
Aiming at largely devoting this resear h to ni hing in ES, we were also
originallyinterestedinapplyingourproposedalgorithmstoexperimentalop-
timization. More spe i ally, we were aimingat appli ations in theemerg-
ingeldofQuantumControl (QC).Thelatteroersanenormousvarietyof
high-dimensional ontinuous optimization problems, both at the theoreti al
aswellastheexperimental levels. Inthatrespe t,itispotentiallyaheavenly
testbedforEvolutionaryoptimization,andparti ularlyforni hingmethods.
ThisisduetosomeremarkablepropertiesofQClands apes,whi htypi ally
possess an innite number of optimal solutions, as proved by QC Theory.
We thus nd the ombinationof resear h onni hing andthe appli ation to
QC lands apes very attra tive. After being exposed to this overwhelming
treasureofQClands aperi hness,wede idedtodevoteanindependentpart
of thisdissertation to Quantum Control.
Symboli ally,this interdis iplinarystudy forms a losed natural omput-
ing ir le, where biologi ally-orientedinvestigation oforgani evolutionand
spe iation helps to develop methods for solving appli ations in Physi s in
general,and inQuantumControl inparti ular. Byour re koning, thissym-
bolism is even further strengthened upon onsidering thesto hasti nature
of Evolutionary Algorithms; This pro ess an be thus onsidered as throw-
ingdi einorder to solveQuantumMe hani s,sometimes referredto asthe
s ien e of di e.
Thus, biologi ally inspired by organi evolution in general, and organi
spe iationinparti ular,armedwiththereal-worldin entivetoobtainmulti-
pleoptimalsolutionsforbetterde isionmaking,weherebybeginourjourney
from diversity innatureto on eptualdesigns inQuantumControl.
This dissertation therefore onsists of two parts: Part I introdu es a
ni hingframeworktoasetof state-of-the-artESalgorithms,namely Deran-
domized Evolution Strategies (DES), and fo uses on testing the proposed
algorithms on arti ial lands apes. Part II reviews the main aspe ts of
Quantum Control in the general ontext of global fun tion optimization.
It then presents the experimental observation of Derandomized ES as well
as the proposed ni hing algorithms when applied to several QC systems,
both at the laboratory and at the numeri al simulations levels. As far as
we know,this is thersttimethat Quantum Control sear hlands apesare
omprehensively introdu ed to the ommunity ofComputer S ien e.
Part I begins with presenting thealgorithmi kernels of this study, De-
randomized Evolution Strategies. This is done in Chapter 1 by providing
the reader withthe essential terminology of global optimization, reviewing
the fundamentals of the ES eld, and eventually introdu ing expli itly, in
detail, thederandomized algorithms.
Upondevelopingani hingframeworkforEvolutionStrategies,somepre-
liminary topi shad to be addressed. We properly introdu e the real-world
in entive for ni hing, namely the sele tion of on eptualdesigns by thede-
Spe iation Theory, dis uss the ru ialaspe tof population diversity within
ES, and nally present a short overview of previously introdu ed ni hing
te hniques. Chapter2aimsat addressingthose topi s,andthereforeit on-
stitutes an important preliminary study for the derivation of our ni hing
framework. Due to the highly interdis iplinary nature of the ni hing re-
sear h, this hapter presents a parti ularly high diversity of topi s, whi h
arelinked byni hing.
InChapter 3 we present our proposed framework of ni hing within De-
randomized ES. Wedes ribeit indetail, and thereaftertest iton a suite of
multimodalarti iallands apes. Weanalyzethenumeri alobservation,and
dis uss thealgorithmi performan e.
Chapter 4 extends the framework of Chapter 3 to self-adaptive ni he-
shape approa hes, for solving theso- alled ni he radius problem. This isan
important topi inthe eld ofni hing, asit attempts to treatthe hallenge
of dening a generi basin of attra tion without a-priori knowledge on the
lands ape.
Another extension of our proposed ni hing framework, this time to the
eld of Multi-Obje tive Optimization, is introdu ed in Chapter 5. As the
twoeldsofni hingandmulti- riterion optimization, orrespondingtomul-
timodal and multiobje tive problems, respe tively, have many aspe ts in
ommon, we show the feasibility of utilizing our ni hing framework in a
multi-obje tive approa h. This on ludesPartIof thethesis.
The goal that Part II aims to a hieve is two-fold: Firstly, properly in-
trodu ing themainoptimization aspe ts oftheQuantumControl eld, and
se ondly, presenting our work on the optimization of a spe i Quantum
Control problem, namely Dynami Mole ular Alignment. We thus begin
Chapter 6 with a detailed review of Quantum Control Theory and Experi-
ments. Thereviewoutlinesfundamental on eptsofQuantumControl The-
ory, and mainly fo uses on theorems on erning the riti al points of the
lands apes, as well ason lands ape ri hness and multipli ity of optimal so-
lutions. It thenpresents Quantum Control Experiments, and dis usses our
experimentalsetup for Part II.
Chapter7des ribesourinvestigationoftwooptimizationproblems orre-
spondingtoQuantumControlsystemsofSe ondHarmoni Generation. We
ondu texperimentsontheseoptimizationproblems,bymeansofnumeri al
simulations aswellaslaboratory experiments, by employing spe i Deran-
domized ES variants. It is the only hapter where we report on real-world
laboratoryexperiments,whilethefollowing haptersfo usonnumeri alsim-
ulations ex lusively.
Chapter 8 is devoted to the introdu tion of the rotational framework,
thefundamental framework upon whi h the Dynami Mole ular Alignment
toourworkonthealignmentprobleminvestigatedinChapter9. Followinga
detailedQuantumMe hani aldes riptionoftheframework,Chapter8poses
therotationalpopulationtransferoptimizationproblem. Itthenpresentsour
numeri alobservationoftheDerandomizedESemployment totheproblem,
and nalizesthe hapter withapplyingour proposedni hingalgorithms.
Chapter9reportsindetailonourworkontheDynami Mole ularAlign-
ment, whi h onstitutes the main appli ation in our resear h on Quan-
tum Control lands apes. It des ribes the alignment problem, and then
presentsvariousoptimizationapproa hesthatweemployedinadditiontothe
straightforward appli ation of Derandomized ES. These approa hes in lude
a spe ial parameterization method developed for this purpose, optimality
investigation of a simplied variant, optimization subje tto a dynami ally
varying environment, multi-obje tive onsideration of theproblem, and, -
nally, the appli ation ofni hing.
We thereafter omplete this journey by summarizing our main results
and bypresenting promising dire tionsfor futureresear h.
A Te hni alNote Duetote hni al printing onsiderations,severalplots
from various hapters are on entrated in Appendix A. In these parti ular
ases, aplot isreferredto inthetext asFigureA.x.
Ni hing in Derandomized
Evolution Strategies
su essive,slight modi ations, my theory wouldabsolutely
breakdown.
CharlesDarwin
Chapter 1
Evolution Strategies
1.1 Ba kground
TheparadigmofEvolutionaryComputation (EC),whi hisgleanedfromthe
modeloforgani evolution,studiespopulationsof andidatesolutionsunder-
goingvariationsandsele tion,andaimsatbenetingfromthe olle tivephe-
nomena of their generational behavior. The term Evolutionary Algorithms
(EAs) essentially refers to the olle tion of su h generi methods, inspired
by the theory of natural evolution, that en ode omplex problems into an
arti ial biologi al environment, dene its geneti operators, and simulate
its propagationintime. Motivatedbythe basi prin iplesoftheDarwinian
theory, itissuggested thatsu hsimulation wouldyield anoptimal solution
for the given problem.
Evolutionary Algorithms [1℄ have three main streams, rooted either in
theUnited States or inGermany,during the1960s: Evolutionary Program-
ming (EP),founded byL.FogelinSan-Diego[2℄,Geneti Algorithms (GAs)
founded by J. Holland in Ann Arbor [3, 4℄, and Evolution Strategies (ES),
founded by P. Bienert, H.P. S hwefel and I. Re henberg, three students to
thattime attheTe hni al Universityof Berlin(see,e.g., [5,6,7℄).
EvolutionStrategiesforglobalparameteroptimization,thegeneralframe-
work ofthis study,isreviewed inthis hapter. Westart withlaying out the
basi foundationsand denitions.
1.1.1 The Framework: Global Optimization
Letus introdu e theelementary terminology ofa ontinuousreal-valued pa-
rameter optimizationproblem [8℄. Thefollowing denitionex ludes dis rete
and mixed-integer problems. Given an obje tive fun tion, also alled the
target fun tion,
f : S ⊆ R n → R, S 6= ∅
where
S
istheset offeasiblesolutionsS = {~x ∈ R n | g j (~x) ≥ 0 ∀j ∈ {1, ..., q}} , g j (~x) : R n → R
subje tto
q
inequality onstraintsg j (~x)
, thegoalis to nd ave tor~x ∗ ∈ S
whi h satises
∀~x ∈ S : f(~x) ≥ f(~x ∗ ) ≡ f ∗
(1.1)Then,
f ∗
is dened as the global minimum and~x ∗
is the global minimumlo ation.
Dueto
min {f(~x)} = −max{−f(~x)},
itisstraightforwardto onverteveryminimizationproblemintoamaximiza-
tion problem. Thus,without lossof generality,we shallassumea minimiza-
tion problem, unlessspe iedotherwise.
Alo al minimum
f = f (ˆ ˆ ~x)
isdened inthefollowing manner:∃ǫ > 0 ∀~x ∈ S : ~ x − ˆ~x
< ǫ ⇒
f ˆ ≤ f(~x)
Unimodality vs. Multimodality A lands apeis saidto beunimodal if
it hasonly asingle minimum, and multimodal otherwise. Itis alledmulti-
global if there are several minima with equal fun tion values as the global
minimum.
Global Minimum in Pra ti e: Chara terization While there exists
a general riterion for the automati identi ation of a lo al minimum,
su h as the zero gradient riterion, in pra ti e there is no equivalent gen-
eral riterion for the global minimum [8℄. The attempt to hara terize it is
essentially equivalent to posing the multimodal optimization problem and
dierentiating de fa to between global and lo al minima. We outline here
a theoreti al attempt to a omplish this hara terization, by means of the
important on ept oflevel sets [9,10℄. Given a level set,
L f (α) = {~x| ~x ∈ S, f (~x) ≤ α} ,
(1.2)itis subje tto level set mapping, whi hdenes itsee tive domain:
G f = {α| α ∈ R, L f (α) 6= ∅} .
(1.3)Assuming that
G f
is ompa t and losed,L f (α)
issaid to be lower semi-ontinuous(ls ) atthe point
α ¯ ∈ G f
if~x ∈ L f ( ¯ α)
,α i ⊂ G f
,α i → ¯α
imply the existen eof
K ∈ N
and a sequen e~x i
su h that
~x i → ~x
and~x i ∈ L f α i
for
i ≥ K
.Given this, the following is a su ient ondition for hara terizing a
Theorem 1.1.1. Let
f
be a real-valued fun tion onS ⊂ R n
. If every~x ∈ S
satisfying
f (~x) = ¯ α
is either a global minimum off ( ·)
onS
or it is not alo al minimumof
f ( ·)
, thenL f (α)
is ls atα ¯
.TörnandZilinskas on luded thattheextensionto multimodaldomains
makesthe optimization problemunsolvableinthegeneral ase, i.e., thereis
no e ient solution te hnique for obtaining the global minimumvalue (see
[8℄ pp. 6).
The Hessian and the Condition Number Given a real-valued twi e
dierentiable
n
-dimensionalfun tionf
,theHessianmatrixoff (~x)
isdenedasthe matrix
H (f (~x)) =
∂ 2 f
∂x 2 1
∂ 2 f
∂x 1 ∂x 2 · · · ∂x ∂ 1 2 ∂x f n
∂ 2 f
∂x 2 ∂x 1
∂ 2 f
∂x 2 2 · · · ∂x ∂ 2 2 ∂x f n
.
.
.
.
.
. .
.
. .
.
.
∂ 2 f
∂x n ∂x 1
∂ 2 f
∂x n ∂x 2 · · · ∂ ∂x 2 f 2 n
(1.4)
If the se ond derivatives of
f
areall ontinuous, a ondition whi hwe shall assumehere, theorder ofdierentiation doesnot matter,and thustheHes-sian matrix is symmetri . It is thenworthwhile to introdu e the ondition
number oftheHessian,as alarwhi h hara terizesitsdegreeof omplexity,
andtypi allydeterminesthedi ultyofaproblemtobesolvedbyoptimiza-
tionmethods. Let
Λ H i n
i=1
denotetheeigenvalues oftheHessianH
,andletΛ H min
andΛ H max
denote its minimal and maximal eigenvalues, respe tively.The ondition number of theHessian matrix isdened by:
ond
(H) = Λ H max
Λ H min ≥ 1
(1.5)Ill- onditioned problems areoften lassied assu h due to large ondition
numbers (e.g.,
10 14
) of theHessianon their lands apes.Separability Another dening property of problemdi ultyis the sepa-
rability of theobje tivefun tion (see,e.g.,[11℄). A fun tion
f : R n → R
isalled separable ifit an be optimized by solving
n 1
-dimensional problems separately:arg min
~
x f (~x) =
arg min
x 1
f (x 1 , . . .) , . . . , arg min
x n
f (. . . , x n )
1.1.2 Evolutionary Algorithms
WhereasESand EParesimilar algorithmsandsharemanybasi hara ter-
Algorithm 1An Evolutionary Algorithm
1:
t ← 0
2:
P t ←
Init()
{P t ∈ S µ
: Setof solutions}3: Evaluate(
P t
)4: while
t < t max
do5:
G t ←
Generate(P t )
{Generateλ
variations}6: Evaluate(
G t
)7:
P t+1 ←
Sele t(G t ∪ P t )
{Rankand sele tµ
best}8:
t ← t + 1
9: end while
the geneti information. Traditional GAsen ode the genome with dis rete
values (as in nature), whereas ES as well as EP do that with ontinuous
real-values. Moreover, ES and EP fo used more on development of muta-
tion operators, while in lassi al GA resear h the re ombination operator
re eived most attention. Today, GA,ES, and EP subsumeunder the term
Evolutionary Algorithms(EAs).
Here, we oer an introdu tory generi des ription of an EA. The lat-
ter onsiders apopulation (i.e., set)of individuals (i.e.,trial solutions), and
models its olle tive learning pro ess. Ea h individual in the population is
initialized a ording to an algorithm-dependent pro edure, and may arry
not onlyaspe i sear hpoint inthelands ape,butalsosome environmen-
tal information on erning the sear h. A ombination of sto hasti as well
as deterministi pro esses su h as mutation, re ombination, and sele tion,
di tatethe propagation intimetowardssu essivelybetter individuals, or-
respondingtobetterregimesof thelands ape. Thequalityofan individual,
or alternatively the merit of a trial solution, are determined by a so- alled
tness fun tion, whi h is typi ally the obje tive fun tion or its res aling.
Thus, ertainindividualsarefavoredoverothers duringthesele tion phase,
whi h isbased upon the tness evaluation of the population. The sele ted
individualsbe omethe andidatesolutions ofthenextgeneration, whilethe
others dieout.
Moreexpli itly, an EA starts withinitializing the generation ounter
t
.Aftergeneratingthe initialpopulationwith
µ
individualsinS
,a setG t
ofλ
newsolutionsisgeneratedbymeansofmutationandpossiblyre ombination.
The new andidate solutions are evaluated and ranked in terms of their
quality(tness value). The
µ
best solutions inG t ∪ P t
aresele ted to formthe new parent population
P t+1
.1.2 The Standard Evolution Strategy
Evolution Strategies were originally developed at the Te hni al University
of Berlin as a pro edure for automated experimental design optimization,
rather than a global optimizer for ontinuous lands apes. Following a se-
quen e ofsu essfulappli ations(e.g., shapeoptimization ofabendedpipe,
dragminimizationofajointplate,andhardwaredesignofatwo-phaseash-
ing nozzle), a diploma thesis [13 ℄ and a dissertation [14℄ laid out the solid
foundations for ESas anoptimization methodology. There hasbeen exten-
sive work onES analysisand algorithmi designsin e then[7, 15 ,16℄.
Thisse tion, whi hismostlybasedon[1℄ and[7℄,willdes ribethestan-
dard ES indetail. Se tion 1.2.1 will introdu e notation and basi terminol-
ogy. Se tion 1.2.2 will present the
(1 + 1)
algorithm, whi h was originallyanalyzedfortheoreti alpurposes,but ontinuedtoplayanimportantrolein
several aspe ts of Evolution Strategy design. The self-adaptation prin iple
will be des ribed in Se tion 1.2.3, while Se tion 1.2.4 will outline the ES
algorithm.
1.2.1 Notation and Terminology
The typi al appli ation domain ofEvolution Strategies is theminimization
of non-linear obje tive fun tions of signature
f : S ⊆ R n → R
. Given asear h problem of dimension
n
, let~x := (x 1 , x 2 , ..., x n ) T ∈ R n
denote theset of de ision parameters or obje t variables to be optimized: It is dened
as anindividual asso iated witha trial solution. Inoptimization problems,
whi hareofourmaininterest,itisthenstraightforwardtodene thetness
of thatindividual: Itis the obje tive fun tion(s) value(s) of
~x
,i.e.,f (~x)
.EvolutionStrategies onsidera populationof andidate solutions of the
given problem. This population undergoessto hasti aswell asdeterminis-
ti variations, with the so- alled mutation operator, and possibly with the
re ombination operator. The mutation operator is typi ally equivalent to
samplingarandom variationfromanormal distribution. Duetothe ontin-
uous nature of the parameter spa e, the biologi al term mutation rate an
be asso iated here withthe a tual size of themutation stepin thede ision
spa e, also referredto asthe mutationstrength.
Expli itly, an individual is represented by a tuple of ontinuous real-
values,sometimesreferredtoasa hromosome,whi h omprisesthede ision
parameters to be optimized,
~x
,their tness value,f (~x)
,as well asa set ofendogenous (i.e.,evolvable) strategyparameters,
~s ∈ R m
.The
k th
individual ofthe populationisthus denotedby:~a k = (~x k , ~s k , f (~x k ))
The dimension
m
of the strategy parameter spa e is subje tto thedesiredrametersareaunique on eptforES,inparti ularinthe ontextofthemu-
tation operator, andthey play a ru ial role intheso- alled self-adaptation
prin iple (see Se tion 1.2.3).
Strategy-spe i parameters, su h as the population hara teristi pa-
rameters
µ
,λ
, and the so- alled mixing numberν
, are alled exogenousstrategy parameters, asthey are kept onstant during thesimulated evolu-
tion. The mixing numberdetermines thenumber ofindividuals involved in
the appli ation of there ombination operator.
1.2.2 Motivation: The
(1 + 1)
Evolution StrategyRe henberg [6℄ onsidereda simple
(1 + 1)
Evolution Strategy,with axedmutation strength
σ
, in order to investigate analyti ally two basi obje - tive fun tions, namely the orridor model and the sphere model. From thehistori al perspe tive, that study laid out the foundationsfor the theoryof
Evolution Strategies.
Re henbergderivedexpli itlythe expressionsforthe onvergen e rate of
his
(1 + 1)
ES forthetwomodels. By denition,neitherself-adaptation nor re ombination were employed inthis strategy. Given theprobability of themutation operator to over a distan e
k ′
towards the optimum,p(k ′ )
, theonvergen e rate
ϕ
is dened as theexpe tation of the distan ek ′
overedbythe mutation:
ϕ = Z ∞
0
p(k ′ ) · k ′ dk ′
(1.6)Theexpressionfortheoptimalstep-sizeforthetwomodelswasrstderived.
It wasobserved to dependon theso- alled su essprobability
p s
,p s = P {f(Mutate {~x}) ≤ f(~x)} .
(1.7)By setting
dϕ dσ σ ∗
= 0,
(1.8)the optimal step-sizes for the two models were al ulated, yielding also the
optimal su ess probabilities. The obtained values were both lose to
1/5
,regardlessofthe sear hspa e dimensionality. Thisledto theformulation of
the well-known
1/5th
-su ess rule:The ratio of su essfulmutations toallmutations should be
1/5
.If it is greater than
1/5
, in rease the standard deviation, if it issmaller, de rease the standard deviation.
Formoredetailssee[1 ℄. Theimplementationofthe
1/5th
-su essrulewithinthe
(1+1)
-ESisgivenasAlgorithm2. Aspra ti alhints,p s
anbe al ulatedover intervals of
10 · n
trials, and the adaptation onstant should be setbetween the boundaries
0.817 ≤ c ≪ 1
.Algorithm 2The
(1 + 1)
Evolution Strategy1:
t ← 0
2:
P t ←
Init()
{P t ∈ S
: Setofsolutions}3: Evaluate(
P t
)4: while
t < t max
do5:
~x(t) := M utate {~x(t − 1)}
withstep-sizeσ
6: Evaluate(
P ′ (t) := {~x(t)}
) :{f (~x(t))}
7:
Select {P ′ (t) ∪ P (t)}
8:
t ← t + 1
9: if
t mod n = 0
then10:
σ =
σ(t − n)/c
ifp s > 1/5 σ(t − n) · c
ifp s < 1/5 σ(t − n)
ifp s = 1/5
11: else
12:
σ(t) = σ(t − 1)
13: endif
14: end while
Itshouldbenotedthat
1/5th
-su essrulehasbeenkeptalive,and ontin-ued to playan important role inseveralaspe ts, in ludingthe onstru tion
of the elitist strategy of the Covarian e Matrix Adaptation ES algorithm
([17 ℄and also seeSe tion 1.4).
1.2.3 The Self-Adaptation Prin iple
Se tion1.2.2provideduswiththemotivationtoadapttheendogenousstrat-
egy parameters during the ourse of evolution, e.g., tuning the mutative
step-size a ording to the
1/5th
-su ess rule. The basi idea of the self-adaptation prin iple is to onsider the strategy parameters as endogenous
parameters, that undergo an evolutionary pro ess themselves. The idea of
ouplingendogenousstrategyparameterstotheobje tvariables anbefound
inorganisms,whereself-repairme hanismsexist,su hasrepairenzymesand
mutator genes [18 ℄. This allows an individualto adapt to the hangingen-
vironment of its traje tory in the lands ape, while keeping the potentially
harmfulee tofmutationwithinreasonableboundaries. Hen e,whenmuta-
tive self-adaptationisapplied,thereisno deterministi ontrolinthehands
of the userwithrespe tto themutation strategy.
The ru ial laimregardingESisthatself-adaptationofstrategyparam-
eters works [19℄. It su eeds in doing so by applying the mutation, re om-
binationand sele tion operators inthestrategy, andwithouttheuseof any
exogenous ontrol. The link between strategy and de ision parameters is
basedhadfoundseveralboosting onditionsforself-adaptationtowork,su h
as re ombination on strategy parameters, sele tion pressure within ertain
bounds,and others.
1.2.4 The Canoni al
(µ/ν + , λ)
-ES AlgorithmWe des ribe herethespe i operatorsfor thestandard Evolution Strategy,
sometimesreferredtoasthe S hwefelapproa h,andprovidethereaderwith
the implementation details.
Mutation
The mutation operator is the dominant variation operator within ES, and
thus we hooseto elaborate inthisse tionon its hara teristi s. Asaretro-
spe tiveanalysis, we hooseto begin withtheoutline ofsome general rules
for the design of mutation operators, assuggestedbyBeyer[15℄:
1. Rea hability. Giventhe urrentgeneration ofindividuals, anyother
sear hpointinthe lands apeshouldberea hedwithinanitenumber
of mutation operations.
2. Unbiasedness. Variation operators ingeneral, andthemutation op-
erator in parti ular, should not introdu e any bias, and satisfy the
maximum entropy prin iple. In the ase of ontinuous un onstrained
lands apes, thiswouldsuggesttheuseof thenormal distribution.
3. S alability. The mutation strength should be adaptive with respe t
to the lands ape.
The ES mutation operator onsiders sto hasti ontinuous variations,
whi h arebasedon the multivariate normaldistribution. Givena normally-
distributed random ve tor, denoted by
~z = (z 1 , z 2 , . . . , z n ) T
, the mutationoperator isthendened asfollows:
~x N EW = ~x OLD + ~z
(1.9)A multivariate normal distribution is uniquely dened by a ovarian e ma-
trix,
C ∈ R n ×n
, whi h is a symmetri positive semi-denite matrix, as well asbyamean ve torm ~ ∈ R n
. Itsprobability densityfun tion (PDF)isgivenby:
Φ pdf N (~z) = 1
p(2π) n det C · exp
− 1
2 (~z − ~ m) T · C −1 · (~z − ~ m)
(1.10)
Arandomve tor
~z
drawnfromamultivariatenormaldistribution,isdenoted by~z ∼ N (~ m, C) .
The ES mutation operator always onsiders a distribution with zero
mean, i.e.,
m = ~0 ~
, and thus the ovarian e matrixC
is the dening om- ponent ofthis operator. Itis hara terized byits(n · (n − 1)) /2
ovarian eelements,
c ij = cov(x i , x j ) = cov(x j , x i ) = c ji ,
aswell asbyits
n
varian es,c ii ≡ σ 2 i = var(x i ).
Overall, we have,
C =
var(x 1 ) cov(x 1 , x 2 ) · · · cov(x 1 , x n ) cov(x 2 , x 1 ) var(x 2 ) · · · cov(x 2 , x n )
.
.
.
.
.
.
.
.
.
.
.
.
cov(x n , x 1 ) cov(x n , x 2 ) · · · var(x n )
Essentially, the
(n · (n + 1)) /2
independent elements of the ovarian e ma- trix are the endogenous strategy parameters that evolve along with thein-dividual:
~s ← C,
i.e., the strategy parameter ve tor
~s
represents the ovarian e matrixC
in this ase.For the denition of the update rule for the strategy parameters, it is
onvenient to representthe o-diagonalelementsof
C
bymeansoftherota- tionalangles between the prin ipal axesof thede isionparameters. Letα ij
denote theseangles,
c ij = cov (x i , x j ) = 1
2 (var(x i ) − var(x j )) · tan (2α ij )
(1.11)A ording to the self-adaptation prin iple, the ovarian e matrix elements
also evolve every generation. The adaptation of the ovarian e matrix ele-
ments is di tatedby non-linear update rules: Thediagonal terms,
c ii = σ i 2
,areupdateda ordingto thelog-normal distribution:
σ i N EW = σ OLD i · exp τ ′ · N (0, 1) + τ · N i (0, 1)
(1.12)
and the o-diagonal termsareupdatedthrough therotational angles:
α N EW ij = α OLD ij + β · N ℓ (0, 1)
(1.13)where
N (0, 1)
,N i (0, 1)
, andN ℓ (0, 1) (ℓ = 1, . . . , (n · (n − 1)) /2)
denote in-dependent random variables, and where
τ ∼ 1/ p2√n
,τ ′ ∼ 1/ √
2n
, andβ = 180 5 π
are onstants. Afterthosetwoupdatesteps,the ovarian e matrixFigure1.1: Mutationellipsoidsfor
n = 2
,drawnfromageneralnon-singular ovarian ematrix,withc 1,2 ∼ tan (2α 1,2 )
. Figure ourtesyofThomasBä k.Geometri al Interpretation Theequalprobabilitydensity ontourlines
ofamultivariatenormaldistributionareellipsoids, enteredaboutthemean.
The prin ipal axes of the ellipsoids are dened by the eigenve tors of the
ovarian e matrix
C
. The lengths of the prin ipal axes are proportionate to the orresponding eigenvalues. Figure 1.1 provides an illustration formutation ellipsoids inthe ase of
n = 2
.Correlated Mutations: Strategy Considerations Given a de ision
parameterspa eofdimension
n
,ageneralmutation- ontrolme hanism on- siders the ovarian e matrixC
, but may apply various dierent strategies, for omputational onsiderations. There arethree ommon approa hes:1. A ovarian e matrix proportionate to theidentitymatrix, i.e., having
a singlefreestrategy parameter
σ
,oftenreferredto astheglobal step-size:
C 1 = σ 2 · I
(1.14)2. A diagonalized ovarian e matrix,i.e., having ave tor of
n
freestrat-egyparameters,
σ 1 2 , σ 2 2 , ..., σ 2 n T
,typi allyreferredtoastheindividual
step-sizes:
C 2 = diag σ 2 1 , σ 2 2 , ..., σ 2 n
(1.15)
Figure 1.2: Equidensity probability ontours for the three dierent ap-
proa hes with respe t to a
2D
lands ape. Left: A single global step-size( ir les). Middle:
n
independentparameters(axis-parallelellipsoids). Right:(n · (n + 1)) /2
independentparameters(arbitrarilyorientedellipsoids). Fig- ures ourtesyof ThomasBä k[20℄.3. Ageneralnon-singular ovarian ematrix,witharbitrary
(n · (n + 1)) /2
freestrategy parameters:
C 3 = (c ij )
(1.16)Thus,thethreeapproa hesproposeordersof
O(1), O(n)
,orO(n 2 )
strat-egyparameters tobelearned,respe tively,atthe ostofdierent invarian e
properties. Obviously,asingleglobal step-size approa hisverylimitedinits
abilitytogenerate su essfulmoveson ageneri lands ape. Thegeneraliza-
tion into individual step-sizes assigns dierent varian es to ea h oordinate
axis, a hieving an invarian e with respe t to translation, but still having
dependen y on the oordinate system (no invarian e with respe t to rota-
tion). Finally,themostgeneralapproa hwithanarbitrarynormalmutation
distribution introdu es omplete invarian e with respe t to translation and
rotation. Figure 1.2oers anillustration for thethree dierent approa hes,
on agiven
2D
lands ape.Re ombination
Inspired bythe organi me hanism of a meioti ell division, where thege-
neti materialisreorderedbymeansof rossover between the hromosomes,
the ES re ombination operator onsiders sharing the information from up
to
ν
parent individuals [21℄. Whenν > 2
, itis usually referredto asmulti-re ombination. Unlike other Evolutionary Algorithms (e.g., GAs), the ES
re ombination operator obtains only asingleospring.
Due to the ontinuous nature of the parameters at hand, de ision as
parents:
•
Dis rete re ombination: one of the alleles is randomly hosen amongν
parents. Given a parental matrix of the old generation,A O =
~a O 1 ,~a O 2 , ...,~a O ν
,thenewre ombinant
~a N
is onstru ted by:~a N
i := A O m i
i , m i := rand {1, .., ν}
•
Intermediatere ombination: thevaluesofν
parentsareaveraged, typi-allywithuniformweights. Essentially,thisisequivalentto al ulating
the entroid of the
ν
parent ve tors:~a N
i := 1 ν
ν
X
j=1
~a O j
i
(1.17)There ombinationoperator inthestandard ES ouldbeapplied asfollows:
1. For ea h obje t variable hoose
ν
parents, and applydis rete re ombi-nation on the orrespondingvariables.
2. For ea h strategy parameter hoose
ν
parents, and apply intermediate re ombination on the orresponding variables.It should be noted that there are no generally known best settings of the
re ombinationoperator,and the above aretypi al implementationsof it.
Within the GA resear h, the building blo k hypothesis (BBH)(see, e.g.,
[22 ℄)oeredanexplanationfortheworkingme hanismofthe rossover: The
ombination ofgood,butyetdierent,buildingblo ks,i.e.,spe i portions
ofthegeneti en odingfromdierent parents,issupposedto bethekeyrole
for propagatinghigh tness. Thedebateoverthis hypothesishasbeenkept
alive. In ESpopulations,thediversityde reases rapidly. Therefore, BBHis
unlikely to t inasimilar wayitdoesinGApopulations.
Onthe other hand,ES resear h hasgiven rise to thegeneti repair hy-
pothesis[23 ℄, statingthatthe ommon good propertiesof thedierent par-
ents, rather than their dierent features, are the key role in the working
me hanismof re ombination. Also,re ombination would typi ally de rease
the harmful ee t of mutation and would allow for high step-sizes while
a hieving the same onvergen e rates.
Sele tion
Natural sele tion is the driving for e of organi evolution: Clearing-out an
old generation, and allowing its individuals with the tness advantage to
in rease their representation in the geneti pool of future generations. As
Algorithm 3The
(µ/ν + , λ)
Evolution Strategy1:
t ← 0
2:
P t ←
Init()
{P t ∈ S µ
: Setof solutions}3: Evaluate(
P t
)4: while
t < t max
do5: Sele t
ν
matingparents fromP t
{Marriage}6:
~a ′ k (t) := Recombine {P (t)} ∀k ∈ {1, . . . , λ}
{Re ombination}7:
~a ′′ k (t) := M utate {~a ′ k (t) } ∀k ∈ {1, . . . , λ}
{Mutation}8: Evaluate(
P ′ (t) := {~a ′′ 1 (t), . . . ,~a ′′ λ (t) }
)( {f (~x ′′ 1 (t)) , . . . , f (~x ′′ λ (t)) })
9: if
(µ, λ)
-ES then10:
Select {P ′ (t) }
11: elseif
(µ + λ)
-ES then12:
Select {P ′ (t) ∪ P (t)}
13: endif
14:
t ← t + 1
15: end while
Evolution Strategies adopt this prin iple, and employ deterministi op-
eratorsinorderto sele tthebest
µ
individualswiththehighesttness,e.g., minimal obje tive fun tion values, to be transferred into the next genera-tion. Two sele tion operators are introdu ed in the standard ES using an
elegant notation due to S hwefel. The notation hara terizes the sele tion
me hanism,aswellasthe number ofparents andospring involved:
• (µ + λ)
-sele tion: the next generation of parents will be the bestµ
individualssele tedoutoftheunionof urrentparentsand
λ
ospring.• (µ, λ)
-sele tion: thenextgenerationofparentswill be thebestµ
indi-viduals sele ted outof the urrent
λ
ospring.In the ase of omma sele tion, it is rather intuitive that setting
µ < λ
wouldbeane essary onditionforane ient onvergen e. Inplussele tion,
however, any
µ > 0
an be hosen in prin iple. In the latter, the so- alledelitist sele tion o urs,when thesurvivalofthebestindividual foundsofar
is guaranteed, leading to a possible s enario of a parent surviving for the
entire pro ess.
We are now in a position to introdu e a pseudo ode of the Standard
Evolution Strategy(Algorithm 3).
ANoteonPopulationSizes Oneoftheimportanttopi sinESresear h
is the study of optimal population sizes. By denition, the magnitude of
λ
determinesthenumberof fun tionevaluationspergeneration, whi hshould
Typi alpopulationsizesinES keepa ratio of
1
7
between theparent andthe ospring populations;a popular hoi eis
µ = 15
andλ = 100
(see,e.g.,[1℄ and[20 ℄).
Basedon experimentalobservations, whenindividual step-sizes are ho-
sen as strategy parameters (Eq. 1.15),
λ
has to s ale linearly withn
. Inthe aseofarbitrarynormalmutations(Eq.1.16),Rudolph[24℄showedthat
su essfuladaptationtothelands ape(i.e.,learningsu essfullytheHessian
matrix) an be a hieved with an upper bound of
µ + λ = (n 2 + 3n + 4)/2
,but itis ertainlynot likelyto bea hieved withthetypi al populationsizes
of
{µ = 15, λ = 100}
.1.3 Derandomized Evolution Strategies (DES)
Mutative step-size ontrol (MSC) tends to work well in the Standard-ES
for the adaptation of a single global step-size (Eq. 1.14), but tends to fail
when it omes to the individual step-sizes or arbitrary normal mutations
(Eq. 1.15orEq. 1.16). S hwefel laimedthattheadaptation ofthestrategy
parameters in those ases is impossible within small populations [19℄, and
suggested largerpopulations asasolution to theproblem.
Duetothe ru ialrolethatthemutationoperatorplayswithinEvolution
Strategies, its mutative step-size ontrol was investigated intensively. In
parti ular, the disruptive ee tsto whi h the MSC issubje t, were studied
at severallevels [25, 16℄, andarereviewed here:
•
Indire t sele tion. By denition, thegoal ofthe mutation operatoris to apply a sto hasti variation to an obje t variable ve tor, whi h
will in rease its sele tion probability. The sele tion of the strategy
parameters setting isindire t,i.e., the ve torof asu essful mutation
isnotusedtoadaptthestep-sizeparameters,butrathertheparameters
of the distributionthat ledto thismutation ve tor.
•
Realization of parameter variation. Due to the sampling from a randomdistribution, therealization oftheparameter variation doesnotne essarilyree tthenatureofthestrategyparameters. Thus,the
dieren e defa to between good andbadstrategy settingsof strategy
parametersisonlyree tedinthedieren ebetweentheirprobabilities
tobesele ted-whi h anberathersmall. Essentially,thismeansthat
the sele tion pro ess ofthe strategyparameters isstrongly disturbed.
•
Thestrategyparameter hangerate isdenedasthedieren ebetweenstrategy parameters oftwo su essivegenerations. Hansenand Oster-
meier[16℄arguethatthe hange rateisanimportant fa tor,asitgives
anindi ation on erningtheadaptationspeed,andthusithasadire t
inuen e on the performan e ofthe algorithm. The prin ipal laimis
The hange rate dependson themutationstrength towhi hthestrat-
egy parameters are subje t. While aiming at attaining the maximal
hangerate, thelatterisunderposedtoanupperbound,duetothe-
nitesele tioninformationthat anbetransferredbetweengenerations.
Change rates that ex eed the upper bound would lead to a sto has-
ti behavior. Moreover, the mutation strength that obtains optimal
hange rate istypi ally smallerthan theone thatobtains good diver-
sityamongthemutants- adesiredout ome ofthemutation operator,
oftenreferredtoassele tion dieren e. Thus, the oni tbetween the
obje tive of optimal hange rate versus theobje tive ofoptimal sele -
tion dieren e annot be resolved at themutation strength level[25 ℄.
A possible solutionto this oni twouldbe to unlinkthe hange rate
from themutation strength.
The so- alled derandomized mutative step-size ontrol aims to treat those
disruptive ee ts, regardlessof theproblemdimensionality,populationsize,
et .
1.3.1
(1, λ)
Derandomized ES VariantsThe on eptofderandomizedEvolution Strategieshasbeen originallyintro-
du ed by s holars at the Te hni al University of Berlin inthe beginning of
the 1990's. It wasfollowed by therelease of a newgeneration of su essful
ES variantsbyHansen,Ostermeier, andGawel zyk [26, 27,28 ,29℄.
TherstversionsofderandomizedESalgorithms introdu eda ontrolled
global step-size in order to monitor the individual step-sizes by de reasing
thesto hasti ee tsoftheprobabilisti sampling. Thesele tiondisturban e
was ompletely removed with later versions by omitting the adaptation of
strategy parametersbymeans ofprobabilisti sampling. Thiswas ombined
with individual information from the last generation (the su essful muta-
tions, i.e., of sele ted ospring), and thenadjusted to orrelated mutations.
Later on, the on ept of adaptation by a umulated information was intro-
du ed,aimingtousewiselythepastinformationforthepurposeofstep-size
adaptation: Insteadof using the information from thelast generation only,
itwassu essfullygeneralized toa weightedaverageoftheprevious genera-
tions.
Notethatthe dierent derandomized-ES variants stri tly follow a
(1, λ)
strategy, postponing the treatment of re ombination or plus-strategies for
later stages 1
. Inthis way,thequestion how to updatethestrategy parame-
ters whenan ospring doesnot improve its an estorisnot relevant here.
Moreover, the dierent variants hold dierent numbers of strategy pa-
rameters to be adapted, and this is a fa tor in the learning speed of the
1
When asked about omma versus plusstrategies, Hansen statesthat witha good
enoughalgorithmathand,employingtheplusstrategyisunne essary,asyouralgorithm
optimization routine. The dierent algorithms hold a number of strategy
parameters s aling eitherlinearly(
O(n)
parameters responsiblefor individ- ual step-sizes)or quadrati ally (O(n 2 )
parameters responsiblefor arbitrary normalmutations) withthedimensionalityn
of thesear h spa e.1.3.2 First Level of Derandomization
The so- alled rst level of derandomization a hieved the following desired
ee ts:
•
Adegreeoffreedomwithrespe ttothemutationstrengthofthestrat-egy parameters.
•
S alability of the ratio between the hange rate and the mutation strength.•
Independen eofpopulationsize withrespe tto theadaptation me h- anism.We hoose to reviewthe implementation of therst level of derandom-
izationthrough three parti ular derandomizedES variants:
DR1
Therstderandomizedattempt[26 ℄ oupledthesu essfulmutationsto the
sele tion of de isionparameters, and learned themutation step-sizeas well
asthe s alingve torbaseduponthe su essfulvariation. Themutationstep
is formulated for the
k th
individual,k = 1, . . . , λ
:~x (g+1) = ~x (g) + ξ k δ (g) ~ ξ scal k ~δ (g) scal ~z k ~z k ∈ {−1, +1} n
(1.18)Note that
~z k
is a random ve tor of±1
, rather than a normallydistributed randomve tor,while~ ξ scal k ∼ ~ N (0, 1) +
,i.e.,distributedoverthepositivepart ofthe normaldistribution. Theevaluationandsele tion arefollowedbytheadaptation of the strategy parameters (subs ripts sel refer to the sele ted
individual):
δ (g+1) = δ (g) · (ξ sel ) β
(1.19)~δ scal (g+1) = ~δ scal (g) · ~ξ scal sel + b β scal
(1.20)
P ξ k = 7 5
= P ξ k = 5 7
= 1 2
;β = p1/n
,β scal = 1/n
,b = 0.35
, andξ k ∈ 7
5 , 5 7
are onstants. Notethatthemultipli ationinEq.1.20isbetween
twove torsand arriedout aselement-by-element multipli ation,yielding a
ve tor of the samedimension
n
.DR2
The se ondderandomized ES variant [27℄ aimed to a umulate information
aboutthe orrelationoranti- orrelationofpastmutationve torsinorderto
adaptthe globalstep-size aswellastheindividualstep-sizes -byintrodu ing
a quasi-memoryve tor. Thisa umulatedinformation allowed omitting the
sto hasti element inthe adaptation of the strategy parameters - updating
themonlybymeansofsu essfulvariations,ratherthan withrandomsteps.
Themutation step forthe
k th
individual,k = 1, . . . , λ
,reads:~x (g+1) = ~x (g) + δ (g) ~δ (g) scal ~z k ~z k ∼ ~ N (0, 1)
(1.21)Introdu inga quasi-memory ve tor
Z ~
:Z ~ (g) = c~z sel + (1 − c) ~ Z (g −1)
(1.22)The adaptation of the strategy parameters a ording to the sele ted o-
spring:
δ (g+1) = δ (g) ·
exp
k ~ Z (g) k
√ n q
c 2 −c
− 1 + 1 5n
β
(1.23)
~δ (g+1) scal = ~δ (g) scal ·
Z ~ (g)
q c
2 −c
+ b
β scal
, Z ~ (g)
=
|Z 1 (g) |, |Z 2 (g) |, ..., |Z n (g) |
(1.24)
with
β = p1/n
,β scal = 1/n
,b = 0.35
, and the quasi-memory ratec = p1/n
as onstants. Note that the multipli ation in Eq. 1.24 is between twove torsand arriedout aselement-by-element multipli ation,yielding ave tor of thesamedimension
n
.DR3
Thisthirdvariant [28℄,usuallyreferredto astheGeneration Set Adaptation
(GSA), onsidered the derandomization of arbitrary normal mutations for
the rst time, aiming to a hieve invarian e with respe t to the s aling of
variables and the rotation of the oordinate system. This naturally ame
with the ost of a quasi-memory matrix,
B ∈ R m ×n
,setting the dimensionof the strategy parameters spa e to
n 2 ≤ m ≤ 2n 2
. The adaptation of theglobal step-sizeismutative withsto hasti variations, justlikeintheDR1.
Themutation step isformulated for the
k th
individual,k = 1, . . . , λ
:~x (g+1) = ~x (g) + δ (g) ξ k ~y k
(1.25)~y k = c m B (g) · ~z k ~z k ∼ ~ N (0, 1)
(1.26)The updateof the memory matrix is formulated as:
B (g) = ~b (g) 1 , . . . ,~b (g) m
~b (g+1) 1 = (1 − c) ·~b (g) 1 + c · (c u ξ sel ~y sel ) , ~b (g+1) i+1 = ~b (g) i
(1.27)
The step-sizeisupdatedasfollows:
δ (g+1) = δ (g) (ξ sel ) β
(1.28)where
P ξ k = 3 2 = P ξ k = 2 3 = 1 2
;β = p1/n
,c m = (1/ √
m )(1 + 1/m)
,c = p1/n
,ξ k ∈ 3
2 , 2 3
,and
c u = p(2 − c)/c
are onstants.1.4 The Covarian e Matrix Adaptation ES
Followingaseriesofsu essfulderandomizedESvariantsaddressingtherst
level of derandomization, and a ontinuous eort at the Te hni al Univer-
sityofBerlin,theso- alledCovarian e MatrixAdaptation (CMA)Evolution
Strategy wasreleasedin1996 [29℄,asa ompletelyderandomized Evolution
Strategy the fourth generation of derandomized ESvariants.
Se ond Level ofDerandomization Theso- alledse ondlevel of deran-
domization targeted thefollowing ee ts:
•
The probability to regeneratethesame mutation stepis in reased.•
The hange rate of the strategy parameters is subje t to expli it on-trol.
•
Strategy parameters arestationarywhensubje ttorandom sele tion.These ondlevelofderandomizationwasimplementedbymeansoftheCMA.
TheCMA ombinestherobustme hanismofESwithpowerfulstatisti al
learning prin iples,andthusitissometimessubje ttoinformal riti ism for
not being a genuine Evolution Strategy. In short, it aims at satisfying the
maximum likelihood prin iple by applying Prin iple Components Analysis
(PCA) to the su essful mutations, and it uses umulative global step-size
adaptation.
1.4.1 Preliminary
One of the goals of the CMA is to a hieve a su essful statisti al learning
pro ess of the optimal mutation distribution, whi h is equivalent to learn-
ing a ovarian e matrix proportionalto the inverse of the Hessian
matrix(see,e.g., [30 ℄),without al ulating thea tual derivatives: