Niching in derandomized evolution strategies and its applications in quantum control

(1)

quantum control

Shir, O.M.

Citation

Shir, O. M. (2008, June 25). Niching in derandomized evolution strategies and its applications in quantum control. Retrieved from https://hdl.handle.net/1887/12981

Version: Corrected Publisher’s Version

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden

Downloaded from: https://hdl.handle.net/1887/12981

Note: To cite this publication please use the final published version (if applicable).

(2)

A Journey from Organi Diversity to Con eptualQuantum Designs

Ofer Mi hael Shir

(3)

(4)

A Journey from Organi Diversity to Con eptualQuantum Designs

PROEFSCHRIFT

terverkrijgingvan

de graad vanDo toraan de Universiteit Leiden,

op gezag van Re torMagni us prof. mr. P.F.van der Heijden,

volgens besluitvanhet College voor Promoties

teverdedigenop woensdag25 juni 2008

klokke16:15 uur

door

Ofer Mi hael Shir

geborente Jeruzalem, Israël in 1978

(5)

Prof. dr. ThomasBä k Promotor

Prof. dr. Mar Vrakking (Amolf-Amsterdam) Promotor

Dr. Mi hael Emmeri h Co-promotor

Prof. dr. DarrellWhitley (Colorado StateUniversity) Referent

Prof. dr. Farhad Arbab

Prof. dr. Joost Kok

Thisworkispartoftheresear hprogrammeofthe'Sti htingvoorFundamenteelOnder-

zoekderMaterie(FOM)',whi hisnan iallysupportedbythe'NederlandseOrganisatie

voorWetens happelijkOnderzoek(NWO)'.

Ni hinginDerandomizedEvolutionStrategiesanditsAppli ationsinQuantumControl.

OferMi hael Shir.

ThesisUniversiteitLeiden.

ISBN:978-90-6464-256-2

(6)

(7)

(8)

Introdu tion 1

I Ni hing in Derandomized Evolution Strategies 5

1 Evolution Strategies 7

1.1 Ba kground . . . 7

1.1.1 The Framework: Global Optimization . . . 7

1.1.2 Evolutionary Algorithms. . . 9

1.2 TheStandard EvolutionStrategy . . . 11

1.2.1 Notationand Terminology . . . 11

1.2.2 Motivation: The

(1 + 1)

^Evolution ^Strategy ^. ^. ^. ^. ^. ^. ¹²

1.2.3 The Self-Adaptation Prin iple . . . 13

1.2.4 The Canoni al

µ/ν ⁺ , λ

-ES Algorithm . . . 14

1.3 DerandomizedEvolution Strategies (DES) . . . 20

1.3.1

(1, λ)

Derandomized ESVariants . . . 21

1.3.2 First Levelof Derandomization . . . 22

1.4 TheCovarian e MatrixAdaptation ES . . . 24

1.4.1 Preliminary . . . 24

1.4.2 The

(1, λ)

^Rank-One^CMA ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ²⁶

1.4.3 The

(µ _W , λ)

^Rank-

µ

^CMA^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ²⁸

1.4.4 The

(1 + λ)

^CMA ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ²⁹

1.4.5 ConstraintsHandling . . . 30

1.4.6 Dis ussion . . . 31

2 Introdu tion to Ni hing 33 2.1 Motivation: Spe iation Theoryvs. Con eptual Designs . . . . 33

2.2 From DNAto Organi Diversity. . . 34

2.2.1 Geneti Drift . . . 34

2.2.2 Organi Diversity . . . 35

2.3 "E ologi al Optima": Basins ofAttra tion . . . 38

2.3.1 Classi ation ofOptima: The Pra ti al Perspe tive . . 39

2.4 PopulationDiversitywithin Evolutionary Algorithms . . . 39

(9)

2.4.1 Diversity LossinEvolution Strategies . . . 41

2.4.2 Point ofReferen e: DiversityLoss withinGAs. . . 44

2.4.3 NeutralityinES Variations: Mutation Drift . . . 45

2.5 Classi alNi hing Te hniques . . . 47

2.5.1 FitnessSharing . . . 48

2.5.2 Dynami FitnessSharing . . . 49

2.5.3 Clearing . . . 49

2.5.4 Crowding . . . 50

2.5.5 Clustering . . . 51

2.5.6 The Sequential Ni heTe hnique . . . 52

2.5.7 The IslandsModel . . . 53

2.5.8 Other GA-BasedMethods . . . 53

2.5.9 Mis ellaneous: Mating S hemes . . . 54

2.6 Ni hinginEvolution Strategies . . . 55

2.7 Dis ussion andMission Statement . . . 55

3 Ni hing with Derandomized Evolution Strategies 57 3.1 General . . . 57

3.2 TheProposed Algorithm . . . 57

3.2.1 Ni hingwith

1 ⁺ , λ

DES Kernels . . . 58

3.3 Ni heRadius Cal ulation . . . 60

3.4 Experimental Pro edure . . . 60

3.4.1 Multi-Modal TestFun tions . . . 61

3.4.2 Performan eCriteria . . . 63

3.4.3 New Perspe tive: MPR vs.Time . . . 65

3.4.4 MPR Analysis: Previous Observation . . . 65

3.5 Numeri alObservation . . . 66

3.5.1 Modus Operandi . . . 66

3.5.2 Numeri alResults . . . 66

3.5.3 Dis ussion . . . 70

4 Self-Adaptive Ni he-Shape Approa hes 71 4.1 General . . . 71

4.1.1 Related Work . . . 71

4.1.2 Our Approa h . . . 73

4.2 New ProposedApproa hes . . . 73

4.2.1 Self-Adaptive Radius: Step-SizeCoupling . . . 74

4.2.2 Mahalanobis Metri : Covarian e Exploitation . . . 77

4.3.1 Numeri alObservation . . . 79

4.3.2 General Behavior . . . 85

(10)

5 Ni hing-CMA as EMOA 87

5.1 Multi-Obje tiveOptimization . . . 87

5.1.1 Formulation . . . 87

5.1.2 The NSGA-IIAlgorithm . . . 89

5.2 OnDiversity inMulti-Obje tive Optimization . . . 91

5.2.1 Related Work . . . 92

5.3 Multi-Parent Ni hingwith

(µ _W , λ)

^-CMA ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ^. ⁹⁵

5.4 Ni hing-CMAasEMOA . . . 96

5.4.1 The Ni hingDistan e Metri . . . 97

5.4.2 Sele tion: Non-dominating Ranking . . . 97

5.4.3 Estimation ofthe Ni he Radius . . . 97

5.5 Numeri alSimulations . . . 99

5.5.1 Test Fun tions: Arti ialLands apes. . . 99

5.5.2 Modus Operandi . . . 100

II Quantum Control 105 6 Introdu tion to Quantum Control 107 6.1 OptimalControl Theory . . . 108

6.1.1 The QuantumControl Framework . . . 108

6.1.2 Controllability . . . 112

6.1.3 Control LevelSets . . . 113

6.1.4 Computational Complexity . . . 115

6.2 OptimalControl Experiments . . . 117

6.2.1 Femtose ond LaserPulse Shaping. . . 117

6.2.2 LaboratoryRealization: Constraints . . . 119

6.3.1 Numeri alSimulations . . . 122

6.3.2 LaboratoryExperiments . . . 123

7 Two PhotonPro esses 125 7.1 Introdu tion . . . 125

7.2 Se ondHarmoni Generation . . . 125

7.2.1 Total SHG . . . 126

7.2.2 FilteredSHG . . . 128

7.3 Numeri alSimulations . . . 130

7.3.1 PreliminaryES Failure: Stret hed Phases . . . 130

7.4 LaboratoryExperiments . . . 132

7.4.1 Performan eEvaluations. . . 133

(11)

8 The Rotational Framework 139

8.1 Numeri alModeling . . . 139

8.1.1 Preliminary: Two Ele troni States Systems . . . 139

8.1.2 Rotational Levels . . . 140

8.2 PopulationTransfer: Optimization . . . 141

8.2.1 Experimental Pro edure . . . 142

8.2.2 Numeri alObservation:

J = 0 −→ J = 4

^. ^. ^. ^. ^. ^. ^. ^. ¹⁴³

8.2.3 IntermediateDis ussion . . . 144

8.3 Appli ation of Ni hing . . . 146

8.3.1 Preliminary: Distan eMeasure . . . 146

9 Dynami Mole ular Alignment 151 9.1 Numeri alModeling . . . 152

9.1.1 Numeri alSimulations: Te hni al Details . . . 153

9.2.1 First Numeri alResults: Comparisonof theAlgorithms155 9.2.2 The Complete-Basis-Fun tions Parameterization . . . 155

9.2.3 Further Investigation . . . 164

9.3 TheZeroKelvin CaseStudy . . . 165

9.3.1 Con eptual Quantum Stru tures . . . 168

9.3.2 Maximally Attained Yield . . . 169

9.3.3 Another Perspe tive to Optimality: Phasing-Up . . . . 170

9.4 Evolutionof PulsesunderDynami Intensity . . . 173

9.4.1 Evolutionary AlgorithmsinDynami Environments. . 173

9.4.2 Dynami IntensityEnvironment: Pro edure . . . 174

9.5 S alability: Control Dis retization . . . 180

9.6 IntermediateDis ussion . . . 184

9.7 Multi-Obje tiveOptimization . . . 185

9.7.1 Choi eof Methods . . . 185

9.8 Appli ation of Ni hing . . . 191

Summary and Outlook 197

A Additional Figures 203

B Complete-Basis Fun tions 221

Bibliography 225

Samenvatting (Dut h) 243

(12)

Prin eHamlet,Hamlet;WilliamShakespeare

Introdu tion

Optimal behavior of natural systems is frequently en ountered at all lev-

els of everyday life, and thus hasbe ome a major sour e of inspiration for

variouselds. Thedis iplineofNaturalComputingaimsatdeveloping om-

putational te hniquesthatmimi olle tive phenomenainnaturethatoften

exhibit ex ellent behavior in information pro essing. Among a long list of

natural omputing bran hes, we are parti ularly interested in the fas inat-

ingeldofOrgani Evolution,andits omputationalderivative,theso- alled

Evolutionary Algorithms (EAs) eld. By en odingan optimizationproblem

into an arti ialbiologi al environment, EAsmimi ertain elementsin the

Darwinian dynami s and aim at obtaining highly-t solutions in terms of

theproblem. Apopulationoftrialsolutionsundergoarti ialvariationsand

survive this simulationuponthe riteria posedbythesele tion me hanism.

Analogously,itissuggestedthatthispopulationwouldevolveinto highly-t

solutions of theoptimization problem.

The original goal of this work was to extend spe i variants of EAs,

alled Evolution Strategies (ES), to subpopulations of trial solutions whi h

evolve inparallel to various solutions ofthe problem. Thisidea stemsfrom

theevolutionary on eptoforgani spe iation. Essentially,thenatural om-

puting wayof thinking isrequired here tofurther deepen into Evolutionary

Biology Theory,andattain reativesolutions for thearti ial populationin

light of the desired spe iation ee t. The so- alled ni hing te hniques are

the extension of EAs to spe iation forming multiple subpopulations. They

have been investigated sin e the earlydays of EAs, mainly within thepop-

ular variants of Geneti Algorithms (GAs). In addition to the theoreti al

hallengeto designsu hte hniques, whi his well supported by thebiologi-

ally inspiredmotivation,there isa real-world in entive for thiseort. The

dis ipline of de ision making, whi h makesdire t benet out of the advent

of the global optimization eld, poses the demand for the multipli ity of

dierent optimal solutions. Ideally,those multiplesolutions, asobtained by

the optimization routine, would have high diversityamong ea h other, and

represent dierent on eptual designs.

Aiming at largely devoting this resear h to ni hing in ES, we were also

originallyinterestedinapplyingourproposedalgorithmstoexperimentalop-

timization. More spe i ally, we were aimingat appli ations in theemerg-

(13)

ingeldofQuantumControl (QC).Thelatteroersanenormousvarietyof

high-dimensional ontinuous optimization problems, both at the theoreti al

aswellastheexperimental levels. Inthatrespe t,itispotentiallyaheavenly

testbedforEvolutionaryoptimization,andparti ularlyforni hingmethods.

ThisisduetosomeremarkablepropertiesofQClands apes,whi htypi ally

possess an innite number of optimal solutions, as proved by QC Theory.

We thus nd the ombinationof resear h onni hing andthe appli ation to

QC lands apes very attra tive. After being exposed to this overwhelming

treasureofQClands aperi hness,wede idedtodevoteanindependentpart

of thisdissertation to Quantum Control.

Symboli ally,this interdis iplinarystudy forms a losed natural omput-

ing ir le, where biologi ally-orientedinvestigation oforgani evolutionand

spe iation helps to develop methods for solving appli ations in Physi s in

general,and inQuantumControl inparti ular. Byour re koning, thissym-

bolism is even further strengthened upon onsidering thesto hasti nature

of Evolutionary Algorithms; This pro ess an be thus onsidered as throw-

ingdi einorder to solveQuantumMe hani s,sometimes referredto asthe

s ien e of di e.

Thus, biologi ally inspired by organi evolution in general, and organi

spe iationinparti ular,armedwiththereal-worldin entivetoobtainmulti-

pleoptimalsolutionsforbetterde isionmaking,weherebybeginourjourney

from diversity innatureto on eptualdesigns inQuantumControl.

This dissertation therefore onsists of two parts: Part I introdu es a

ni hingframeworktoasetof state-of-the-artESalgorithms,namely Deran-

domized Evolution Strategies (DES), and fo uses on testing the proposed

algorithms on arti ial lands apes. Part II reviews the main aspe ts of

Quantum Control in the general ontext of global fun tion optimization.

It then presents the experimental observation of Derandomized ES as well

as the proposed ni hing algorithms when applied to several QC systems,

both at the laboratory and at the numeri al simulations levels. As far as

we know,this is thersttimethat Quantum Control sear hlands apesare

omprehensively introdu ed to the ommunity ofComputer S ien e.

Part I begins with presenting thealgorithmi kernels of this study, De-

randomized Evolution Strategies. This is done in Chapter 1 by providing

the reader withthe essential terminology of global optimization, reviewing

the fundamentals of the ES eld, and eventually introdu ing expli itly, in

detail, thederandomized algorithms.

Upondevelopingani hingframeworkforEvolutionStrategies,somepre-

liminary topi shad to be addressed. We properly introdu e the real-world

in entive for ni hing, namely the sele tion of on eptualdesigns by thede-

(14)

Spe iation Theory, dis uss the ru ialaspe tof population diversity within

ES, and nally present a short overview of previously introdu ed ni hing

te hniques. Chapter2aimsat addressingthose topi s,andthereforeit on-

stitutes an important preliminary study for the derivation of our ni hing

framework. Due to the highly interdis iplinary nature of the ni hing re-

sear h, this hapter presents a parti ularly high diversity of topi s, whi h

arelinked byni hing.

InChapter 3 we present our proposed framework of ni hing within De-

randomized ES. Wedes ribeit indetail, and thereaftertest iton a suite of

multimodalarti iallands apes. Weanalyzethenumeri alobservation,and

dis uss thealgorithmi performan e.

Chapter 4 extends the framework of Chapter 3 to self-adaptive ni he-

shape approa hes, for solving theso- alled ni he radius problem. This isan

important topi inthe eld ofni hing, asit attempts to treatthe hallenge

of dening a generi basin of attra tion without a-priori knowledge on the

lands ape.

Another extension of our proposed ni hing framework, this time to the

eld of Multi-Obje tive Optimization, is introdu ed in Chapter 5. As the

twoeldsofni hingandmulti- riterion optimization, orrespondingtomul-

timodal and multiobje tive problems, respe tively, have many aspe ts in

ommon, we show the feasibility of utilizing our ni hing framework in a

multi-obje tive approa h. This on ludesPartIof thethesis.

The goal that Part II aims to a hieve is two-fold: Firstly, properly in-

trodu ing themainoptimization aspe ts oftheQuantumControl eld, and

se ondly, presenting our work on the optimization of a spe i Quantum

Control problem, namely Dynami Mole ular Alignment. We thus begin

Chapter 6 with a detailed review of Quantum Control Theory and Experi-

ments. Thereviewoutlinesfundamental on eptsofQuantumControl The-

ory, and mainly fo uses on theorems on erning the riti al points of the

lands apes, as well ason lands ape ri hness and multipli ity of optimal so-

lutions. It thenpresents Quantum Control Experiments, and dis usses our

experimentalsetup for Part II.

Chapter7des ribesourinvestigationoftwooptimizationproblems orre-

spondingtoQuantumControlsystemsofSe ondHarmoni Generation. We

ondu texperimentsontheseoptimizationproblems,bymeansofnumeri al

simulations aswellaslaboratory experiments, by employing spe i Deran-

domized ES variants. It is the only hapter where we report on real-world

laboratoryexperiments,whilethefollowing haptersfo usonnumeri alsim-

ulations ex lusively.

Chapter 8 is devoted to the introdu tion of the rotational framework,

thefundamental framework upon whi h the Dynami Mole ular Alignment

(15)

toourworkonthealignmentprobleminvestigatedinChapter9. Followinga

detailedQuantumMe hani aldes riptionoftheframework,Chapter8poses

therotationalpopulationtransferoptimizationproblem. Itthenpresentsour

numeri alobservationoftheDerandomizedESemployment totheproblem,

and nalizesthe hapter withapplyingour proposedni hingalgorithms.

Chapter9reportsindetailonourworkontheDynami Mole ularAlign-

ment, whi h onstitutes the main appli ation in our resear h on Quan-

tum Control lands apes. It des ribes the alignment problem, and then

presentsvariousoptimizationapproa hesthatweemployedinadditiontothe

straightforward appli ation of Derandomized ES. These approa hes in lude

a spe ial parameterization method developed for this purpose, optimality

investigation of a simplied variant, optimization subje tto a dynami ally

varying environment, multi-obje tive onsideration of theproblem, and, -

nally, the appli ation ofni hing.

We thereafter omplete this journey by summarizing our main results

and bypresenting promising dire tionsfor futureresear h.

A Te hni alNote Duetote hni al printing onsiderations,severalplots

from various hapters are on entrated in Appendix A. In these parti ular

ases, aplot isreferredto inthetext asFigureA.x.

(16)

Ni hing in Derandomized

Evolution Strategies

(17)

(18)

su essive,slight modi ations, my theory wouldabsolutely

breakdown.

CharlesDarwin

Chapter 1

Evolution Strategies

1.1 Ba kground

TheparadigmofEvolutionaryComputation (EC),whi hisgleanedfromthe

modeloforgani evolution,studiespopulationsof andidatesolutionsunder-

goingvariationsandsele tion,andaimsatbenetingfromthe olle tivephe-

nomena of their generational behavior. The term Evolutionary Algorithms

(EAs) essentially refers to the olle tion of su h generi methods, inspired

by the theory of natural evolution, that en ode omplex problems into an

arti ial biologi al environment, dene its geneti operators, and simulate

its propagationintime. Motivatedbythe basi prin iplesoftheDarwinian

theory, itissuggested thatsu hsimulation wouldyield anoptimal solution

for the given problem.

Evolutionary Algorithms [1℄ have three main streams, rooted either in

theUnited States or inGermany,during the1960s: Evolutionary Program-

ming (EP),founded byL.FogelinSan-Diego[2℄,Geneti Algorithms (GAs)

founded by J. Holland in Ann Arbor [3, 4℄, and Evolution Strategies (ES),

founded by P. Bienert, H.P. S hwefel and I. Re henberg, three students to

thattime attheTe hni al Universityof Berlin(see,e.g., [5,6,7℄).

EvolutionStrategiesforglobalparameteroptimization,thegeneralframe-

work ofthis study,isreviewed inthis hapter. Westart withlaying out the

basi foundationsand denitions.

1.1.1 The Framework: Global Optimization

Letus introdu e theelementary terminology ofa ontinuousreal-valued pa-

rameter optimizationproblem [8℄. Thefollowing denitionex ludes dis rete

and mixed-integer problems. Given an obje tive fun tion, also alled the

target fun tion,

f : S ⊆ R ⁿ → R, S 6= ∅

(19)

where

S

^is^the^set ^of^feasible^solutions

S = {~x ∈ R ⁿ | g j (~x) ≥ 0 ∀j ∈ {1, ..., q}} , g j (~x) : R ⁿ → R

subje tto

q

^inequality onstraints

g _j (~x)

^, ^the^goal^is ^to ^nd ^a^ve
tor

~x ^∗ ∈ S

whi h satises

∀~x ∈ S : f(~x) ≥ f(~x ^∗ ) ≡ f ^∗

^(1.1)

Then,

f ^∗

îs ^dened âs ^the ^global ^minimum ând

~x ^∗

^is ^the ^global ^minimum

lo ation.

Dueto

min {f(~x)} = −max{−f(~x)},

itisstraightforwardto onverteveryminimizationproblemintoamaximiza-

tion problem. Thus,without lossof generality,we shallassumea minimiza-

tion problem, unlessspe iedotherwise.

Alo al minimum

f = f (ˆ ˆ ~x)

^is^dened ⁱⁿ^the^following ^manner:

∃ǫ > 0 ∀~x ∈ S : ~ x − ˆ~x

< ǫ ⇒

f ˆ ≤ f(~x)

Unimodality vs. Multimodality A lands apeis saidto beunimodal if

it hasonly asingle minimum, and multimodal otherwise. Itis alledmulti-

global if there are several minima with equal fun tion values as the global

minimum.

Global Minimum in Pra ti e: Chara terization While there exists

a general riterion for the automati identi ation of a lo al minimum,

su h as the zero gradient riterion, in pra ti e there is no equivalent gen-

eral riterion for the global minimum [8℄. The attempt to hara terize it is

essentially equivalent to posing the multimodal optimization problem and

dierentiating de fa to between global and lo al minima. We outline here

a theoreti al attempt to a omplish this hara terization, by means of the

important on ept oflevel sets [9,10℄. Given a level set,

L _f (α) = {~x| ~x ∈ S, f (~x) ≤ α} ,

^(1.2)

itis subje tto level set mapping, whi hdenes itsee tive domain:

G _f = {α| α ∈ R, L f (α) 6= ∅} .

^(1.3)

Assuming that

G _f

îs ômpa
t ând ^losed,

L _f (α)

^is^said ^to ^be ^lower ^semi-

ontinuous(ls ) atthe point

α ¯ ∈ G f

^if

~x ∈ L f ( ¯ α)

^,

α ⁱ ⊂ G f

^,

α ⁱ → ¯α

imply the existen eof

K ∈ N

^and ^a ^sequen
e

~x ⁱ

su h that

~x ⁱ → ~x

^and

~x ⁱ ∈ L f α ⁱ

for

i ≥ K

^.

Given this, the following is a su ient ondition for hara terizing a

(20)

Theorem 1.1.1. Let

f

^be ^a real-valued fun tion on

S ⊂ R ⁿ

^. ^If ^every

~x ∈ S

satisfying

f (~x) = ¯ α

îs êither â ^global ^minimum ôf

f ( ·)

^on

S

ôr ît îs ^not â

lo al minimumof

f ( ·)

^, ^then

L _f (α)

^is ^ls ^at

α ¯

^.

TörnandZilinskas on luded thattheextensionto multimodaldomains

makesthe optimization problemunsolvableinthegeneral ase, i.e., thereis

no e ient solution te hnique for obtaining the global minimumvalue (see

[8℄ pp. 6).

The Hessian and the Condition Number Given a real-valued twi e

dierentiable

n

-dimensionalfun tion

f

^,^the^Hessian^matrix^of

f (~x)

^is^dened

asthe matrix

H (f (~x)) =







∂ ² f

∂x ² ₁

∂ ² f

∂x 1 ∂x 2 · · · _∂x ^∂ ₁ ² _∂x ^f _n

∂ ² f

∂x 2 ∂x 1

∂ ² f

∂x ² ₂ · · · _∂x ^∂ ₂ ² _∂x ^f _n

.

. .

.

. .

.

∂ ² f

∂x n ∂x 1

∂ ² f

∂x n ∂x 2 · · · ^∂ _∂x ² ^f ² _n







(1.4)

If the se ond derivatives of

f

^are^all ontinuous, a ondition whi hwe shall assumehere, theorder ofdierentiation doesnot matter,and thustheHes-

sian matrix is symmetri . It is thenworthwhile to introdu e the ondition

number oftheHessian,as alarwhi h hara terizesitsdegreeof omplexity,

andtypi allydeterminesthedi ultyofaproblemtobesolvedbyoptimiza-

tionmethods. Let

Λ ^H _i n

i=1

^denote^theeigenvalues oftheHessian

H

,andlet

Λ ^H _min

^and

Λ ^H _max

^denote ^its ^minimal ^and ^maximal eigenvalues, respe tively.

The ondition number of theHessian matrix isdened by:

ond

(H) = Λ ^H _max

Λ ^H _min ≥ 1

^(1.5)

Ill- onditioned problems areoften lassied assu h due to large ondition

numbers (e.g.,

10 ¹⁴

⁾ ^of ^the^Hessian^on ^their lands apes.

Separability Another dening property of problemdi ultyis the sepa-

rability of theobje tivefun tion (see,e.g.,[11℄). A fun tion

f : R ⁿ → R

^is

alled separable ifit an be optimized by solving

n 1

-dimensional problems separately:

arg min

~

x f (~x) =

arg min

x 1

f (x ₁ , . . .) , . . . , arg min

x n

f (. . . , x _n )

1.1.2 Evolutionary Algorithms

WhereasESand EParesimilar algorithmsandsharemanybasi hara ter-

(21)

Algorithm 1An Evolutionary Algorithm

1:

t ← 0

2:

P t ←

^Init

()

^{

P t ∈ S ^µ

^: ^Set^of ^solutions}

3: Evaluate(

P _t

⁾

4: while

t < t _max

^do

5:

G t ←

^Generate

(P t )

^{Generate

λ

variations}

6: Evaluate(

G _t

⁾

7:

P _t+1 ←

^Sele
t

(G _t ∪ P t )

^{Rank^and ^sele
t

µ

^best}

8:

t ← t + 1

9: end while

the geneti information. Traditional GAsen ode the genome with dis rete

values (as in nature), whereas ES as well as EP do that with ontinuous

real-values. Moreover, ES and EP fo used more on development of muta-

tion operators, while in lassi al GA resear h the re ombination operator

re eived most attention. Today, GA,ES, and EP subsumeunder the term

Evolutionary Algorithms(EAs).

Here, we oer an introdu tory generi des ription of an EA. The lat-

ter onsiders apopulation (i.e., set)of individuals (i.e.,trial solutions), and

models its olle tive learning pro ess. Ea h individual in the population is

initialized a ording to an algorithm-dependent pro edure, and may arry

not onlyaspe i sear hpoint inthelands ape,butalsosome environmen-

tal information on erning the sear h. A ombination of sto hasti as well

as deterministi pro esses su h as mutation, re ombination, and sele tion,

di tatethe propagation intimetowardssu essivelybetter individuals, or-

respondingtobetterregimesof thelands ape. Thequalityofan individual,

or alternatively the merit of a trial solution, are determined by a so- alled

tness fun tion, whi h is typi ally the obje tive fun tion or its res aling.

Thus, ertainindividualsarefavoredoverothers duringthesele tion phase,

whi h isbased upon the tness evaluation of the population. The sele ted

individualsbe omethe andidatesolutions ofthenextgeneration, whilethe

others dieout.

Moreexpli itly, an EA starts withinitializing the generation ounter

t

^.

Aftergeneratingthe initialpopulationwith

µ

individualsin

S

^,^a ^set

G _t

^of

λ

newsolutionsisgeneratedbymeansofmutationandpossiblyre ombination.

The new andidate solutions are evaluated and ranked in terms of their

quality(tness value). The

µ

^best ^solutions ⁱⁿ

G _t ∪ P t

^are^sele
ted ^to ^form

the new parent population

P _t+1

^.

(22)

1.2 The Standard Evolution Strategy

Evolution Strategies were originally developed at the Te hni al University

of Berlin as a pro edure for automated experimental design optimization,

rather than a global optimizer for ontinuous lands apes. Following a se-

quen e ofsu essfulappli ations(e.g., shapeoptimization ofabendedpipe,

dragminimizationofajointplate,andhardwaredesignofatwo-phaseash-

ing nozzle), a diploma thesis [13 ℄ and a dissertation [14℄ laid out the solid

foundations for ESas anoptimization methodology. There hasbeen exten-

sive work onES analysisand algorithmi designsin e then[7, 15 ,16℄.

Thisse tion, whi hismostlybasedon[1℄ and[7℄,willdes ribethestan-

dard ES indetail. Se tion 1.2.1 will introdu e notation and basi terminol-

ogy. Se tion 1.2.2 will present the

(1 + 1)

^algorithm, ^whi
h ^was ^originally

analyzedfortheoreti alpurposes,but ontinuedtoplayanimportantrolein

several aspe ts of Evolution Strategy design. The self-adaptation prin iple

will be des ribed in Se tion 1.2.3, while Se tion 1.2.4 will outline the ES

algorithm.

1.2.1 Notation and Terminology

The typi al appli ation domain ofEvolution Strategies is theminimization

of non-linear obje tive fun tions of signature

f : S ⊆ R ⁿ → R

^. ^Given ^a

sear h problem of dimension

n

^, ^let

~x := (x ₁ , x ₂ , ..., x _n ) ^T ∈ R ⁿ

^denote ^the

set of de ision parameters or obje t variables to be optimized: It is dened

as anindividual asso iated witha trial solution. Inoptimization problems,

whi hareofourmaininterest,itisthenstraightforwardtodene thetness

of thatindividual: Itis the obje tive fun tion(s) value(s) of

~x

^,^i.e.,

f (~x)

^.

EvolutionStrategies onsidera populationof andidate solutions of the

given problem. This population undergoessto hasti aswell asdeterminis-

ti variations, with the so- alled mutation operator, and possibly with the

re ombination operator. The mutation operator is typi ally equivalent to

samplingarandom variationfromanormal distribution. Duetothe ontin-

uous nature of the parameter spa e, the biologi al term mutation rate an

be asso iated here withthe a tual size of themutation stepin thede ision

spa e, also referredto asthe mutationstrength.

Expli itly, an individual is represented by a tuple of ontinuous real-

values,sometimesreferredtoasa hromosome,whi h omprisesthede ision

parameters to be optimized,

~x

^,^their ^tness ^value,

f (~x)

^,âs ^well âsâ ^set ôf

endogenous (i.e.,evolvable) strategyparameters,

~s ∈ R ^m

^.

The

k ^th

îndividual ôf^the ^populationîs^thus ^denoted^by:

~a _k = (~x _k , ~s _k , f (~x _k ))

The dimension

m

^of ^the ^strategy ^parameter ^spa
e ^is ^sub^je
t^to ^the^desired

(23)

rametersareaunique on eptforES,inparti ularinthe ontextofthemu-

tation operator, andthey play a ru ial role intheso- alled self-adaptation

prin iple (see Se tion 1.2.3).

Strategy-spe i parameters, su h as the population hara teristi pa-

rameters

µ

^,

λ

^, ^and ^the ^so-
alled ^mixing ^number

ν

^, âre âlled êxogenous

strategy parameters, asthey are kept onstant during thesimulated evolu-

tion. The mixing numberdetermines thenumber ofindividuals involved in

the appli ation of there ombination operator.

1.2.2 Motivation: The

(1 + 1)

^Evolution ^Strategy

Re henberg [6℄ onsidereda simple

(1 + 1)

^Evolution ^Strategy,^with ^a^xed

mutation strength

σ

^, ⁱⁿ ^order ^to ⁱⁿ^vestigate analyti ally two basi obje - tive fun tions, namely the orridor model and the sphere model. From the

histori al perspe tive, that study laid out the foundationsfor the theoryof

Evolution Strategies.

Re henbergderivedexpli itlythe expressionsforthe onvergen e rate of

his

(1 + 1)

^ES ^for^the^two^models. ^By ^denition,^neitherself-adaptation nor re ombination were employed inthis strategy. Given theprobability of the

mutation operator to over a distan e

k ^′

^towards ^the ^optimum,

p(k ^′ )

^, ^the

onvergen e rate

ϕ

^is ^dened ^as ^theexpe tation of the distan e

k ^′

^overed

bythe mutation:

ϕ = Z _∞

0 p(k ^′ ) · k ^′ dk ^′

^(1.6)

Theexpressionfortheoptimalstep-sizeforthetwomodelswasrstderived.

It wasobserved to dependon theso- alled su essprobability

p _s

^,

p _s = P {f(Mutate {~x}) ≤ f(~x)} .

^(1.7)

By setting

dϕ dσ σ ^∗

= 0,

^(1.8)

the optimal step-sizes for the two models were al ulated, yielding also the

optimal su ess probabilities. The obtained values were both lose to

1/5

^,

regardlessofthe sear hspa e dimensionality. Thisledto theformulation of

the well-known

1/5th

^-su

ess ^rule:

The ratio of su essfulmutations toallmutations should be

1/5

^.

If it is greater than

1/5

^, în
rease ^the ^standard ^deviation, îf ît îs

smaller, de rease the standard deviation.

Formoredetailssee[1 ℄. Theimplementationofthe

1/5th

^-su

ess^rule^within

the

(1+1)

^-ESîs^givenâsÂlgorithm^2. Âs^pra
ti
al^hints,

p _s

^an^be^al
ulated

over intervals of

10 · n

^trials, ând ^the âdaptation ônstant ^should ^be ^set

between the boundaries

0.817 ≤ c ≪ 1

^.

(24)

Algorithm 2The

(1 + 1)

1:

t ← 0

2:

P _t ←

^Init

()

^{

P _t ∈ S

^: ^Set^of^solutions}

3: Evaluate(

P _t

⁾

4: while

t < t _max

^do

5:

~x(t) := M utate {~x(t − 1)}

^with^step-size

σ

6: Evaluate(

P ^′ (t) := {~x(t)}

⁾ ^:

{f (~x(t))}

7:

Select {P ^′ (t) ∪ P (t)}

8:

t ← t + 1

9: if

t mod n = 0

^then

10:

σ =







σ(t − n)/c

^if

p _s > 1/5 σ(t − n) · c

^if

p _s < 1/5 σ(t − n)

^if

p _s = 1/5

11: else

12:

σ(t) = σ(t − 1)

13: endif

14: end while

Itshouldbenotedthat

1/5th

^-su

ess^rule^has^been^keptâlive,ândôntin-

ued to playan important role inseveralaspe ts, in ludingthe onstru tion

of the elitist strategy of the Covarian e Matrix Adaptation ES algorithm

([17 ℄and also seeSe tion 1.4).

1.2.3 The Self-Adaptation Prin iple

Se tion1.2.2provideduswiththemotivationtoadapttheendogenousstrat-

egy parameters during the ourse of evolution, e.g., tuning the mutative

step-size a ording to the

1/5th

^-su

ess ^rule. ^The ^basi ^idea ^of ^the ^self-

adaptation prin iple is to onsider the strategy parameters as endogenous

parameters, that undergo an evolutionary pro ess themselves. The idea of

ouplingendogenousstrategyparameterstotheobje tvariables anbefound

inorganisms,whereself-repairme hanismsexist,su hasrepairenzymesand

mutator genes [18 ℄. This allows an individualto adapt to the hangingen-

vironment of its traje tory in the lands ape, while keeping the potentially

harmfulee tofmutationwithinreasonableboundaries. Hen e,whenmuta-

tive self-adaptationisapplied,thereisno deterministi ontrolinthehands

of the userwithrespe tto themutation strategy.

The ru ial laimregardingESisthatself-adaptationofstrategyparam-

eters works [19℄. It su eeds in doing so by applying the mutation, re om-

binationand sele tion operators inthestrategy, andwithouttheuseof any

exogenous ontrol. The link between strategy and de ision parameters is

(25)

basedhadfoundseveralboosting onditionsforself-adaptationtowork,su h

as re ombination on strategy parameters, sele tion pressure within ertain

bounds,and others.

1.2.4 The Canoni al

(µ/ν ⁺ , λ)

^-ES ^Algorithm

We des ribe herethespe i operatorsfor thestandard Evolution Strategy,

sometimesreferredtoasthe S hwefelapproa h,andprovidethereaderwith

the implementation details.

Mutation

The mutation operator is the dominant variation operator within ES, and

thus we hooseto elaborate inthisse tionon its hara teristi s. Asaretro-

spe tiveanalysis, we hooseto begin withtheoutline ofsome general rules

for the design of mutation operators, assuggestedbyBeyer[15℄:

1. Rea hability. Giventhe urrentgeneration ofindividuals, anyother

sear hpointinthe lands apeshouldberea hedwithinanitenumber

of mutation operations.

2. Unbiasedness. Variation operators ingeneral, andthemutation op-

erator in parti ular, should not introdu e any bias, and satisfy the

maximum entropy prin iple. In the ase of ontinuous un onstrained

lands apes, thiswouldsuggesttheuseof thenormal distribution.

3. S alability. The mutation strength should be adaptive with respe t

to the lands ape.

The ES mutation operator onsiders sto hasti ontinuous variations,

whi h arebasedon the multivariate normaldistribution. Givena normally-

distributed random ve tor, denoted by

~z = (z ₁ , z ₂ , . . . , z _n ) ^T

^, ^the ^mutation

operator isthendened asfollows:

~x ^{N EW} = ~x ^OLD + ~z

^(1.9)

A multivariate normal distribution is uniquely dened by a ovarian e ma-

trix,

C ∈ R ⁿ ^×n

^, ^whi
h ^is ^a ^symmetri^positive semi-denite matrix, as well asbyamean ve tor

m ~ ∈ R ⁿ

^. Îts^probâbility ^density^fun
tion ^(PDF)îs^given

by:

Φ ^pdf _N (~z) = 1

p(2π) ⁿ det C · exp

− 1

2 (~z − ~ m) ^T · C ⁻¹ · (~z − ~ m)

(1.10)

Arandomve tor

~z

^drawn^from^amultivariatenormaldistribution,isdenoted by

~z ∼ N (~ m, C) .

(26)

The ES mutation operator always onsiders a distribution with zero

mean, i.e.,

m = ~0 ~

^, ^and ^thus ^the ^ovarian
e ^matrix

C

is the dening om- ponent ofthis operator. Itis hara terized byits

(n · (n − 1)) /2

^ov^arian
e

elements,

c _ij = cov(x _i , x _j ) = cov(x _j , x _i ) = c _ji ,

aswell asbyits

n

^v^arian
es,

c _ii ≡ σ ² i = var(x _i ).

Overall, we have,

C =







var(x ₁ ) cov(x ₁ , x ₂ ) · · · cov(x 1 , x _n ) cov(x 2 , x 1 ) var(x 2 ) · · · cov(x 2 , x n )

.

cov(x n , x 1 ) cov(x n , x 2 ) · · · var(x n )







Essentially, the

(n · (n + 1)) /2

independent elements of the ovarian e matrix are the endogenous strategy parameters that evolve along with thein-

dividual:

~s ← C,

i.e., the strategy parameter ve tor

~s

^represents ^the ^ov^arian
e ^matrix

C

in this ase.

For the denition of the update rule for the strategy parameters, it is

onvenient to representthe o-diagonalelementsof

C

bymeansoftherota- tionalangles between the prin ipal axesof thede isionparameters. Let

α _ij

denote theseangles,

c _ij = cov (x _i , x _j ) = 1

2 (var(x _i ) − var(x j )) · tan (2α ij )

^(1.11)

A ording to the self-adaptation prin iple, the ovarian e matrix elements

also evolve every generation. The adaptation of the ovarian e matrix ele-

ments is di tatedby non-linear update rules: Thediagonal terms,

c _ii = σ _i ²

^,

areupdateda ordingto thelog-normal distribution:

σ _i ^{N EW} = σ ^OLD _i · exp τ ^′ · N (0, 1) + τ · N i (0, 1)

(1.12)

and the o-diagonal termsareupdatedthrough therotational angles:

α ^{N EW} _ij = α ^OLD _ij + β · N ℓ (0, 1)

^(1.13)

where

N (0, 1)

^,

N i (0, 1)

^, ^and

N ℓ (0, 1) (ℓ = 1, . . . , (n · (n − 1)) /2)

^denote ^in-

dependent random variables, and where

τ ∼ 1/ p2√n

^,

τ ^′ ∼ 1/ √

2n

^, ^and

β = ₁₈₀ ⁵ π

âreônstants. Âfter^those^twoûpdate^steps,^theôvârian
e ^matrix

(27)

Figure1.1: Mutationellipsoidsfor

n = 2

^,^drawn^from^a^generalnon-singular ovarian ematrix,with

c 1,2 ∼ tan (2α 1,2 )

^. ^Figure^ourtesy^of^Thomas^Bä
k.

Geometri al Interpretation Theequalprobabilitydensity ontourlines

ofamultivariatenormaldistributionareellipsoids, enteredaboutthemean.

The prin ipal axes of the ellipsoids are dened by the eigenve tors of the

ovarian e matrix

C

. The lengths of the prin ipal axes are proportionate to the orresponding eigenvalues. Figure 1.1 provides an illustration for

mutation ellipsoids inthe ase of

n = 2

^.

Correlated Mutations: Strategy Considerations Given a de ision

parameterspa eofdimension

n

^,^a^generalmutation- ontrolme hanism onsiders the ovarian e matrix

C

, but may apply various dierent strategies, for omputational onsiderations. There arethree ommon approa hes:

1. A ovarian e matrix proportionate to theidentitymatrix, i.e., having

a singlefreestrategy parameter

σ

^,^often^referred^to ^as^the^global ^step-

size:

C ₁ = σ ² · I

^(1.14)

2. A diagonalized ovarian e matrix,i.e., having ave tor of

n

^free^strat-

egyparameters,

σ ₁ ² , σ ² ₂ , ..., σ ² _n T

,typi allyreferredtoastheindividual

step-sizes:

C ₂ = diag σ ² ₁ , σ ² ₂ , ..., σ ² _n

(1.15)

(28)

Figure 1.2: Equidensity probability ontours for the three dierent ap-

proa hes with respe t to a

2D

^lands
ape. ^Left: ^A ^single ^global ^step-size

( ir les). Middle:

n

independentparameters(axis-parallelellipsoids). Right:

(n · (n + 1)) /2

independentparameters(arbitrarilyorientedellipsoids). Fig- ures ourtesyof ThomasBä k[20℄.

3. Ageneralnon-singular ovarian ematrix,witharbitrary

(n · (n + 1)) /2

freestrategy parameters:

C ₃ = (c _ij )

^(1.16)

Thus,thethreeapproa hesproposeordersof

O(1), O(n)

^,^or

O(n ² )

^strat-

egyparameters tobelearned,respe tively,atthe ostofdierent invarian e

properties. Obviously,asingleglobal step-size approa hisverylimitedinits

abilitytogenerate su essfulmoveson ageneri lands ape. Thegeneraliza-

tion into individual step-sizes assigns dierent varian es to ea h oordinate

axis, a hieving an invarian e with respe t to translation, but still having

dependen y on the oordinate system (no invarian e with respe t to rota-

tion). Finally,themostgeneralapproa hwithanarbitrarynormalmutation

distribution introdu es omplete invarian e with respe t to translation and

rotation. Figure 1.2oers anillustration for thethree dierent approa hes,

on agiven

2D

^lands
ape.

Re ombination

Inspired bythe organi me hanism of a meioti ell division, where thege-

neti materialisreorderedbymeansof rossover between the hromosomes,

the ES re ombination operator onsiders sharing the information from up

to

ν

^parent individuals [21℄. When

ν > 2

^, îtîs ûsually ^referred^to âs^multi-

re ombination. Unlike other Evolutionary Algorithms (e.g., GAs), the ES

re ombination operator obtains only asingleospring.

Due to the ontinuous nature of the parameters at hand, de ision as

(29)

parents:

•

^Dis
rete re ombination: one of the alleles is randomly hosen among

ν

^parents. ^Given â ^parental ^matrix ôf ^the ôld generation,

A ^O =

~a Ô ₁ ,~a Ô ₂ , ...,~a Ô _ν

,thenewre ombinant

~a ^N

^is onstru ted by:

~a ^N

i := A ^O _m _i

i , m _i := rand {1, .., ν}

•

Intermediatere ombination: thevaluesof

ν

^parents^are^averaged, ^typi-

allywithuniformweights. Essentially,thisisequivalentto al ulating

the entroid of the

ν

^parent ^ve
tors:

~a ^N

i := 1 ν

ν

X

j=1

~a ^O _j

i

^(1.17)

There ombinationoperator inthestandard ES ouldbeapplied asfollows:

1. For ea h obje t variable hoose

ν

^parents, ^and ^apply^dis
rete ^re
ombi-

nation on the orrespondingvariables.

2. For ea h strategy parameter hoose

ν

^parents, ^and ^apply intermediate re ombination on the orresponding variables.

It should be noted that there are no generally known best settings of the

re ombinationoperator,and the above aretypi al implementationsof it.

Within the GA resear h, the building blo k hypothesis (BBH)(see, e.g.,

[22 ℄)oeredanexplanationfortheworkingme hanismofthe rossover: The

ombination ofgood,butyetdierent,buildingblo ks,i.e.,spe i portions

ofthegeneti en odingfromdierent parents,issupposedto bethekeyrole

for propagatinghigh tness. Thedebateoverthis hypothesishasbeenkept

alive. In ESpopulations,thediversityde reases rapidly. Therefore, BBHis

unlikely to t inasimilar wayitdoesinGApopulations.

Onthe other hand,ES resear h hasgiven rise to thegeneti repair hy-

pothesis[23 ℄, statingthatthe ommon good propertiesof thedierent par-

ents, rather than their dierent features, are the key role in the working

me hanismof re ombination. Also,re ombination would typi ally de rease

the harmful ee t of mutation and would allow for high step-sizes while

a hieving the same onvergen e rates.

Sele tion

Natural sele tion is the driving for e of organi evolution: Clearing-out an

old generation, and allowing its individuals with the tness advantage to

in rease their representation in the geneti pool of future generations. As

(30)

Algorithm 3The

(µ/ν ⁺ , λ)

1:

t ← 0

2:

P _t ←

^Init

()

^{

P _t ∈ S ^µ

^: ^Set^of ^solutions}

3: Evaluate(

P _t

⁾

4: while

t < t _max

^do

5: Sele t

ν

^mating^parents ^from

P _t

^{Marriage}

6:

~a ^′ _k (t) := Recombine {P (t)} ∀k ∈ {1, . . . , λ}

{Re ombination}

7:

~a ^′′ _k (t) := M utate {~a ^′ _k (t) } ∀k ∈ {1, . . . , λ}

^{Mutation}

8: Evaluate(

P ^′ (t) := {~a ^′′ 1 (t), . . . ,~a ^′′ _λ (t) }

⁾

( {f (~x ^′′ 1 (t)) , . . . , f (~x ^′′ _λ (t)) })

9: if

(µ, λ)

^-ES ^then

10:

Select {P ^′ (t) }

11: elseif

(µ + λ)

^-ES ^then

12:

Select {P ^′ (t) ∪ P (t)}

13: endif

14:

t ← t + 1

15: end while

Evolution Strategies adopt this prin iple, and employ deterministi op-

eratorsinorderto sele tthebest

µ

individualswiththehighesttness,e.g., minimal obje tive fun tion values, to be transferred into the next genera-

tion. Two sele tion operators are introdu ed in the standard ES using an

elegant notation due to S hwefel. The notation hara terizes the sele tion

me hanism,aswellasthe number ofparents andospring involved:

• (µ + λ)

-sele tion: the next generation of parents will be the best

µ

individualssele tedoutoftheunionof urrentparentsand

λ

^ospring.

• (µ, λ)

-sele tion: thenextgenerationofparentswill be thebest

µ

^indi-

viduals sele ted outof the urrent

λ

^ospring.

In the ase of omma sele tion, it is rather intuitive that setting

µ < λ

wouldbeane essary onditionforane ient onvergen e. Inplussele tion,

however, any

µ > 0

^an ^be ^hosen ⁱⁿ ^prin
iple. ^In ^the ^latter, ^the ^so-
alled

elitist sele tion o urs,when thesurvivalofthebestindividual foundsofar

is guaranteed, leading to a possible s enario of a parent surviving for the

entire pro ess.

We are now in a position to introdu e a pseudo ode of the Standard

Evolution Strategy(Algorithm 3).

ANoteonPopulationSizes Oneoftheimportanttopi sinESresear h

is the study of optimal population sizes. By denition, the magnitude of

λ

determinesthenumberof fun tionevaluationspergeneration, whi hshould

(31)

Typi alpopulationsizesinES keepa ratio of

1

7

^between ^the^parent ^and

the ospring populations;a popular hoi eis

µ = 15

^and

λ = 100

^(see,^e.g.,

[1℄ and[20 ℄).

Basedon experimentalobservations, whenindividual step-sizes are ho-

sen as strategy parameters (Eq. 1.15),

λ

^has ^to ^s
ale ^linearly ^with

n

^. ^In

the aseofarbitrarynormalmutations(Eq.1.16),Rudolph[24℄showedthat

su essfuladaptationtothelands ape(i.e.,learningsu essfullytheHessian

matrix) an be a hieved with an upper bound of

µ + λ = (n ² + 3n + 4)/2

^,

but itis ertainlynot likelyto bea hieved withthetypi al populationsizes

of

{µ = 15, λ = 100}

^.

1.3 Derandomized Evolution Strategies (DES)

Mutative step-size ontrol (MSC) tends to work well in the Standard-ES

for the adaptation of a single global step-size (Eq. 1.14), but tends to fail

when it omes to the individual step-sizes or arbitrary normal mutations

(Eq. 1.15orEq. 1.16). S hwefel laimedthattheadaptation ofthestrategy

parameters in those ases is impossible within small populations [19℄, and

suggested largerpopulations asasolution to theproblem.

Duetothe ru ialrolethatthemutationoperatorplayswithinEvolution

Strategies, its mutative step-size ontrol was investigated intensively. In

parti ular, the disruptive ee tsto whi h the MSC issubje t, were studied

at severallevels [25, 16℄, andarereviewed here:

•

Îndire
t ^sele
tion. ^By ^denition, ^the^goal ôf^the ^mutation ôperator

is to apply a sto hasti variation to an obje t variable ve tor, whi h

will in rease its sele tion probability. The sele tion of the strategy

parameters setting isindire t,i.e., the ve torof asu essful mutation

isnotusedtoadaptthestep-sizeparameters,butrathertheparameters

of the distributionthat ledto thismutation ve tor.

•

Realization of parameter variation. Due to the sampling from a randomdistribution, therealization oftheparameter variation does

notne essarilyree tthenatureofthestrategyparameters. Thus,the

dieren e defa to between good andbadstrategy settingsof strategy

parametersisonlyree tedinthedieren ebetweentheirprobabilities

tobesele ted-whi h anberathersmall. Essentially,thismeansthat

the sele tion pro ess ofthe strategyparameters isstrongly disturbed.

•

^The^strategy^parameter^hange^rate ^is^dened^as^the^dieren
e^between

strategy parameters oftwo su essivegenerations. Hansenand Oster-

meier[16℄arguethatthe hange rateisanimportant fa tor,asitgives

anindi ation on erningtheadaptationspeed,andthusithasadire t

inuen e on the performan e ofthe algorithm. The prin ipal laimis

(32)

The hange rate dependson themutationstrength towhi hthestrat-

egy parameters are subje t. While aiming at attaining the maximal

hangerate, thelatterisunderposedtoanupperbound,duetothe-

nitesele tioninformationthat anbetransferredbetweengenerations.

Change rates that ex eed the upper bound would lead to a sto has-

ti behavior. Moreover, the mutation strength that obtains optimal

hange rate istypi ally smallerthan theone thatobtains good diver-

sityamongthemutants- adesiredout ome ofthemutation operator,

oftenreferredtoassele tion dieren e. Thus, the oni tbetween the

obje tive of optimal hange rate versus theobje tive ofoptimal sele -

tion dieren e annot be resolved at themutation strength level[25 ℄.

A possible solutionto this oni twouldbe to unlinkthe hange rate

from themutation strength.

The so- alled derandomized mutative step-size ontrol aims to treat those

disruptive ee ts, regardlessof theproblemdimensionality,populationsize,

et .

1.3.1

(1, λ)

Derandomized ES Variants

The on eptofderandomizedEvolution Strategieshasbeen originallyintro-

du ed by s holars at the Te hni al University of Berlin inthe beginning of

the 1990's. It wasfollowed by therelease of a newgeneration of su essful

ES variantsbyHansen,Ostermeier, andGawel zyk [26, 27,28 ,29℄.

TherstversionsofderandomizedESalgorithms introdu eda ontrolled

global step-size in order to monitor the individual step-sizes by de reasing

thesto hasti ee tsoftheprobabilisti sampling. Thesele tiondisturban e

was ompletely removed with later versions by omitting the adaptation of

strategy parametersbymeans ofprobabilisti sampling. Thiswas ombined

with individual information from the last generation (the su essful muta-

tions, i.e., of sele ted ospring), and thenadjusted to orrelated mutations.

Later on, the on ept of adaptation by a umulated information was intro-

du ed,aimingtousewiselythepastinformationforthepurposeofstep-size

adaptation: Insteadof using the information from thelast generation only,

itwassu essfullygeneralized toa weightedaverageoftheprevious genera-

tions.

Notethatthe dierent derandomized-ES variants stri tly follow a

(1, λ)

strategy, postponing the treatment of re ombination or plus-strategies for

later stages 1

. Inthis way,thequestion how to updatethestrategy parame-

ters whenan ospring doesnot improve its an estorisnot relevant here.

Moreover, the dierent variants hold dierent numbers of strategy pa-

rameters to be adapted, and this is a fa tor in the learning speed of the

1

When asked about omma versus plusstrategies, Hansen statesthat witha good

enoughalgorithmathand,employingtheplusstrategyisunne essary,asyouralgorithm

(33)

optimization routine. The dierent algorithms hold a number of strategy

parameters s aling eitherlinearly(

O(n)

^parameters responsiblefor individual step-sizes)or quadrati ally (

O(n ² )

^parameters responsiblefor arbitrary normalmutations) withthedimensionality

n

^of ^the^sear
h ^spa
e.

1.3.2 First Level of Derandomization

The so- alled rst level of derandomization a hieved the following desired

ee ts:

•

Â^degreeôf^freedom^with^respe
t^to^the^mutation^strengthôf^the^strat-

egy parameters.

•

S alability of the ratio between the hange rate and the mutation strength.

•

Independen eofpopulationsize withrespe tto theadaptation me h- anism.

We hoose to reviewthe implementation of therst level of derandom-

izationthrough three parti ular derandomizedES variants:

DR1

Therstderandomizedattempt[26 ℄ oupledthesu essfulmutationsto the

sele tion of de isionparameters, and learned themutation step-sizeas well

asthe s alingve torbaseduponthe su essfulvariation. Themutationstep

is formulated for the

k ^th

individual,

k = 1, . . . , λ

^:

~x ^(g+1) = ~x ^(g) + ξ _k δ ^(g) ~ ξ _scal ^k ~δ ^(g) _scal ~z _k ~z _k ∈ {−1, +1} ⁿ

^(1.18)

Note that

~z _k

îs â ^random ^ve
tor ôf

±1

^, ^rather ^than ^a ^normallydistributed randomve tor,while

~ ξ _scal ^k ∼ ~ N (0, 1) ⁺

^,^i.e.,distributedoverthepositivepart ofthe normaldistribution. Theevaluationandsele tion arefollowedbythe

adaptation of the strategy parameters (subs ripts sel refer to the sele ted

individual):

δ ^(g+1) = δ ^(g) · (ξ sel ) ^β

^(1.19)

~δ _scal ^(g+1) = ~δ _scal ^(g) · ~ξ _scal ^sel + b β scal

(1.20)

P ξ k = ⁷ ₅

= P ξ k = ⁵ ₇

= ¹ ₂

^;

β = p1/n

^,

β _scal = 1/n

^,

b = 0.35

^, ^and

ξ _k ∈ ₇

5 , ⁵ ₇

are onstants. Notethatthemultipli ationinEq.1.20isbetween

twove torsand arriedout aselement-by-element multipli ation,yielding a

ve tor of the samedimension

n

^.

(34)

DR2

The se ondderandomized ES variant [27℄ aimed to a umulate information

aboutthe orrelationoranti- orrelationofpastmutationve torsinorderto

adaptthe globalstep-size aswellastheindividualstep-sizes -byintrodu ing

a quasi-memoryve tor. Thisa umulatedinformation allowed omitting the

sto hasti element inthe adaptation of the strategy parameters - updating

themonlybymeansofsu essfulvariations,ratherthan withrandomsteps.

Themutation step forthe

k ^th

individual,

k = 1, . . . , λ

^,^reads:

~x ^(g+1) = ~x ^(g) + δ ^(g) ~δ ^(g) _scal ~z _k ~z _k ∼ ~ N (0, 1)

^(1.21)

Introdu inga quasi-memory ve tor

Z ~

^:

Z ~ ^(g) = c~z _sel + (1 − c) ~ Z ^(g ⁻¹⁾

^(1.22)

The adaptation of the strategy parameters a ording to the sele ted o-

spring:

δ ^(g+1) = δ ^(g) ·



exp





k ~ Z ^(g) k

√ n q

c 2 −c

− 1 + 1 5n









β

(1.23)

~δ ^(g+1) _scal = ~δ ^(g) _scal ·



 Z ~ ^(g)

q c

2 −c

+ b





β scal

, Z ~ ^(g)

=

|Z ₁ ^(g) |, |Z ₂ ^(g) |, ..., |Z n ^(g) |

(1.24)

with

β = p1/n

^,

β _scal = 1/n

^,

b = 0.35

^, ^and ^the quasi-memory rate

c = p1/n

^as ^onstants. ^Note ^that ^the multipli ation in Eq. 1.24 is between twove torsand arriedout aselement-by-element multipli ation,yielding a

ve tor of thesamedimension

n

^.

DR3

Thisthirdvariant [28℄,usuallyreferredto astheGeneration Set Adaptation

(GSA), onsidered the derandomization of arbitrary normal mutations for

the rst time, aiming to a hieve invarian e with respe t to the s aling of

variables and the rotation of the oordinate system. This naturally ame

with the ost of a quasi-memory matrix,

B ∈ R ^m ^×n

^,^setting ^the ^dimension

of the strategy parameters spa e to

n ² ≤ m ≤ 2n ²

^. ^The ^adaptation ^of ^the

global step-sizeismutative withsto hasti variations, justlikeintheDR1.

Themutation step isformulated for the

k ^th

individual,

k = 1, . . . , λ

^:

~x ^(g+1) = ~x ^(g) + δ ^(g) ξ _k ~y _k

^(1.25)

~y _k = c _m B ^(g) · ~z k ~z _k ∼ ~ N (0, 1)

^(1.26)

(35)

The updateof the memory matrix is formulated as:

B ^(g) = ~b ^(g) ₁ , . . . ,~b ^(g) _m

~b ^(g+1) ₁ = (1 − c) ·~b ^(g) ₁ + c · (c u ξ _sel ~y _sel ) , ~b ^(g+1) _i+1 = ~b ^(g) _i

(1.27)

The step-sizeisupdatedasfollows:

δ ^(g+1) = δ ^(g) (ξ _sel ) ^β

^(1.28)

where

P ξ k = ³ ₂ = P ξ k = ² ₃ = ¹ ₂

^;

β = p1/n

^,

c _m = (1/ √

m )(1 + 1/m)

^,

c = p1/n

^,

ξ _k ∈ ₃

2 , ² ₃

,and

c u = p(2 − c)/c

^are^onstants.

1.4 The Covarian e Matrix Adaptation ES

Followingaseriesofsu essfulderandomizedESvariantsaddressingtherst

level of derandomization, and a ontinuous eort at the Te hni al Univer-

sityofBerlin,theso- alledCovarian e MatrixAdaptation (CMA)Evolution

Strategy wasreleasedin1996 [29℄,asa ompletelyderandomized Evolution

Strategy the fourth generation of derandomized ESvariants.

Se ond Level ofDerandomization Theso- alledse ondlevel of deran-

domization targeted thefollowing ee ts:

•

^The probability to regeneratethesame mutation stepis in reased.

•

^The ^hange ^rate ôf ^the ^strategy ^parameters îs ^subje
t ^to êxpli
itôn-

trol.

•

^Strategy ^parameters ^are^stationary^when^sub^je
t^to^random ^sele
tion.

These ondlevelofderandomizationwasimplementedbymeansoftheCMA.

TheCMA ombinestherobustme hanismofESwithpowerfulstatisti al

learning prin iples,andthusitissometimessubje ttoinformal riti ism for

not being a genuine Evolution Strategy. In short, it aims at satisfying the

maximum likelihood prin iple by applying Prin iple Components Analysis

(PCA) to the su essful mutations, and it uses umulative global step-size

adaptation.

1.4.1 Preliminary

One of the goals of the CMA is to a hieve a su essful statisti al learning

pro ess of the optimal mutation distribution, whi h is equivalent to learn-

ing a ovarian e matrix proportionalto the inverse of the Hessian

matrix(see,e.g., [30 ℄),without al ulating thea tual derivatives:

Niching in derandomized evolution strategies and its applications in quantum control

quantum control

Shir, O.M.

Citation

Shir, O. M. (2008, June 25). Niching in derandomized evolution strategies and its applications in quantum control. Retrieved from https://hdl.handle.net/1887/12981

Version: Corrected Publisher’s Version

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden

Downloaded from: https://hdl.handle.net/1887/12981

Note: To cite this publication please use the final published version (if applicable).

(1 + 1)

µ/ν + , λ

(1, λ)

(1, λ)

(µ W , λ)

µ

(1 + λ)

1 + , λ

(µ W , λ)

J = 0 −→ J = 4

f : S ⊆ R n → R, S 6= ∅

S

S = {~x ∈ R n | g j (~x) ≥ 0 ∀j ∈ {1, ..., q}} , g j (~x) : R n → R

q

g j (~x)

~x ∗ ∈ S

∀~x ∈ S : f(~x) ≥ f(~x ∗ ) ≡ f ∗

f ∗

~x ∗

min {f(~x)} = −max{−f(~x)},

f = f (ˆ ˆ ~x)

∃ǫ > 0 ∀~x ∈ S : ~ x − ˆ~x

< ǫ ⇒

f ˆ ≤ f(~x)

L f (α) = {~x| ~x ∈ S, f (~x) ≤ α} ,

G f = {α| α ∈ R, L f (α) 6= ∅} .

G f

L f (α)

α ¯ ∈ G f

~x ∈ L f ( ¯ α)

α i ⊂ G f

α i → ¯α

K ∈ N

~x i

~x i → ~x

~x i ∈ L f α i

i ≥ K

f

S ⊂ R n

~x ∈ S

f (~x) = ¯ α

f ( ·)

S

f ( ·)

L f (α)

α ¯

n

f

f (~x)

H (f (~x)) =















∂ 2 f

∂x 2 1

∂ 2 f

∂x 1 ∂x 2 · · · ∂x ∂ 1 2 ∂x f n

∂ 2 f

∂x 2 ∂x 1

∂ 2 f

∂x 2 2 · · · ∂x ∂ 2 2 ∂x f n

∂ 2 f

∂x n ∂x 1

∂ 2 f

∂x n ∂x 2 · · · ∂ ∂x 2 f 2 n





µ/ν ⁺ , λ

(µ _W , λ)

1 ⁺ , λ

(µ _W , λ)

f : S ⊆ R ⁿ → R, S 6= ∅

S = {~x ∈ R ⁿ | g j (~x) ≥ 0 ∀j ∈ {1, ..., q}} , g j (~x) : R ⁿ → R

g _j (~x)

~x ^∗ ∈ S

∀~x ∈ S : f(~x) ≥ f(~x ^∗ ) ≡ f ^∗

f ^∗

~x ^∗

L _f (α) = {~x| ~x ∈ S, f (~x) ≤ α} ,

G _f = {α| α ∈ R, L f (α) 6= ∅} .

G _f

L _f (α)

α ⁱ ⊂ G f

α ⁱ → ¯α

~x ⁱ

~x ⁱ → ~x

~x ⁱ ∈ L f α ⁱ

S ⊂ R ⁿ

L _f (α)

∂ ² f

∂x ² ₁

∂ ² f

∂x 1 ∂x 2 · · · _∂x ^∂ ₁ ² _∂x ^f _n

∂ ² f

∂ ² f

∂x ² ₂ · · · _∂x ^∂ ₂ ² _∂x ^f _n

∂ ² f

∂ ² f

∂x n ∂x 2 · · · ^∂ _∂x ² ^f ² _n

Λ ^H _i n

Λ ^H _min

Λ ^H _max

(H) = Λ ^H _max

Λ ^H _min ≥ 1

10 ¹⁴

f : R ⁿ → R

f (x ₁ , . . .) , . . . , arg min

f (. . . , x _n )

P t ∈ S ^µ

P _t

t < t _max

G _t

P _t+1 ←

(G _t ∪ P t )

G _t

G _t ∪ P t

P _t+1

f : S ⊆ R ⁿ → R

~x := (x ₁ , x ₂ , ..., x _n ) ^T ∈ R ⁿ

~s ∈ R ^m

k ^th

~a _k = (~x _k , ~s _k , f (~x _k ))

k ^′

p(k ^′ )