OPPOSITE-CENTER LEARNING AND ITS APPLICATION TO DIFFERENTIAL EVOLUTION

(1)

1

OPPOSITE-CENTER LEARNING AND ITS APPLICATION TO

DIFFERENTIAL EVOLUTION

By Hongpei Xu

Thesis submitted to

Graduate School of Informatics University of Amsterdam

In partial fulfillment of the requirements for the degree of

Master of Computational Science

Supervisor: Dr. Valeria Krzhizhanovskaya Scientific adviser: Dr. Christiaan Erdbrink Amsterdam 2017

(2)

2

STATEMENT OF ORIGINALITY

This document is written by Student Hongpei Xu who declares to take full respon-sibility for the contents of this document.

I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it.

(3)

3

(4)

4

ACKNOWLEDGEMENTS

My sincere gratitude to my supervisors Dr. Valeria Vladimirovna Krzhizhanovskaya for guiding me in this research.

I would like to thank my scientific advisors Dr. Christiaan Erdbrink for inspiring me to carry out this work and his help in the experiments design and thesis writing.

I would like to extend my appreciation to my teachers for sharing with me their knowledge and skills, without which I would not be able to perform this research.

(5)

5

ABSTRACT

Keywords: optimization speed-up; meta-heuristics; Opposite-Center Learning; evolutionary algorithms; contin-uous optimization; differential evolution; Opposition-Based Learning

Optimization methods are widely used in computational models to improve the outcomes or tuning the model parameters, which often benefit human real life. This thesis introduces a new sampling technique called Opposite-Center Learning (OCL) intended for convergence speed-up of meta-heuristic optimization algorithms. The simple version of OCL, 1-1 OCL, comprises an extension of Opposition-Based Learning (OBL), a sim-ple scheme that manages to boost numerous optimization methods by considering the opposite points of candidate solutions. In contrast to OBL, 1-1 OCL has a theoretical foundation – the opposite-center point is defined as the optimal choice in pair-wise sam-pling of the search space given a random starting point. A concise analytical background is provided. Based on the research of 1-1 OCL, m-n OCL is developed so that OCL can generate n points from m known points and grant their optimality in the sense of all m and n points generated by m-n OCL scheme having shorter expected distances to an ar-bitrary distributed global optimum. Computationally both the opposite-center point in 1-1 OCL and opposite-center points in m-n OCL are approximated by a lightweight Monte Carlo scheme for arbitrary dimension. Empirical results up to dimension 20 confirm that 1-1 OCL outperforms OBL and random sampling: the points generated by OCL have shorter expected distances to a uniformly distributed global optimum, where m-n OCL does even better. To further test its practical performance, both 1-1 OCL and m-n OCL are applied to differential evolution (DE). This novel scheme for continuous optimization named Opposite-Center DE (OCDE) and m-n Opposite-Center DE (MNOCDE), they employ OCL for population initialization and generation jumping. Numerical experi-ments on a set of benchmark functions for dimensions 10, 30, 50 and 100 reveal that OCDE and MNOCDE on average improves the convergence rates compared to the orig-inal DE and the Opposition-based DE (ODE), respectively, while remaining fully robust. Most promising are the observations that the accelerations shown by OCDE and OCL increase with problem dimensionality.

(6)

6

INTRODUCTION

Optimization methods are widely used in computational models to improve the outcomes or tune model parameters, which often benefit human real life in various areas such as forecasting, simulation, design, etc.

This thesis presents an improvement of Opposition-Based Learning (OBL) aimed at boosting the efficiency of meta-heuristic optimization algorithms. The central idea of OBL is to consider not only the candidate solutions generated by a stochastic iteration scheme (Tizhoosh, 2005), but also their “opposite solutions” found in the opposite regions of the search space. The OBL method has been used in many computational intelligence area such as Differential Evolution (Rahnamayan, 2006), Harmony Search (Qin, 2011), Artificial Neural Network (Shokri, 2006) and Particle Swarm Optimization (Dhahri, 2010). In optimization problems, the strategy of simultaneously examining a candidate and its opposite solution has the purpose of accelerating the convergence rate towards a globally optimal solution.

To date little effort has been put into developing the theoretical background of OBL (Xu, 2014). It was proven in (Rahnamayan, 2012) that including opposite solutions on average gives shorter expected distances to the global optimum compared to randomly sampled solution pairs. However, the obvious questions whether the opposite point as defined by OBL can be improved or whether it is theoretically the best choice were never posed.

This thesis addresses these questions by redefining the opposite point such that the expected distance of the pair consisting of the original candidate and the opposite point to the global optimum is in fact minimized. This results in a new method named Opposite-Center Learning (OCL). This thesis provides theoretical and empirical evidence of the positive effects of OCL for continuous optimization problems, since this has been the focus of most OBL applications.

(8)

8

This thesis is organized as follows. Chapter 1 introduces the background knowledge of OBL and differential evolution (DE). Chapter 2 defines the 1-1 OCL scheme and m-n OCL scheme, analyzes them theoretically and gives the computational recipe. In Chapter 3 a performance comparison between OCL and OBL is presented up to dimension 20. Next, Chapter 4 shows how OCL can be implemented to solve continu-ous optimization problems by applying it to DE, thus establishing the new algorithm: Opposite-Center Differential Evolution (OCDE) and m-n Opposite-Center Differential Evolution (MNOCDE). Because new parameters are introduced, the discussion of them are also included in Chapter 4. The experimental results of testing OCDE and MNOCDE against ODE and the original DE on a set of benchmark functions are given in Chapter 5. Finally, the Conclusion part contains concluding remarks and an outlook on possible fu-ture work.

(9)

9

1 Background on Opposition-based Learning and Differential Evolution

1.1 Background on Opposition-based Learning

In the area of machine intelligence, many algorithms are inspired by beings in the nature, such as Genetic Algorithm(GA), Particle Swarm Optimization(PSO) (James, 1995), Artificial Neural Networks(ANN) etc. There is one common aim of these algo-rithm: optimization. GA and PSO are optimization method themselves and in ANN the optimization is running through the whole algorithm.

Opposition-based learning was inspired by the opposite things in real life, the main idea is to accelerate the convergence speed of optimization algorithm by simultaneously consideration of both the original point and the opposite point (Tizhoosh, 2005).

Definition of opposite point according to OBL: Let 𝒑₀ = (𝑥₀₁, 𝑥₀₂, ⋯ , 𝑥_0𝐷) ∈ ℝ𝐷 be the starting point, with 𝑥_0𝑖 ∈ [𝑎_𝑖, 𝑏_𝑖] ⊂ ℝ, ∀𝑖 ∈ {1, 2, . . , 𝐷}, where D is the prob-lem dimensionality. Then all coordinates 𝑝_{𝑂𝐵𝐿,𝑖} of the opposite point 𝒑_𝑂𝐵𝐿 are defined by

𝑝_{𝑂𝐵𝐿,𝑖} = 𝑎_𝑖 + 𝑏_𝑖 − 𝑥_0𝑖. (1-1)

The main thought of this method is that to find a closer solution to the global opti-mal solution given one known solution, the opposite one is better than a random one. Thus, simultaneously considering the estimate solution and the opposite one would ac-celerate the convergence in that case. One important theoretical theorem in OBL area is central opposition theorem, which was proved (Rahnamayan, 2008) and revised (Ventresca, 2010).

Central Opposition Theorem (Ventresca, 2010): Let 𝒑₀ ∈ ℝ𝐷 be a randomly in-itialized point and 𝒑_𝑠 ∈ ℝ𝐷 be the solution point. 𝒑_𝑂𝐵𝐿 is the opposite point of 𝒑₀ and 𝒑_{𝑟𝑎𝑛𝑑} is a uniformly random point within the range. Then

𝑃(‖𝒑_𝑂𝐵𝐿 − 𝒑_𝑠‖ < min{‖𝒑0− 𝒑𝑠‖, ‖𝒑𝑟𝑎𝑛𝑑 − 𝒑𝑠‖})

(10)

10

Here 𝑃(∙) means the probability. In other words, the opposite point is more likely to be closer to the solution than the random point.

The OBL scheme computes both points and subsequently selects the point with the best fitness value and disposes of the other point. For a minimization problem with a fitness function F, therefore, we compute min{𝐹(𝒑₀), 𝐹(𝒑_𝑂𝐵𝐿)} and the corresponding solution (i.e. the starting point 𝒑₀ or the opposite point 𝒑_𝑂𝐵𝐿) is returned as output.

The advantage of OBL is derived from the fact that the points generated by OBL have a shorter expected distance towards the global minimum than randomly generated ones. This leads to the question whether the expected distance can be further reduced. This study gives an affirmative answer.

After the original OBL was invented, several new variations of OBL were proposed to use. Quasi-Opposition-Based Learning (QOBL), using a uniformly random point be-tween the center point and opposite point rather than the opposite point itself, was applied in differential evolution (Rahnamayan, 2007). Quasi-Reflection Opposition-Based Learn-ing, using a uniformly random point between the center point and starting point rather than the opposite point itself, was applied in Biogeography-Based Optimization (Ergezer, 2009). Center-based Sampling (Rahnamayan, 2009). Generalized Opposition-Based Learning was used in Space Transformation Search (Wang, 2009).

While OBL has been applied on so many areas, the theoretical analysis of OBL seems to lack of enough attention: it has been proved better than random in expected minimal distance to the global optimum (Rahnamayan, 2012), but not achieving the best scheme. This thesis started from theoretical analysis of OBL. First the definition of eval-uation function of opposite point scheme is formalized and through further analysis, a new scheme is discovered which can give the “best” point under the definition of evalu-ation function.

(11)

11

1.2 Background on Differential Evolution

Differential evolution (DE) (Storn, 1997) is a powerful evolutionary algorithm (EA) for solving complex global optimization problems which has abundantly been demonstrated to be efficient and robust, which has been applied to diverse fields such as system identification (Erdbrink, 2014) and antenna design (Goudos, 2011).

Similar to other evolutionary algorithms, DE contains four steps: initialization, mu-tation, recombination and selection. In this thesis, we use the DE/rand/1/bin strategy, which is one of the most common used strategy in DE and can be described as follows.

Figure 1.1 Four steps in DE optimization

Initialization: This strategy begins by uniformly randomly initializing 𝑁_𝑝 individ-uals in 𝐷 dimensional space. 𝒙_𝑖,𝐺 ∈ ℝ𝐷, 𝑖 = 1,2, ⋯ , 𝑁_𝑝, G denotes the generation number and now 𝐺 = 1. The population number 𝑁_𝑝 does not change along the optimization pro-cess.

Mutation: For each individual 𝒙_𝑖,𝐺, 𝑖 = 1,2, ⋯ , 𝑁_𝑝, a mutant vector is generated: 𝒗_𝑖,𝐺+1 = 𝒙_𝑟₁_,𝐺 + 𝐹 ∙ (𝒙_𝑟₂_,𝐺 − 𝒙_𝑟₃_,𝐺) (1-3) With 𝑟₁, 𝑟₂, 𝑟₃ ∈ {1,2, ⋯ , 𝑁_𝑝} are mutually different indexes randomly chosen from the population and different from 𝑖. The mutation factor 𝐹 is a positive real number constant and often less than 1.

Recombination (crossover): The scheme is using to generate the trial vector.

𝒖_𝑖,𝐺+1 = (𝑢_1𝑖,𝐺+1, 𝑢_2𝑖,𝐺+1, ⋯ , 𝑢_{𝐷𝑖,𝐺+1}, ) (1-4) Initialization Mutation Recombination Selection

(12)

12 Where

𝑢_{𝑗𝑖,𝐺+1} = {𝑣𝑗𝑖,𝐺+1 𝑖𝑓 𝑟𝑎𝑛𝑑(𝑗) ≤ 𝐶𝑅 𝑜𝑟 𝑗 = 𝑟𝑎𝑛𝑑𝑛(𝑖)

𝑥_{𝑗𝑖,𝐺} 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (1-5)

𝑗 = 1,2, ⋯ , 𝐷

Here 𝑟𝑎𝑛𝑑(𝑗) ∈ [0,1] is the 𝑗th uniformly random number. The crossover rate 𝐶𝑅 is a constant and has to be determined by the user. The random index 𝑟𝑎𝑛𝑑𝑛(𝑖) ∈ {1,2, ⋯ , 𝐷} is a random integer which guarantee at least one of the elements are altered.

Selection: The greedy selection is used.

𝒙_𝑖,𝐺+1 = {𝒖𝑖,𝐺+1 𝑖𝑓 𝑓(𝒖𝑖,𝐺+1) < 𝑓(𝒙𝑖,𝐺)

𝒙_𝑖,𝐺 𝑜𝑡ℎ𝑒𝑟𝑤𝑖𝑠𝑒 (1-6)

𝑖 = 1,2, ⋯ , 𝑁_𝑝

In the selection step, for each individual, the trial vector 𝒖_𝑖,𝐺+1 is chosen when it yields better fitness function value (the example shows a minimize problem). Otherwise the in-dividual retains the same value.

About the control parameters 𝑁_𝑝, 𝐹, 𝐶𝑅, these control parameters are determined by the user and often do not change over the optimization process. A reasonable choice of the population number 𝑁_𝑝 would be between 5 ∙ 𝐷 and 10 ∙ 𝐷 according to Storn and Price (Storn, 1997). For the mutation factor 𝐹 , 0.5 would be a good initial choice and value between 0.4 and 1 would be reasonable. The crossover rate 𝐶𝑅 controls the number of elements to be changed in crossover step. Low value of 𝐶𝑅 will make the search direc-tion tend to be orthogonal to the axis and high value of 𝐶𝑅 will make the optimizadirec-tion rotationally invariant (Das, 2011).

Through years of researching, DE, as a method of computational intelligence, has been applied to computational models in diverse fields. In industry design DE was used in IIR filter design (Das, 2006), robust design of a gas circuit breaker (Kim, 2007), control system design (Nobakhti, 2008) and antenna design (Goudos, 2011). DE has also been

(13)

13

applied to model training like general regression neural network (Masters, 1997) and wavelet neural networks (Chauhan, 2009), computational model parameter estimation such as estimating the kinetic model parameters (Wang, 2001), pattern synthesis prob-lems (Chen, 2008), and system identification (Yousefi, 2008) (Erdbrink, 2014).

Opposition-based differential evolution (ODE) was first introduced by Rahnama-yan in 2006 (RahnamaRahnama-yan, 2006). It applies OBL in the population initialization and be-tween the generations, see Figure 1.2. The result is promising: ODE achieves 13% con-vergence rate improvement without losing success rate on chosen benchmark functions.

Figure 1.2 The process of Opposition-based Differential Evolution

DE is chosen as the “base” optimization method in this thesis. The reason is that: 1. DE is a simple yet powerful optimization algorithm and is easy to implement; 2. This thesis wants to provide an improved version of OBL, which was used on DE to prove its ability to accelerate the optimization process; 3. DE has wild range of applications.

Initialization Mutation Recombination Selection

(14)

14

2 Opposite-Center Learning Methodology

2.1

General definition and one-dimensional case

This study deals with solving continuous global optimization problems. It is as-sumed here that a unique globally optimal solution exists. Let us start by giving the defi-nition of the opposite point as defined by OBL (Rahnamayan, 2008).

Definition of opposite point according to OBL: Let 𝒑₀ = (𝑥₀₁, 𝑥₀₂, ⋯ , 𝑥_0𝐷) ∈ ℝ𝐷 be the starting point, with 𝑥_0𝑖 ∈ [𝑎_𝑖, 𝑏_𝑖] ⊂ ℝ, ∀𝑖 ∈ {1, 2, . . , 𝐷}, where D is the prob-lem dimensionality. Then all coordinates 𝑝_{𝑂𝐵𝐿,𝑖} of the opposite point 𝒑_𝑂𝐵𝐿 are defined by

𝑝_{𝑂𝐵𝐿,𝑖} = 𝑎_𝑖 + 𝑏_𝑖 − 𝑥_0𝑖. (2-1)

The OBL scheme computes both points and subsequently selects the point with the best fitness value and disposes of the other point. For a minimization problem with a fitness function F, therefore, we compute min{𝐹(𝒑₀), 𝐹(𝒑_𝑂𝐵𝐿)} and the corresponding solution (i.e. the starting point 𝒑₀ or the opposite point 𝒑_𝑂𝐵𝐿) is returned as output.

The advantage of OBL is derived from the fact that the points generated by OBL have a shorter expected distance towards the global minimum than randomly generated ones. This leads to the question whether the expected distance can be further reduced. This study gives an affirmative answer.

We start by defining an evaluation function that allows analytical assessment of the candidate points.

Definition of evaluation function 𝑔(𝒑): Let 𝒑₀ ∈ ℝ𝐷 be a randomly initialized point and 𝒑_𝑠 ∈ ℝ𝐷 be the global optimum. The evaluation function 𝑔(𝒑) of candidate point 𝒑 ∈ ℝ𝐷 is defined as

𝑔(𝒑) = 𝔼(min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑 − 𝒑𝑠‖}) (2-2)

or equivalently as

(15)

15

Here 𝑓(𝒑_𝑠) is the supposed probability distribution function of 𝒑_𝑠 and “‖. . . ‖” is a suitable distance metric (in this thesis we consider Euclidean and squared distance, but other choices are allowed). This evaluation function measures the expected norm between the optimal point and the point nearest to it. By the logic of this function 𝑔(𝒑), it is proved that OBL performs better than random sampling under the Euclidean norm if there are global optima that are distributed uniformly (Rahnamayan, 2008). Here, the focal point of our efforts is on finding 𝒑̌ such that

𝒑

̌ = arg min

𝒑 𝑔(𝒑) (2-4)

Definition of opposite-center point 𝒑_𝑂𝐶: Let 𝒑₀ = (𝑥₀₁, 𝑥₀₂, ⋯ 𝑥_0𝐷) ∈ ℝ𝐷 be the starting point, with 𝑥₀₁, 𝑥₀₂, ⋯ , 𝑥_0𝐷 ∈ ℝand let 𝒑_𝑠 = (𝑥_𝑠1, 𝑥_𝑠2, ⋯ 𝑥_𝑠𝐷) be the global op-timal point, with 𝑥_𝑠1, 𝑥_𝑠2, ⋯ 𝑥_𝑠𝐷 ∈ ℝ. Then the opposite-center point is defined by

𝒑_𝑂𝐶 = arg min 𝒑 ∫_𝒑 ‖𝒑 − 𝒑𝑠‖𝑓(𝒑𝑠)𝑑𝒑𝒔 𝑠∈𝕋 (2-5) where 𝕋 = {𝒑_𝑠: ‖𝒑₀− 𝒑_𝑠‖ > ‖𝒑 − 𝒑𝑠‖}. (2-6)

Applying formula 2-5 to a one-dimensional problem gives an analytical solution:

𝒑_𝑂𝐶 = { 𝒑₀ 3 + 2𝑏 3 , 𝒑0 ∈ [𝑎, 𝑎 + 𝑏 2 ] 𝒑₀ 3 + 2𝑎 3 , 𝒑0 ∈ [ 𝑎 + 𝑏 2 , 𝑏] (2-7)

To illustrate definitions (2-2) and (2-5) and solution (2-7), Figure 2.1 sketches the geometric construction of the opposite point in OBL and the opposite-center point in OCL for dimension one and 𝑓(𝒑_𝑠) distributed uniformly.

(16)

16

Figure 2.1 Sketched definition of the opposite-center point 𝒑𝑶𝑪 in one dimension. The opposition-based

point 𝒑𝑶𝑩𝑳 and the starting point 𝒑𝟎 are also indicated.

If we consider the supposed global optimum distribution 𝑓(𝒑_𝑠) as the density of the search space, the opposite-center point can be described as the weighted center of the region where all points have shorter distances to it than to the starting point. Different metrics correspond to different kinds of center. For example, if we take the Euclidean norm, then the geometric median is computed. The squared distance measure corresponds to the center of mass (centroid). The selection of norm or metric will be discussed in the next subsection.

2.2 Computational scheme for higher dimensions

There is no straightforward way of finding the opposite-center point deterministi-cally for arbitrary dimension. Therefore, an iterative scheme is proposed to approximate the opposite-center point in Figure 2.2. This algorithm is inspired by the famous k-means algorithm (MacQueen, 1967).

(17)

17

Figure 2.2: Flow chart of the Opposite-Center Learning (OCL) scheme.

In Step 4, function 𝑓(𝒑_𝑠) is the supposed distribution of the optimum (the objective function). In case there is no information about the distribution, a uniform distribution can be assumed over the search space.

In simple cases (1D or 2D) the Euclidean norm is a suitable choice for the OCL scheme. However, unfortunately the region center for the Euclidean norm (geometric me-dian) is hard to compute in higher dimensions, even when 𝑓(𝒑_𝑠) is a uniform distribution which is the simplest case. In contrast, the center of mass (centroid), which corresponds to the squared distance, is computationally cheap and also suitable for Riemannian man-ifolds, where 𝑓(𝒑_𝑠) does not have to be uniform. Another advantage of squared distance metric is that it preserves the order of Euclidean norm so that the closer point in Euclidean norm will be closer in squared distance metric. So in this thesis squared Euclidean dis-tance is chosen for the evaluation function and the scheme. Despite the simplicity of this metric, though, the computation of the centroid may still be expensive in high dimensional cases. Therefore, Step 3 of the scheme (Figure 2.2) is performed with the Monte Carlo (MC) method to find an approximate centroid.

(18)

18

Experiments shown in Figure 2.3 indicate that the standard deviation of MC-sim-ulated centroids is 𝜎 ≈ 0.3 √𝑁⁄ in dimensions 5, 10, 20 and 30, with N the number of MC trials. It is shown that one thousand MC trials generally suffice to attain an error less than 1% (marked by a green dashed line in Figure 2.3) in any dimension up to 30.

Figure 2.3: Monte Carlo experiments used in Step 3 in dimensions 5, 10, 20, 30.

For the termination criterion (Step 5 in Figure 2.2) we have done another experi-ment to investigate how the iteration time affects the OCL points locations. The iterative scheme in Figure 2.2 produces a series of points that converges to the opposite-center point in the limit 𝑖 → ∞. Preliminary tests suggest that after 3 iterations, the distance be-tween iterations will be less than the standard deviation so it can be considered as con-verged. So the termination condition is set to 𝑖 = 4 in the scheme shown in Figure 2.2.

To further explain the OCL scheme, Figure 2.4 illustrates the main working of the scheme in two dimensions (same steps as in Figure 2.2).

(19)

19

Figure 2.4: Illustration of OCL scheme in two dimensions.

2.3 Proof of validity of OCL scheme

OCL Convergence Theorem (Convergence of the scheme): By the iterative

scheme shown in Figure 2.2, the array of points 𝒑_𝑖 converges to an opposite-center point.

Proof:

First, we prove the inequality 𝑔(𝒑_𝑖+1) ≤ 𝑔(𝒑_𝑖)

𝑔(𝒑_𝑖+1) = ∫ min{‖𝒑₀ − 𝒑_𝑠‖, ‖𝒑_𝑖+1− 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 (2-8) = ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑖+1− 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∉𝕋𝑖 (2-9) + ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑖+1 − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋_𝑖 (2-10) ≤ ∫ ‖𝒑₀− 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∉𝕋_𝑖 + ∫ ‖𝒑_𝑖+1 − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋_𝑖 (2-11) Recall that

(20)

20

𝕋_𝑖 = {𝒑_𝑠: ‖𝒑₀− 𝒑_𝑠‖ > ‖𝒑_𝑖 − 𝒑_𝑠‖}. (2-12) Since 𝒑_𝑖+1 = arg min

𝒑 ∫𝒑𝑠∈𝕋𝑖‖𝒑 − 𝒑𝑠‖𝑓(𝒑𝑠)𝑑𝒑𝒔, we have: 𝑔(𝒑_𝑖+1) ≤ ∫ ‖𝒑₀− 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∉𝕋_𝑖 + ∫ ‖𝒑_𝑖 − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋_𝑖 = 𝑔(𝒑_𝑖) (2-13)

The equality holds if and only if 𝒑_𝑖 is an opposite-center point. The evaluation value 𝑔(𝒑_𝑖) of 𝒑_𝑖 converges since 𝑔(𝒑) is obviously lower bounded (both the norm func-tion and the probability density funcfunc-tion are non-negative). Since 𝑔(𝒑_𝑖+1) = 𝑔(𝒑_𝑖) if and only if 𝒑_𝑖 is an opposite-center point, the array of points 𝒑_𝑖 converges to an opposite-center point, theorem proved.

Since our goal is to find 𝒑̌ such that 𝒑̌ = arg min

𝒑 𝑔(𝒑), the theorem below is

given. It illustrates the relationship between the best choice of point and the opposite-center point in the sense of evaluation function.

Theorem: If 𝒑̌ = arg min

𝒑 𝑔(𝒑), then 𝒑̌ is an opposite-center point of 𝒑0. Proof: 𝑔(𝒑̌) = ∫ min{‖𝒑₀ − 𝒑_𝑠‖, ‖𝒑̌ − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 (2-14) = ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑̌ − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∉𝕋 (2-15) + ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑̌ − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 (2-16) Suppose 𝕋 = {𝒑_𝑠|‖𝒑_𝑠 − 𝒑̌‖ < ‖𝒑_𝑠 − 𝒑₀‖} (2-17) Then

(21)

21 𝑔(𝒑̌) = ∫ ‖𝒑₀− 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∉𝕋 + ∫ ‖𝒑̌ − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 . (2-18)

In a proof by contradiction, suppose:

∃𝒑_𝑂𝐶 ≠ 𝒑̌ 𝑠. 𝑡. 𝒑_𝑂𝐶 = arg min 𝒑 ∫_𝒑 ‖𝒑 − 𝒑𝑠‖𝑓(𝒑𝑠)𝑑𝒑𝒔 𝑠∈𝕋 (2-19) Then: 𝑔(𝒑_𝑂𝐶) = ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∉𝕋 (2-20) + ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋 (2-21) Since ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∉𝕋 ≤ ∫ ‖𝒑₀− 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∉𝕋 (2-22) ∫ min{‖𝒑₀− 𝒑_𝑠‖, ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖} 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 ≤ ∫ ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 (2-23) And ∫ ‖𝒑_𝑂𝐶 − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 < ∫ ‖𝒑̌ − 𝒑_𝑠‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋 , (2-24)

We have 𝑔(𝒑_𝑂𝐶) < 𝑔(𝒑̌). Contradict to 𝒑̌ = arg min

𝒑 𝑔(𝒑)

(22)

22

2.4 Definition of m-n OCL

It is discovered that the scheme above performs well in small populations, but the expected minimal compare to OBL scheme and pure random scheme becomes worse when the sample pair increases. The reason is rather intuitive: pair-wise sampling in large populations does not guarantee its advantages. To achieve a better result in large popula-tion, a more common scheme is developed.

Definition (m-n evaluation function 𝑔_𝑀(𝑁) ): Let 𝑚 points 𝑀 = {𝒑_0,1, 𝒑_0,2, … , 𝒑_0,𝑚} be the original random initialized points set in D-dimensional space. Let 𝒑_𝑠 = (𝑥_𝑠1, 𝑥_𝑠2, ⋯ 𝑥_𝑠𝐷) be the global optimal point, where 𝑥𝑠1, 𝑥𝑠2, ⋯ 𝑥𝑠𝐷 ∈ 𝑅. The

evaluation function 𝑔_𝑀(𝑁) of 𝑛 points 𝑁 = {𝒑1, 𝒑2, … , 𝒑𝑛} is defined this way:

𝑔_𝑀(𝑁) = 𝔼 [ min

𝒑∈𝑀∪𝑁‖𝒑𝑠 − 𝒑‖] (2-25)

or equivalently as

𝑔_𝑀(𝑁) = ∫ min

𝒑∈𝑀∪𝑁‖𝒑𝑠 − 𝒑‖ 𝑓(𝒑𝑠)𝑑𝒑𝒔. (2-26)

Here 𝑓(𝒑_𝑠) is the supposed probability distribution function of 𝒑_𝑠 and “‖. . . ‖” is a suitable distance metric, in this thesis we consider Euclidean and squared distance (which is not even a metric. In the situation where the distances only have to be compared, squared distance fits well and is frequently used), but other choices are also allowed. This evaluation function measures the expected “distance” between the optimal point and the point nearest to it. Here, the focal point of our efforts is on finding 𝑁̌ such that

𝑁̌ = arg min

𝑁 𝑔𝑀(𝑁) . (2-27)

By this definition, we can define m-n opposite-center points set and m-n OCL scheme will come after that.

(23)

23

Definition (Opposite-Center Points Set 𝑁_𝑂𝐶 ): Let 𝑚 points 𝑀 = {𝒑_0,1, 𝒑_0,2, … , 𝒑_0,𝑚} be the original random initialized points in D-dimensional space. Let 𝒑_𝑠 = (𝑥_𝑠1, 𝑥_𝑠2, ⋯ 𝑥_𝑠𝐷) be the global optimal point, where 𝑥_𝑠1, 𝑥_𝑠2, ⋯ 𝑥_𝑠𝐷 ∈ 𝑅. Then the definition of opposite-center points set is:

𝑁_𝑂𝐶 = {𝒑₁, 𝒑₂, … , 𝒑_𝑛}, (2-28) where 𝒑_𝑖 = arg min 𝒑 ∫_𝒑 ‖𝒑 − 𝒑𝑠‖𝑓(𝒑𝑠)𝑑𝒑𝒔 𝑠∈𝕋𝑖 (2-29) 𝕋_𝑖 = {𝒑_𝑠|𝒑𝑖 = arg min 𝒑∈𝑀∪𝑁𝑂𝐶 ‖𝒑𝑠 − 𝒑‖} (2-30)

If we consider the supposed global optimum distribution 𝑓(𝒑_𝑠) as the density of the search space, the opposite-center point can be described as the weighted center of the region where all points have shorter distances to it than to other points. Different “dis-tance” corresponds to different kinds of center. For example, if we take the Euclidean norm, then the geometric median is computed. The squared distance measure corresponds to the center of mass (centroid). The selection of “distance” will be discussed in the next subsection.

2.5 Computational scheme of m-n OCL scheme

There is no straightforward way of finding the opposite-center point deterministi-cally for arbitrary dimension. Therefore, an iterative scheme is proposed to approximate the opposite-center point in Figure 2.5.

(24)

24

Figure 2.5: Flow chart of the m-n Opposite-Center Learning (OCL) scheme.

In Step 4, function 𝑓(𝒑_𝑠) is the supposed distribution of the optimum (the objective function). In case there is no information about the distribution, a uniform distribution can be assumed over the search space.

In simple cases (1D or 2D) the Euclidean norm is a suitable choice for the OCL scheme. However, unfortunately the region center for the Euclidean norm (geometric me-dian) is hard to compute in higher dimensions, even when 𝑓(𝒑_𝑠) is a uniform distribution. In contrast, the center of mass (centroid), which corresponds to the squared distance, is computationally cheap and also suitable for Riemannian manifolds, where 𝑓(𝒑_𝑠) does not have to be uniform. Despite the simplicity of this metric, though, the computation of the centroid may still be expensive in high dimensional cases. Therefore, Step 3 of the scheme

(25)

25

(Figure 2.5) is performed with the Monte Carlo (MC) method to find an approximate centroid.

To further explain the OCL scheme, Figure 2.6 illustrates the main working of the 2-2 OCL scheme in two dimensions (same steps as in Figure 2.5).

Figure 2.6 Illustration of 2-2 OCL scheme

It is also very important to have a good initial guess because the fitness surface of evaluation function itself is always multimodal and our scheme is essentially a local op-timization procedure. The number of local optima is low when n is small ( 𝑛 ≤ 2), nor-mally an opposite point would serve well as the initial guess in these cases. But when n is larger than 3, more explicit scheme for initial guess shall be investigated in future study. This thesis applies opposite points set as the initial guess.

The iterative scheme in Figure 2.5 produces a series of points sets that converges to the opposite-center points set in the limit 𝑖 → ∞. Preliminary tests suggest that after 3 iterations, the distance between iterations will be less than the standard deviation so it can be considered as converged. So, the termination condition is set to 𝑖 = 4 in the scheme shown in Figure 2.5.

(26)

26

2.6 Proof of validity of m-n OCL scheme

m-n OCL Convergence Theorem (Convergence of the m-n OCL scheme): By

the iterative scheme shown in Figure 2.5, the array of points set 𝑁_𝑖 converges to an oppo-site-center points set 𝑁_𝑂𝐶.

Proof: 𝕋_𝑖,𝑗 = {𝒑_𝑠|𝒑_𝑖,𝑗 = arg min 𝒑∈𝑀∪𝑁𝑖 ‖𝒑_𝑠 − 𝒑‖} (2-31) 𝑔(𝑁_𝑖+1) = ∫ min 𝒑∈𝑀∪𝑁𝑖+1 ‖𝒑_𝑠 − 𝒑‖ 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 (2-32) = ∑ ∫ min 𝒑∈𝑀∪𝑁𝑖+1 ‖𝒑_𝑠 − 𝒑‖ 𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋0,𝑗 𝑚 𝑗=1 + ∑ ∫ min 𝒑∈𝑀∪𝑁𝑖+1 ‖𝒑𝑠 − 𝒑‖ 𝑓(𝒑𝑠)𝑑𝒑𝒔 𝒑𝑠∈𝕋𝑖,𝑗 𝑛 𝑗=1 (2-33) ≤ ∑ ∫ ‖𝒑_𝑠 − 𝒑_0,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑𝒔 𝒑𝑠∈𝕋0,𝑗 𝑚 𝑗=1 + ∑ ∫ ‖𝒑_𝑠 − 𝒑_𝑖+1,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑𝒔 𝒑𝑠∈𝕋𝑖,𝑗 𝑛 𝑗=1 (2-34) Since 𝒑_𝑖+1,𝑗 = arg min 𝒑∈𝑀∪𝑁_𝑖∫_𝒑 ‖𝒑𝑠 − 𝒑‖𝑓(𝒑𝑠)𝑑𝒑𝒔 𝑠∈𝕋𝑖,𝑗 (2-35) ∀𝑖 ∈ {1,2, … , 𝑛}, ∫ ‖𝒑_𝑠 − 𝒑_𝑖+1,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋𝑖,𝑗 ≤ ∫ ‖𝒑_𝑠 − 𝒑_𝑖,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋𝑖,𝑗 (2-36) 𝑔(𝑁_𝑖+1) ≤ ∑ ∫ ‖𝒑_𝑠 − 𝒑_0,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋0,𝑗 𝑚 𝑗=1 + ∑ ∫ ‖𝒑_𝑠 − 𝒑_𝑖+1,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑𝑠∈𝕋𝑖,𝑗 𝑛 𝑗=1 (2-37)

(27)

27 ≤ ∑ ∫ ‖𝒑_𝑠 − 𝒑_0,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋_0,𝑗 𝑚 𝑗=1 + ∑ ∫ ‖𝒑_𝑠 − 𝒑_𝑖,𝑗‖𝑓(𝒑_𝑠)𝑑𝒑_𝒔 𝒑_𝑠∈𝕋_𝑖,𝑗 𝑛 𝑗=1 = 𝑔(𝑁_𝑖) (2-38)

The equality holds if and only if 𝑁_𝑖 is an opposite-center points set. The evaluation value 𝑔(𝑁_𝑖) of 𝑁_𝑖 converges since 𝑔(𝑁_𝑖) is obviously lower bounded (both the norm function and the probability density function are non-negative). Since 𝑔(𝑁_𝑖+1) = 𝑔(𝑁_𝑖) if and only if 𝑁_𝑖 is an opposite-center points set, the array of points set 𝑁_𝑖 converges to an opposite-center points set, theorem proved.

Since our goal is to find points set 𝑁̌ such that 𝑁̌ = arg min

𝑁 𝑔(𝑁), the theorem

below is given. It illustrates the relationship between the best choice of points set and the opposite-center points set in the sense of evaluation function.

Theorem: If 𝑁̌ = arg min

𝑁 𝑔(𝑁), then 𝑁̌ is an opposite-center points set of the

original points set 𝑁. Proof:

Because the scheme shown in Figure 2.5 is monotonically decreasing and converge to an opposite-center points set. So, if 𝑁̌ = arg min

𝑁 𝑔(𝑁) is not an opposite-center points

set, the next iteration of input 𝑁̌ , 𝑁̌₊₁, will lead to 𝑔(𝑁̌₊₁) ≤ 𝑔(𝑁̌). Since 𝑁̌ is not an opposite-center points set, the equality will not hold so 𝑔(𝑁̌₊₁) < 𝑔(𝑁̌), contradict to 𝑁̌ = arg min

𝑁 𝑔(𝑁). So 𝑁̌ = arg min𝑁 𝑔(𝑁) must be an opposite-center points set.

(28)

28

3 Numerical experiments of OCL scheme: sampling

3.1 1-1 OCL Verification in one dimension

The OCL scheme of Section 2 is tested by applying it to a simple sampling prob-lem. We compare OCL with OBL and with random sampling using Euclidean distance and squared distance as metrics and assuming a uniform distribution 𝑓(𝒑_𝑠) between 0 and 1. The set-up of this experiment is identical to that appearing in (Rahnamayan, 2012) for OBL: the locations of the optima are successively fixed at [0, 0.01, 0.02, … , 0.99, 1] and for each location 104 point pairs are sampled with each scheme.

Figure 5 compares the different sampling strategies for two metrics. The value on the vertical axis is the expected distance between the optimum and the closer solution of the pair (see Section 2.1); in the left figure the norm is Euclidean and in the right one the measure is the squared distance. The plots of OBL scheme and random scheme in the left figure are the same as in (Rahnamayan, 2012).

Figure 3.1: Comparison between sampling strategies in one dimension for the Euclidean distance metric (left) and the squared distance metric (right).

From Figure 3.1 it is clear that the Opposite-Center scheme outperforms the ran-dom scheme and the OBL scheme for both metrics. Most significantly, the new scheme yields better results no matter where the optimum is located, except for three points where it is as good as the random scheme (location of optimum at 0.5) and as the OBL scheme

(29)

29

(location of optimum near 0 and 1). This means that OCL will beat OBL and random sampling for most non-uniform distributions as well.

3.2 1-1 OCL Verification in higher dimensions

Next, the scheme is tested in higher dimensions, again the uniform distribution is assumed and the squared distance is used as metric. For each dimension D, 100D random samples of optima were taken and for each optimum 1000D pairs of points were generated by each scheme.

Figure 3.2 compares the three different schemes in higher dimensions. The left figure shows average values of evaluation function (2-2) of generated points. It grows almost linearly with dimension for all three schemes, but the Opposite-Center scheme grows nearly twice as slow as the others. Figure 3.2, right, shows the minimum difference of evaluation values between OCL and the other two schemes, illustrating the super-linear growth of the advantage of OCL with increasing dimensionality. An important result is that OCL always attains smaller expected nearest squared distances than the other two schemes. Thus, it is computationally verified that the Opposite-Center scheme performs better than the random and opposition-based sampling in dimensions up to 20.

Figure 3.2: Performance comparison of Opposite-Center sampling versus Opposition-based sampling and uniformly random sampling up to dimension 20.

(30)

30

3.3 m-n OCL scheme Verification

In this section the m-n OCL scheme is tested in 1~20 dimensions with different populations. The uniform distribution is assumed and the squared distance is used as met-ric. To evaluate the performance of every schemes, m-n evaluation function 𝑔_𝑀(𝑁) = ∫ min

𝒑∈𝑀∪𝑁‖𝒑𝑠 − 𝒑‖ 𝑓(𝒑𝑠)𝑑𝒑𝒔 (2-26) introduced in section 2.4 is used. Noted that to

eval-uate m-n OCL scheme, the size of the set M and N is important. To simplify the problem without loss of genericity, we can only experiment the situation where the size of M and N are equal. Let the size of M be m, size of N be n, so the experiments are about the performance of different schemes 𝑔_𝑀(𝑁) = ∫ min

𝒑∈𝑀∪𝑁‖𝒑𝑠 − 𝒑‖ 𝑓(𝒑𝑠)𝑑𝒑𝒔 with different

𝑚 (𝑚 = 𝑛). For each dimension D, 100𝑚 ∙ 𝐷 random samples of optima were taken and for each optimum 1000D pairs of points were generated by each scheme.

Figure 3.3 Performance comparison of Opposite-Center sampling versus Opposition-based sampling and uniformly random sampling up to dimension 20, in the situation of 𝒎 = 𝟐.

(31)

31

Figure 3.4 Performance comparison of Opposite-Center sampling versus Opposition-based sampling and uniformly random sampling up to dimension 20, in the situation of 𝒎 = 𝟒.

Figure 3.5 Performance comparison of Opposite-Center sampling versus Opposition-based sampling and uniformly random sampling up to dimension 10 (left) and 20 (right), in the situation of 𝒎 = 𝟏𝟎.

(32)

32

Figure 3.6 Performance comparison of Opposite-Center sampling versus Opposition-based sampling and uniformly random sampling up to dimension 20, in the situation of 𝒎 = 𝟐𝟎.

Figure 3.3 to Figure 3.6 show the performance comparison of m-n OCL sampling versus 1-1 OCL sampling, Opposition-based sampling and uniformly random sampling up to dimension 20. In all dimensions, m-n OCL sampling performs better in terms of the evaluation function 𝑔_𝑀(𝑁), which means the points generated by m-n OCL sampling scheme are the better complement in order to get closer to a random point. It grows almost linearly with dimension for all four schemes, but the m-n OCL scheme grows only about 70% as fast as the random and Opposition-based scheme.

It can also be found that with the increase of batch size 𝑚, the advantages of Op-position-based scheme over the uniformly random scheme disappear, the result becomes almost same when 𝑚 > 10, but m-n OCL scheme keeps the advantages in all dimensions and with all size of batch. The result of 1-1 OCL scheme is interesting: 1-1 OCL scheme does not perform as good as Opposition-based scheme and uniform random scheme in the lower dimension when the batch size 𝑚 is big, but with the dimension increases, at a certain point, 1-1 OCL scheme outperforms those two schemes and the difference in-creases with the further increase of dimensionality. Two figures are used to illustrate this

(33)

33

finding. The m-n OCL scheme always performs better than other three schemes and the difference increases in higher dimensions.

(34)

34

4 Application to population-based heuristics

From the definition of OCL it is clear that differentiability and continuity of the objective function are not required. This makes this method very suitable for application to a great variety of heuristic approaches to intractable optimization problems (Rothlauf, 2011). Opposite-Center Learning can be embedded in population-based stochastic search methods by applying the steps described in Sections 4.1 and 4.2 based on (Rahnamayan, 2006).

4.1 Population initialization

By utilizing OCL a better initial population can be created: it is spread more evenly over the search space, so that more information is collected. The following steps describe the procedure:

 Step 1: Generate 𝑛₀ =𝑁𝑃

2 random points 𝒑𝑖 (𝑖 = 1,2, … , 𝑁𝑃

2), where 𝑁𝑃 is the

pop-ulation size.

 Step 2: For each random point 𝒑_𝑖 use the OCL scheme to generate opposite-center points 𝒑_𝑖𝑂𝐶 (𝑖 = 1,2, … ,𝑁𝑃

2 ).

 Step 3: Assemble all points 𝒑_𝑖 and 𝒑_𝑖𝑂𝐶 to create the initial population.

This scheme requires no other property of the fitness function except the prior knowledge of the (approximate) optimum distribution, which may be assumed uniform if it is unknown.

4.2 Generation jumping

The generation jumping scheme takes place after the variation operations in each generation with a probability 𝐽𝑅 (i.e. jumping rate) in the following way:

 Step 1: For 𝜏 ⋅ 𝑁_𝑃 points randomly chosen from the population, apply the OCL scheme to generate 𝜏 ⋅ 𝑁_𝑃 opposite-center points. Here 𝜏 ∈ (0,1) is the exploration rate.

 Step 2: Evaluate fitness of these extra 𝜏 ⋅ 𝑁𝑃 opposite-center points.

 Step 3: Select 𝑁_𝑃 points out of the total (1 + 𝜏) ⋅ 𝑁_𝑃 points. This method does not constrain the choice for selection scheme.

(35)

35

The values of 𝐽𝑅 and 𝜏 influence the trade-off between exploration and exploita-tion. The value to choose will be discussed later in 4.4.

4.3 Opposite-Center Differential Evolution (OCDE)

Similar to ODE (Rahnamayan, 2008), OCL and m-n OCL scheme can be applied to differential evolution in two parts: population initialization and generation evolution. It is important to stress that the feasible region where the OCL works changes each gen-eration. Instead of using the original region [𝑎_𝑖, 𝑏_𝑖] ⊂ ℝ, ∀𝑖 ∈ {1,2, ⋯ 𝐷}, which is used to form the initial population, the OCL scheme operates in adapted regions [𝑚𝑖𝑛_𝑔,𝑖, 𝑚𝑎𝑥_𝑔,𝑖] ⊂ ℝ, ∀𝑖 ∈ {1,2, ⋯ 𝐷}, where 𝑚𝑖𝑛_𝑔,𝑖 and 𝑚𝑎𝑥_𝑔,𝑖 are the minimal and maximal solution values appearing in the population at generation g >1 and in dimension 𝑖 . When OCL is applied to DE, it becomes Opposite-Center Differential Evolution (OCDE). And when m-n OCL is applied to DE, it becomes m-n Opposite-Center Differ-ential Evolution (MNOCDE).

4.4 Investigation of Jumping Rate and Exploration Rate of OCDE

4.4.1 Experiments set-up

A set of benchmark functions listed in Table 4.1 is used to test the OCDE algorithm outlined in 4.3. The number of function calls (NFC) and the success rate (SR) are com-pared in order to investigate convergence speed and effectiveness. The success rate is defined as the percentage of runs in which the value-to-reach VTR is obtained within the set maximum number of function calls 𝑁𝐹𝐶_{𝑙𝑖𝑚𝑖𝑡}. Model parameters other than jumping rate (JR) and exploration rate 𝜏 are the same for all experiments, these are common values used in the references, as listed below.

 Population size, 𝑁_𝑝 = 10 ⋅ 𝐷 (Brest, 2006)

(36)

36  Crossover rate, 𝐶𝑅 = 0.9 (Storn, 1997)

 Mutation strategy, DE/rand/1/bin (Storn, 1997)  Limit of NFC, 𝑁𝐹𝐶_{𝑙𝑖𝑚𝑖𝑡} = 106

 Value to reach, 𝑉𝑇𝑅 = 10−8_{(Suganthan, 2005)}

In order to maintain a reliable and fair comparison, for all conducted experiments we use an average result of 100 independent runs. Extra fitness evaluations of OCDE and MNOCDE are counted.

Table 4.1 Definitions of benchmark functions used for parameter tuning

name definition 𝐟(𝐱)

Shifted Sphere ∑𝐷𝑖=1𝑧𝑖2, 𝒙 = 𝒛 + 𝒐, where 𝒐 is the shift vector (bellows are the same)

Shifted Ackley −20 𝑒𝑥𝑝 (−0.2√∑𝐷𝑖=1𝑧𝑖2

𝐷 ) − 𝑒𝑥𝑝 (

∑𝐷_𝑖=1𝑐𝑜𝑠(2𝜋𝑧_𝑖)

𝐷 ) + 20 + 𝑒, 𝒙 = 𝒛 + 𝒐 Shifted Rosenbrock ∑𝐷−1𝑖=1[100(𝑧𝑖+1− 𝑧𝑖2)2+ (1 − 𝑧𝑖)2], 𝒙 = 𝒛 + 𝒐

Shifted Schwefel 418.9829𝐷 + ∑𝐷 (𝑧𝑖sin(√|𝑧𝑖|))

𝑖=1 , 𝒙 = 𝒛 + 𝒐

4.4.2 Result of the 1-1 OCDE experiments

First, the effect of parameters has on the number of function calls (NFC) is re-searched. One of the parameter is fixed to see the effect of another. Figure 4.1 to Figure 4.3 shows the average NFC of OCDE on 10D fitness functions with 𝜏 fixed to 0/0.2/0.4/0.6/0.8/1.0. 𝜏 = 0 simply means that the OCL scheme is not applied so it is a pure DE as baseline. We can see from the figure that with jumping rate (JR) decreases (the reciprocal of JR increases), the average NFC increases. And we also learned that larger 𝜏 yield better result. So if only judge by NFC, we can get to the conclusion that larger 𝜏 and more frequently jumping will lead to better result (less NFC).

(37)

37

Figure 4.1 Average Number of Function Calls (NFC) of OCDE under 10D Sphere Function with fixed explo-ration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

Figure 4.2 Average Number of Function Calls (NFC) of OCDE under 10D Ackley Function with fixed explo-ration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

(38)

38

Figure 4.3 Average Number of Function Calls (NFC) of OCDE under 10D Schwefel Function with fixed ex-ploration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

The aim of these experiments is to find the best 𝜏 and JR for the OCL algorithm. That is, in this situation, to achieve SR as good as possible and in the meanwhile choose the 𝜏 and JR which can yield lowest NFC in all functions. So NFC is not the sole target we are aiming at. In most optimization problems, the success rate (SR) is more important than NFC. In the figures above, SR are also indicated, green filled square means the SR is 100% and red circle means the SR is less than 100% but not zero. The combination which yields no success runs lead to a missing point in the figures. With figures above we can simply find the lowest green square in all functions. The figures can give us gen-eral impression of the result, but to achieve the best combination more accurately, data table are to be investigated.

The results are presenting in Table 4.2 to Table 4.5 showing the average number of function calls (NFC) of OCDE under 10D different fitness functions with different exploration rates (𝜏) and jumping rate (JR). The detailed success rate (SR) data of OCDE under 10D fitness functions with different exploration rates (𝜏) and jumping rate (JR) are shown in the APPENDIX. The “NaN” in the tables shows that with corresponding 𝜏 and

(39)

39

JR, there is no success run during the trials, which colored red in both NFC and SR tables. And the yellow colored cell (with single underline) shows that the SR with corresponding 𝜏 and JR is not 100%. For the other cells (corresponding to 100% SR) the greener they are, the lower NFC they have (behave better).

Table 4.2 Average number of function calls (NFC) of OCDE optimizing 10D Sphere Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 18270 17950 17940 17990 18230 17910 18010 17760 18360 18110 0.1 11479 12517 13512 13638 15007 15134 15802 16396 15596 15592 0.2 10310 10822 11788 12666 13520 14244 14704 14774 15098 15118 0.3 11817 10665 11786 12612 13010 14064 13438 13611 14320 14115 0.4 12575 10614 11558 12672 12208 13702 14494 13992 15132 14530 0.5 17800 12130 11540 11680 12130 13465 13385 13485 12570 15055 0.6 NaN 12708 11720 11868 11982 12778 14060 14142 13548 13682 0.7 20520 14898 11757 15042 13547 13531 13458 12560 13044 13643 0.8 NaN 16051 13453 13062 12232 13338 13214 13244 15022 13630 0.9 NaN 19967 14318 14381 13244 14542 13615 13574 13417 14213 1 NaN 19467 17933 15550 16590 14590 13040 19070 13990 15170 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.3 Average number of function calls (NFC) of OCDE optimizing 10D Ackley Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 44590 43900 44330 43630 43720 43890 43900 43930 44140 43680 0.1 25373 29580 32488 33907 36066 37013 37168 37589 38599 39240 0.2 23010 26922 28464 30220 32664 33234 34118 36000 35830 35382 0.3 26343 25138 27045 29285 30825 30992 32771 33006 33696 35806 0.4 41040 24998 26742 27748 29636 31054 30484 31596 34180 33416 0.5 NaN 27110 27420 27090 27515 29470 31330 31625 31625 33370 0.6 NaN 28697 26684 28634 29682 30648 30340 31868 34516 33060 0.7 NaN 34445 28046 28822 27330 29318 30317 33223 33054 32868 0.8 NaN NaN 28650 29482 30016 29476 31660 33092 34012 33780 0.9 NaN 35653 33960 32916 30829 31971 33001 30935 31790 33067 1 NaN 39200 42500 38844 32980 37750 32510 32600 34422 34070 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.4 Average number of function calls (NFC) of OCDE optimizing 10D Rosenbrock Function with dif-ferent τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50

τ = 0 34680 35830 35910 36400 35830 35660 35070 36510 35440 35220 0.1 25755 29055 30258 31664 31606 32191 32613 33814 34283 34666 0.2 27030 25974 28484 29598 28930 29680 29732 30986 31560 32374 0.3 22620 23920 27633 27147 29131 29891 29013 29532 30341 31749

(40)

40 0.4 26360 29007 28714 28484 27760 27212 28112 30182 30830 30796 0.5 NaN 26940 29717 25185 26072 26975 28990 30415 30090 28755 0.6 NaN NaN 25927 32874 29298 30293 30128 32062 30170 27742 0.7 NaN NaN 32908 30933 29406 29917 29494 30243 29454 32387 0.8 NaN NaN 23920 30645 38323 28260 31514 28006 30324 29948 0.9 NaN NaN NaN 25310 26878 27640 29774 29802 29561 31213 1 NaN NaN 24567 29600 35700 31113 28017 31360 27100 30570

Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.5 Average number of function calls (NFC) of OCDE optimizing 10D Schwefel Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 25560 25390 25660 25630 25550 25350 25650 25680 25610 25100 0.1 15081 17909 18632 20194 21084 21276 21836 21248 21683 23052 0.2 13544 15368 16430 18454 18254 18942 19898 21164 20850 21292 0.3 16341 14961 16257 16305 18348 19503 18979 19126 20190 21177 0.4 16212 16022 15694 16572 17446 17986 19150 19442 18780 20098 0.5 22350 16920 15500 16870 17700 16865 17950 18020 18960 20640 0.6 22300 17806 15614 15954 16360 18252 17966 18360 18078 19628 0.7 NaN 20118 16455 16216 17523 17673 18288 19366 19122 18917 0.8 NaN 25420 18869 19938 17540 17638 18472 18990 19894 19502 0.9 NaN 34720 25579 18700 17801 18409 17639 20242 19657 17805 1 NaN 21833 19186 22000 19200 21240 21880 19290 19090 21490 Higher NFC Lower NFC 0%<SR<100% SR = 0%

From the tables above we can see that in the 10D situation, bigger 𝜏 and JR means less NFC but also low SR at the same time. We would like to choose the combination which has lower NFC and SR = 100%, so 𝜏 = 0.2 and 1/𝐽𝑅 = 10 would make a choice which is both computationally efficient and reliable.

The results presented in Table 4.6 to Table 4.9 show the average number of func-tion calls (NFC) of OCDE under 30D different fitness funcfunc-tions with different explorafunc-tion rates (𝜏) and jumping rate (JR). The detailed success rate (SR) data of OCDE under 30D functions with different exploration rates (𝜏) and jumping rate (JR) are shown in the AP-PENDIX. The “NaN” in the tables shows that with corresponding 𝜏 and JR, there is no success run during the trials, which colored red in both NFC and SR tables. And the yel-low colored cell (with single underline) shows that the SR with corresponding 𝜏 and JR is not 100%. For the other cells (corresponding to 100% SR) the greener they are, the lower NFC they have (behave better).

(41)

41

Table 4.6 Average number of function calls (NFC) of OCDE optimizing 30D Sphere Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 293430 298470 294420 295740 297420 293430 297450 297690 293820 293400 0.1 69324 81306 89502 97980 101361 109848 114435 133326 139131 135414 0.2 87882 74880 81360 86484 92376 101400 97620 121320 113040 122232 0.3 133697 83322 80496 84531 89172 96606 102312 101967 119139 108663 0.4 244410 96750 84612 87594 90648 98226 102354 106770 120840 125154 0.5 NaN 116700 99285 93270 96165 94350 96555 106665 114900 124440 0.6 NaN 143958 102708 93906 94938 98316 103026 110658 119058 117282 0.7 NaN 175436 115221 100611 104922 112842 97425 101313 111474 109119 0.8 NaN 203783 140982 108384 103536 100914 117648 107412 110532 115422 0.9 NaN 226496 170061 134202 126510 123723 115209 121251 120195 123633 1 NaN NaN 230700 247267 185880 146670 136470 149280 149130 139290 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.7 Average number of function calls (NFC) of OCDE optimizing 30D Ackley Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 697620 689640 692820 694200 688320 697440 690930 690210 695610 695190 0.1 149424 169701 184194 219342 221928 237765 239352 266214 284241 298347 0.2 193820 164214 169368 187644 204912 221490 228738 236424 234366 254676 0.3 264193 174702 172953 191205 188277 211626 211554 227823 228606 237405 0.4 374880 206262 189810 184644 193253 201516 201912 220662 241710 238398 0.5 NaN 258793 194400 194505 201585 207735 214545 227385 218970 229920 0.6 NaN 298586 225228 205068 210168 209454 210354 236544 245940 257484 0.7 NaN 304344 240960 222738 215130 226473 220470 239663 242886 244392 0.8 NaN 456930 278307 249414 230773 238812 235242 231876 251958 247836 0.9 NaN NaN 441150 274038 258429 248547 249798 265607 256393 253443 1 NaN NaN 568200 411200 366986 308100 283367 300567 289575 282733 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.8 Average number of function calls (NFC) of OCDE optimizing 30D Rosenbrock Function with dif-ferent τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 691110 714630 694680 709530 709650 706410 704250 710940 699960 705660 0.1 565549 263259 253428 252036 260442 278745 280464 280908 313887 296379 0.2 NaN 535128 328686 261588 276696 261306 271224 272430 283164 272616 0.3 NaN NaN 633108 360987 314553 283779 295098 277059 283725 297336 0.4 NaN NaN 842550 437313 367464 377478 284346 304488 301326 295356 0.5 NaN NaN NaN 717630 514725 430470 374790 334725 328920 312705 0.6 NaN NaN NaN 714120 557720 452573 507534 355896 350712 327888 0.7 NaN NaN NaN NaN 715920 507555 487755 386988 367470 357729 0.8 NaN NaN NaN NaN NaN 659604 470646 449754 456906 369756 0.9 NaN NaN NaN NaN 705120 710730 671655 595117 410022 450921

(42)

42

1 NaN NaN NaN NaN NaN NaN 746700 867900 529500 628350

Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.9 Average number of function calls (NFC) of OCDE optimizing 30D Schwefel Function with different τ and jumping rate (JR)

NFC 1/JR = 5 10 15 20 25 30 35 40 45 50 τ = 0 412380 410700 408510 409800 411930 409800 410430 412890 405030 412500 0.1 91050 101601 116409 127665 135318 146175 159642 168978 161061 181797 0.2 121518 101838 109656 118044 117960 128322 146868 140244 155070 160626 0.3 181635 113061 109893 109296 119931 122892 126999 133569 151194 158781 0.4 259830 126162 114066 114732 119838 126300 128928 145134 144090 149976 0.5 374400 163680 121890 116970 127140 128280 140625 145590 147855 156840 0.6 NaN 185347 136422 132270 127926 133080 140790 144906 140760 169284 0.7 NaN 197535 169362 136374 132579 135567 132891 140796 150759 149307 0.8 NaN 229894 175128 156378 141150 150570 146232 142896 157878 154428 0.9 NaN 325725 265842 165555 160254 152784 153918 152286 159837 156165 1 NaN NaN 377475 278790 234030 186300 214020 177330 195450 181200 Higher NFC Lower NFC 0%<SR<100% SR = 0%

From the tables above we can see that in the 30D situation, bigger 𝜏 and JR means less NFC but also low SR at the same time. We would like to choose the combination which has less NFC and 100%SR. I finally choose 𝜏 = 0.2 and 1/𝐽𝑅 = 30, which would make a choice which is both computationally efficient and reliable. Combine with the parameters chosen in 10D situation, 𝜏 = 0.2 and 𝐽𝑅 = 1/𝐷 is the combination recom-mended and will be used in the following comparison with other algorithms.

4.4.3 Result of the MNOCDE experiments

Similar to the OCDE experiments, the effect of parameters has on the number of function calls (NFC) is researched. One of the parameter is fixed to see the effect of an-other. Figure 4.1 and Figure 4.2 shows the average NFC of 10D fitness functions with 𝜏 fixed to 0/0.2/0.4/0.6/0.8/1.0. 𝜏 = 0 simply means that the mnOCL scheme is not applied so it is a pure DE as baseline. First, we happily saw that in dimension 10 with m-n OCL applied, all the trials are success, so the NFC could be the only factor to be considered here. We can see from the figure that with jumping rate (JR) decreases (the reciprocal of

(43)

43

JR increases), the average NFC increases. And we also learned that, opposite to 1-1 OCL scheme, in dimension 10 MNOCDE smaller 𝜏 yield better result. So we can get to the conclusion that smaller 𝜏 and more frequently jumping will lead to better result (lower NFC).

Figure 4.4 Average Number of Function Calls (NFC) of MNOCDE on 10D Ackley Function with fixed explo-ration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

(44)

44

Figure 4.5 Average Number of Function Calls (NFC) of MNOCDE on 10D Rosenbrock Function with fixed exploration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

Figure 4.6 Average Number of Function Calls (NFC) of MNOCDE on 10D Schwefel Function with fixed exploration rate τ and different jumping rate (JR), Success rate (SR) is indicated by the symbol color fill. 10 trials per combination of τ and JR in this experiment.

(45)

45

The details of the experiments are in Table 4.10 to Table 4.13, the cells with color greener are the combinations which yield better results (lower NFC). And the yellow col-ored cell (with single underline) shows that the SR with corresponding 𝜏 and JR is not 100% (not happened in 10D MNOCDE situation). We can find that in all fitness functions other than Rosenbrock, 1/JR = 1 while 𝜏 = 0.1~0.3 could get us to the best results.

Table 4.10 Average number of function calls (NFC) of MNOCDE on 10D Sphere Function with different τ and jumping rate (JR)

NFC 1/JR = 1 2 3 4 5 6 7 8 9 10 τ = 0 17910 17940 17610 18070 18070 17740 18170 18120 17980 18190 0.1 15446 16409 16951 17591 17805 17412 17762 17718 17821 17452 0.2 15388 16914 17100 17498 17492 18058 17674 17668 17728 17716 0.3 15733 17047 17423 17675 17781 17798 17498 17676 17649 17873 0.4 15692 17440 17768 17740 17974 17870 17832 17902 17616 17838 0.5 16240 17555 17920 18385 18010 18035 18040 17935 18185 18120 0.6 16436 18046 18468 18380 18402 18404 18272 18212 18370 18564 0.7 17072 18187 18782 18410 18308 18212 18298 18613 18459 18126 0.8 17554 18690 18410 18982 18496 18736 18818 18630 18396 18350 0.9 18058 19203 18782 19052 19174 18818 18607 18883 18258 18872 1 18410 18920 19460 18820 19390 18500 19020 19100 18480 18230 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.11 Average number of function calls (NFC) of MNOCDE on 10D Ackley Function with different τ and jumping rate (JR)

NFC 1/JR = 1 2 3 4 5 6 7 8 9 10 τ = 0 44280 44480 43910 43760 43960 44210 44160 44160 44140 43760 0.1 31429 36981 39253 40293 40913 41400 41984 42045 42110 42769 0.2 30716 37562 39474 41072 41792 41952 42046 42542 42818 43434 0.3 30542 37575 40650 41630 41947 42987 42830 43328 43211 43226 0.4 31280 37794 41186 42188 42940 43686 43132 43214 43718 43830 0.5 31400 39430 41770 43425 43255 44055 44045 44480 43855 43885 0.6 32568 40142 42466 43734 44738 44912 44954 44712 44814 44702 0.7 32908 40878 43799 44843 45053 44618 45331 44645 45095 44751 0.8 34346 42296 44246 45780 45760 45690 45606 45474 45398 45404 0.9 35369 43414 45292 45752 46639 46084 46075 46207 45622 45416 1 36120 44030 46160 46400 46950 47080 47020 46670 46180 46050 Higher NFC Lower NFC 0%<SR<100% SR = 0%

Table 4.12 Average number of function calls (NFC) of MNOCDE on 10D Rosenbrock Function with different τ and jumping rate (JR)

NFC 1/JR = 1 2 3 4 5 6 7 8 9 10

τ = 0 35410 35840 35630 36210 35500 35630 36210 35390 34560 36180 0.1 36852 36066 36829 36232 35905 35834 35269 35885 34610 35195 0.2 41684 37822 38154 36822 36440 36946 36228 36588 36342 36120

OPPOSITE-CENTER LEARNING AND ITS APPLICATION TO DIFFERENTIAL EVOLUTION