Index of /SISTA/sgunel

(1)

Global Optimization with Coupled Local Minimizers

Excited by Gaussian White Noise

Serkan G¨unel, Johan A. K. Suykens, and Joos P. L. Vandewalle

Katholieke Universiteit Leuven, ESAT-SCD/SISTA

Kasteelpark Arenberg 10, B-3001 Leuven (Heverlee), Belgium Email:{serkan.gunel,johan.suykens,joos.vandewalle}@esat.kuleuven.be

Abstract—In this paper, Stochastic Coupled Local Minimizers

(SCLMs) are presented for global optimization of smooth cost functions. This extends the deterministic coupled local minim-izers (CLMs) whose best performing elements impose their dy-namics to others by means of master–slave synchronization. The elements of the CLM network are excited by Gaussian white noise. An adaptive cooling schedule that effectively decouples the noise sources from the network when a solution candidate is agreed among all CLMs is proposed. Examples are given to illus-trate the proposed scheme.

1. Introduction

In many optimization problems, global exploration of the search space is required. However, an intelligent way of exploration is needed, since exhaustive searches are usu-ally too expensive for applications of interest. Many pop-ular methods for global optimization incorporate random-ness to achieve global exploration of the search space either in assigning new candidate solutions randomly (e.g. sim-ulated annealing algorithms [1], genetic algorithms [2]) or starting from multiple random solutions with a prior distri-bution of initial candidates. Exhaustive search is avoided by decreasing the randomness gradually (i.e. cooling in simulated annealers, decreasing mutation rate in genetic algorithms), and combining good candidates of solutions such as crossover operations in genetic algorithms. Satis-factory exploration of the search space can be done with processes running in parallel, and the best solutions of the parallel running processes are the most likely candidates of the global optima.

Recently, Suykens et al. proposed a new scheme which depends on coupling of local optimization processes in or-der to achieve global optimization of smooth functions, and showed that this approach performs better than con-ventional multi–start local optimizations consisting of in-dependent runs [3, 4]. The optimization is done by con-sidering the augmented Lagrangian in terms of the aver-age ensemble energy of the individual local minimizers, and the hard and soft constraints of the synchronization. This augmented Lagrangian leads to a set of coupled Lag-range programming networks [5] which exchange inform-ation via the coupling. The coupling strength and learning rates of the Coupled Local Minimizers (CLM) are adapted to achieve maximal decrease in the cost, which leads to co-operative behavior. The approach has been tested e.g. with

the optimization of Lennard-Jones Clusters, and supervised training of neural networks [3], and the CLM scheme is well suited to a large family of continuous optimization problems.

If all CLMs happen to be at the basin of a local minimum initially, the optimizations might end up in the associated local minimum in some cases, resulting in poor explora-tion of the search space. Previously, chaotic signals have been proposed as inputs to CLMs in order to improve per-formance in such situations [6]. In this paper, the injection of white Gaussian noise to several elements of the CLM network has been proposed to improve the overall optimiz-ation performance.

We also propose a cooling scheme for changing the noise amplitude, by letting it to be constant and proportional to the energy of the costates of the Lagrangian network (i.e. Lagrange multipliers of the augmented Lagrangian) between the epochs of coupling adaptations of the system. The costates contain information about the pairwise chronization errors. When the network is away from syn-chronization, the energy of the costate vector is large and the elements excited by Gaussian white noise are free to explore the search space. When the network tends to syn-chronize, the noise amplitude is decreased when the energy of the costates decreases.

This paper is organized as follows. In Section 2 the de-terministic CLM approach to global optimization problem is revisited. The stochastic CLMs excited with Gaussian white noise are presented in the Section 3. In Section 4 the numerical scheme is given followed by two examples to illustrate the usefulness of the proposed scheme.

2. Deterministic Coupled Local Minimizers 2.1. Basic Formulation

The minimization problem of a twice differentiable

func-tion, U : Rn _{→ R is recast into following form when q}

local minimizers with states x(i)

∈ Rn_{, i = 1, . . . , q are} con-sidered [3], min x(i)_∈Rn q X i=1

U(x(i)_{), subject to x}(i)

− x(i+1)_{= 0, i = 1, . . . , q} ₍₁₎

with boundary conditions x(0) _{= x}(q)_{, x}(q+1) _{= x}(1)_{. The}

associated augmented Lagrangian of the problem (1), with soft and hard constraints for the pairwise synchronization

(2)

of each local minimizer, is then L(x(i), λ(i)) =η q q X i=1 U(x(i))+1 2 q X i=1 γi x (i) − x(i+1) 2 2+ q X i=1 D

λ(i), x(i)− x(i+1)E

(2)

where λ(i)_{∈ R}n, i = 1, 2, . . . , q are the Lagrange multipliers

and γi are the weights for soft constraints [3]. The

Lag-range programming network [5] for optimization w.r.t. this Lagrangian forms the following CLM network

˙x(i)₌₋η q∇x(i)U(x (i)_{) + γ} i−1 x(i−1)_{− x}(i) − γi x(i)_{− x}(i+1) +λ(i−1)_{− λ}(i) ˙λ(i)_{= x}(i) − x(i+1), i = 1, 2, . . . , q (3)

where η > 0 is the learning rate.

2.2. Optimal Cooperative Behavior

To ensure optimal cooperative behavior of the CLMs, the

coupling weights are adapted at each time interval k∆T _≤

t < (k + 1)∆T, k = 0, 1, . . . by solving the linear program

min γ∈Rq q X i=1 dU(x(i)₎ dt _x,λ subject to γ_{≤ γ}i≤ γ (4)

where γ = _{γ1, γ2, . . . , γq}T, γ and γ are user specified

upper and lower bounds, respectively [3]. The coupling weights are kept constant in between intervals. At each interval, the step size η is determined by

η = qX

i

D

∂U(x(i)_)/∂x(i)_{, h}E

+ qαX

i

U(x(i)_)/(X

i

D

∂U(x(i)_)/∂x(i)_{, ∂U(x}(i)_)/∂x(i)E (5)

where h, γi−1

x(i−1)_{− x}(i)_{− γ}i

x(i)_{− x}(i+1)+λ(i−1)_{− λ}(i)

to impose exponential decrease of the cost function. α is a positive constant and η is also subject to additional user

defined constraints η_{≤ η ≤ η. U(·) is assumed to be}

pos-itive everywhere (if not one adds a constant value to the cost to guarantee this). The difference between the

coup-ling constants γi results in master–slave type

synchroniz-ation dynamics, hence the successful local optimizers im-pose their dynamics on the others.

Note that only pairwise synchronization constraints are considered in (3). The possibility of more complex inter-actions and use of inequality constraints instead of equality constraints are also discussed in [4]. It has been shown that given the initial distribution of the states, the CLMs provide good optimization performance at roughly q times the com-putational cost of standard gradient descent based methods. However, if the initial distribution of the states are such that all states are contained in a basin of a local minimum, the network may converge to the local minimum, without ex-ploring the search space for the other possibilities. This problem can be solved by applying white noise as inputs into some of the local optimizers. In this case, some of the local optimizers are continuous–time stochastic annealers instead of Lagrange programming networks.

3. Stochastic Coupled Local Minimizers

3.1. Excitation of CLMs with Gaussian white noise

The evolution equations for the CLMs with Gaussian white noise proposed in this paper is given by the following set of stochastic differential equations

dx(i)=₋η q∇x(i)U(x (i) ) dt + γi−1 x(i−1)_{− x}(i)dt − γi x(i) − x(i+1) dt + (λ(i−1)_{− λ}(i)_{) dt}

+ G(i)(t, x(i), λ(i)) dw(i) dλ(i)_{= (x}(i)

− x(i+1)_{) dt,} _{i = 1, 2, . . . , q}

(6)

where w(i) ₌ _(ω(i)

1 , ω (i) 2, . . . , ω

(i)

n)T are n-dimensional

Wiener processes, with ω(i)_k , k = 1, . . . , n are

inde-pendent 1–dimensional Wiener processes. Although the Wiener processes are nowhere differentiable, dw/ dt can be considered as the white noise for notational purposes.

G(i): R+_{× R}2n _{→ R}n×n are the noise coupling matrices.

Note that, if ∂G( j)/∂x(i) ₌ _∂G( j)_/∂λ(i) _{= 0, j}

∈ J ⊂ {1, 2, · · · , q}, i = 1, 2, ...q, and G(i)(_{·) = 0, i < J where J} is the index set of stochastic cells, then the individual local optimizers without coupling take the form,

dx( j)(t) =₋η

q∇x( j)U(x ( j)

(t)) dt + G( j)(t) dw( j), j_{∈ J} (7)

which are considered to be continuous–time annealing

pro-cesses with a suitable choice of cooling as G( j)(t) _{→ 0 as}

t_{→ ∞. If G}( j)(t) are diagonal and G( j)_{k k}(t) = C/ log(t), k =

1, . . . , n with constant C depending only on U(_{·), the}

prob-ability distribution of the states is known to be convergent with probability one and is concentrated on the global

op-tima of U(_{·) as t → ∞ [7, 8]. However, since the optima}

are achieved only as t _{→ ∞, the ability of such processes}

in real world problems might be limited.

3.2. Determination of the Noise Coupling Matrix

The combination of continuous–time annealers with the deterministic local minimizers results in better exploration of the search space together with above mentioned advant-ages of CLMs, if the cooling schedule is selected properly. Initially, the noise amplitude must be large and the dynam-ics have to be dominated by noise to let the elements of the network be excited by the white Gaussian noise exploring the search space almost independent of other elements. An obvious choice is to use the same cooling schedules as in continuous–time simulated annealing algorithms [9]. How-ever, when the group dynamics tend to a candidate min-imum in average, the exploration of the noisy cells has to be canceled. The whole network tends to minimize the av-erage cost and keeping individual CLMs together because of the synchronization constraints. A minimum is achieved when all local minimizers agree (i.e. synchronization con-straints are satisfied) and the noise amplitude is zero.

The required information for this is already contained in

the costate λ. Defining u(i)_(k)_{, λ}(i)

(3)

and e(i)_j _{, x}(i)_j _{− x}(i+1)_j , i = 1, 2, . . . , q, j = 1, 2, . . . , n and k = 0, 1, . . . , q X i=1 D u(i)_{(k), u}(i)_(k)E = q X i=1 Z k∆T (k−1)∆T e(i)T_{(t) dt} · Z k∆T (k−1)∆T e(i)_{(t) dt. (8)}

By the mean value theorem one has, Rk∆T

(k−1)∆Te (i) j (t) dt = ∆T e(i)_j (tj), tj∈ [(k − 1)∆T, k∆T ). Hence, 1 qn q X i=1 D u(i)_{(k), u}(i)_(k)E =∆T qn q X i=1 n X j=1 e(i)_j 2(tj). (9)

If ∆T is sufficiently small, the right hand side of Eq. (9) can be viewed as the average energy of the synchronization error, which tends to zero if all CLMs agree on a solu-tion candidate. The noise is effectively decoupled from the system when all the local optimizers tend to agree on a solution, by making use of this and specifically select-ing the noise couplselect-ing matrices constant in the interval as

k∆T _{≤ t < (k + 1)∆T, k = 0, 1, . . .} G( j) r p(k) = δr,sT nq q X i=1 D u(i)_{(k), u}(i)_(k)E , s ∈ N ⊂ {1, 2..., n}, (10)

where δr,p is the Kronecker delta,_{T > 0. G}( j)r prepresents

the noise gain of the pthwhite noise source, coupled to rth–

state of the jthstochastic cell and_{N is an index set of states,}

with r, p = 1, . . . , n, i, j = 1, . . . , q.

For complex problems,_{T can be changed with time to}

improve global exploration of the search space, e.g. let-tingT (k) = µk

T0,T0 > 0, and 0 < µ ≤ 1. If the states

of the optimizers are not close to each other initially, the noise amplitude is large, hence, even if all optimizers are in the same basin of a local minimum, the system has a chance to explore other possible locations. The adjust-ments to the coupling weights and the learning rate are made as in the deterministic CLM case at each interval. If the stochastic cells happen to be near an optimum that is better than the deterministic cells the network behavior is attracted towards the behavior of the stochastic cells. An overall effect of cooperation that is taking place between the stochastic cells that are trying to explore the space and the deterministic cells that are trying to keep synchronized results in exploration of the search space while allowing localized searches.

4. Numerical Considerations and Proposed Algorithm 4.1. Numerical Aspects

It is important to note that, classical numerical ODE solving schemes fail for the stochastic differential equa-tions (SDE) as in Eq. (6) because of the additional dif-fusion term due to the noise [10]. Many efficient numerical schemes has been introduced to solve SDEs. In the sequel, the basic Euler–Maruyama scheme, i.e.

xk+1= xk+ f (tk, xk)Rdt + √ dtg(tk, xk) R−1 X i=0 ωi, x0= x(0) (11)

has been used for simulation of SDEs of the form dx =

f (t, x) dt + g(t, x) dw, where dt > 0, tk , kRdt, xk ,

x(kRdt), R_{∈ N}+_{and ωi}_{are Gaussian increments with zero}

mean and unit variance (i.e. ωi _{∼ N(0, 1) where N(m, s)}

is the Gaussian density with mean m and variance s). This scheme is chosen since it is easier to implement for high di-mensional systems despite the fact that it is slow compared to more advanced schemes [11].

4.2. Stochastic CLM Algorithm

The overall algorithm that is proposed for global optim-ization is the following

Stochastic CLM Algorithm:

1. Initialization: Determine q, R, dt, ∆T , Tmax, x(i)0, λ (i) 0,T0,J,

N, γi(0), η(0), α, γ, γ, η, η, ǫ, x_j and xj, i = 1, ..., q and

j = 1, ..., n and calculate G(i)r p(0) using Eq. (10).

2. Optimization: For 0_{≤ k ≤ k}max

(a) Simulate Eq. (6) for k∆T _{≤ t < (k + 1)∆T using the} Euler–Maruyama scheme, (b) If x (i) k − x (i) k−1

≤ ǫ, ∀i, then END.

(c) Clip the solutions x_j _{≤ x}(i)_j _{≤ x}j, to limit the search

domain.

(d) Adaptation : Calculate γi(k + 1) by solving (4), η(k)

using (5), and G(i)r,p(k + 1) using (10),

(e) Repeat (a)–(d) until convergence, or (k + 1)∆T = tmax

Note that the output can be obtained from the cells which are not excited with noise for high accuracy.

5. Illustrative Examples

The following examples are chosen from [3] in order to illustrate the usefulness and improvement over determin-istic CLMs.

5.1. Example 1 : Double potential well

Consider the functionU(x) = x4

− 16x2_{+ 5x + 100}_{. The}

actual global minimum is located at x =_{−2.9035, and the}

function has another local minimum at x = 2.7468. Sim-ulations for the deterministic CLM are done by choosing

q = 3, ∆T = 0.2, γ = 0.2, γi(0) = 1, i = 1, 2, 3, γ = 2,

α = 1, η = 1, R = 8, dt = 2 × 10−3, ǫ = 10−3. The initial conditions are chosen randomly. When all the ini-tial conditions are larger than x = 0.1567, the determin-istic CLM network converges to the local minimum. The

stochastic CLM is simulated by settingT0 = 500, µ = 0.99

andJ = {1}. For a typical run, the states, the cost values

during the evaluation and the noise gain G(1)are shown in

Fig. 1(a), (b) and (c), respectively. It can be seen that the

deterministic cells (i.e._{{2, 3}) synchronize at the local}

min-imum, the stochastic cell performs exploration since the noise amplitude is not zero. Even when the global min-imum is reached by the deterministic cells, the stochastic cell continues to explore the space due to the high noise gain, until all cells agree on the solution. Then, the noise

(4)

0 1 2 3 4 5 6 7 8 9 10 −4 −2 0 2 4 0 1 2 3 4 5 6 7 8 9 10 20 40 60 80 100 0 1 2 3 4 5 6 7 8 9 10 0 100 200 300 400 500 x (i ) U (x (i )) G (1 ) (1) (2) (3) (1) (2) (3) (a) t (b) t (c) t

Figure 1: (a) The evaluation of the states x(i)(t) for Example 1, (b) The cost values U(x(i)(t)), (c) The noise gain G(1)(t) for the stochastic cell.

amplitude goes to zero, effectively decoupling the noise from the system. For this illustrative example, thousand runs with the initial conditions distributed randomly in the

interval [_{−5, 5], has been performed for both deterministic}

CLM (_T0 = 0) and SCLM (T0 = 500, µ = 0.99). The

deterministic CLM could achieve the global minimum the at 526 of the runs. The SCLMs converged to the global minimum in 640 of the runs, indicating a clear improve-ment. Note that only the state of the network at the final time is considered. The performance could be improved further easily by adding more deterministic and stochastic cells in the network.

5.2. Example 2

The cost function

U(x) = a 2n n X i=1 x2i+ 8n− 4n         n Y i=1 cos(β1xi) + n Y i=1 cos(β2xi)         (12) where x = (x1x2· · · xn)T, a = 0.01, β1 = 0.2 and β2 = 1

has been optimized in the domain [_{−20, 20]}n_{for n = 50.}

The minimum is known to be located at U(0)=0. Note that Eq. (12) has many local minima that are close to each other and have almost the same cost, hence the conven-tional gradient descent or line search based optimization methods fail, even with multiple starts. The simulation

parameters are selected as q = 25,_{J = {1, 6, 11, 16, 21},}

N = {50}, ∆T = 0.4, γ = 0.1, γ0 = 1 γ = 10, α = 1, η = 10−3_{, η}₀ _{= 1, η = 10}3_{, R = 8, dt = 10}−3_{, ǫ = 10}−4_,

T0 = 105, µ = 0.99. The SCLM algorithm converged to

the global minimum in 433 epochs with in the given accur-acy while deterministic CLM failed to achieve the global minimum with the same settings.

6. Conclusions

In this paper, we have presented stochastic CLMs for global optimization of smooth cost functions. A cooling schedule that depends on the average costate energy has been proposed. It allows global exploration of the search space, and automatic termination of the cooling when a

solution candidate is agreed among the stochastic CLM network. It effectively combines advantages of the determ-inistic CLM schemes with the continuous–time diffusion processes for global optimization of smooth functions.

Acknowledgments: Research supported by the Research Council KUL:

GOA-AMBioRICS, CoE EF/05/006 Optimization in Engineering, Flemish Government: FWO: G.0211.05 (Nonlinear Systems), G.0226.06 (Cooperative Systems) Belgian Federal Science Policy Office IUAP P5/22 (‘Dyn. Sys. and Ctrl.: Computation, Identification and Modeling’) and Turkish Scientific Research and Development Council (TUBITAK).

References

[1] S. Kirkpatrick, C. D. Gelatt, and M. Vecci, “Optimization by simu-lated annealing,” Science, vol. 220, pp. 621–680, 1983.

[2] D. E. Goldberg, Genetic Algorithms in Search, Optimization, and

Machine Learning. Reading, MA: Addison-Wesley, 1989.

[3] J. A. K. Suykens, J. Vandewalle, and B. De Moor, “Intelligence and cooperative search by coupled local minimizers,” Int. J. of

Bifurca-tion and Chaos, vol. 11, pp. 2133–2144, August 2001.

[4] J. A. K. Suykens and J. Vandewalle, “Coupled local minimizers : al-ternative formulations and extensions,” in World Congress on

Com-putational Intelligence, Int. Joint Conf. on Neural Networks (WCCI-IJCNN 2002), (Honolulu), pp. 2039–2043, May 2002.

[5] S. Zhang and A. G. Constantinides, “Lagrange programming neural networks,” IEEE Trans. on Circuits and Systems–II Analogue and

Digital Signal Processing, vol. 39, p. 441, July 1992.

[6] M. E. Yalc¸ın, J. A. K. Suykens, and J. P. L. Vandewalle, Cellular

Neural Networks, Multi–scroll Chaos ans Synchronization.

Singa-pore: World Scientific Publishing, 2005.

[7] S. Geman and C. Hwang, “Diffusions for global optimization,”

SIAM J. of Control and Optimization, vol. 24, pp. 1031–1043,

September 1986.

[8] S. B. Gelfand and S. K. Mitter, “Metropolis–type annealing al-gorithms for global optimization inRd_{,” SIAM J. of Control and}

Optimization, vol. 31, pp. 111–131, January 1993.

[9] H. Cohn and M. Fielding, “Simulated annealing: searching for an optimal temperature schedule,” SIAM J. of Optimization, vol. 9, no. 3, pp. 779–802, 1999.

[10] T. T. Soong, Random Differential Equations in Science and

Engin-eering. New York: Academic Press, 1973.

[11] P. E. Kloeden and E. Platen, Numerical Solution of Stochastic