• No results found

In the case when the coupling between agents is represented only through the common function, we employ the primal-dual algorithm proposed by V ˜u and Condat

N/A
N/A
Protected

Academic year: 2021

Share "In the case when the coupling between agents is represented only through the common function, we employ the primal-dual algorithm proposed by V ˜u and Condat"

Copied!
6
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Multi-agent structured optimization over message-passing architectures with bounded communication delays

Puya Latafat, Panagiotis Patrinos

Abstract— We consider the problem of solving structured convex optimization problems over a network of agents with communication delays. It is assumed that each agent performs its local updates using possibly outdated information from its neighbors under the assumption that the delay with respect to each neighbor is bounded but otherwise arbitrary. The pri- vate objective of each agent is represented by the sum of two possibly nonsmooth functions one of which is composed with a linear mapping. The global optimization problem consists of the aggregate of the local cost functions and a common Lipschitz- differentiable function. In the case when the coupling between agents is represented only through the common function, we employ the primal-dual algorithm proposed by V ˜u and Condat.

In the case when the linear maps introduce additional coupling between agents a new algorithm is developed. In both cases convergence is obtained under a strong convexity assumption.

To the best of our knowledge, this is the first time that this form of delay is analyzed for a primal-dual algorithm in a message-passing local-memory model.

I. INTRODUCTION

In this paper we consider a class of structured optimization problems that can be represented as follows:

minimize

x∈IRn f (x) +

m

X

i=1

gi(xi) + hi(Nix), (1) where x = (x1, . . . , xm), Ni is a linear mapping, hi, gi are proper closed convex (possibly) nonsmooth functions, and f is convex, continuously differentiable with Lipschitz con- tinuous gradient. The goal is to solve (1) over a network of agents through local communications. Each agent is assumed to maintain its own private cost functions giand hi◦Ni, while f and (possibly) the linear mappings Ni represent the cou- pling between the agents. An important challenge in such a network is the assumption that the agents have access to the latest information required for their computations.

Most iterative algorithms for convex optimization can be written as

xk+1= xk− T xk, (2) where the mapping Id − T (Id is the identity operator) has some contractive property resulting in the convergence of the sequence to a zero of T . In distributed optimization the

Puya Latafat1,2; Email: puya.latafat@{kuleuven.be,imtlucca.it}

Panagiotis Patrinos1; Email: panos.patrinos@esat.kuleuven.be

This work was supported by: FWO PhD fellowship 1196818N; FWO projects: G086318N; G086518N; Fonds de la Recherche Scientifique – FNRS and the Fonds Wetenschappelijk Onderzoek – Vlaanderen under EOS Project no 30468160 (SeLMA)

1KU Leuven, Department of Electrical Engineering (ESAT-STADIUS), Kasteelpark Arenberg 10, 3001 Leuven-Heverlee, Belgium.

2IMT School for Advanced Studies Lucca, Piazza San Francesco 19, 55100 Lucca, Italy.

goal is to devise algorithms where a group of agents/pro- cessors distributively update certain coordinates of x while guaranteeing convergence to a zero of T .

There are two main computational models in distributed optimization (depicted inFig. 1) with a range of hybrid mod- els in between [1, Chap. 1]. These models are conceptually different and require different analysis. The model consid- ered in this work is the local/private-memory model. Let us first describe the two models.

Shared-memory model: This model is characterized by the access of all agents/processors to a shared memory. A large body of literature exists for parallel coordinate descent algorithms for this problem. Typically, coordinate descent al- gorithms would require a memory lock to ensure consistent reading. Interesting recent works allow inconsistent reads [2], [3]. In this model, for the fixed point iteration (2), each pro- cessor reads the global memory and proceeds to choose a random coordinate i ∈ {1, . . . , m} and to perform

xk+1i = xki − Tixˆk,

where ˆxk denotes the data loaded from the global mem- ory to the local storage at the clock tick k, and Ti repre- sents the operator that updates the ith coordinate. This form of updates are asynchronous in the sense that the proces- sors update the global memory simultaneously resulting in possibly inconsistent local copy ˆxk due to other processors modifying the global memory during a read. The analysis of such algorithms would in general rely on either using the properties of the operator that updates the ith coordinate when possible (coordinate-wise Lipschitz continuity in the case of the gradient [2]), or the properties of the global op- erator (see [3] for nonexpansive operators). A crucial point in the convergence analysis of such methods is the fact that for a given processor, the index of the coordinate to be up- dated is selected at random, but no matter which coordinate is selected the same local data ˆxk is used for the update.

Let ˆTix := (0, . . . , 0, Tix, 0 . . . , 0). Then in a randomized scheme the operators ˆTi can be summed over i:

m

X

i=1

Tˆixˆk = T ˆxk,

allowing one to use the properties known for the global op- erator (see the proof of [3, Lem. 2]). As we discuss below, the difficulty in the local-memory model is precisely due to the fact that this summation no longer holds.

Local/private-memory model: In this model each agen- t/processor has its own private local memory. The agents can send and receive information to other agents as needed, and agent i can only update xi. This model is also referred to as

(2)

Local/private-memory model Global memory

Shared-memory model

Local storages

Fig. 1. The main two memory models; (left) agents cooperating to perform a task, (right) processors updating a global memory

message-passing model[1].

In the absence of delay between agents, randomized block- coordinate updates may be used to develop distributed asyn- chronous algorithms. Such schemes would typically involve random independent activation of agents to perform their lo- cal updates, and are in this sense also referred to as asyn- chronous [4]–[7]. In this work we are only concerned with the use of outdated information by the agents and do not pursue this form of asynchrony.

In accordance with the notation of the seminal work [1, Chap. 7] we define the following local (outdated) version of the generic vector xk= (xk1, . . . , xkm) used by agent i:

xk[i] := xτ

i 1(k) 1 , . . . , xτ

i m(k) m



, (3)

where τji(k) is the latest time at which the value of xj is transmitted to agent i by agent j. In our setting the delay is assumed to be bounded:

Assumption 1. There exists an integer B such that for all k ≥ 0 the following holds

(∀i, j) 0 ≤ k − τji(k) ≤ B, and τii(k) = k.

The fact that each agent knows its own local variable with- out delay is projected in the assumption τii(k) = k. This is a natural assumption and is satisfied in practice. Notice that for ease of notation we defined the complete outdated vector while in practice each agent would only keep a local copy of the coordinates that are required for its computation, see Fig. 1. The direction of the arrows in Fig. 1 signify the na- ture of the coupling between two agents. For example, the arrow from A4to A3 indicates that agent A3requires x4for its computation. Such a relation between agents is depen- dent on the formulation and the nature of coupling between agents. For instance, in the minimization (1), the coupling is represented through f and possibly Ni. As we shall see in

§II the coupling through f may be one sided since agent i may require xj for computing ∇if (the partial derivative of f with respect to xi) without agent j requiring xi.

In summary, each agent controls only one block of coor-

dinates and updates according to xk+1i = xki − Tixk[i],

the result of which will be sent (possibly with delay) to the agents that require it in their computations. The difficulty in this model comes from the impossibility of summing Tixk[i]

over all i given that xk[i] is different for each i.

In addition to the above described delay, the partially asyn- chronousmodel considered in [1, Chap. 7] involves a second assumption: each agent must perform an update at least once during any time interval of a given length. Instead, we are not concerned with asynchrony but rather with the use of outdated information by the agents. We emphasize that de- veloping partially asynchronous schemes for primal-dual al- gorithms or randomized schemes that comply with the delay model described in (3) remains a challenge.

In [1, Chap. 7.5] a partially asynchronous variant of the gradient method is studied. This analysis is further extended to the projected-gradient method in the convex case. In [8]

a periodic linear convergence rate is established for the projected-gradient method. The recent work [9] extends this analysis to the proximal-gradient method. Notice that the aforementioned primal methods are not well equipped for problems with more complex structures as in (1).

In this work we study two primal-dual algorithms for solv- ing (1) in the presence of bounded communication delays.

Primal-dual proximal algorithms are a class of first-order methods that are easy to implement, are parallelizable, and yield the primal and dual solutions simultaneously. They are able to exploit the structure in (1) efficiently, resulting in fully splitalgorithms applicable to a wide range of applica- tions. It is worth noting that while this paper focuses on two particular primal-dual algorithms, a similar analysis should be applicable to other primal-dual methods such as the ones developed in [6], [10]–[13].

A. Main Contributions

To the best of our knowledge this is the first work that con- siders the general delay described in (3) for a primal-dual

(3)

algorithm. Unlike primal methods (gradient or proximal- gradient), this scheme can be applied to solve problems with complex structures as in (1) without the need to in- vert matrices or to solve inner loops.

The analysis of [1], [8], [9] rely on the use of the cost as the Lyapunov function. In contrast, we show that un- der the bounded delay assumption and some strong con- vexity assumption, the generated sequence is quasi-Fejér monotone provided that the stepsizes are sufficiently small.

Moreover, linear convergence is established with an ex- plicit convergence factor.

Two primal-dual algorithms are presented: (i) when the coupling between agents is enforced only through f , the algorithm of [14], [15] is considered, (ii) when the cou- pling is enforced through f and the linear mapping a mod- ified algorithm is developed which appears to be new. In the second case due to the presence of additional coupling smaller stepsizes must be used to ensure convergence.

B. Motivating Example

Consider the problem of formation control [16], where each agent (vehicle) has its own private dynamics and cost function and the goal is to achieve a specific formation while communicating only with a selected number of agents. Let wi= (ξi, vi) where ξiand videnote the local state and input sequences. The location of agent i is given by yi = Cξi and the set of its neighbors is denoted by Ai. The linear dynamics of each agent over a control horizon is represented by the constraints Eiwi= bi. In order to enforce a formation between agents i and j the quadratic cost function kC(ξi ξj) − dijk2 is used where dij is the target relative distance between them (refer to [16] for details). Hence, the formation control problem is formulated as the following constrained minimization:

minimize 12

m

X

i=1

X

j∈Ai

kC(ξi− ξj) − dijk2+12

m

X

i=1

wi>Qiwi

subject to Eiwi= bi, i = 1,...,m

This problem can be easily cast in the form of (1) by set- ting f equal to the first term, giequal to the quadratic local cost, hi the indicator of the point bi and the linear mapping Ni= Ei. Therefore, the objective is to enforce a formation between agents by solving this optimization problem in pres- ence of communication delays by allowing the agents to use outdated information. Notice that in this case the coupling between agents is enforced only through f . This special case of (1) is studied in §III.

C. Notation and Preliminaries

Throughout, IRn is the n-dimensional Euclidean space with inner product h·,·i and induced norm k · k. For a posi- tive definite matrix P we define the scalar product hx,yiP = hx,P yi and the induced norm kxkP=phx,xiP.

For a set C, we denote its relative interior by riC. Let q : IRn→ IR := IR ∪ {+∞} be a proper closed convex function.

Its domain is denoted by domq. Its subdifferential is the

set-valued operator ∂q : IRn⇒ IRn

∂q(x) = {y ∈ IRn| ∀z ∈ IRn, hz − x,yi + f (x) ≤ f (z)}.

For a positive scalar ρ the proximal map associated with q is the single-valued mapping defined by

proxρq(x) := argmin

z∈IRn

{q(z) +1kx − zk2}.

The Fenchel conjugate of q, denoted by q, is defined as q(v) := supx∈IRn{hv,xi − q(x)}. The function q is said to be µ-convex with µ ≥ 0 if q(x) −µ2kxk2 is convex.

A sequence (wk)k∈IN is said to be quasi-Fejér monotone relative to the set U if for all v ∈ U and all k ∈ IN

kwk+1− vk2≤ kwk− vk2+ εk,

where (εk)k∈IN is a summable nonnegative sequence [17].

The positive part of x ∈ IR is denoted by [x]+:= max{x,0}.

II. PROBLEMSETUP

Throughout this paper the primal and dual vectors, denoted x and u, are assumed to be composed of m blocks as follows

x = (x1,...,xm) ∈ IRn, u = (u1,...,um) ∈ IRr, where xi∈ IRni and ui∈ IRri. Consider a linear mapping L : IRn→ IRr that is partitioned as follows:

L =

L11 ··· L1m

... . .. ... Lm1 ··· Lmm

, (4)

where Lij: IRni→ IRrj. Furthermore, the ith (block) row of L is denoted by Li: IRn→ IRri and the ith (block) column by Li: IRni→ IRr, i.e.,

L =

L1

... Lm

= L1 ··· Lm.

The following holds hLx,ui =

m

X

i=1

hLix,uii =

m

X

i=1

hxi,L>iui. (5) Consider the structured optimization problem (1) where the linear mapping Ni has been replaced by Li defined above in order to clarify the structure of the mapping:

minimize

x∈IRn f (x) +

m

X

i=1

gi(xi) + hi(Lix). (6) The cost functions gi and hi◦ Li are private functions be- longing to agent i. The coupling between agents is through the smooth term f and the linear term Lix. An agent i is assumed to have access to the information required for its computation, be it outdated, cf.Algorithms 1and2.

Let the following assumptions hold Assumption 2.

(i) For i = 1,...,m, gi: IRni→ IR, hi: IRri→ IR are proper closed convex functions, and Li: IRn→ IRri is a linear mapping.

(ii) f : IRn→ IR is convex, continuously differentiable, and

∇f is β-Lipschitz continuous for some nonnegative β:

k∇f (x) − ∇f (x0)k ≤ βkx − x0k, ∀x,x0∈ IRn.

(4)

(iii) For every i = 1,...,m there exists a nonnegative con- stant ¯βisuch that for allx,x0∈ IRn satisfyingxi= x0i: k∇if (x) − ∇if (x0)k ≤ ¯βikx − x0k. (7) (iv) The set of solutions to (6) is nonempty.

(v) (Constraint qualification) There exists xi∈ ridomgi, for i = 1,...,m such that Ljx ∈ ridomhj, for j = 1,...,m.

Assumption 2(iii) quantifies the strength of the coupling (through f ) between agents [1, Sec. 7.5]. In particular, if f is separable, i.e., f (x) =Pm

i=1fi(xi), then there is no coupling and ¯βi= 0.

Problem (6) can be compactly represented as minimize

x∈IRn f (x) + g(x) + h(Lx), where g(x) =Pm

i=1gi(xi), h(u) =Pm

i=1hi(ui), and L is as in (4). The dual problem is given by

minimize

u∈IRr (g + f )(−L>u) + h(u).

Under the constraint qualification of Assumption 2(v), the set of solutions to the dual problem is nonempty and the duality gap is zero [18, Cor. 31.2.1]. Furthermore, x? is a primal solution and u? is a dual solution if and only if the pair (x?,u?) satisfies

0 ∈ ∂g(x?) + ∇f (x?) + L>u?,

0 ∈ ∂h(u?) − Lx?. (8)

Such a point is called a primal-dual solution and the set of all primal-dual solutions is denoted by S.

Let us define a few parameters used throughout the paper.

For each agent i ∈ {1,...,m} define the positive stepsizes γi, σithat are associated with the primal and the dual variables, respectively. Moreover set

β := ¯¯ β1,..., ¯βm,

Γ := blkdiag(γ1In1,...,γmInm), Σ := blkdiag(σ1Ir1,...,σmIrm).

Applying the algorithm of V˜u and Condat [14], [15] to (6), with stepsize matrices Σ and Γ as defined above, results in the following updates for agent i at iteration k:

xk+1i = proxγigi xki− γiL>iuk− γiif (xk)

(9a) uk+1i = proxσih

i uki+ σiLi(2xk+1− xk). (9b) Notice that each agent requires the latest variables xk, xk+1 and uk in the above updates, which may not be available due to communication delays. In the next section we consider the case when L is block-diagonal. The case of general L is studied inSection IVwhere a modified primal-dual algorithm is proposed in place of (9) to allow for a larger stepsize in this case.

III. THECASE OFBLOCK-DIAGONALLINEARMAPPING

Throughout this section we assume that the linear mapping L has a block-diagonal structure. Therefore, the coupling between agents is enacted only through the smooth function f . The example of formation control inSection I-Bis of this structure.

Under this assumption problem (6) becomes minimize

x∈IRn f (x) +

m

X

i=1

gi(xi) + hi(Liixi), where Lii is the ith diagonal block of L, see (4). Given this diagonal structure, in the updates (9), agent i must receive those xj’s that are required for the computation of ∇if and all other operations are local. Let us define the set of agents that are required to send their variable to i as follows:

Niin:= {j | ∇if depends on xj},

and the set of j’s that agent i must send xi to as Niout:=

{j | i ∈ Njin}.

Algorithm 1 summarizes the proposed scheme for this problem. At every iteration each agent i performs the up- dates described in (9) using the last information it has re- ceived from agents j ∈ Niin. It then transmits the updated xk+1i to the agents that require it (possibly with delay). Note that xk[i] was defined as the outdated version of the full vec- tor xk for simplicity of notation, and in practical implemen- tation it would only involve the coordinates that are required for the computation of ∇if .

Algorithm 1 V˜u-Condat algorithm with bounded delays Initialize: x0i ∈ IRni, u0i ∈ IRri for each i ∈ {1,...,m}.

for k = 0,1,... do

for each agent i = 1,...,m do

– perform the local updates using the last received information, i.e., using the locally stored vector xk[i]

as defined in (3):

xk+1i = proxγigi xki− γiL>iiuki− γiif (xk[i]) uk+1i = proxσih

i uki+ σiLii(2xk+1i − xki) – send xk+1i to all j ∈ Niout(possibly with different delays)

As shown in Theorem 1, for small enough stepsizes the generated sequence converges to a primal-dual solution un- der the bounded delay assumption, and provided that func- tions gi are strongly convex. Such needed requirements are summarized below:

Assumption 3. For i = 1,...,m

(i) (Strong convexity)gi isµig-convex for someµig> 0.

(ii) (Convergence condition) The stepsizes σii> 0 sat- isfy the following assumption:

γi< 1

σikLiik2+ β +B22k ¯βk2

Mg−1

, (10)

where

Mg= blkdiag µ1gIn1,...,µmgInm. (11) Notice that according toAssumption 3(ii)we require a one time global communication of k ¯βkM−1

g and β when initiating the algorithm.

Before proceeding with the convergence results, let us de- fine the following

P :=−1 −L>

−L Σ−1



. (12)

(5)

Noting that Σ,Γ are positive definite, and using Schur com- plement we have that P is positive definite if and only if Γ−1− L>ΣL is positive definite, a condition that holds if (10) is satisfied (since L has a block-diagonal structure).

Our analysis inTheorem 1relies on showing that the gen- erated sequence is quasi-Fejér monotone relative to the set of primal-dual solutions in the space equipped with the in- ner product h·,·iP. Notice that without communication delays (B ≡ 0), this analysis leads to the usual Fejér monotonicity of the sequence. The use of outdated information introduces additional error terms that are shown to be tolerated by the algorithm if the stepsizes are small enough and the functions gi are strongly convex.

The proof ofTheorem 1can be found in [19].

Theorem 1. Consider Algorithm 1 and let Assumptions 1 to3hold. Then the sequence (zk)k∈IN is quasi-Fejér mono- tone relative toS in the space equipped with the inner prod- ucth·,·iP. Furthermore,(zk)k∈INconverges to somez?∈ S.

IV. THECASE OFGENERALLINEARMAPPING

In this section we consider the general optimization prob- lem (6) where additional coupling is present through the lin- ear maps, i.e., L is not block-diagonal. We consider a modi- fied primal-dual algorithm that resembles (9) with the differ- ence that in the dual update the linear map Li operates on xk[i] in place of 2xk+1[i] − xk[i]. This modification results in the possibility of using larger stepsizes since the terms 2xk+1[i] − xk[i] would introduce additional sources of error.

Let us define the following two sets:

Mpi := {j | Lji6= 0}, Mdi := {j | Lij6= 0}, where 0 denotes a zero matrix of appropriate dimensions. In Algorithm 2, due to the additional coupling through the linear maps, the primal vector of agent i must be transmitted to all j ∈ Mpi∪ Nioutwhile the dual vector is to be transmitted to all j ∈ Mdi. Notice that the outdated primal and dual vectors xk[i] and uk[i], need not have the same delay pattern and are arbitrary as long asAssumption 1is satisfied, i.e., agent i may use the primal vector xkj1 and the dual vector ukj2 transmitted by j at times k1 and k2.

Algorithm 2 A primal-dual algorithm with bounded delays Initialize: x0i ∈ IRni, u0i ∈ IRri for each i ∈ {1,...,m}.

for k = 0,1,... do

for each agent i = 1,...,m do

– perform the local updates using the last received information, i.e., using the locally stored vectors xk[i]

and uk[i] as defined in (3):

xk+1i = proxγigi xki− γiL>iuk[i] − γiif (xk[i]) uk+1i = proxσ

ih?i uki+ σiLixk[i]

– send xk+1i to all j ∈ Niout∪ Mpi, and uk+1i to all j ∈ Mdi (possibly with different delays)

InTheorem 2 convergence is established forAlgorithm 2 when the stepsizes are small enough, under the assumption

that the functions gi are strongly convex and hiare continu- ously differentiable with Lipschitz continuous gradient. We summarize these requirements below:

Assumption 4. For all i = 1,...,m:

(i) (Strong convexity)gi isµig-convex for someµig> 0.

(ii) (Lipschitz continuity)hi is continuously differentiable, and ∇hi is µ1i

h

-Lipschitz continuous for someµih> 0.

Equivalently, hi isµih-convex.

(iii) (Convergence condition) The stepsizes σii> 0 sat- isfy the following inequalities

σi< 1

Cs(B + 1)2, γi< 1

β +12Rs(B + 1)2+ B2k ¯βk2

Mg−1

, where

Rs:=

m

X

i=1 1

µihkLik2, Cs:=

m

X

i=1 1

µigkL>ik2. (13) Notice that by Assumption 4(iii) we require a one time global communication of Rs, Cs, β and k ¯βkM−1

g .

Let us define the following positive definite matrix that is used in the convergence analysis

D :=blkdiag(Γ−1−1). (14) We proceed with the convergence results for Algorithm 2.

The proofs ofTheorems 2 and3 can be found in [19].

Theorem 2. Consider Algorithm 2 and let Assumptions 1, 2 and 4 hold. Then the sequence (zk)k∈IN is quasi-Fejér monotone relative to S in the space equipped with h·,·iD. Furthermore,(zk)k∈IN converges to somez?∈ S.

Next theorem provides a sufficient condition for the step- sizes under which linear convergence is attained.

Theorem 3 (Linear convergence). ConsiderAlgorithm 2and letAssumption 1,2,4(i)and 4(ii)hold. Let c be a positive scalar and setγi=µci

gi=µci h

fori = 1,...,m. Let µming = minµ1g,...,µmg , µminh = minµ1h,...,µmh . Suppose that the following holds:

c ≤ (1 + c2)

1 B+1− 1, where

c2= min

( µming

2Bk ¯βk2

Mg−1+ Rs(B + 1) + β, µminh 2Cs(B + 1)

) . Then the following linear convergence rate holds

kzk− z?k2 1+c1 k

kz0− z?k2. V. CONCLUSION& FUTUREWORKS

In this paper we considered the application of primal-dual algorithms for solving structured optimization problems in a message-passing network model. It is shown that the com- munication delay is tolerated by the considered algorithms provided that the stepsizes are small enough, and that some strong convexity assumption holds. Future work consists of extending the convergence analysis to the partially asyn- chronous framework. Another research direction is to devise randomized schemes where in addition to the use of outdated

(6)

information, the agents would wake up at random indepen- dently from one another.

REFERENCES

[1] D. P. Bertsekas and J. N. Tsitsiklis, Parallel and distributed compu- tation: numerical methods. Prentice-Hall, 1989, vol. 23.

[2] J. Liu and S. J. Wright, “Asynchronous stochastic coordinate descent:

Parallelism and convergence properties,” SIAM Journal on Optimiza- tion, vol. 25, no. 1, pp. 351–376, 2015.

[3] Z. Peng, Y. Xu, M. Yan, and W. Yin, “ARock: An algorithmic frame- work for asynchronous parallel coordinate updates,” SIAM Journal on Scientific Computing, vol. 38, no. 5, pp. A2851–A2879, 2016.

[4] F. Iutzeler, P. Bianchi, P. Ciblat, and W. Hachem, “Asynchronous dis- tributed optimization using a randomized alternating direction method of multipliers,” in 52nd IEEE Conference on Decision and Control, 2013, pp. 3671–3676.

[5] P. Bianchi, W. Hachem, and F. Iutzeler, “A coordinate descent primal- dual algorithm and application to distributed asynchronous optimiza- tion,” IEEE Transactions on Automatic Control, vol. 61, no. 10, pp.

2947–2957, Oct 2016.

[6] P. Latafat, N. M. Freris, and P. Patrinos, “A new randomized block- coordinate primal-dual proximal algorithm for distributed optimiza- tion,” arXiv preprint arXiv:1706.02882, 2017.

[7] J.-C. Pesquet and A. Repetti, “A class of randomized primal-dual algo- rithms for distributed optimization,” Journal of Nonlinear and Convex Analysis, vol. 16, no. 12, pp. 2453–2490, 2015.

[8] P. Tseng, “On the rate of convergence of a partially asynchronous gradient projection algorithm,” SIAM Journal on Optimization, vol. 1, no. 4, pp. 603–619, 1991.

[9] Y. Zhou, Y. Liang, Y. Yu, W. Dai, and E. P. Xing, “Distributed proxi- mal gradient algorithm for partially asynchronous computer clusters,”

Journal of Machine Learning Research, vol. 19, no. 19, pp. 1–32, 2018.

[10] P. L. Combettes and J.-C. Pesquet, “Primal-dual splitting algorithm for solving inclusions with mixtures of composite, Lipschitzian, and parallel-sum type monotone operators,” Set-Valued and Variational Analysis, vol. 20, no. 2, pp. 307–330, 2012.

[11] L. M. Briceño-Arias and P. L. Combettes, “A monotone + skew split- ting model for composite monotone inclusions in duality,” SIAM Jour- nal on Optimization, vol. 21, no. 4, pp. 1230–1250, 2011.

[12] Y. Drori, S. Sabach, and M. Teboulle, “A simple algorithm for a class of nonsmooth convex-concave saddle-point problems,” Operations Re- search Letters, vol. 43, no. 2, pp. 209–214, 2015.

[13] P. Latafat and P. Patrinos, “Asymmetric forward–backward–adjoint splitting for solving monotone inclusions involving three operators,”

Computational Optimization and Applications, pp. 1–37, 2017.

[14] L. Condat, “A primal-dual splitting method for convex optimization in- volving Lipschitzian, proximable and linear composite terms,” Journal of Optimization Theory and Applications, vol. 158, no. 2, pp. 460–479, 2013.

[15] B. C. V˜u, “A splitting algorithm for dual monotone inclusions involv- ing cocoercive operators,” Advances in Computational Mathematics, vol. 38, no. 3, pp. 667–681, 2013.

[16] R. L. Raffard, C. J. Tomlin, and S. P. Boyd, “Distributed optimization for cooperative agents: application to formation flight,” in 43rd IEEE Conference on Decision and Control, vol. 3, 2004, pp. 2453–2459.

[17] P. L. Combettes, “Quasi-Fejérian analysis of some optimization algo- rithms,” Studies in Computational Mathematics, vol. 8, pp. 115–152, 2001.

[18] R. Rockafellar, Convex analysis. Princeton University Press, 1997.

[19] P. Latafat and P. Patrinos, “Multi-agent structured optimization over message-passing architectures with bounded communication delays,”

arXiv preprint arXiv:1809.07199, 2018.

Referenties

GERELATEERDE DOCUMENTEN

The resulting tractable robust counterpart of a model with our nonlinear decision rule is again a convex optimization model of the same optimization class as the original model

To elucidate that the structure of the model problem is generic, we consider the analogy with var- ious boundary-coupled problems, viz., free-boundary prob- lems,

for fully nonconvex problems it achieves superlinear convergence

In this paper we propose a line search method to accelerate the popular Chambolle-Pock optimization method, we discuss its convergence properties and apply it for the solution of

In the case where d k is a linear mixture of the latent source signals as they impinge on the reference sensor of node k, the idea of (3) is to perform a denoising of the

If many delay violations occur, quality of experience (QoE) su↵ers considerably for these applications. In multi-user communication systems, competition for bandwidth among

Plug and Play Distributed Model Predictive Control with Dynamic Coupling: A Randomized Primal-Dual Proximal Algorithm.. Puya Latafat, Alberto Bemporad,

In Section II we introduce the convex feasibility problem that we want to solve, we provide new reformulations of this problem as separable optimization problems, so that we can