QPALM: A Newton-type Proximal Augmented Lagrangian Method for Quadratic Programs

(1)

QPALM: A Newton-type Proximal Augmented Lagrangian Method for Quadratic Programs

Ben Hermans,

¹

Andreas Themelis

²

and Panagiotis Patrinos

²

Abstract— We present a proximal augmented Lagrangian based solver for general quadratic programs (QPs), relying on semismooth Newton iterations with exact line search to solve the inner subproblems. The exact line search reduces in this case to finding the zero of a one-dimensional monotone, piecewise affine function and can be carried out very efficiently.

Our algorithm requires the solution of a linear system at every iteration, but as the matrix to be factorized depends on the active constraints, efficient sparse factorization updates can be employed like in active-set methods. Both primal and dual residuals can be enforced down to strict tolerances and otherwise infeasibility can be detected from intermediate iterates.

A C implementation of the proposed algorithm is tested and benchmarked against other state-of-the-art QP solvers for a large variety of problem data and shown to compare favorably against these solvers.

I. Introduction

This paper deals with convex quadratic programs (QPs) minimize

x∈ⁿ 1

2

hx, Qxi + hq, xi subject to ` ≤ Ax ≤ u, (QP) where Q ∈

^n×n

is symmetric and positive semidefinite, q ∈

ⁿ

, A ∈

^m×n

, and `, u ∈

^m

.

E fficiently and reliably solving QPs is a key challenge in optimization [15, §16]. QPs cover a wide variety of applications and problem classes, such as portfolio optimization, support vector machines, sparse regressor selection, real-time linear model predictive control (MPC), etc. QPs also often arise as subproblems in general nonlinear optimization techniques such as sequential quadratic programming. Therefore, substantial research has been performed to develop robust and e fficient QP solvers. State-of-the-art solvers are typically based on interior point methods, such as MOSEK [1]

and Gurobi [14], or active-set methods, such as qpOASES [11]. Both methods have their advantages and disadvantages.

Interior point methods typically require few but expensive iterations, involving the solution of a linear system at every iteration. In contrast, active-set methods require more but

1Ben Hermans is with the Department of Mechanical Engineering, KU Leuven, and DMMS lab, Flanders Make, Leuven, Belgium. His research benefits from KU Leuven-BOF PFV/10/002 Centre of Excellence: Opti- mization in Engineering (OPTEC), from project G0C4515N of the Research Foundation - Flanders (FWO - Flanders), from Flanders Make ICON: Avoid- ance of collisions and obstacles in narrow lanes, and from the KU Leuven Research project C14/15/067: B-spline based certificates of positivity with applications in engineering.

2Andreas Themelis and Panagiotis Patrinos are with the Department of Electrical Engineering (ESAT-STADIUS) – KU Leuven, Kasteelpark Aren- berg 10, 3001 Leuven, Belgium. This work was supported by the Research Foundation Flanders (FWO)research projects G086518N and G086318N;

Research Council KU Leuven C1 project No. C14/18/068; Fonds de la Recherche Scientifique — FNRS and the Fonds Wetenschappelijk Onder- zoek — Vlaanderenunder EOS project no 30468160 (SeLMA).

{ben.hermans2,andreas.themelis,panos.patrinos}@kuleuven.be

cheaper iterations, as the linear system changes only slightly and low-rank factorization updates can be used instead of having to factorize from scratch. As a result, active-set methods can be e fficiently warm started when solving a series of similar QPs, which is common in applications such as MPC, whereas interior-point methods in general do not have this capability. Recently, proximal algorithms, also known as operator splitting methods [16], have experienced a resurgence in popularity. Relying only on first-order information of the problems, such methods have as their advantage operational simplicity and cheap iterations, but they may exhibit slow asymptotic convergence for poorly scaled problems. The recently proposed OSQP solver [21], based on the alternating direction method of multipliers (ADMM), addresses this crucial issue by means of a tailored o ffline scaling that performs very well on some QP problems, although a more thorough benchmarking confirms the known limitations of proximal algorithms. Indeed, parameter tuning is typically set before the execution, with possibly minor online adjustments, and as such operator splitting methods do not take into account cur- vature information about the problem that can greatly speed up convergence.

In this work we propose a novel and reliable QP solver

based on the augmented Lagrangian method (ALM) [5] and

in particular proximal ALM [18], which is e fficient and ro-

bust against ill conditioning. ALM involves penalizing the

constraints and solving a series of unconstrained minimiza-

tions, where fast smooth optimization techniques can be em-

ployed. QPs turn out to be particularly amenable for this ap-

proach, as optimal stepsizes of exact line searches are avail-

able in closed form, similiar to what was shown in previous

work [17], resulting in an extremely fast minimization strat-

egy. In each unconstrained optimization, the iterates rely on

the solution of a linear system, dependent on the set of ac-

tive constraints. As such, similarly to active-set methods, our

iterates can benefit from low-rank factorization update tech-

niques and are therefore cheaper than in interior point meth-

ods. However, in contrast to active-set methods and more

in the flavor of other algorithms such as the dual gradient

projection method of [2], our method allows for substan-

tial changes in the active set at each iteration and converges

therefore much faster on average. To some extent, our al-

gorithm strikes a balance between interior point and active-

set methods. In regard to state-of-the-art ALM solvers, not

much research is available. The authors of [13] presented

an ALM solver for QPs, OQLA /QPALM, which solves the

inner minimization problems using a combination of active-

set and gradient projection methods. Although the authors of

(2)

[9] discuss second order updates based on a generalized Hes- sian, similar to our method, their approach deals with convex (composite) optimization problems in general, and as such it can not make use of the e fficient factorization updates and optimal stepsizes that are key to the e fficiency of our algorithm. Finally, a link with proximal algorithms can also be established, in the sense that the form of an iterate of unconstrained minimization is very similar to what is obtained by applying ADMM to (QP). Di fferently from an ADMM approach where parameters are set before the execution, in our method penalties are adjusted online. Moreover, penalty matrices are used instead of scalars to allow for more flexibility in penalizing the constraints. As a result of the online adjustments, our algorithm requires in general fewer, although slightly more expensive, iterations.

The contribution of this paper is QPALM, a full-fledged QP solver based on proximal ALM. We describe the relevant theory and outline in detail the algorithmic steps for both the outer and inner minimization procedures. We also provide an open-source C implementation of the algorithm,

¹

which we benchmark with state-of-the-art QP solvers, showing that QPALM can compete with and very often outperform them both in runtime and robustness in regard to problem scaling.

The remainder of this paper is structured as follows. Sec- tion II introduces notation used in the paper and underlying theoretical concepts. Section III outlines the application of the proximal ALM to (QP). Section IV discusses in detail the algorithmic steps, regarding both inner and outer minimization, of QPALM. Section V addresses some of the crucial implementation details that contribute to making QPALM an e fficient and robust solver. Section VI presents simulation results on QPs of varying sizes and problem conditioning, benchmarking QPALM’s performance against state-of-the- art solvers. Finally, Section VII draws concluding remarks and mentions future directions of research.

II. Preliminaries

We now introduce some notational conventions and briefly list some known facts needed in the paper. The interested reader is referred to [4] for an extensive discussion.

A. Notation and known facts

We denote the extended real line by B ∪ {∞}.

The scalar product on

ⁿ

is denoted by h · , · i. With [x]

₊

B max {x, 0} we indicate the positive part of vector x ∈

ⁿ

, meant in a componentwise sense. A sequence of vectors (x

^k

)

_k∈

is said to be summable if P

k∈

kx

^k

k < ∞.

With Sym(

ⁿ

) we indicate the set of symmetric

^n×n

matrices, and with Sym

₊

(

ⁿ

) and Sym

₊₊

(

ⁿ

) the subset of those which are positive semidefinite and positive definite, respectively. Given Σ ∈ Sym

₊₊

(

ⁿ

) we indicate with k · k

_Σ

the norm on

ⁿ

induced by Σ, namely kxk

Σ

B

√ hx, Σxi.

Given a nonempty closed convex set C ⊆

ⁿ

, with Π

C

(x) we indicate the projection of a point x ∈

ⁿ

onto C, namely

1https://github.com/Benny44/QPALM

Π

C

(x) = arg min

y∈C

ky − xk or, equivalently, the unique point z ∈ C satisfying the inclusion

x − z ∈ N

C

(z), (1)

where N

C

(z) B {v ∈

ⁿ

| hv, z − z

⁰

i ≤ 0 ∀z

⁰

∈ C} is the normal cone of C at z. dist(x, C) and dist

_Σ

(x, C) denote the distance from x to set C in the Euclidean norm and in that induced by Σ, respectively, while δ

C

is the indicator function of set C, namely δ

C

(x) = 0 if x ∈ C and ∞ otherwise.

B. Convex functions and monotone operators

The Fenchel conjugate of a proper closed convex function ϕ :

ⁿ

→ is the convex function ϕ

^∗

:

ⁿ

→ defined as ϕ

^∗

(y) = sup

x

hx, yi − ϕ(x). The subdi fferential of ϕ at x ∈

ⁿ

is ∂ϕ(x) B {v ∈

ⁿ

| ϕ(x

⁰

) ≥ ϕ(x) + hv, x

⁰

− xi ∀x

⁰

∈

ⁿ

}.

Having y ∈ ∂ϕ(x) is equivalent to x ∈ ∂ϕ

^∗

(y).

A point-to-set mapping M :

ⁿ

⇒

ⁿ

is monotone if hx− x

⁰

, ξ−ξ

⁰

i ≥ 0 for all x, x

⁰

∈

ⁿ

, ξ ∈ M(x) and ξ

⁰

∈ M(x

⁰

).

It is maximally monotone if, additionally, there exists no monotone operator M

⁰

, M such that M(x) ⊆ M

⁰

(x) for all x ∈

ⁿ

. The resolvent of a maximally monotone operator M is the single-valued (in fact, Lipschitz-continuous) mapping (id + M)

⁻¹

, where (id + M)

⁻¹

(x) is the unique point

¯x ∈

ⁿ

such that x − ¯x ∈ M( ¯x). zer M B {x | 0 ∈ M(x)} de- notes the zero-set of M, and for a linear mapping Σ, ΣM is the operator defined as ΣM(x) B {Σy | y ∈ M(x)}.

The subdi fferential of a proper convex lower semicontin- uous function ϕ is maximally monotone, and its resolvent is the proximal mapping prox

_ϕ

B (id + ∂ϕ)

⁻¹

.

III. Proximal ALM for QPs

This section is devoted to establishing the theoretical ground in support of the proposed Algorithm 1. We will show that our scheme amounts to proximal ALM and derive its convergence guarantees following the original analysis in [18], here generalized to account for scaling matrices (as op- posed to scalars) and by including the possibility of having di fferent Lagrangian and proximal weights. We start by observing that problem (QP) can equivalently be expressed as

minimize

x∈ⁿ

f (x) + g(Ax), (2)

where f (x) B

¹₂

hx, Qxi + hq, xi, g(z) B δ

C

(z) and C B {z ∈

^m

| ` ≤ z ≤ u}. The KKT conditions of (2) are

0 ∈ M(x, y) B ∇f (x) + A

^>

y

−Ax + ∂g

^∗

(y)

! .

Let V

_?

B zer M be the set of primal-dual solutions. Since M is maximally monotone, as first observed in [18] one can find KKT-optimal primal-dual pairs by recursively applying the resolvent of c

k

M, where (c

k

)

_k∈

is an increasing sequence of strictly positive scalars. This scheme is known as proximal point algorithm (PPA) [18]. We now show that these scalars can in fact be replaced by positive definite matrices.

Theorem 1. Suppose that (2) has a solution. Starting from (x

⁰

, y

⁰

) ∈

ⁿ

×

^m

, let (x

^k

, y

^k

)

_k∈

be recursively defined as

(x

^k⁺¹

, y

^k⁺¹

) = (id + Σ

k

M)

⁻¹

(x

^k

, y

^k

) + ε

^k

(3) for a summable sequence (ε

^k

)

_k∈

, and where Σ

k

B

_Σ

x,k Σy,k

for some Σ

x,k

∈ Sym

₊₊

(

ⁿ

) and Σ

y,k

∈ Sym

₊₊

(

^m

). If Σ

k

(3)

Σ

k+1

Σ

∞

∈ Sym

₊₊

(

ⁿ

×

^m

) holds for all k, then (x

^k

, y

^k

)

_k∈

converges to a KKT-optimal pair for (2).

Proof. We start by observing that for all k it holds that zer( Σ

k

M) = zer(M) = V

?

and that Σ

k

M is maximally monotone with respect to the scalar product induced by Σ

⁻¹_k

. The resolvent (id + Σ

k

M)

⁻¹

is thus firmly nonexpansive in that metric (see [4, Prop. 23.8 and Def. 4.1]): that is, denoting v

^k

B (x

^k

, y

^k

) and ˜v

^k⁺¹

B v

^k⁺¹

− ε

^k

= (id + Σ

k

M)

⁻¹

(v

^k

),

k˜v

^k⁺¹

− v

_?

k

²

Σ⁻¹_k

≤ kv

^k

− v

_?

k

²

Σ⁻¹_k

− k˜v

^k⁺¹

− v

^k

k

²

Σ⁻¹_k

(4) holds for every v

?

∈ V

_?

. Therefore, since kv

^k⁺¹

− v

_?

k

_Σ−1

k+1

≤

kv

^k⁺¹

− v

_?

k

_Σ−1

k

≤ k˜v

^k⁺¹

− v

_?

k

_Σ−1

k

+kε

^k

k

_Σ−1

k

≤ kv

^k

− v

_?

k

_Σ−1

k

+kε

^k

k

_Σ−1 k

(where the first inequality owes to the fact that Σ

⁻¹_k₊₁

Σ

⁻¹_k

), it follows from [8, Thm. 3.3] that the proof reduces to showing that any limit point of (v

^k

)

_k∈

belongs to V

?

.

From [8, Prop. 3.2(i)] it follows that the sequence is bounded and that v

^k⁺¹

− v

^k

→ 0 as k → ∞. Suppose that a subsequence (v

^k^j

)

_j∈

converges to v; then, so do v

^k^j⁺¹

and

˜v

^k^j⁺¹

= v

^k^j⁺¹

− ε

^k^j

= (id + Σ

kj

M)

⁻¹

v

^k^j

. We have Σ

⁻¹k_j

(v

^k^j

− ˜v

^k^j⁺¹

) ∈ M(˜v

^k^j⁺¹

);

since ( Σ

⁻¹_k

)

_k∈

is upper bounded, the left-hand side converges to 0, and from outer semicontinuity of M [19, Ex. 12.8(b)]

it then follows that 0 ∈ M(v), proving the claim. Let us consider the iterates (3) under the assumptions of Theorem 1, and let us further assume that Σ

y,k

is diagonal for all k. Let ( ˜x

^k⁺¹

, ˜y

^k⁺¹

) B (x

^k⁺¹

, v

^k⁺¹

)−ε

^k

= (id+Σ

k

M)

⁻¹

(x

^k

, y

^k

).

Equation (3) reads

0 = Σ

⁻¹_x,k

( ˜x

^k⁺¹

− x

^k

) + ∇f ( ˜x

^k⁺¹

) + A

^>

˜y

^k⁺¹

(5a) 0 ∈ ˜y

^k⁺¹

− y

^k

+ Σ

y,k

(∂g

^∗

(˜y

^k⁺¹

) − A ˜x

^k⁺¹

). (5b) The second condition is in fact equivalent to

˜y

^k⁺¹

= prox

_Σ_y,kg^∗

(y

^k

+ Σ

y,k

A ˜x

^k⁺¹

)

= y

^k

+ Σ

y,k

A ˜x

^k⁺¹

− Σ

y,k

prox

_Σ⁻¹

y,kg

(A ˜x

^k⁺¹

+ Σ

⁻¹_y,k

y

^k

)

= Σ

y,k

A ˜x

^k⁺¹

+ Σ

⁻¹_y_,k

y

^k

− Π

C

(A ˜x

^k⁺¹

+ Σ

⁻¹_y_,k

y

^k

), where the first equality follows from the Moreau decompo- sition [4, Thm. 14.3(ii)], and the second one from the fact that Σ

y,k

is diagonal and set C is separable, hence that the projection on C with respect to the Euclidean norm and that induced by Σ

y,k

coincide. Notice that A

^>

˜y

^k⁺¹

is the gradient of

¹₂

dist

²_Σ

y,k

(A · + Σ

⁻¹_y_,k

y

^k

, C) at ˜x

^k⁺¹

. Using this in (5a), by introducing an auxiliary variable z

^k

we obtain that an (exact) resolvent step ( ˜x

^k⁺¹

, ˜y

^k⁺¹

) = (id + Σ

k

M)

⁻¹

(x

^k

, y

^k

) amounts to



 

 

 

 

˜x

^k⁺¹

= arg min

x

ϕ

k

(x)

˜z

^k⁺¹

= Z

k

( ˜x

^k⁺¹

)

˜y

^k⁺¹

= y

^k

+ Σ

y,k

A ˜x

^k⁺¹

− ˜z

^k⁺¹

,

(6) where

Z

k

(x) B arg min

z∈C 1

2

kz − (Ax + Σ

⁻¹_y_,k

y

^k

)k

²_Σ

y,k

= Π

C

(Ax + Σ

⁻¹_y_,k

y

^k

)

= Ax+Σ

⁻¹_y_,k

y

^k

+[`−Ax−Σ

⁻¹_y_,k

y

^k

]

₊

−[Ax +Σ

⁻¹_y_,k

y

^k

−u]

₊

(7) is a Lipschitz-continuous mapping, and

ϕ

_k

(x) B f (x) +

¹₂

dist

²_Σ

y,k

(Ax + Σ

⁻¹_y_,k

y

^k

, C) +

¹₂

kx − x

^k

k

²

Σ⁻¹_x,k

(8) is a (Lipschitz) di fferentiable and strongly convex function with gradient

∇ϕ

_k

(x) = ∇f (x) + A

^>

y

^k

+ Σ

y,k

(Ax − Z

_k

(x)) + Σ

⁻¹_x_,k

(x − x

^k

). (9)

Algorithm 1 Quadratic Program ALM solver (QPALM) Require x

⁰

∈

ⁿ

, y

⁰

∈

^m

, ε, ε

_a

, ε

_r

> 0, ϑ, ρ ∈ (0, 1)

∆

x

, ∆

y

> 1, σ

^max_x

, σ

^max_y

> 0, σ

_y

> 0 Initialize σ

x

= Π

[10⁻⁴,10⁴]

(20

max(1,| f (x⁰)|)

max(1,kAx⁰−ΠC(Ax⁰)k²)

) [6, §12.4]

Σ

x,0

= σ

x

I

_m

, Σ

y,0

= σ

y

I

_m

Repeat for k = 0, 1, . . .

1.1:

x ← x

^k

. Let ϕ

k

, ∇ϕ

k

and J

k

be as in (8), (9) and (12)

1.2:

while k∇ϕ

k

(x)k

_Σ_x,k

> ρ

^k

ε do

1.3:

x← x +τ

?

d with d as in (14) and τ

_?

as in Algorithm 2

1.4:

x

^k⁺¹

= x, z

^k⁺¹

= Π

C

(Ax

^k⁺¹

+ Σ

⁻¹_y_,k

y

^k

)

1.5:

y

^k⁺¹

= y

^k

+ Σ

y,k

(Ax

^k⁺¹

− z

^k⁺¹

)

1.6:

if (x

^k⁺¹

, z

^k⁺¹

, y

^k⁺¹

) satisfies (11 ) then return x

^k⁺¹

; end if

1.7:

( Σ

y,k+1

)

i,i

= ((Σ

y,k

)

_i,i

if |(Ax

^k⁺¹

−z

^k⁺¹

)

_i

| ≤ ϑ|(Ax

^k

− z

^k

)

_i

|, min ∆

y

( Σ

y,k

)

_i,i

, σ

^max_y

otherwise,

1.8:

( Σ

x,k+1

)

i,i

= min ∆

x

( Σ

x,k

)

i,i

, σ

^max_x

Remark 2 (Connection with the proximal ALM [18]). The (x, z)-update in (6) can equivalently be expressed as

(x

^k⁺¹

, z

^k⁺¹

) = arg min

(x,z)∈ⁿ×^m

L

_Σ

y,k

(x, z, y

^k

) +

¹₂

kx − x

^k

k

²

Σ⁻¹_x,k

, where for Σ

y

∈ Sym

₊₊

(

^m

)

L

_Σ

y

(x, z, y) B f (x) + δ

C

(z) + hy, Axi +

¹₂

kAx − zk

²_Σ

y

is the Σ

y

-augmented Lagrangian associated to minimize

x∈ⁿ,z∈^m

f (x) + δ

C

(z) subject to Ax = z, (10) a formulation equivalent to (2). In fact, the iterative scheme (6) simply amounts to the proximal ALM applied to (10).

IV. The QPALM algorithm

We now describe the proposed proximal ALM based Al- gorithm 1 for solving (QP). Steps 1.1-1.5 amount to an iteration of proximal ALM. As detailed in Section IV-A, the minimization of ϕ

k

needed for the computation of x

^k⁺¹

can be carried out inexactly. In fact, this is done by means of a tailored extremely fast semismooth Newton method with exact line search, discussed in Sections IV-B and IV-C. Fi- nally, step 1.7 increases the penalty parameters where the constraint norm has not su fficiently decreased [5, §2].

A. Outer and inner loops: early termination criteria Since the x-update is not available in closed form, each proximal ALM iteration (x

^k

, z

^k

, y

^k

) 7→ (x

^k⁺¹

, z

^k⁺¹

, y

^k⁺¹

) in (6)

— which we refer to as an outer step — requires an inner procedure to find a minimizer x

^k⁺¹

of ϕ

_k

. In this subsection we investigate termination criteria both for the outer loop, indicating when to stop the algorithm with a good candidate solution, and for the inner loops so as to ensure that x

^k⁺¹

is computed with enough accuracy to preserve the convergence guarantees of Theorem 1.

1) Outer loop termination: Following the criterion in [21], for fixed absolute and relative tolerances ε

a

, ε

r

> 0 we say that (x, z, y) is an (ε

a

, ε

r

)-optimal triplet if y ∈ N

C

(z) and the following hold:

kQx +q+A

^>

yk

∞

≤ε

a

+ε

r

max n

kQxk

∞

, kA

^>

yk

∞

, kqk

∞

o (11a)

kAx−zk

∞

≤ε

a

+ε

r

max{kAxk

∞

, kzk

∞

}. (11b)

(4)

Algorithm 2 Exact line search

Require x, d ∈

ⁿ

, diagonal Σ ∈ Sym

₊₊

(

ⁿ

) Provide optimal stepsize τ

?

∈

2.1:

Let ψ

⁰

: → , α, β ∈ and δ, η ∈

^2m

be as in (15)

2.2:

Define the set of breakpoints of ψ

⁰

T = n

_α_i_+`_i

δ_i

,

^αⁱ_δ^+uⁱ

i

| i = 1, . . . , 2m, δ

i

, 0 o

2.3:

Sort T = {t

1

, t

2

, . . .} such that t

i

< t

i+1

for all i

2.4:

Let t

_i

∈ T be the smallest such that ψ

⁰

(t

_i

) ≥ 0

2.5:

return τ

?

= t

i−1

−

_ψ₀ ^tⁱ^−tⁱ⁻¹

(ti)−ψ⁰(ti−1)

ψ

⁰

(t

i−1

)

(or τ?= t1 if i= 1)

From the expression of z

^k⁺¹

at step 1.4 it follows that Ax

^k⁺¹

+ Σ

⁻¹_y_,k

y

^k

− z

^k⁺¹

∈ N

_C

(z

^k⁺¹

), cf. (1), and hence y

^k⁺¹

as in step 1.5 satisfies y

^k⁺¹

∈ N

_C

(z

^k⁺¹

). A triplet (x

^k

, z

^k

, y

^k

) generated by Algorithm 1 is thus (ε

a

, ε

_r

)-optimal if it satisfies (11).

2) Inner loop termination: As shown in Theorem 1, convergence to a solution can still be ensured when the iterates are computed inexactly, provided that the errors have finite sum. Since ϕ

k

as in (8) is Σ

⁻¹_x_,k

-strongly convex and Σ

x,k

is diagonal, from (6) we have that

k∇ϕ

k

(x)k

_Σ_x,k

= k∇ϕ

k

(x) − ∇ϕ

k

( ˜x

^k⁺¹

)k

_Σ_x,k

≥ kx − ˜x

^k⁺¹

k.

Consequently, condition at step 1.1 ensures that kx − ˜x

^k⁺¹

k ≤ ρ

^k

ε holds for all k. In turn, ky

^k⁺¹

− ˜y

^k⁺¹

k ≤ kAkρ

^k

ε follows from nonexpansiveness of Π

C

and finally ky

^k⁺¹

− ˜y

^k⁺¹

k ≤ 2k Σ

y,k

kkAkρ

^k

ε ≤ 2kΣ

y,∞

kkAkρ

^k

ε. As a result, the inner termination criterion at step 1.1 guarantees that the error k(x

^k⁺¹

, y

^k⁺¹

) − (id + Σ

k

M)

⁻¹

(x

^k

, y

^k

)k is summable as required in the convergence analysis of Theorem 1. In the rest of the section we describe how the x-update can be carried out with an e fficient minimization strategy.

B. Semismooth Newton method

The diagonal matrix P

_k

(x) with entries

(P

k

(x))

ii

= (1 if `

i

≤ (Ax + Σ

⁻¹_y_,k

y

^k

)

i

≤ u

i

, 0 otherwise

is an element of the generalized Jacobian, [10, §7.1], of Π

C

at Ax +Σ

⁻¹_y_,k

y

^k

, see e.g. [23, §6.2.d]. Consequently, one element H

_k

(x) ∈ Sym

₊₊

(

^m

) of the generalized Hessian of ϕ

_k

is

H

k

(x) = Q + A

^>

Σ

y,k

(I − P

k

(x))A + Σ

⁻¹_x,k

. Denoting

J

_k

(x) B n

i | (Ax + Σ

⁻¹_y_,k

y

^k

)

i

< [`

i

, u

_i

]o, (12) one has that (I − P

_k

(x))

_ii

is 1 if i ∈ J

_k

(x) and 0 otherwise.

Consequently, we may rewrite the generalized Hessian matrix H

_k

(x) in a more economic form as

H

k

(x) = Q + A

^>J_k(x)

( Σ

y,k

)

J_k(x)

A

J_k(x)

+ Σ

⁻¹_x,k

, (13) where A

J_k(x)

is the stacking of the j-th rows of A with j ∈ J

_k

(x), and similarly ( Σ

y,k

)

J_k(x)

is obtained by removing from Σ

y,k

all the i-th rows and columns with i < J

k

(x).

A semismooth Newton direction d at x solves H

k

(x)d =

−∇ϕ

k

(x). Denoting λ B (Σ

y,k

)

J_k(x)

A

J_k(x)

d, the computation of d is equivalent to solving the linear system

"Q + Σ

⁻¹_x_,k

A

^>J_k(x)

A

J_k(x)

( Σ

y,k

)

⁻¹_J

k(x)

#"d λ

#

= "−∇ ϕ

_k

(x) 0

#

. (14)

C. An exact line search

∇ ϕ

_k

(x) is piecewise linear, hence so are its sections

3 τ 7→ ψ

k,(x,d)

(τ) B ϕ

k

(x + τd)

for any d ∈

ⁿ

. This implies that, given a candidate update direction d, the minimization of ϕ

_k

can be carried out without the need to perform backtrackings, as an optimal stepsize

τ

?

∈ arg min

τ∈ⁿ

ψ

k,(x,d)

(τ)

can be explicitly computed, owing to the fact that ψ

⁰_k,(x,d)

is a piecewise linear increasing function. Indeed, it follows from (7) and (9) that

ψ

⁰

(τ) = h∇ϕ

k

(x + τd), di

= hd, ∇f (x + τd) + Σ

⁻¹_x_,k

(x + τd)i

+ hAd, y

^k

+ Σ

y,k

A(x + τd) − Z

k

(x + τd)i

= hd,(Q + Σ

⁻¹_x_,k

)x + qi + hΣ

y,k

Ad, Ax + Σ

⁻¹_y_,k

y − u + τAd

₊

i + τhd,(Q + Σ

⁻¹_x_,k

)di − h Σ

y,k

Ad, ` − Ax − Σ

⁻¹_y_,k

y − τAd

+

i

= ητ + β + hδ, [δτ − α]

₊

i, (15a)

where



 

 

 

 

3 η B hd, (Q + Σ

⁻¹_x_,k

)di,

3 β B hd, (Q + Σ

⁻¹_x_,k

)x + qi,

^2m

3 δ B −Σ

¹_y^/_,k²

Ad Σ

¹_y^/_,k²

Ad ,

^2m

3 α B Σ

⁻_y_,k¹^/²

y + Σ

y,k

(Ax − `) Σ

y,k

(u − Ax) − y . (15b)

Due to convexity, it now su ffices to find τ such that the expression in (15a) is zero, as done in Algorithm 2. We remark that, since ϕ

k

∈ C

¹

is strongly convex and piecewise quadratic, the proposed nonsmooth Newton method with exact linesearch would converge in finitely many iterations even with zero tolerance [22].

V. Implementation aspects

This section discusses some of the implementation details that are necessary to make QPALM an e fficient and competitive algorithm, such as the solution of the linear system at every iteration, preconditioning and infeasibility detection.

A. Linear system

We solve the linear system (14) by means of sparse

Cholesky factorization routines. In the first iteration and af-

ter every outer iteration, a sparse LDL factorization of the

generalized Hessian matrix H

k

(x

^k

) as in (13) is computed. In

between inner iterations, the set of active constraints J

k

typ-

ically does not change much. Consequently, instead of doing

an LDL factorization from scratch, two low rank updates are

su fficient, one for the constraints that enter the active set and

one for those that leave. As such, the algorithm allows for

active set changes where more than one constraint is added

and/or dropped, in contrast to active-set methods. Therefore,

our algorithm typically requires substantially fewer iterations

than active-set methods to find the set of constraints that is

active at the solution, while still having the advantage of

relatively cheap factorization updates. The aforementioned

routines are carried out with software package cholmod [7].

(5)

B. Preconditioning

Preconditioning of the problem data aims at mitigating possible adverse e ffects of ill conditioning. This amounts to scaling problem (QP) to

minimize

¯x∈ⁿ 1

2

h ¯x, ¯ Q ¯xi + h¯q, ¯xi subject to ¯` ≤ ¯A ¯x ≤ ¯u, (16) with ¯x = D

⁻¹

x, ¯ Q = c

f

DQD, ¯q = c

f

Dq, ¯ A = EAD, ¯` = E`

and ¯u = Eu. The dual variables in this problem are ¯y = c

f

E

⁻¹

y. Matrices D ∈

^n×n

and E ∈

^m×m

are diagonal and computed by performing a modified Ruiz equilibration [20]

on the constraint matrix A, scaling its rows and columns to have an infinity norm close to 1, as we observed this tends to reduce the number and scope of changes of the active set.

The scaling factor c

f

for scaling the objective was obtained from [6, §12.5], namely c

f

= max(1, k∇f (x

0

)k

∞

)

⁻¹

.

We say the problem is solved when the unscaled termination criteria (11) holds in a triplet ( ¯x

^k

, ¯z

^k

, ¯y

^k

), resulting in the following unscaled criterion

c

⁻¹_f

kD

⁻¹

( ¯ Q ¯x

^k

+ ¯q + A

^>

¯y

^k

)k

∞

≤ ε

a

+ ε

r

c

⁻¹_f

max n

kD

⁻¹

Q ¯ ¯x

^k

k

_∞

, kD

⁻¹

A ¯

^>

¯y

^k

k

_∞

, kD

⁻¹

¯qk

∞

o, kE

⁻¹

( ¯ A ¯x

^k

− ¯z

^k

)k

∞

≤ ε

a

+ ε

r

max n

kE

⁻¹

A ¯ ¯x

^k

k

_∞

, kE

⁻¹

¯z

^k

k

_∞

o.

C. Infeasibility detection

The proposed method can also detect whether the problem is primal or dual infeasible from the iterates, making use of the criteria given in [3]. Let δ¯y denote the (potential) change in the dual variable, δ¯y = Σ

y

A ¯ ¯x − Π

C

( ¯ A ¯x + Σ

⁻¹_y

¯y), then the problem is primal infeasible if for δ¯y , 0

kD

⁻¹

A ¯

^>

δ¯yk

∞

≤ ε

p

kEδ¯yk

∞

,

¯u

^>

[δ¯y]

₊

+ ¯`

^>

[δ¯y]

−

≤ −ε

p

kEδ¯yk

∞

hold, with c

⁻¹_f

Eδ¯y the certificate of primal infeasibility. Let δ ¯x denote the update in the primal variable, then the problem is dual infeasible if for δ ¯x , 0 the following conditions hold

kD

⁻¹

Qδ ¯ ¯xk

∞

≤ c

f

ε

d

kDδ ¯xk

∞

,

¯q

^>

δ ¯x ≤ −c

_f

ε

_d

kDδ ¯xk

∞

,

(E

⁻¹

Aδ ¯ ¯x)

_i



 

 

 

 

≥ ε

_d

kDδ ¯xk

∞

if ¯u

_i

= +∞,

≤ −ε

d

kDδ ¯xk

∞

if ¯`

_i

= −∞,

∈ [−ε

d

, ε

d

]kDδ ¯xk

∞

otherwise.

In that case, Dδ ¯x is the certificate of dual infeasibility.

VI. Numerical simulations

The C implementation of the proposed algorithm was tested for various sets of QPs and benchmarked against state-of-the-art QP solvers: the interior point solver Gurobi [14], the operator splitting based solver OSQP [21] and the active-set solver qpOASES [12]. The first two are also programmed in C, and the third is programmed in C++. All simulations were performed on a notebook with Intel(R) Core(TM) i7-7600U CPU @ 2.80GHz x 2 processor and 16 GB of memory. The problems are solved to medium accuracy, with the tolerances ε

_a

and ε

_r

set to 10

⁻⁶

for QPALM and OSQP, as are terminationTolerance for qpOASES and OptimalityTol and FeasibilityTol for Gurobi. Apart

Number of primal variables

0 100 200 300 400 500 600

Runtime (s)

10^-3 10^-2 10^-1 10⁰ 10¹ 10²

QPALM OSQP Gurobi

Fig. 1: Runtime comparison for random LPs of varying sizes.

from the tolerances, all solvers were run with their default options. For QPALM, the default parameters in Algo- rithm 1 are: ε = 1, ρ = 10

⁻¹

, θ = 0.25, ∆

x

= ∆

y

= 10, σ

^max_x

= σ

^max_y

= 10

⁸

, and σ

_y

= 10

⁴

. For the test sets below, the runtime in the figures comprises the average runtime on ten problems of the mentioned problem size or conditioning.

A. Linear programs

The first set of tests are linear programs (LPs) with randomly generated data. Of course, an LP is a special case of a QP with a zero Q matrix. The LPs are constructed for 30 values of n equally spaced on a linear scale between 20 and 600. We take m = 10n, as typically optimization problems have more constraints than variables. The constraint matrix A ∈

^m×n

is set to have 50% nonzero elements drawn from the standard normal distribution, A

_{i j}

∼ N (0, 1). The linear part of the cost q ∈

ⁿ

is a dense vector, q

_i

∼ N (0, 1). Fi- nally, the elements of the upper and lower bound vectors, u, l ∈

^m

, are uniformly distributed on the intervals [0, 1]

and [−1, 0] respectively. Figure 1 illustrates a comparison of the runtimes for QPALM, OSQP and Gurobi applied to random LPs of varying sizes. qpOASES is not included in this example, as according to its user manual [12, §4.5] it is not suited for LPs of sizes larger than few hundreds of primal variables, which was observed to be the case. Also OSQP is not suited for LPs as it hit the maximum number of iterations (10

⁵

) for 10 cases, solved inaccurately for 13 cases and solved only in the remaining 3 cases. QPALM is shown to be an efficient solver for LPs, outperforming the simplex and interior point methods that are concurrently tried by Gurobi.

B. Quadratic programs

The second set of tests are QPs with data randomly generated in the same way as in Section VI-A, with an additional positive definite matrix Q = MM

^>

, with M ∈

^n×n

and 50%

nonzero elements M

_{i j}

∼ N (0, 1). Figure 2 illustrates the runtimes of the four solvers for such random QPs. It is clear that QPALM outperforms the other state-of-the-art solvers regardless of the problem size.

C. Ill-conditioned problems

The third test set concerns the conditioning of quadratic

programs. In this example, the data from Section VI-B are

reproduced for a QP with n = 100, but now the impact of the

(6)

Number of primal variables

0 100 200 300 400 500 600

Runtime (s)

10^-3 10^-2 10^-1 10⁰ 10¹ 10²

QPALM OSQP qpOASES Gurobi

Fig. 2: Runtime comparison for random QPs of varying sizes.

Condition number

10⁰ 10⁵

Runtime (s)

10^-2 10^-1 10⁰ 10¹

QPALM OSQP qpOASES Gurobi

Fig. 3: Runtime comparison for random QPs with varying conditioning.

problem conditioning is investigated. For this, we set the condition number κ of the matrices Q and A (using sprandn in MATLAB), and also scale q with κ. Figure 3 shows the runtime results for 20 values of κ equally spaced on a logarith- mic scale between 10

⁰

and 10

⁵

. This figure clearly demon- strates that the first-order method OSQP su ffers from ill conditioning in the problem despite the o ffline Ruiz equilibration it operates. From condition number 38 and onwards, OSQP hit the maximum number of iterations (10

⁵

). Also qpOASES experienced di fficulties with ill-conditioned problems. From condition number 4833 onwards, it started reporting that the problem was infeasible, while QPALM and Gurobi solved to the same optimal solution. From these results it follows that QPALM, supported by preconditioning as discussed in Section V-B, is competitive with other solvers in terms of robustness to the scaling of the problem data.

VII. Conclusion

This paper presented QPALM, a proximal augmented La- grangian based solver for QPs that proved to be e fficient and robust against scaling in the problem data. Inner minimization procedures rely on semismooth Newton directions and an exact line search which is available in closed form.

The iterates, with sparse factorization update routines, allow for large updates in the active set and are more e fficient than those of interior point methods and more e ffective than those of active-set methods. QPALM was shown to compare favorably against state-of-the-art QP solvers, both in runtime and in robustness against problem ill conditioning.

Future work can be focused on considering warm-starting aspects, considering extensions to nonconvex QPs and SOCPs, and on executing a more thorough set of benchmarking examples, focused on problems arising from real applications instead of randomly generated ones.

References

[1] MOSEK ApS. Introducing the MOSEK Optimization Suite 8.1.0.80, 2019.

[2] D. Axehill and A. Hansson. A dual gradient projection quadratic programming algorithm tailored for model predictive control. In 2008 47th IEEE Conference on Decision and Control, pages 3057–3064.

IEEE, 2008.

[3] G. Banjac, P. Goulart, B. Stellato, and S. Boyd. Infeasibility detection in the alternating direction method of multipliers for convex optimization. optimization-online.org, 2017.

[4] H. H. Bauschke and P. L. Combettes. Convex analysis and monotone operator theory in Hilbert spaces. CMS Books in Mathematics.

Springer, 2017.

[5] D. P. Bertsekas. Constrained optimization and Lagrange multiplier methods. Athena Scientific, 1999.

[6] E. G. Birgin and J. M. Martínez. Practical augmented Lagrangian methods for constrained optimization, volume 10. SIAM, 2014.

[7] Y. Chen, T. A. Davis, W. W. Hager, and S. Rajamanickam. Algorithm 887: CHOLMOD, supernodal sparse Cholesky factorization and update/downdate. ACM Transactions on Mathematical Software (TOMS), 35(3):22, 2008.

[8] P. L. Combettes and B. C. V˜u. Variable metric quasi-Fejér monotonic- ity. Nonlinear Analysis, Theory, Methods and Applications, 78(1):17–

31, 2013.

[9] N. K. Dhingra, S. Z. Khong, and M. R. Jovanovi´c. A second order primal-dual algorithm for nonsmooth convex composite optimization. In 2017 IEEE 56th Annual Conference on Decision and Control (CDC), pages 2868–2873, Dec 2017.

[10] F. Facchinei and JS. Pang. Finite-dimensional variational inequalities and complementarity problems. Springer Science & Business Media, 2007.

[11] H. J. Ferreau, H. G. Bock, and M. Diehl. An online active set strategy to overcome the limitations of explicit MPC. International Journal of Robust and Nonlinear Control, 18(8):816–830, 2008.

[12] H. J. Ferreau, C. Kirches, A. Potschka, H. G. Bock, and M. Diehl.

qpOASES: A parametric active-set algorithm for quadratic programming. Mathematical Programming Computation, 6(4):327–363, 2014.

[13] J. C. Gilbert and É. Joannopoulos. OQLA/QPALM–Convex quadratic optimization solvers using the augmented Lagrangian approach, with an appropriate behavior on infeasible or unbounded problems. 2014.

[14] LLC Gurobi Optimization. Gurobi optimizer reference manual, 2018.

[15] J. Nocedal and S. Wright. Numerical optimization. Springer Science

& Business Media, 2006.

[16] N. Parikh and S. Boyd. Proximal algorithms. Foundations and Trends in Optimization, 1(3):127–239, 2014.R

[17] P. Patrinos, P. Sopasakis, and H. Sarimveis. A global piecewise smooth Newton method for fast large-scale model predictive control. Auto- matica, 47(9):2016–2022, 2011.

[18] R. T. Rockafellar. Augmented Lagrangians and applications of the proximal point algorithm in convex programming. Mathematics of operations research, 1(2):97–116, 1976.

[19] R. T. Rockafellar and R. JB. Wets. Variational analysis, volume 317.

Springer Science & Business Media, 2011.

[20] Daniel Ruiz. A scaling algorithm to equilibrate both rows and columns norms in matrices. Technical report, Rutherford Appleton Laboratorie, 2001.

[21] B. Stellato, G. Banjac, P. Goulart, A. Bemporad, and S. Boyd. OSQP:

An operator splitting solver for quadratic programs. In 2018 UKACC 12th International Conference on Control (CONTROL), pages 339–

339, Sep. 2018.

[22] J. Sun. On piecewise quadratic Newton and trust region problems.

Mathematical Programming, 76(3):451–467, Mar 1997.

[23] A. Themelis, M. Ahookhosh, and P. Patrinos. On the acceleration of forward-backward splitting via an inexact Newton method. In R. Luke, H. Bauschke, and R. Burachik, editors, Splitting Algorithms, Modern Operator Theory, and Applications. Springer. To appearhttps://

arxiv.org/abs/1811.02935.

QPALM: A Newton-type Proximal Augmented Lagrangian Method for Quadratic Programs