Index of /SISTA/pcoppens

(1)

Data-driven distributionally robust MPC for constrained

stochastic systems

Peter Coppens and Panagiotis Patrinos

†

Abstract

In this paper we introduce a novel approach to distributionally robust optimal control that supports online learning of the ambiguity set, while guaranteeing recursive feasibility. We introduce conic representable risk, which is useful to derive tractable reformulations of distributionally robust optimization problems. Specifically, to illustrate the techniques introduced, we utilize risk measures con-structed based on data-driven ambiguity sets, constraining the second moment of the random disturbance. In the optimal control setting, such moment-based risk measures lead to tractable optimal controllers when combined with affine disturbance feedback. Assumptions on the constraints are given that guarantee recursive feasibility. The resulting control scheme acts as a robust controller when little data is available and converges to the certainty equivalent controller when a large sample count implies high confidence in the estimated second moment. This is illustrated in a numerical experiment.

I. INTRODUCTION

D

ISTRIBUTIONALLY robust optimization (DRO) has gained traction recently as a technique that balances robustness with performance in an intuitive fashion. From a theoretical point of view such techniques act as regularizers [1] and in a data-driven setting, DRO acts at the interface between stochastic and robust optimization [2].

In the control community the potential of such techniques has not gone unnoticed [3], [4]. Here one would ideally solve stochastic optimal control problems like

minimize π∈Π IE h PN −1 t=0 `t(xt, πt(w0, . . . , wt−1)) + `N(xN) i subj. to xt+1= f (xt, πt(w0, . . . , wt−1), wt), t ∈ IN0:N −1 IP[φ(xt) ≤ 0] ≥ 1 − ε, t ∈ IN1:N −1 ψ(xN) ≤ 0 a.s., x0 given,

where xt∈ IRnxdenotes the state. Parametrized, causal policies π ∈ Π map disturbances

to inputs. That is, an element πtof the sequence π = {πt}N −1t=0 , maps {wi}t−1i=0 to inputs

in IRnu _{for t ≥ 1 and π}

0 ∈ IRnu. Here, the disturbances wt ∈ IRnw are i.i.d. random

vectors, the distribution of which is unknown, usually introducing the need for robust

†_{P. Coppens and P. Patrinos are with the Department of Electrical Engineering (ESAT-STADIUS), KU}

Leuven, Kasteelpark Arenberg 10, 3001 Leuven, Belgium. Email: peter.coppens@kuleuven.be, panos.patrinos@kuleuven.be

This work was supported by: the Research Foundation Flanders (FWO) PhD grant 11E5520N and research projects G0A0920N, G086518N and G086318N; Research Council KU Leuven C1 project No. C14/18/068; Fonds de la Recherche Scientifique – FNRS and the FWO – Vlaanderen under EOS project no 30468160 (SeLMA); EU’s Horizon 2020 research and innovation programme: Marie Skłodowska-Curie grant No. 953348.

(2)

approaches. DRO then improves upon classical robust control by using available data to infer properties of the distribution, while retaining guarantees.

The core construct in DRO is the ambiguity set, a set of distributions against which one should robustify. Several ambiguity sets have been examined with varying success. The most common are moment-based, φ-divergence and Wasserstein ambiguity sets [5]. Such ambiguity sets are connected to so-called risk measures by duality. Hence this approach is directly related to risk-averse optimization [6].

Throughout the paper we rely on conic representable ambiguity and risk to derive tractable problems, similar to the methodology presented in [7], [8]. The main contri-butions are then as follows: (i) we derive tight, data-driven, moment-based ambiguity sets that are conic representable and shrink when more data becomes available; (ii) we extend conic risks to the multi-stage setting and use them to model average value-at-risk constraints; (iii) we synthesize the controller such that it is recursive feasible when it is applied in a receding horizon fashion; (iv) we illustrate how our framework leads to tractable controllers based on affine disturbance feedback policies [9], which are evaluated in numerical experiments.

Similar results were achieved in [10] for a tube-based approach with Wasserstein ambiguity, the radius of which is not data-driven; in [11] for relaxed, robust constraints; in [4] for moment-based ambiguity which is not data-driven and does not guarantee recursive feasibility; and [12], [13] for discrete distributions. DR control of Markov decision processes with finite state-spaces was also considered in [14]. Our framework supports online learning of truly data-driven ambiguity sets and risk constraints within a continuous state-space, while guaranteeing recursive feasibility.

This section continues with notation and preliminaries. Next§II introduces conic and data-driven ambiguity in a single-stage setting and§IIIintroduces multi-stage extensions as well as the optimal control problem that we want to solve. Then§IV shows how to construct a controller such that recursive feasibility is guaranteed. Finally§V illustrates how our techniques lead to tractable controllers and contains numerical experiments.

A. Notation and preliminaries

Let IN denote the integers and IR (IR+) the (nonnegative) reals. We denote by Sd

the symmetric d by d matrices and by Sd++ (S

d

+) the positive (semi)-definite matrices.

For two matrices of compatible dimensions X, Y we use [X; Y ] ([X, Y ]) for vertical (horizontal) concatenation. We use k·k2 to denote the spectral norm (Euclidean norm)

for matrices (vectors) and [·]+:= max(0, ·). For matrices (or vectors) X, Y and cone K,

let X₄KY (X <KY ) be Y − X ∈ K (X − Y ∈ K). When K = Sd+ we use ().

Meanwhile, (X, Y ) := [vec(X); vec(Y )] interprets X, Y as column vectors in vertical concatenation. Let diag(X, Y ) be a (block) diagonal matrix and let Id ∈ Sd denote the

identity. For a vector x ∈ IRd, [x]i denotes the i’th element.

Slice notation: We introduce INa:b = {a, . . . , b}. Similarly we use wa:b to denote

the sequence {wi}i∈INa:b. For a sequence of length N , index a (b) is omitted when 0

(N − 1) is implied (when both are omitted we write w).

Interpreting w0:N −1 with wi ∈ IRnw as an element of IRN nw, consider affine maps

x0:M −1 = Aw0:N −1+ a0:M −1 (A ∈ IRM nx×N nw). Introducing homogeneous

(3)

For a matrix A acting on sequences, the slice Ai:j,k:`describes the part mapping wk:`

to xi:j. So we take block rows and block columns, with blocks of size IRnx×nw.

Risk measures and ambiguity: Given some measurable space (W,B) with W a com-pact subset of IRnw _andB the associated Borel sigma-algebra, we use M

+(W) (M(W))

to denote the space of finite (signed) measures on (W,B), making the dependency on W explicit. Similarly, let P(W) denote the space of probability measures.

We also consider the spaceZ := C(W) of continuous (bounded) B-measurable func-tions z :W → IR. Elements of Z act as random loss functions. Notation z ∼ µ means z has distribution µ ∈ P(W). The space M(W) and Z are paired by the bilinear form [6, §2.2], for z ∈Z, µ ∈ M(W),

hz, µi := Z

W

z(w)dµ(w).

We endow M(W) with the weak∗ topology.

We write z_{< 0 (µ < 0) to imply z(w) ≥ 0 (µ(w) ≥ 0), ∀w ∈ W. Note that, since} Z and M(W) are linear spaces, we can use the usual notation of linear operators (e.g., let E : M(W) → IRn, then Eµ = (h0, µi, . . . hn−1, µi) for some random variables

i ∈ Z, i ∈ IN0:n−1). For each linear operator E we have the adjoint E∗: IRn → Z,

with E∗λ := (0, . . . , n−1) · λ, where · is the usual inner product between vectors. After

all,

Eµ · λ = (h0, µi, . . . hn−1, µi) · λ

= Z W (0(w), . . . , n−1(w))dµ(w) · λ = Z W ((0(w), . . . , n−1(w)) · λ) dµ(w) = h(0, . . . , n−1) · λ, µi = hE∗λ, µi. (1)

We define risk based on its ambiguity as in most DRO literature [6]. Specifically, we say thatA ⊆ M+(W) is an ambiguity set if it is a non-empty, closed and convex subset

of P(W). The associated risk measure ρA: Z → IR is then [6, §2]

ρA[z] = max

µ∈Ahz, µi = maxµ∈AIEµ[z], (2)

where IEµ[·] denotes the expected value w.r.t. µ ∈ P(W). and constitutes a mapping

from random loss functions to the real line, which (similarly to expectation) can be used to deterministically compare random variables.

Our definition of an ambiguity set is directly related to that of coherent risk [15]. Lemma I.1. Suppose that A ⊆ P(W) is non-empty, closed and convex. Then ρA in(2)

is coherent. Specifically,∀z, z0_{∈ Z and α ∈ IR, ρ} A is

(i) convex, proper, and lower semicontinuous; (ii) monotonous:ρA(z) ≥ ρA(z0) if z < z0;

(iii) translation equivariant:ρA(z + α) = ρA(z) + α;

(iv) positive homogeneousρ_A(αz) = αρ_A(z) if α > 0.

Moreover, A is compact and equal to the domain of ρ∗_A andρA[z] is finite, where ρ∗A

(4)

Proof. Let χ_A denote the indicator function of A (i.e. χA[µ] = +∞ if µ /∈ A and 0

otherwise). Then, by (2),

ρ[z] = χ∗= sup

µ∈M(W)

{hz, µi − χ[µ]}, (3)

where we omit the subscript of ρA and χA for convenience. Since χ is an indicator

function, it is convex (A is convex); lower semicontinuous (A is closed); and proper (A is nonempty). Therefore, by [16, Prop.2.112] and (3), ρ is proper, convex and lower semicontinuous (its epigraph is an intersection of closed halfspaces). Therefore(i)holds. Since χ is convex and lower semicontinuous, we apply [16, Thm. 2.133] to show χ = χ∗∗= ρ∗, where the second equality follows by (3). Hence the domain of ρ∗ isA. Compactness of P(W) follows by Prohorov’s theorem [17, p.13]. Since A is a closed subset of P(W) it is also compact.

The results (ii)–(iv) follow directly from [15, Thm. 2.2]. Specifically (ii) from A ⊂ M+(W),(iii)from µ(W) = 1 for all µ ∈ A and(iv) from (2).

Next, we show that for any z ∈Z,

hz, µi ≤ α, ∀µ ∈ P(W) ⇔ _{z 4 α,} (4) where the inequality on the right holds almost surely over W (i.e. z(w) ≤ α, ∀w ∈ W). The argument for (4) is as follows [17, Eq. 3.7]. Since µ(W) = 1, hz, µi ≤ α iff hα − z, µi ≥ 0, which holds if α − z < 0 (since µ < 0). For the converse note δw∈ P(W) for any w ∈ W with δw a dirac measure. So hα − z, δwi = α − z(w) ≥ 0

for any w ∈W is a necessary condition. So we have shown (4)

From (4) we can conclude hz, µi ≤ sup_w∈Wz(w), ∀z ∈ Z, µ ∈ A. Hence, by (2), ρ[z] ≤ sup_w∈Wz(w). Since z(w) is finite for any w ∈ W, ρ[z] is finite (cf. [6, §2.2]).

II. SINGLE-STAGE PROBLEMS

Given the dual formulation of a risk measure in (2), it is clear that the choice ofA is a critical design decision. In this section we introduce how ambiguity sets, using moment information, are derived from data. We also introduce conic representable risk, used to derive tractable problems.

A. Data-driven risk

In DRO the reasoning is usually as follows. Consider a probability space (Ω,F, IP) and the optimization problem

minimize

u∈U IEµ?[f (u, w)], (5)

with u ∈ IRnu _{some decision variable, f some loss function and w : Ω →}_{W a random}

variable with W ⊂ IRnw _{the (compact) support of w. The main difficulty in solving the}

stochastic optimization problem (5) is that the distribution (or push-forward measure), µ?∈ P(W) defined on the sample space (W,B) as µ?(O) = IP[w−1(O)] for all O ∈B

and with w−1(O) the preimage of O, is unknown.

Hence, instead one introduces an ambiguity setA ⊆ P(W), which contains µ? with

some confidence. To do so one can estimate some statistic θ based on data. In the case of φ-divergence [12] and Wasserstein ambiguity [18], this θ is the empirical distribution,

(5)

while for moment-based ambiguity, θ encapsulates moment information. We will consider this final case in§II-B. To summarize:

Definition II.1. Consider random variable w : Ω → W with distribution µ? and i.i.d.

samples w0:M −1: Ω → WM. Let θ : WM → Θ denote a statistic for a set Θ and

let β ∈ IR be some radius1. Then a data-driven ambiguity A : Θ × IR ⇒ P(W) with confidenceδ ∈ (0, 1) maps (θ(w), β) to an ambiguity set Aβ(θ(w)) ⊆ P(W) such that

IP[µ?∈ Aβ(θ(w0:M −1))] ≥ 1 − δ. (6)

In [12] this is referred to as a learning system. Based on (6) we minimize ρ_A

β( ˆθ)[f (u, w)] instead of (5). The result upper bounds (5)

with probability at least 1 − δ.

B. Moment-based ambiguity

As mentioned before, we focus on the case where θ encapsulates moment information. Such ambiguity sets have the advantage that [18] (i) they can contain measures with sup-port not limited to the observed samples (unlike φ-divergence based sets); (ii) the radius is estimated with reasonable accuracy based on known information of the distribution (unlike for Wasserstein-based sets); and (iii) problem complexity does not grow with the sample count.

To ensure that an ambiguity set satisfying (6) can be derived, we assume that W is bounded, which is often the case in control applications and is therefore the usual assumption in robust control. Other common choices are that w is multivariate Gaussian or that it satisfies some concentration properties (e.g., sub-Gaussian) [19]. We have: Lemma II.2. Let W = {w ∈ IRnw_{: kwk}

2≤ r}. Assume we have a set of i.i.d. samples

w0:M −1 ofw ∼ µ? and let ˆC :=PM −1_i=0 wiw>i/M . Then,

Aβ( ˆC) = n µ ∈ P(W) : ˆ C − IEµ[w w>] ₂≤ β o ,

satisfies (6) when β = r2_{p2 log(2(n}

w+ 1)/δ)/M .

Proof. We use a matrix Hoeffding bound [20, Thm. 1.3] with improved constants. See

App. A for the full proof.

C. Conic-representable ambiguity

We introduce conic representable ambiguity (similar to the framework in [7], [8]) below and show how such risk is related to robust optimization through conic duality. Definition II.3. Consider a compact sample space W ⊂ IRnw _and _{Z = C(W). An}

ambiguity setA is conic representable if, for some E, F : M(W) → IRnb _and _{b ∈ IR}nb_,

A = {µ ∈ P(W) : ∃ν ∈ M+(W), Eµ + F ν 4K b} ,

withν some auxiliary measure and K a closed, convex cone. Usually we assume F = 0. When F 6= 0 we refer to the ambiguity as ν-conic representable. Similarly we refer to ρ_A, as in(2), as (ν-)conic representable risk (conic for short).

(6)

The parameters of A should be selected such that it is an ambiguity set (i.e., a nonempty, closed and convex subset of P(W)). Since we usually want an A satisfying (6), it will be non-empty as it should at least contain the true distribution. The random variables used to construct E and F , are all continuous. Therefore [21, Thm. 15.5] E and F are continuous mappings. Thus A is the intersection between the closed set P(W) and the preimage of a closed set under a continuous mapping, which is also closed. HenceA is a closed subset of P(W). Convexity of A then follows, since E and F are linear andK is convex.

In [8] it was shown that both the average and entropic value-at-risk are conic whenever W is finite. Many more risks fall under this framework [7], [22].

Direct application of conic linear duality [17] gives:

Lemma II.4. A risk ρ_A[z] as inDef. II.3 is equal to the optimal value of minimize

λ<K∗0,τ

τ + b · λ

subj. to E∗_{λ + τ < z, F}∗_{λ < 0,}

(D)

where the inequality should hold almost surely,E∗, F∗denote the adjoint operators (cf.

§I-A) andK∗ the dual cone. Proof. By (2) the primal problem is

maximize µ,ν∈M+(W) hz, µi subj. to _{Eµ + F ν 4}_K b h1, µi = 1, (P )

with val(P ) = ρ[z] (where we omit the subscript for convenience). We refer to the minimization problem (D) as the dual problem. Let τ ∈ IR and λ ∈ IRnb_{. Then the}

Lagrangian is

ϕ[µ, ν, λ] := hz, µi + (1 − h1, µi) · τ + (b − Eµ − F ν) · λ = τ + b · λ − hτ + E∗λ − z, µi − hF∗λ, νi, where we can use (1) to construct the adjoints. We have

K∗:= {λ ∈ IRnb_{: λ}∗_{· λ ≥ 0, ∀λ}∗_{∈ K}.}

Hence maxν∈M+(W)minλ∈K∗,τ{(b − Eµ − F ν) · λ} = −χ[µ], where χ is the indicator

of A. Therefore max µ,ν∈M+(W) min λ∈K∗_,τ{ϕ[µ, ν, λ]} =_µ∈Mmax +(W) {hz, µi − χ[µ]} = ρ[z].

Similarly note that [17, Eq. 3.7]

M∗₊(W) := {z ∈ Z : hz, µi ≥ 0, ∀µ ∈ M+(W)}

= {z ∈ Z : z(w) ≥ 0, ∀w ∈ W} = {z ∈ Z : z < 0}, which follows from a similar argument as (4). As such

min

λ∈K∗_,τ_µ,ν∈Mmax +(W)

(7)

since λ gives a finite cost iff τ + E∗λ − z ∈ M∗₊(W) and F∗λ ∈ M∗₊(W) (i.e. τ + E∗_{λ − z < 0 and F}∗_{λ < 0).}

All that is left is to show strong duality (i.e. val(D) = val(P ) = ρ[z]). This follows directly from coherence of ρ (specifically ρ being proper, implying consistency of (P)), compactness ofW and [17, Cor. 3.1].

Note that constraints in the dual are robust constraints, since they hold a.s. over W. Hence, techniques from robust optimization enable finding tractable reformulations. Example II.5. The ambiguity set Aβ( ˆC) of Lem. II.2is conic with nb= 3n2w, Eµ =

(±hw w>

, µi), b = ( ˆC ± βI) and K = Snw+1 + × S

nw+1 + .

Moreover, letting λ = (Λ, V) with Λ, V ∈ Snw+1 _{and τ ∈ IR while using}_{Lem. II.4}_,

means

ρ_A

β( ˆC)[z] = min_Λ,V0,τ τ + Tr[Λ( ˆC + βI)] + Tr[V( ˆC − βI)]

s. t. τ + E∗_{λ < z,} where the adjoint E∗: IRnb _{→ Z, is (cf. (}₁₎₎

(τ + E∗λ)(w) = τ + Tr[w w>_{(Λ − V)]}

= w>_{(Λ − V + diag(0, τ ))w.}

If the constraint τ + E∗λ < z is LMI representable, then ρAβ( ˆC)[z] can be evaluated by

solving a SDP. For example if z = w>

P w. Then, since w>_{w ≤ r}2_{, we can apply the}

S-Lemma [23, Thm. B.2.1.] to show that τ + E∗_{λ < z iff.,} ∃s ≥ 0, Λ − V + diag(sI, τ − sr2_{) − P 0.}

(7) We also consider ambiguity with only support constraints.

Example II.6. Ambiguity P(W) is conic representable with nb= 0. Hence, ρP(W)[z] =

minτ{τ : τ < z}, corresponds to ρP(W)[z] = maxw∈Wz(w) and only considers the

support as is common in robust optimization.

III. MULTI-STAGE PROBLEMS

In this section we show how conic single-stage risk can be extended to a multi-stage setting, which is required to develop distributionally robust MPC controllers. Specifically, we will consider risk measures operating on the dynamics

xt+1= f (xt, ut, wt),

with xt ∈ IRnx (ut ∈ IRnu) the state (input) and wt ∈ IRnw the disturbance, which

follows a random process. For t ∈ IN0:N −1 we consider `t: IRnx× IRnu → IR+ a stage

cost function, and `N: IRnx → IR+ the terminal cost.

For each stage t, the trajectory up to that time w0:t−1 is an element ofWt. For each

Wt,Btis the accompanying Borel sigma-algebra, (Mt) Mt+the set of (signed) measures

and Ptthe set of probability measures on (Wt_,_Bt_{). For brevity we henceforth omit the}

explicit dependency on Wt_{. Also consider the paired spaces of continuous functions}

(8)

We can then consider multistage ambiguity sets At_{, which are nonempty, closed}

and convex subsets of Pt_{. These in turn define a multistage analog to risk measures}2_,

multistage risk measures [6, §4.2], ρ_At: Zt → IR. Since this is simply a usual risk

measure, but defined on Zt, the properties of Lem. I.1 generalize. We specifically

consider coherent multistage risk ρAt[zt] = max

µt_∈Athzt, µ

t_{i = max}

µt_∈AtIEµt[zt]. (8)

Given such risks, the goal is to solve, for a given x0,

minimize π∈Π ρA N "N −1 X t=0 `t(xt, πt(w:t−1)) + `N(xN) # (9a) subj. to xt+1= f (xt, πt(w:t−1), wt), t ∈ IN0:N −1 (9b) rt_A[φ(xt)] ≤ 0, t ∈ IN1:N −1 (9c) ψ(xN) ≤ 0 a.s., (9d)

where Π denotes a set of parametrized, continuous, causal policies. The risk constraints (9c) involve the multistage risk measures rt

A and are discussed in detail in §IV. We

illustrate how (9) interpolates between the robust setting and (1) in§V.

Remark III.1. Problem (9) is not exact as we optimize over parametrized policies (cf.

§V), for tractability. As such, time-consistency [24] cannot be guaranteed (i.e. a policy computed at t = 0 may not be optimal at t = 1 after realization of w0). Hence a receding

horizon scheme is used.

A. Product ambiguity

To enforce independence of the disturbances wt we introduce product ambiguity [6,

§4.2]. For a sequence of single-stage ambiguity factors Ai for i ∈ IN0:t−1, consider t−1

×

i=0

Ai= A0× · · · × At−1, (10)

where some µt_{∈ A}

0× · · · × At−1 if it is constructed as a product measure of some

µi ∈ Ai, for i ∈ IN0:t−1 (denoted by µt = µ0× · · · × µt−1). We show that in certain

cases such ambiguities are conic representable.

Before doing so we need to extend linear operators Ei: M → IRnb to take arguments

in Mt_{in a natural way. To do so, note that for any E}

i: M → IRnb and µ ∈ M we have

Eiµ = R_w∈Wei(w)dµ(w) for some ei: W → IRnb, by definition. Measures µt∈ Mt

take arguments w:t−1= (w0, . . . , wt−1), so we introduce E|i: Mt→ IRnb such that

E|iµt=

Z

Wt

ei(wi)dµt(w:t−1), ∀µt∈ Mt. (11)

With these new operators we have

2_{Multistage risk is often constructed using nested conditional risk measures. We avoid such a construction}

(9)

Lemma III.2. Let Aibe conic representable with parametersEi, bi, Kifori ∈ IN0:t−1.

Then×t−1

i=0Ai is also conic representable with parameters

Eµt= (E|0µt, . . . , E|t−1µt), b = (b0, . . . , bt−1), and K = K0× . . . Kt−1. Moreover, ρ_×t−1 i=0Ai[zt] =_λ min i<K∗_i0,τ n τ +Pt−1 i=0bi· λi: τ +P t−1 i=0E|∗iλi< zt o .

Proof. Let µt= µ0× · · · × µt−1, with µi∈ Pi. Then, following the notation in (11),

E|iµt:= Z Wt ei(wi)dµt(dw0:t−1) (i) = Z W · · · Z W ei(wi)dµ0(w0) . . . dµt−1(wt−1) = Z W ei(wi)dµi(wi),

where (i) follows from µt = µ0× · · · × µt−1 and µt ∈ Pt. Hence E|iµ 4Ki bi iff

Eiµi 4Ki bi. Repeating the same argument for each i proves that A

t_{:= ×}t−1 i=0 Ai

is conic representable. Since Ai are all non-empty, At is also nonempty. Convexity

and closedness follow from the arguments belowDef. II.3. The dual then follows from applying Lem. II.4.

B. Risk constraints

Ideally constraints like (9c) would require the state to lie within some set almost surely. Since such a constraint in a stochastic setting can be very conservative, we will instead implement average value-at-risk constraints, for α ∈ (0, 1),

AV@Rµ_α[z] := inf

τ ∈IRτ + α −1_IE

µ[z − τ ]+ ≤ 0. (12)

Such constraints (i) act as a convex relaxation of chance constraints [25]; (ii) penalize the expected violation in the α quantile where violations do occur. In control applications (12) is natural, since it penalizes large violations more.

To evaluate the expectation in (12), true knowledge about the distribution is needed. Hence, we will operate on the distributionally robust AV@R constraint instead:

r-AV@RAα[z] := max µ∈AAV@R

µ

α[z] ≤ 0, (13)

withA the core ambiguity. If A satisfies (6), then (13) implies the chance constraint IP[zt≤ 0] ≥ 1 − ε holds with 1 − ε ≤ (1 − δ)(1 − α). Moreover, whenever A is conic,

then robust AV@R is ν-conic.

Lemma III.3. Let A be conic with parameters Ec, bc, Kc. Then r-AV@RAα in(13) is

ν-conic with

Eµ = (Ecµ, h1, µi), F ν = (Ecν, h1, νi), b = (bc, 1)/α,

and K = Kc× {0}. Moreover, r-AV@Rρα[z] equals

min

λ<K∗_c0,τ,τc

τ + α−1_(τ

(10)

Proof. This proof generalizes the methodology of [26] to arbitrary conic representable risk. First note that

max

µ∈Aτ ∈IRinf τ + α −1_IE

µ[z − τ ]+ = inf

τ ∈IRτ + α −1_ρ

A[z − τ ]+ , (15)

by [27, Thm. 2.1]. Specifically let φ(τ, w) = τ + α−1[z(w) − τ ]+. Then we have (i)

φ(τ, ·) ∈ Z, implying that it is µ integrable and measurable; (ii) φ(·, w) is convex for any w ∈W; (iii) ρA[z − τ ]+ is finite (Lem. I.1); (iv) the set A ⊆ P(W) is compact

(Lem. I.1); (v) φ(τ, ·) is continuous and hence bounded for any τ ∈ IR on W. Under these properties as well as A being convex, [27, Thm. 2.1] states that strong duality holds, allowing us to exchange the inf and the max.

ApplyingLem. II.4to ρ_Aon the r.h.s., results in (14) (where [·]+produces two seperate

constraints and τc, λc act as the Lagrangian multipliers for the constraint µc ∈ Ac).

Again applyingLem. II.4gives the original ν-conic representation.

The second application of Lem. II.4 requires the resulting set of measures (denoted Aα below) to be a nonempty, closed and convex subset of P(W). By construction we

already haveAα⊆ P(W). Next, we show that Aαis larger thanA. After all, for any

µc∈ A, take µ = µc and ν = α−1(1 − α)µc/α < 0, since α ∈ (0, 1). Moreover, since

Kc is a cone,

Ecµ + Ecν 4Kc bc/α ⇔ Ec(αµ + αν) 4Kc bc,

with αµ + αν = αµc+ (1 − α)µc= µc. For the same reason we have h1, µi + h1, νi =

α−1h1, µci = α−1. Therefore µc∈ Aαfor each µc∈ A. Hence, since A is nonempty,

so is Aα. Closedness and convexity then follow by the arguments below Def. II.3. So

using Lem. II.4is justified.

IV. RECURSIVE FEASIBILITY

We show how one can configure the constraints of (9) such that recursive feasibility is ensured. To do so we assume

(A1) rt

A[zt] := r-AV@RP

t−1_×A

α [zt], ∀t ∈ IN0:N −1, zt∈ Zt;

(A2) A is updated based on measurements as g(A, w) (e.g., followingLem. II.2) and A+_{:= g(A, w) ⊆ A a.s.}

We introduce the terminal set XN:= {x : ψ(x) ≤ 0}. Let VNA(x0) denote the

mini-mum of (9) for some x0 and let DN(A) denote its domain. Then consider the set of

feasible policies ΠN(x0, A) := {π ∈ Π : (9b), (9c), (9d)} .

We begin with the following definition.

Definition IV.1 (Recursive Feasibility). Let x0 ∈ DN(A) and π ∈ ΠN(x0, A). If,

f (x0, π0, w0) ∈ DN(g(A, w0)) a.s., then (9) is recursive feasible (RF).

We can then prove the following theorem:

Theorem IV.2. Assume(A1),(A2)and that we are given some terminal policyπf(xN)

such that

(A3) XN ⊆ {x ∈ IRnx: φ(f (x, πf(x), w)) ≤ 0, ∀w ∈ W};

(A4) XN is robust positive invariant (RPI) forπf (i.e.,f (x, πf(x), w) ∈ XN for each

(11)

(A5) ∀π0:N −1∈ Π, let πN = πf(xN), depending on w:N throughxN. Then the shifted

policyπ+_{0:N −1}= π1:N(w0, ·) for any fixed w0∈ W, lies in Π.

Then,(9) is recursive feasible.

Proof. We will consider any fixed w0∈ W and show that, given that (9) is feasibile for

some x0, it will also be feasible for the next time step starting from x+0 = f (x0, π0, w0)

(cf. Def. IV.1). Here, we consider the feasibile policy π0:N −1 ∈ ΠN(x0, A), to which

we append πN = πf(xN). Propagating the dynamics with this policy gives the sequence

of states x0:N +1, depending on w:N through (9b).

We then define the shifted sequence of states as

x+_0:N(w1:N −1) := (x1(w0, w1:N), . . . , xN +1(w0, w1:N)),

where w0 is considered fixed and w1:N is left variable. We can analogously define the

shifted policy π+_{0:N −1} as

π+_{0:N −1}(w1:N −1) := (π1(w0, w1:N −1), . . . , πf(xN(w0, w1:N −1))).

By construction, these shifted sequences satisfy (9b) and we can consider risk measures over (continuous) functions of these, where integration is performed over w1:N.

Using this coupling between the feasibile problem and the shifted problem we show π+_{∈ Π}

N(x0, A+). That is, the candidate policy π+ is feasible for the shifted problem.

I: By (A5), π+ _{∈ Π;}

II: We show that (9c) at t implies (9c) in the shifted problem at t − 1. That is, r_At [φ(xt)] ≥ rt−1_A [φ(x+t−1)] for any w0∈ W. So, letting z = φ(xt), by (15),

rt_A[z] = max

µt_∈Pt−1_×Aτ ∈IRinf τ + α −1_IE µt[zτ] = inf τ ∈IRτ + α −1_ρ Pt−1_×A[zτ] , with zτ = [z − τ ]+. For z+:= φ(x+t−1) = z(w0, w1:t−1) (zτ+ = [z+− τ ]+), we

replace ρPt−1_×A[z_τ] with ρ_Pt−2_×A[z_τ+] for rt−1

A [z +_{]. Writing out ρ} Pt−1_×A gives max µt_∈Pt−1_×A Z Wt zτ(w:t−1)dµt(w:t−1) (i) = max µt−1∈A max µt−1_∈Pt−1 Z Wt−1 h(w:t−2) Z W zτ(w:t−1)dµt−1(wt−1) dµt−1(w:t−2) (ii) = max µt−1∈A max w:t−2∈Wt−1 h(w:t−2) (iii) ≥ max µt−1∈A max w1:t−2∈Wt−2 h(w0, w1:t−2).

Noting that µt= µt−1× µt−1with µt−1∈ Pt−1and µt−1∈ A, before splitting up

the max and the integrals, gives(i). The inner integral (i.e. h(w:t−2)) then acts as a

continuous random variableWt−1→ IR for any fixed µt−1(cf. App. B). Hence we

can apply the reasoning withinEx. II.6 to maximize over w:t−2 ∈ Wt−1 instead

of over measures resulting in (ii). It is clear that, fixing the value of w0 results

in the inequality (ii). Reverting the steps (i) and (ii) to get a maximization over µt−2_{∈ P}t−2_{× A shows that the final expression after}_(iii)_{equals ρ}+

At−1[z

+_{− τ ]} +.

Hence ρPt−1_×A[z − τ ]₊ ≥ ρ_Pt−2_×A[z+− τ ]₊ for all τ ∈ IR, z ∈Zt and w₀ ∈

W. Therefore rt

(12)

r_At−1+[φ(x

+

t−1)]. Hence (9c) holds for all t ∈ IN1:N −2 in the shifted problem. For

t = N − 1 we rely on(A3) and(A4).

III: The terminal constraint (9d) follows directly from (A4). We have thus shown that π+ _{is a feasible policy.}

Remark IV.3. Note that (A1)is essential since RF is a robust property, holding a.s. It acts as a convex relaxation of chance constraints conditioned on previous time steps (i.e., IP[φ(xt) ≤ 0 | xt−1] ≤ ε which holds a.s., hence ∀µt−1∈ Pt−1byEx. II.6). Due to the

reduction of the policy space Π (cf.Rem. III.1) it is harder to satisfy such constraints for larger t. Other (less conservative) reformulations exist in the stochastic MPC literature, which impose all constraints at the first time step using a (maximal) RPI set (cf. [28]).

V. AFFINE DISTURBANCE FEEDBACK

To make the reformulations above more concrete, we show how (9) is solved. In general this is intractable, since we need to optimize over infinite dimensional policies π, under robust constraints associated with the risks (cf.Rem. III.1). Hence, we use affine disturbance feedback. The resulting optimization problem is a SDP. Different ambiguity sets and policies would give other reformulations (e.g., [11], [8]).

Consider linear dynamics, quadratic losses and constraints: f (x, u, w) = Ax + Bu + Ew, πf(x) = Kfx, `t(x, u) = x>Qx + u>Ru, `N(x) = x>Qfx, φ(x) = x> Gx + 2g> x + γ, ψ(x) = x> Gfx + 2g>fx + γf.

with Q 0, R 0, Qf 0, G 0, Gf 0. We could include (hard) input constraints

as well or multiple state constraints (cf. discussion in [4, §1] on modeling joint chance constraints), but abstain from doing so for conciseness.

In this setting affine disturbance feedback [9] has been applied to solve many robust optimal control problems (and even some DRO problems [4]). The idea is to let

π(w) = F w + f ,

where F : IR(N +1)nw _{→ IR}(N +1)nx _{(defined in} _{App. C}_{), has a structure that enforces}

causality of π. Note x:N ∈ IR(N +1)nx, w:N −1∈ IRN nw and u:N −1∈ IRN nu.

The state trajectory then depends on the disturbance as x = (BF + E)w + (Ax0+ Bf ) = Hw,

with A, B, E defined inApp. C, F = [F , f ] and H = [H, h] = [BF +E, Ax0+Bf ].

Here the linear part, H, can be interpreted as the sensitivity of the state to the disturbance, while h = (h0, . . . , hN) is the deterministic part of the state (i.e., the state trajectory

when w = 0).

We first show how the cost (9a) is implemented, before doing the same for the risk constraints (9c) and the terminal constraint (9d).

(13)

A. Reformulation of the cost

We will assume the risk in (9a) has a product ambiguity,AN = Aβ( ˆC)×· · ·×Aβ( ˆC)

withAβ( ˆC) as in Lem. II.2. Letting z := x>Qx + u>Ru, (9a) is

ρ_AN[z] = ρ_AN

h w>

H>QH + F>RFwi (17) where Q and R are defined in App. C.

Note, byEx. II.5, that the factorsAi= Aβ( ˆC) for i ∈ IN0:N −1 are conic with

E_iρµ = (±hww>_{, µi),} _bρ i = ( ˆC ± βI), K ρ i = S nw+1 + × S nw+1 + .

Therefore, byLem. III.2,AN _{= ×}N −1

i=0 Ai is conic with

EρµN = (Eρ|0µN, . . . , Eρ|N −1µN), bρ= (b ρ 0, . . . , b ρ N −1) andKρ_{= K}ρ 0× · · · × K ρ N −1. Let Λ ρ i, V ρ i 0, τ ∈ IR, λ ρ i = (Λ ρ i, V ρ i) and b = ( ˆC ± βI)

for all i ∈ IN0:N −1. Then, byLem. III.2,

ρ_AN[z] = min Λρ_i0,Vρ_i0,τ τ +PN −1 i=0 b · λ ρ i (18a) s. t. τ +PN −1 i=0 E ρ_|∗ iλ ρ i < z, (18b) with Eρ_|∗ iλ ρ i = w > i(Λ ρ i − V ρ i)wi for i ∈ IN0:N −1.

Our goal is now to find a tight, conservative LMI reformulation of (18b) as an LMI. We use, for any matrix Λ ∈ Snw+1_{, the partitioning}

Λ =[Λ]m [Λ]v [Λ]>

v [Λ]c

,

with [Λ]m∈ Snw, [Λ]v∈ IRnw and [Λ]c∈ IR. We also introduce ∆ρi:= Λ ρ i − V ρ i for all i ∈ IN0:N −1. Then, N −1 X i=0 Eρ|∗iλ ρ i = w >      [∆ρ₀]m [∆ρ0]v . ._. .._. [∆ρ_{N −1}]m [∆ρN −1]v [∆ρ₀]> v . . . [∆ ρ N −1] > v τ + PN −1 i=0 [∆ ρ i]c      w,

which is quadratic in w. Note that the argument of ρ_AN is also quadratic in w. Therefore

(cf. z in (17)), (18b) is quadratic in w. It should hold a.s. over the set of quadratic inequal-itiesWN = {w : w>

i wi ≤ r2, ∀i ∈ IN0:N −1}. This is conservatively approximated by a

LMI, with the approximate S-Lemma [23, Thm. B.3.1.]. Its effect here is conservatively quantified as increasing the radius r by a factor 9.19pln(N − 1), before solving the problem exactly. Specifically (18b) holds if there exists a Pρ_{∈ S}Nw+1

+ and s

ρ_{∈ IR}N −1 +

such that (remembering ∆ρ_i:= Λρ_i − Vρ_i)      [∆ρ₀]m+ [sρ]0I [∆ρ0]v . ._. .._. [∆ρ_{N −1}]m+ [sρ]N −1I [∆ ρ N −1]v [∆ρ₀]> v . . . [∆ ρ N −1] > v τ + PN −1 i=0 [∆ ρ i]c− [sρ]ir2      Pρ (19)

(14)

and   Pρ _H>_Q1/2 _F>_R1/2 Q1/2H I R1/2_F _I   0. (20)

Here (20) implies w>_Pρ_{w ≥ z(w) for all w ∈ IR}Nw _{by a Schur complement argument.}

The cost (18a) meanwhile is given as τ +PN −1 i=0 Tr[Λ ρ i( ˆC + βI)] + Tr[V ρ i( ˆC − βI)]. (21)

B. Reformulation of risk constraints

We consider constraints,rt

Awith ambiguityA = Aβ( ˆC), satisfying(A1). The random

variable φ(xt) in (9c) equals x> t G g g> _γ xt, with xt= Ht,:t−1 ht 0 1 w:t−1 (22)

and is thus quadratic in w:t−1. By construction, (9c) is of type (13) with core ambiguity

Pt−1 _{× A. Both ambiguity factors are conic by} _{Ex. II.6} _and _{Ex. II.5}_{. We now use}

µt_{= µ}t−1_{× µ}

t−1. Then, similarly to before, byLem. III.2, Pt−1× A is conic for

Erµt= Er|t−1µt= (h±(wt−1w>t−1), µti),

br _{= ( ˆ}_{C ± βI) and K}r_{= S}nw+1 + × S

nw+1 + .

Using the duality result ofLem. III.3shows that (9c) holds if there exists a τr

t, τc,tr ∈ IR,

Λr

t, Vtr 0 and λrt= (Λrt, Vtr) such that

τ_tr+ α−1 τ_c,tr + b · λr_t ≤ 0, (23a) τ_c,tr + Er|∗

t−1λrt< 0, and (23b)

τ_tr+ τ_c,tr + Er|∗_t−1λr_t_{< φ(x}t), (23c)

with b = ( ˆC ± βI) (as inEx. II.5). We appliedEx. II.6 for Pt−1 and Ex. II.5for A. Note that (23a) is a linear constraint. Since Er|∗

t−1λrt= w

>

t−1(Λrt− Vtr)wt−1, (23b) and

(23c) are of similar structure to (18b). That is, a quadratic inequality that should hold for all w:t−1∈ Wt−1. Hence we can once again apply the approximate S-Lemma followed

by a Schur complement, resulting in a LMI. Specifically, let ∆rt:= Λrt− Vrt. Then,

Er|∗t−1λrt= w > :t−1   [∆rt]m [∆rt]v [∆r t]>v [∆rt]c  w:t−1.

Following a similar procedure to before, we apply the approximate S-Lemma for both (23b) and (23c). Specifically, (23b) holds if there exists some sr

1,t∈ IR N −1 + such that        [sr 1,t]0I . ._. [sr 1,t]t−1I [∆r t]m [∆rt]v [∆r t] > v [∆rt]c+ τc,tr − Pt−1 i=0[sr1,t]ir2        0, (24)

(15)

Meanwhile, (23c) holds if there exists some Pr t ∈ Stn+w+1 and s r 2,t∈ IR N −1 + such that        [sr2,t]0I . ._. [sr2,t]t−1I [∆r t]m [∆rt]v [∆rt]>v [∆rt]c+ τc,tr + τtr− Pt−1 i=0[sr2,t]ir2        Pr t, (25)

where, analogously to Pρ, the matrix Ptr should satisfy

Ptr Ht,:t−1 ht 0 1 > G g g> _γ Ht,:t−1 ht 0 1 =Ht,:t−1 ht > GHt,:t−1 ht + 0 H> t,:t−1g g>_H t,:t−1 2h>tg + γ . (26)

We then apply a Schur complement to find an equivalent LMI for (26):   Pr t− 0 H> t,:t−1g g>_H t,:t−1 2h>tg + γ Ht,:t−1 ht > G1/2 G1/2_H t,:t−1 ht I   0. (27)

Constraint (23a) is equivalent to

τtr+ α−1

τc,tr + Tr[Λrt( ˆC + βI)] + Tr[Vtr( ˆC − βI)]

≤ 0. (28)

We conclude with the reformulation of the terminal constraint (9d), which is equivalent to w>HN,: hN 0 1 > Gf gf g> f γf HN,: hN 0 1 w ≤ 0, (29)

for all w ∈ WN_{. We then apply the approximate S-Lemma, followed by a Schur}

complement. Specifically (29) holds, if there exists some sf ∈ IRN −1+ such that

   diag(sf) ⊗ Inw −H > N,:gf −g> fHN,: −P N −1 i=0 [sf]ir2− 2h>Ngf − γf HN,: hN > G1/2_f G1/2_f HN,: hN I    0. (30) To summarize, we minimize (21) subject to (19) and (20) (replacing (9a)). Then add constraints (24), (25), (27), (28) for each t ∈ IN1:N −1 (replacing (9c)). The terminal

constraint (9d) is then replaced by (30). The resulting minimization problem is a SDP.

C. Numerical Results

We consider two experiments: one where data is gathered offline and one where data is gathered online. In both settings the controller acts more optimally when more data is available without affecting recursive feasibility.

(16)

101 102 103 104 380 400 420 M cost 1 1.5 0 0.5 1 1.5 2 x2 x1 M =10 M =104

Fig. 1. (left) range of closed-loop costs for T = 15; (right) state trajectories for M = 10, 10 000 offline samples.

Offline learning: Consider the linear system

xt+1= 0.9 0.2 0 0.8 x + 0.1 0.05 ut+ 0.5 0 0 0.1 wt,

with wtdistributed as a truncated Gaussian with covariance Σ = 0.012I and kwtk ≤ 0.15

and N = 8. For the cost, we take Q = diag(1, 10), R = 10 and Qf and Kf and

the solution to the discrete algebraic Riccati equation and the corresponding optimal controller respectively. For the state constraints we have α = 0.2 and φ(x) = [x]1−

1.5 ≤ 0, where [x]1 denotes the first element of x. For the terminal constraint we have

ψ(x) = x>_Q

fx − 9.524 ≤ 0, which is RPI for πf(x) = Kfx and x0= (1.75, 2).

We investigate sample counts M between 10 and 10 000. For each sample count we gather an offline data set and computeA fromLem. II.2for δ = 0.05. Then we propagate the dynamics, with the receding horizon controller (9) for T = 15 time steps, cumulating the stage cost for each step. This experiment is repeated 25 times. The range of resulting costs is shown in Fig. 1. For M = 10 and M = 10 000, closed-loop state trajectories are depicted. The cost initially remains constant, since the moment constraint is inactive due to insufficient samples. Afterwards, the controller becomes more confident, moving closer to the state constraint when more data is available, resulting in a lower cost.

Online learning: We illustrate how our framework can be used to construct controllers that learn online, guaranteeing recursive feasibility. Consider the scalar linear system

xt+1= xt+ ut+ wt,

with wtdistributed as a truncated Gaussian with variance σ2= 0.052and |wt| ≤ 0.1. Let

Q = 1, R = 0.1, N = 5, α = 0.2 and φ(x) = 1 − x ≤ 0. We use πf(x) = 0.9x + 1.5,

ψ(x) = (x − 1.25)2_{− 0.25}2 _{≤ 0 and Q}

f the solution to the discrete algebraic Riccati

equation. To implement learning, the ambiguity update scheme described inApp. D is used (which constructs ambiguity from data based onLem. II.2for δ = 0.05 and rejects an ambiguity set if it is not a subset of the previously accepted ambiguity).

We consider three cases: (i) robust, whereA = P and no learning occurs; (ii) online, where we start with M = 10 samples and improve the ambiguity online while running the controller; and (iii) offline, where M = 10 000 initial samples and no updates.Fig. 2

depicts state trajectory, which similarly to before, approach the constraint for larger M .

REFERENCES

[1] S. Shafieezadeh-Abadeh, P. M. Esfahani, and D. Kuhn, “Distributionally robust logistic regression,” in NIPS’15, vol. 1, (Cambridge, MA, USA), p. 1576–1584, MIT Press, 2015.

(17)

0 200 400 600 800 1,000 1,200 1,400 1 1.1 1.2 t x

robust offline online

Fig. 2. state trajectories for robust; online-; and offline learning.

[2] E. Delage and Y. Ye, “Distributionally Robust Optimization Under Moment Uncertainty with Application to Data-Driven Problems,” Oper. Res., vol. 58, no. 3, pp. 595–612, 2010.

[3] K. Kim and I. Yang, “Minimax Control of Ambiguous Linear Stochastic Systems Using the Wasserstein Metric,” in CDC, pp. 1777–1784, IEEE, 2020.

[4] B. P. G. Van Parys, D. Kuhn, P. Goulart, and M. Morari, “Distributionally Robust Control of Constrained Stochastic Systems,” IEEE TAC, vol. 61, no. 2, 2015.

[5] H. Rahimian and S. Mehrotra, “Distributionally Robust Optimization: A Review,” 2019,1908.05659. [6] A. Pichler and A. Shapiro, “Mathematical Foundations of Distributionally Robust Multistage

Optimiza-tion,” 2021,2101.02498.

[7] Z. Chen, M. Sim, and H. Xu, “Distributionally Robust Optimization with Infinitely Constrained Ambiguity Sets,” Oper. Res., vol. 67, no. 5, pp. 1328–1344, 2019.

[8] P. Sopasakis, M. Schuurmans, and P. Patrinos, “Risk-averse risk-constrained optimal control,” in ECC, pp. 375–380, IEEE, 2019.

[9] P. Goulart, E. Kerrigan, and J. Maciejowski, “Optimization over state feedback policies for robust control with constraints,” Automatica, vol. 42, no. 4, pp. 523–533, 2006.

[10] C. Mark and S. Liu, “Stochastic MPC with Distributionally Robust Chance Constraints,” 2020,

2005.00313.

[11] S. Lu, J. H. Lee, and F. You, “Soft-constrained model predictive control based on data-driven distribu-tionally robust optimization,” AIChE J., vol. 66, no. 10, pp. 1–18, 2020.

[12] M. Schuurmans and P. Patrinos, “Learning-based distributionally robust model predictive control of markovian switching systems with guaranteed stability and recursive feasibility,” in CDC, pp. 4287– 4292, IEEE, 2020.

[13] P. Sopasakis, D. Herceg, A. Bemporad, and P. Patrinos, “Risk-averse model predictive control,” Automatica, vol. 100, pp. 281–289, 2019.

[14] W. Wiesemann, D. Kuhn, and B. Rustem, “Robust Markov Decision Processes,” Math. Oper. Res., vol. 38, no. 1, pp. 153–183, 2013.

[15] A. Ruszczy´nski and A. Shapiro, “Optimization of convex risk functions,” Mathematics of Operations Research, vol. 31, no. 3, pp. 433–452, 2006.

[16] J. F. Bonnans and A. Shapiro, Perturbation Analysis of Optimization Problems. New York, NY: Springer New York, 2000.

[17] A. Shapiro, “On Duality Theory of Conic Linear Problems,” in Semi-Infinite Programming. Nonconvex Optimization and Its Applications, vol 57, pp. 135–165, Boston, MA: Springer, 2001.

[18] D. Kuhn, P. M. Esfahani, V. A. Nguyen, and S. Shafieezadeh-Abadeh, “Wasserstein Distributionally Robust Optimization,” in Oper. Res. & Mgmt. Sci. in the Age of Analytics, no. 2, pp. 130–166, 2019. [19] P. Coppens, M. Schuurmans, and P. Patrinos, “Data-driven distributionally robust LQR with multiplicative

noise,” in L4DC, vol. 120, (The Cloud), pp. 521–530, PMLR, 2020.

[20] J. A. Tropp, “User-Friendly Tail Bounds for Sums of Random Matrices,” FoCM, vol. 12, no. 4, pp. 389– 434, 2012.

[21] C. D. Aliprantis and K. C. Border, Infinite Dimensional Analysis. Berlin/Heidelberg: Springer-Verlag, 2006.

[22] W. Wiesemann, D. Kuhn, and M. Sim, “Distributionally robust convex optimization,” Oper. Res., vol. 62, no. 6, pp. 1358–1376, 2014.

[23] A. Ben-Tal, L. El Ghaoui, and A. Nemirovski, Robust optimization. Princeton University Press, 2009. [24] A. Shapiro, “Time consistency of dynamic risk measures,” Oper. Res. Lett., vol. 40, no. 6, pp. 436–439,

2012.

[25] A. Shapiro, D. Dentcheva, and A. Ruszczy´nski, Lectures on Stochastic Programming. Soc. Ind. Appl. Math., 2014.

(18)

[26] S. Zymler, D. Kuhn, and B. Rustem, “Distributionally robust joint chance constraints with second-order moment information,” Math. Prog., vol. 137, no. 1-2, pp. 167–198, 2013.

[27] A. Shapiro and A. Kleywegt, “Minimax analysis of stochastic problems,” Optim Methods Softw., vol. 17, no. 3, pp. 523–542, 2002.

[28] M. Lorenzen, F. Dabbene, R. Tempo, and F. Allgower, “Constraint-Tightening and Stability in Stochastic Model Predictive Control,” IEEE TAC, vol. 62, no. 7, pp. 3165–3177, 2017.

[29] J. A. Tropp, “An Introduction to Matrix Concentration Inequalities,” Found. Trends Mach. Learn., vol. 8, no. 1-2, pp. 1–230, 2015,1501.01571.

[30] K. B. Athreya and S. N. Lahiri, Measure Theory and Probability Theory. Springer Texts in Statistics, Springer New York, 2006.

[31] A. M. Bruckner, J. B. Bruckner, and B. S. Thomson, Real Analysis. Prentice-Hall, Inc., 2007. [32] H. van Waarde, K. Camlibel, and M. Mesbahi, “From noisy data to feedback controllers: non-conservative

design via a matrix S-lemma,” 2020,2006.00870.

APPENDIX

A. Proof ofLem. II.2

Let Xi:= wiw>i − IEµ?[w w >

] and its eigenvalue decomposition Xi = ViDiVi>. We

know that a.s., −r2_{I −C X}

i wiw>i r2I, implying kXik2≤ r2. Therefore the

eigenvalues are bounded as |di| ≤ r2.

As such, we can produce a matrix sub-Gaussian bound: IE[exp(θXi)] = IE[Viexp(θDi)Vi>]

(i) IEh(r2I+Xi) 2r2 exp(θr 2_{) +}(r2I−Xi) 2r2 exp(−θr 2₎i (ii)

= exp(θr2)+exp(−θr₂ 2)I = cosh(θr2)I exp(θ2r4/2)I, where(i)follows from convexity and ViVi>= I, and(ii)from IE[Xi] = 0.

From independence and [29, Lem. 3.5.1] we have, IE[Tr[exp(PM −1

i=0 Xi)]] ≤ d exp(M θ 2_r4_/2),

with d := nw+ 1. This allows us to apply [29, Prop. 3.2.1]:

IP[λmax(P M −1

i=0 Xi) ≥ β] ≤ inf

θ>0d exp(−θβ + M θ

2_r4_{/2) = d exp(−β}2_{/(2M r}4_)).

Similarly bounding λmin and applying a union bound gives

IP[kPM −1

i=0 Xik ≥ β] ≤ 2d exp(−β2/(2M r4)).

Setting equal to δ and solving for β gives the value inLem. II.2, showing IP[k ˆC −Ck2≤

β] ≥ 1 − δ, implying (6).

B. Continuity of iterated integration

We argue why h(w:t−2) in the proof ofThm. IV.2is a measurable, continuous random

variable. It is given by

h(w:t−2) =

Z

W

zτ(w:t−2, wt−1)dµ(wt−1),

for continuous zτ ∈ Zt. Measurability follows by Tonelli’s theorem [30, Thm. 5.2.2],

(19)

integrability). Meanwhile, continuity follows by an epsilon-delta argument [31, §9.3]. Specifically, Jensen’s inequality gives

for any w:t−2, w0:t−2∈ Wt−1. The second inequality follows since |zτ(w:t−2, wt−1) −

zτ(w0:t−2, wt−1)| is continuous in wt−1 by continuity of zτ. Therefore we can use the

reasoning ofEx. II.6and the fact that µt−1∈ P(W).

From continuity of zτ we have

∀ε ≥ 0, ∃δ ≥ 0 s.t. kw − w0k ≤ δ ⇒ |z(w) − z(w0)| ≤ .

So picking w = (w:t−2, wt−1) and w0= (w0:t−2, wt−1) and plugging into (31) gives

kw:t−2− w0:t−2k ≤ δ ⇒ |h(w:t−2) − h(w0:t−2)| ≤ ε. (32)

Hence, for any ε ≥ 0 we have a δ ≥ 0 s.t. (32) holds, implying continuity of h.

C. Affine disturbance feedback nototation

Let Nx:= (N + 1)nx, Nw:= N nw and Nu:= N nu. Following [9], we consider the

matrices, A ∈ IRNx×nx_{, B ∈ IR}Nx×Nu_{, E ∈ IR}Nx×Nw _{and F ∈ IR}Nu×Nw _{such that}

A :=    I_nx A A2 .. . AN   , B :=    0 0 ... 0 B 0 ... 0 AB B ... 0 .. . ... . .. ... AN −1_{B A}N −2_{B ... B}   , (33) E :=    0 0 ... 0 E 0 ... 0 AE E ... 0 .. . ... . .. ... AN −1E AN −2E ... E   , F :=   0 0 ... 0 F1,0 0 ... 0 .. . ... . .. ... FN −1,0... FN −1,N −2 0  , (34)

Q = diag(Q, . . . , Q, Qf) and R = diag(R, . . . , R).

D. Updating the ambiguity

In this section we examine how(A2)can be guaranteed in practice. More specifically, assume we have M offline samples used to constructA0according toLem. II.2. At time

step t we then will have gathered M + t samples, which can be used to construct ˆAt.

We then use the update rule

At+1=

( _ˆ

At+1, if ˆAt+1⊆ At

At, else.

(35)

Hence we only need to check Aˆt+1 ⊂ At. Letting ˆC+ (β+), ˆC (β) denote the center

(radius) of ˆAt+1 andAtrespectively. Then a sufficient condition for ˆAt+1⊆ Atis,

k ˆC+− Ck2≤ β+, ∀C 0, s.t., k ˆC − Ck2≤ β.

This condition is only sufficient since the bounded support would imply Tr([C]:nw,:nw) ≤

(20)

IE[ww>

]). The condition that µ is a probability measure meanwhile implies [C]nw+1,nw+1=

1, where [C]nw+1,nw+1 denotes the bottom-right element of C. Based on numerical

experiments however, dropping these constraints does not result in significant conserva-tiveness.

Condition (35) is not trivial to check however. It requires maximizing the convex function k ˆC+− Ck2 over a convex set, which is NP hard in general. We however have

some structure we can exploit. Note how

k ˆC+− Ck2≤ β+ ⇔ C I > −I _Cˆ + ˆ C+ β+I − ˆC+2 C I 0.

This, should hold for all C 0 such that C I > −I _Cˆ ˆ C βI − ˆC2 C I 0.

A sufficient condition for which is that,

v>−I Cˆ+

ˆ

C+ β+I − ˆC+2

v ≥ 0,

for all v ∈ IR2(nw+1)_{, such that}

v>−I Cˆ ˆ C βI − ˆC2 v ≥ 0 and v> I I v ≥ 0.

In fact this condition is sufficient and necessary for (35) (cf. [32]). Applying the Approx-imate S-Lemma [23, Thm. B.3.1.] (which is exact in this case) allows us to formulate this final condition as a LMI.