Using stochastic decomposition to solve generalized alpha-approximations for two-stage mixed-integer recourse models

(1)

Using stochastic decomposition to solve generalized

alpha-approximations for two-stage mixed-integer recourse

models

Boris Verwoerd

(2)

(3)

Using stochastic decomposition to solve generalized

alpha-approximations for two-stage mixed-integer recourse

models

Boris Verwoerd

January 2020

Abstract

Real-life decision problems often involve uncertainty. These problems can be modelled as stochastic linear programs (SLP). When we restrict some decision variables to be integer-valued, we can formulate these problems as mixed-integer recourse models. Solution methods for two-stage mixed-integer recourse models exist, but they generally have difficulties solving large problem in-stances in reasonable time. Finding efficient methods to solve these models in reasonable time is therefore worthwhile. Instead of solving the original mixed-integer recourse model, we solve a convex approximation of this model which replaces the expected second-stage cost function by the so-called generalized alpha-approximation. This paper contributes by applying the iterative sampling concept from stochastic decomposition to generalized alpha-approximations. Addition-ally, we propose the stochastic loose Benders decomposition algorithm, called SLBDA(α), which solves generalized alpha-approximations using iterative sampling. Solutions from SLBDA(α) are compared to solutions from the loose Benders decomposition algorithm, LBDA(α), and solutions from solving the mixed-integer recourse model in its deterministic equivalent formulation, DEF. In most instances, the solutions from solving SLBDA(α) perform well compared to solutions from LBDA(α) and DEF. Especially in large problem instances, when the variability in the random parameters is large, solutions from SLBDA(α) perform well in terms of solution quality, while outperforming existing methods in terms of computation time.

1 Introduction

Many real-life decision problems involve uncertainty. Parisio and Jones (2015) investigate optimal staffing of employees in retail outlets where consumer demand is uncertain. They outline three pos-sible sources of uncertainty: total demand fluctuations, peak demand shifting and employee absence. Moreover, L¨ohndorf and Shapiro (2019) investigate how to minimize operational costs while satisfying generation capacity limits. The amount of power generated by hydropower is uncertain and can cause periods where the demand for electricity cannot be satisfied by hydropower alone.

Besides uncertainty, we often want to model decision problems where we restrict some decision variables to be integer-valued. These type of decisions arise naturally in real-life decision problems. For example, the number of televisions a manufacturer decides to produce will be an integer amount. Decision problems under uncertainty with integrality constraints can be modelled as two-stage mixed-integer recourse models. Such models are defined as

η := min x {cx + Eω[v(ω, x)] : Ax = b, x ∈ X ⊂ Z p1 + × R n1−p1 + }, (1)

where the second stage value function v is defined as v(ω, x) := min y {qy : W y = ω − T x, y ∈ Y ⊂ Z p2 + × R n2−p2 + }. (2)

(4)

stage variables. Similarly, n2 denotes the number of real-valued second-stage variables and p2denotes

the number of integer-valued second-stage variables. The recourse model is a two-stage model, where the objective value for the first stage is given by cx and the expected second-stage costs are given by Q(x) := Eω[v(ω − T x)]. In the first stage a decision on the values of x has to be made, satisfying

deter-ministic constraints Ax = b and random goal constraints T x = ω. After the first stage, the realization of ω is known. If the constraints T x = ω were not satisfied, a recourse action can be performed. A recourse action corresponds to making a decision on y such that the constraints W y + T x = ω are satisfied. The recourse model reduces to the continuous stochastic linear program (SLP) model for p1= p2= 0.

While several methods to solve mixed-integer recourse models exist, they generally have difficulties solving large problem instances in reasonable time. If ω has finite support, the SLP model can be written in its deterministic equivalent form and be solved as a mixed-integer linear program (MILP). While this is theoretically the case, in practice this deterministic equivalent form is typically too large to be solved in reasonable time. In cases where the support of ω is large or infinite, one can use the sample average approximation (SAA) method, which constructs an empirical distribution which replaces the original distribution of ω to create a MILP. The number of samples for the empirical distribution is a trade-off between solvability and solution quality. When fewer samples are used, the MILP can be solved in less time. Using more samples results in an empirical distribution which is generally a better approximation of the distribution of ω.

Due to the integrality restriction on the decisions in the mixed-integer recourse model, the expected second-stage cost function Q is usually non-convex (Rinnooy Kan and Stougie (1988)). An implication of the non-convexity of Q is that standard solving methods from convex optimization cannot be used. A notable convex optimization method is the L-shaped algorithm by Van Slyke and Wets (1969), which is designed to solve SLP models. Solving MIR models is usually done by combining methods from continuous stochastic programming and deterministic mixed-integer programming. Notable examples are Laporte and Louveaux (1993), Sen and Higle (2005), Carøe and Schultz (1999), Carøe and Tind (1998), Schultz et al. (1998) and Sen and Sherali (2006). These methods can only solve mixed-integer recourse models for relatively small instances or small number of scenarios. Hence, solving large prob-lem instances in reasonable time remains difficult.

A new method for solving mixed-integer recourse problems is to solve generalized alpha-approximations, as introduced by van der Laan and Romeijnders (2018). The generalized alpha approximation is a convex approximation to the expected second-stage cost function Q. This has the advantage that a model which uses the generalized alpha-approximation can be estimated by convex optimization methods, which are typically fast. To guarantee the quality of the solution of the generalized alpha-approximation, van der Laan and Romeijnders (2018) prove an error bound of the total variation of the probability density function of the compounds of the random vector ω. They propose an algorithm called the Loose Benders Decomposition Algorithm (LBDA). The algorithm and the error bound will be discussed in Section 2.1.

In LBDA, models including generalized alpha-approximations are solved by using a Benders de-composition (Benders (1962)) on an SAA of the generalized alpha-approximation. The sample average approximation requires a decision on the number of samples that are drawn to construct an empirical distribution. The number of samples is a trade-off between solvability and solution quality. More samples will generally result in a better approximation of the distribution of ω, but it also increases the computation time of the algorithm. Therefore, a reasonable improvement to make on LBDA(α) is the addition of iterative sampling. A method that uses iterative sampling to solve two-stage recourse models is stochastic decomposition as introduced by Higle and Sen (1991). In Higle and Sen (1991), a new sample is drawn in each iteration instead of using S drawn samples at the beginning of the algorithm.

(5)

involve estimating the objective function at a so-called incumbent solution, for which the estimate of the objective function is considered sufficiently low. We propose a new termination criterion which is based on the estimate of the objective function at the incumbent solution. The main advantage of using SLBDA(α) compared to the algorithm LBDA(α) from van der Laan and Romeijnders (2018), or solving DEF, is that SLBDA(α) is significantly faster in most instances. Additionally, we find that in large problem instances, when the random parameter is distributed with a medium to large variance, solutions from SLBDA(α) perform well in terms of solution quality while outperforming other methods in terms of computation time.

The remainder of this paper is organized as follows. In Section 2 the generalized alpha-approximations and the LBDA(α) algorithm are explained in mathematical detail. The proposed SLBDA(α) algorithm is defined and explained in Section 3 and numerical results from tuning and assessing the performance of the algorithm are given in Section 4. The paper will be concluded in Section 5.

Several assumptions are necessary for a well-behaved mixed-integer recourse model. Define the first-stage feasible region by X = {x ∈ X : Ax = b}. Assumptions (A1)-(A3) are required to guarantee that the expected second-stage cost function Q is finite for all x ∈ X.

A1 Relative complete recourse. That is for all ω ∈ Rm and x ∈ X, there exists a y ∈ Y such that W y = ω − T x, so that v(ω, x) < ∞. This ensures that there is always a possible recourse action y given any choice of x.

A2 The recourse is sufficiently expensive: v(ω, x) > −∞ for all ω ∈ Rm _{and x ∈ X .}

A3 E[|ωi|] < ∞ for all i = 1, . . . , m.

A4 The first-stage feasible region X is bounded.

A5 The recourse matrix W is integer, that is, every element of W is integer valued.

2 The generalized alpha-approximation

The mathematical definition of the generalized alpha-approximation is necessary to formulate SLBDA(α) and its derivation is therefore given in this section. Additionally, an error bound on the generalized alpha-approximation, which justifies the practical usefulness of this approximation, is given at the end of the section.

The generalized alpha-approximation is a convex approximation ˜Qα := Eω[˜vα(ω, x)] of Q, where

we relax the integrality constraints on the first-stage decision variable x. This implies that p1 = 0.

Hence, instead of solving the model in (1) we solve ˆ η := min x n cx + ˜Qα(x) : Ax = b, x ∈ Rn+1 o . (3)

The generalized alpha-approximation is the expectation of the convex approximation ˜vα of v, which

we will derive now. Consider the second-stage value function in (2) v(ω, x) = min y {qy : W y = ω − T x, y ∈ Z p2 + × R n2−p2 + }. (4)

By assumption A2 we assume that the recourse is sufficiently expensive, which implies that LP-duality holds. Then, by LP-duality, the dual representation of the LP-relaxation is a polyhedral function with finite extreme points and is given by

vLP(ω, x) = max k=1,...,Kλ

k_{(ω − T x),} ₍₅₎

where λk _{for k = 1, . . . , K are the extreme points of the dual feasible region.}

Let B denote a dual feasible basis matrix of vLP. The columns of B are a subset of the columns

of W and we let N denote the matrix consisting of the columns in W which are not in B. Let yB be

(6)

that corresponds to the columns of N such that W y = B · yB+ N · yN. Denote qB and qN as the cost

vectors corresponding to yB and yN, respectively. Then, by invertibility of the basis matrix B, we get

that

W y = ω − T x, (6)

ByB+ N yN = ω − T x, (7)

yB= B−1(ω − T x − N yN), (8)

Replacing yB with (8) in the second-stage cost function results in

v(ω, x) = min{qByB+ qNyN : W y = ω − T x, yB∈ Z pB + × R nB + , yN ∈ Z pN + × R nN + } (9) = qBB−1(ω − T x) (10) + min{¯qNyN : B−1(ω − T x − N yN) ∈ Zp+B× R nB + , yN ∈ Zp+N × R nN + }, (11)

where ¯qN = qN − qBB−1N ≥ 0 denote reduced costs. The Gomory relaxation vB is obtained by

relaxing the non-negativity constraints on the basic variables yB. The Gomory relaxation is then

given by vB(ω, x) = qBB−1(ω − T x) + ψB(ω − T x), (12) where ψB(s) = min{¯qNyN : B−1(s − N yN) ∈ ZpB× RnN, yN ∈ Z pN + × R nN + }, s ∈ R m_. ₍₁₃₎

Romeijnders et al. (2016a) show that under two conditions on ω and x, it holds that v(ω, x) = vB(ω, x).

These conditions state that ω − T x ∈ {t : B−1_{t ≥ 0} and that the distance between ω − T x and the set}

{t : B−1t = 0} is sufficiently large. Additionally, Romeijnders et al. (2016a) show that ψB is B-periodic

and that under B-periodicity of ψB and the aforementioned conditions on ω and x, the second-stage

cost function v is equal to the sum of the LP relaxation and ψB(ω − T x). Since ψ is the difference

between the second-stage cost function v and the LP-relaxation, it can be interpreted as the cost of integrality.

The convex approximation ˜vαof v is given by replacing ψ(ω − T x) by ψ(ω − α) for α ∈ Rm, which

results in ˜

vα(ω, x) = qBB−1(ω − T x) + ψ(ω − α). (14)

The generalized alpha-approximation ˜Qα of Q is defined by taking the expection of the pointwise

maximum over all dual feasible basis matrices Bk

, k = 1, . . . , K. For ω ∈ Rm

and x ∈ Rn1

+, the

generalized alpha-approximation is given by ˜ Qα(x) := Eω[ˆvα(ω, x)] , (15) = Eω max k=1,...,K˜v k α(ω, x) , (16) = Eω max k=1,...,KqBk(B k )−1(ω − T x) + ψk(ω − α) , (17)

An error bound on the difference between the expected second-stage cost function from the MIR model in (1) defined as Q(x) := Eω min y {qy : W y = ω − T x, y ∈ Y ⊂ Z p2 + × R n2−p2 + } , (18)

and the generalized alpha-approximation ˜Qαin (17) is given by van der Laan and Romeijnders (2018).

(7)

Definition 1

For an interval I ⊂ R, let P(I) denote the set of all finite ordered sets W = {x1, . . . , xN} with

x1< · · · < xN in I. Let f : R → R be a real-valued function. The total variation of f on I is defined

as |∆|f (I) = sup W ∈P(I) N −1 X i=1 |f (xi+1) − f (xi)|, (19) where |∆|f is defined as |∆|f (R). The error bound is defined as

sup x |Q(x) − ˜Qα(x)| ≤ C m X i=1 Eω−i[|∆|fi(·|ω−i)] , (20)

where C > 0 and |∆|fi(·|ω−i) is the total variation on the joint probability density function of ω.

The error bound in (20) is lower if the sum of the total variations of each element of ω is higher. For unimodal distributions this implies that the error bound is decreasing in the standard deviation. Additionally, the error bound is independent of α. Example 1 illustrates the error bound in a one-dimensional example.

Example 1

Consider the one-dimensional simple integer function Q(x) = Eωdω − T xe+, x ∈ R. From Romeijnders

et al. (2016b) we know that the error bound in (20) is given by sup x |Q(x) − ˜Qα(x)| ≤ h(|∆|f ), (21) where h : R+→ R is defined as h(x) := (_x 8 if x ≤ 4, 1 − 2 x if x ≥ 4. (22)

Now consider ω ∼ N (µ, σ2_{). Since the normal distribution is unimodal with mode µ, |∆|f = 2f (µ).}

Then for f (µ) = √1

2πσ2, the error bound for the simple integer recourse is given by

sup x |Q(x) − ˜Qα(x)| ≤    1 8 q 2 πσ2 if σ ≥ q 1 8π, 1 −√2πσ2 _{if σ ≤}q1 8π. (23)

From (23) it follows that the error bound is indeed decreasing in the standard deviation σ. Addi-tionally, the error bound is independent of α.

2.1 Loose Benders Decomposition Algorithm

Consider the mixed-integer recourse model where we replace the expected second-stage cost function Q by the generalized alpha-approximation ˜Qα

min

x {cx + ˜Qα(x) : Ax = b, x ∈ R n1

+}. (24)

In LBDA(α), (24) is solved using a Benders decomposition (Benders (1962)) on an SAA of the gener-alized alpha-approximation ˜Qα. We use an SAA of the generalized alpha-approximation since Benders

decomposition requires that ω follows a discrete distribution. We draw a sample ω1_{, . . . , ω}S _{of size S}

from the distribution of the random vector ω. We now define a new random vector ˜ω by P (˜ω = ωs) = 1

(8)

The sample average approximation ˜QSα of the generalized alpha-approximation of (17) is defined as ˜ QS_α(x) := 1 S S X s=1 max k=1,...,K{λ k_(ωs_{− T x) + ψ}k_(ωs_{− α)}.} ₍₂₆₎

The first step to solve the model is to decompose it into a master problem and S subproblems. The master problem is given by

min x {cx + ˜Q S α(x) : Ax = b, x ∈ R n1 +}, (27)

and the subproblems are defined as min y {qy : W y = ω i_{− T x,} y ∈ Zp2 + × R n2−p2 + }, i = 1, . . . , S. (28)

Since the sample average approximation ˜QSαis a convex polyhedral function, it can be represented as

the maximum of finitely many affine functions. Hence, instead of solving (27), we can use the intuition of the L-shaped algorithm and solve

min x {cx + θ : θ ≥ ˜Q S α(x), Ax = b, x ∈ R n1 +, θ ∈ R}. (29) Since ˜QS

α is a piecewise linear function, we can the constraint θ ≥ ˜QSα as

θ ≥ βrx + δr, βr∈ Rn1, δr∈ R, r = 1, . . . , R, (30)

where R is the total number of linear functions that describe ˜QS

α. According to Benders decomposition,

each iteration a so-called optimality cut is added which sequentially improves the approximation to ˜

QS

α. In iteration i, the master problem is given by

min

x {cx + θ : θ ≥ βrx + δr, Ax = b, r = 1, . . . , i, x ∈ R n1

+, θ ∈ R}. (31)

The new optimality cut in iteration i is based on the subgradient of ˜QS_α at xi and is given by

θ ≥ ˜QSα(x) ≥ ˜QSα(xi) + ai(x − xi), (32)

where ai_{∈ δ ˜}_QS

α(xi) is a subgradient of ˜QSα at xiand is given by

ai= −1 S S X s=1 λks_{T x}i_, ₍₃₃₎ where ks∈ arg max k=1,...,K {λk(ωs− T xi) + ψk(ωs− α)}, s = 1, . . . , S. (34) Finding the optimal basis matrix index in (34) requires solving ψk for all k = 1, . . . , K, which is cumbersome since the number of dual vertices K grows exponentially in the input size of the second-stage. Therefore, van der Laan and Romeijnders (2018) construct so-called loose optimality cuts which we will derive below. These loose optimality cuts are derived by choosing a dual vertex that is optimal with a high probability and can be identified faster than the optimal basis matrix index in (34). For any ˜ks∈ {1, . . . , K}, s = 1, . . . , S, it follows that

(9)

with equality if ˜ks= ks for all s = 1, . . . , S. Since for all ˜ks, s = 1, . . . , S, except for ˜ks= ks, there

is an inequality in (35), cuts generated by choosing the sub-optimal indices ˜ksdefine loose optimality

cuts. Loose ptimality cuts generated in this way are nearly tight to the optiamlity cut constructed by the optimal basis matrix indices, as proven by van der Laan and Romeijnders (2018).

Denote (xi_{, θ}i_{) as the optimal solution in the master problem in iteration i. To find the sub-optimal}

basis matrix index, consider the inner maximization problem max

k=1,...,K{λ k_(ωs

− T xi) + ψk(ωs− α)}. (36)

By setting ψk_(ωs_{− α) = 0, the sub-optimal basis matrix index ˜}_k

s is defined by

˜

ks= arg max k=1,...,K

{λk(ωs− T xi)}. (37)

The maximization problem in (37) is similar to the LP-relaxation in (5). Hence, the choice of ˜ks

coincides with the optimal basis matrix index in the LP-relaxation vLP(ωs, xi) = min

y {qy : W y = ω s

− T xi, _{y ∈ R}n2

+}, (38)

of v. The loose optimality cut generated in iteration i is given by

θ ≥ βi+1x + δi+1 = 1 S S X s=1 λk˜s_(ωs_{− T x) + ψ}˜ks_(ωs_{− α),} _{x ∈ R}n1_. ₍₃₉₎

The algorithm terminates if the termination criterion

θl≥ βi+1xl+ δi+1− ε, (40)

is satisfied. Pseudo-code for LBDA(α) is given in Algorithm 1. Algorithm 1: Loose Benders Decomposition Algorithm (LBDA(α))

Input: A, b, c, T , q, W , distribution of ω, α, lower bound L on ˜QS

α. Shift parameter α.

Tolerance ε. Sample size S. Output: Near-optimal solution ˆx

1 Initialization 2 Initialize τ = 0.

3 Obtain a sample ω1, . . . , ωS of size S from the distribution of ω. 4 Iteration

5 Solve min

x {cx + θ : Ax = b, θ ≥ βrx + δr, r = 1, . . . , τ, x ∈ R n1

+, θ ≥ L}, 6 denote the optimal solution by xτ, θτ.

(10)

3 Stochastic Loose Benders Decomposition Algorithm

For an SAA an a priori value for S has to be chosen. The concepts in stochastic decomposition by Higle and Sen (1991), is to draw a new sample at every iteration of the algorithm. In Higle and Sen (1991), stochastic decomposition is applied to SLP models. The contribution of this paper is the application of iterative sampling to generalized alpha-approximations and is given in this section.

3.1 Iterative sampling

At the beginning of each iteration of the algorithm, a new sample from the distribution of ω is drawn. In iteration l, we construct a random vector ˜ω with probability density function

P (˜ω = ωi) = 1

l, for all i = 1, . . . , l. (41)

Let the iterative sampling approximation of ˜Qαin (17) in iteration l be given by

¯ Ql_α(x) := 1 l l X s=1 max k=1,...,K{λ k_(ωs_{− T x) + ψ}k_(ωs_{− α)}.} ₍₄₂₎

Now we apply Benders decomposition on the iterative sampling approximation ¯Ql

α, which decomposes

to a different set of master problem and subproblems in each iteration. The master problem in iteration l is defined as min x {cx + ¯Q l α(x) : Ax = b, x ∈ R n1 +}, (43)

and the l subproblems are defined as min y {qy : W y = ω i_{− T x,} y ∈ Zp2 + × R n2−p2 + }, i = 1, . . . , l. (44)

By LP-duality, which is guaranteed by assumption A2, the dual formalization of (44) is given by max

k=1,...,Kλ k

(ωi− T x) + ψk(ωi− α) , i = 1, . . . , l, (45)

where λk _{for k = 1, . . . , K are the extreme points of the dual feasible region. Instead of solving (45)}

for all ωi_{, i = 1, . . . , l, we use inspiration from LBDA(α) and stochastic decomposition and solve (45)}

only for the sample that was drawn last. Hence, we solve max

k=1,...,K{λ

k_(ωl_{− T x)},} ₍₄₆₎

to obtain the optimal basis matrix index ˜kl as in (37). Denote Vl as the set of dual vertices that are

obtained from solving (46) in iterations 1, . . . , l. In iteration l, we update the set of obtained dual vertices by Vl= Vl−1∪ {λ

˜

kl_{}. Instead of solving the subproblem for the remaining l − 1 samples as in}

(46), we solve max

λ {λ(ω

i_{− T x) :} _{λ ∈ V}

l}, i = 1, . . . , l − 1. (47)

Note that the optimal dual vertex from (46) is an element of Vl. Therefore, the optimality cuts will

be constructed according to the basis matrix index in ¯

ki= arg max k

{λk_(ωi_{− T x) :} _λk_{∈ V}

l}, i = 1, . . . , l. (48)

(11)

implies that the basis matrix that is chosen will not always be optimal in (46). This does not have to be a problem since optimality cuts we will derive below are dependent on a small set of samples which will generally not be good approximations to the generalized alpha-approximation. During the algorithm, when we draw more samples, the number of dual vertices in Vl will increase and the basis

matrix index in (48) will be closer to the iterative sampling approximation. Since ˜Ql

α is a polyhedral

function, it can be approximated by finitely many affine functions. Therefore, in the spirit of the L-shaped algorithm, we rewrite (43) to

min x {cx + θ : θ ≥ ¯Q l α(x) : Ax = b, x ∈ R n1 +}, (49)

where ¯Qlα is approximated by finitely many linear constraints such that the constraint θ ≥ ¯Qlα(x) in

iteration l can be represented by

θ ≥ βix + δi, βi∈ Rn1, δi ∈ R, i = 1, . . . , l, (50)

By the same logic for LBDA(α) in Section 2.1, the optimality cut constructed in iteration l is defined as θ ≥ βlx + δl= 1 l l X i=1 λ¯ki_(ωi_{− T x) + ψ}¯ki_(ωi_{− α),} _{x ∈ R}n1_. ₍₅₁₎

As the algorithm progresses, we obtain more samples, which changes the iterative sampling approxi-mation in each iteration. Hence, we need to guarantee that the cuts constructed in previous iterations are still valid lower bounds in following iterations. Denote ki as the optimal basis matrix index in (45)

for ωi_{. An optimality cut created in iteration l by (51) is a valid lower bound in iteration l since}

1 l l X i=1 λk¯i_(ωi_{− T x) + ψ}¯ki_(ωi_{− α) ≤} 1 l l X i=1 λki_(ωi_{− T x) + ψ}ki_(ωi_{− α).} ₍₅₂₎

The optimality cut created in iteration l should be a valid lower bound in all iterations l + m, m ∈ Z+.

In iteration l + m it holds that 1 l + m l X s=1 λ¯ki_(ωs_{− T x) + ψ}¯ki_(ωs_{− α) ≤} 1 l + m l X s=1 λki_(ωs_{− T x) + ψ}ki_(ωs_{− α),} ₍₅₃₎ ≤ 1 l + m l+m X s=1 λki_(ωs_{− T x) + ψ(ω}s_{− α).} ₍₅₄₎

Hence, we require that the averaging fraction 1_l in iteration l becomes _l+m1 in iteration l + m. We can achieve this by multiplying the elements of the optimality cuts by i−1_i in each following iteration i = {l + 1, . . . , l + m}, m ∈ Z+, as illustrated in (55). 1 l · l l + 1· · · l + m − 2 l + m − 1· l + m − 1 l + m = 1 l + m, m ∈ Z+. (55)

In iteration l, let the optimality cuts constructed in iteration i, i = 1, . . . , l, be defined by

θ ≥ β_lix + δ_li, i = 1, . . . , l. (56)

(12)

3.2 Incumbent solution

Higle and Sen (1991) propose the idea of a so-called incumbent solution, for which the estimate of the objective value is considered sufficiently low. They show that the sequence of incumbent solutions has properties that allow for construction of termination criteria. Denote the estimate of the objective value in iteration l by fl(x) := max i=1,...,l c + β_li x + δi l , x ∈ R n1 +. (58)

The master problem in (49) in iteration l is now equal to the problem ˜

ηl:= min

x fl(x) : Ax = b, x ∈ R n1

+ . (59)

At the end of each iteration l we solve ˜ηl to obtain the solution xl. Note that xl is optimal for fl.

Let the incumbent solution in iteration l be denoted by ¯xl. In the first iteration, we set ¯x0 = x0. In iteration l, xl−1is accepted as the new incumbent solution ¯xl−1 if

fl(xl−1) − fl(¯xl−1) < rfl−1(xl−1) − fl−1(¯xl−1) , (60)

holds for r ∈ (0, 1), which we call the incumbent update value. We need to evaluate the approximation of the objective value in iteration l, fl, at the solutions xl−1 and ¯xl−1. This is done after constructing

the optimality cut. Since fl−1(xl−1) − fl−1(¯xl−1) ≤ 0 and r ∈ (0, 1), the right-hand side is less than

or equal to 0. Rearranging the terms in (60) results in

fl(xl−1) < fl(¯xl−1) + rfl−1(xl−1) − fl−1(¯xl−1) . (61)

Since rfl−1(xl−1) − fl−1(¯xl−1) ≤ 0, the inequality in (61) holds if the estimate of the objective value

at the solution xl−1 _{is sufficiently lower than the estimate of the objective value at the incumbent}

solution ¯xl−1_{. Hence, if (60) holds, the incumbent solution in iteration l is updated according to}

¯

xl_{= x}l−1_{. Otherwise the incumbent solution remains the same, given by ¯}_xl_{= ¯}_xl−1_{. Let i}

l denote the

iteration in which the incumbent solution ¯xl _{was accepted as the new incumbent solution. If a new}

incumbent solution is accepted in iteration l, il= l, otherwise il= il−1.

From Higle and Sen (1991) we are required to create an additional optimality cut, generated by the subgradient of the iterative sampling approximation at the incumbent solution. This new optimality cut in iteration l replaces the optimality cut that was added to the master in the iteration in which the incumbent solution was first found, il. Similarly for the solution in iteration l, xl, we require to

solve the LP-relaxation for last drawn sample ωl_{and the incumbent solution ¯}_xl−1_{, given by}

max

k=1,...,K{λ k_(ωl

− T ¯xl−1)}. (62)

Since the maximization problem in (62) is similar to (37), we denote the optimal basis matrix index by ˜ks and add the optimal dual vertex to the set of obtained dual vertices Vl by Vl = Vl−1∪ {λ

˜ ks_}.

Now for ω1_{, . . . , ω}l−1 _{we find the sub-optimal basis matrix indices by}

¯

k∗_i = arg max

k

{λk_(ωi_{− T ¯}_xl−1_{) :} _λk_{∈ V}

l}, i = 1, . . . , l. (63)

The optimality cut generated by the incumbent solution in iteration l, which replaces the cut added in iteration il−1, is given by β_il l−1x + δ l il−1= 1 l l X i=1 λ¯ki∗_(ωi_{− T x) + ψ}k¯∗i_(ωi_{− α),} _{x ∈ R}n1_. ₍₆₄₎

Since this cut is a valid lower bound in iteration l, we do not need to update the cut indexed by il−1.

(13)

3.3 Termination criteria

The main issue in stochastic decomposition is the lack of an algorithmic termination criterion (Higle and Sen (1991)). Higle and Sen (1991) propose three different termination criteria involving the estimate of the objective value at the incumbent solution. Although these termination criteria have some theoretical convergence properties, they are difficult to interpret. Therefore, we propose a termination criterion which keeps track of whether the estimate of the objective value evaluated at the incumbent solutions did not change significantly for several iterations. Additionally, for an incumbent solution to be an optimal solution, the incumbent solution should not have been updated for a sufficient number of iterations. Furthermore, the number of dual vertices that is obtained from solving the LP-relaxation should not have changed for a number of iteration.

Define f_maxl,n := max i=l−n,...,l{fi(¯x l_)}, ₍₆₆₎ f_minl,n := min i=l−n,...,l{fi(¯x l_)}, ₍₆₇₎

where fmaxl,n is the maximum value of the estimation of the objective value at the incumbent solutions

in the last n iterations. Similarly, fmaxl,n is the minimum value of the estimation of the objective value

at the incumbent solutions in the last n iterations. We now consider the estimate of the objective value at the incumbent solution to have changed significantly if

fl,n max− f l,n min fmaxl,n < ε, (68)

for small ε > 0 and sufficiently large n > 0. Furthermore, we require that the incumbent solution did not change in the previous n iterations, denoted by

il≤ l − n. (69)

Additionally, we require that the set of obtained dual vertices did not change in the previous n itera-tions, denoted by

Vl= Vl−n, (70)

where Vl= Vl−n implies that for all dual vertices λ ∈ Vl∪ Vl−n, λ ∈ Vl ⇐⇒ λ ∈ Vl−n. The values for

n and ε are tuned in Section 4.1.

3.4 SLBDA

The master problem is solved at the end of each iteration and we therefore need a value for x0 and the incumbent solution ¯x0_{. Therefore, in the initialization of the algorithm, we solve the model}

min x {cx + Q(x) : Ax = b, x ∈ R n1 +}, (71) where Q(x) := min y qy : W y = Eω[ω] − T x, y ∈ R n2 + . (72)

(14)

those cuts. Pseudo-code for SLBDA(α) is given in Algorithm 2.

Algorithm 2: Stochastic Loose Benders Decomposition Algorithm (SLBDA(α)) Input: A, b, c, T , q, W , distribution of ω, α, n, ε

Output: Optimal solution ¯x

1 Initialization 2 V0= ∅, l = 0, i0= 0, r ∈ (0, 1), ω0= E[ω], x0∈ argmin{cx + Q(x) : Ax = b, x ∈ Rn+1} 3 where Q(x) := E_ω min y {qy : W y = ω 0_{− T x,} y ∈ Rn2 +} , ¯x0_{= x}0_. 4 Iteration

5 Set l = l + 1. Generate ωlaccording to the distribution of ω and create P_l= ∅. 6 Solve min

y {qy : W y = ω

l_{− T x}l−1

, y ∈ Rn2

+}, denote the optimal basis matrix index by ¯kl. 7 Solve min

y {qy : W y = ω

l_{− T ¯}_xl−1

, y ∈ Rn2

+}, denote the optimal basis matrix index by ¯k∗l. 8 Update the set of obtained dual vertices according to Vl= Vl−1∪ {λ

¯ kl_{, λ}¯k∗l}

9 for t = 1, . . . , l do 10 Find ¯kt∈ arg max

k

{λk_(ωt_{− T x}l−1_)|λk_{∈ V}

l} and ¯k∗t ∈ arg max k

{λk_(ωt_{− T ¯}_xl−1_)|λk _{∈ V} l}. 11 If λk¯t _{∈ P}

t,

12 then the corresponding ψk¯t_(ωt_{− α) is known.}

13 Otherwise evaluate ψ¯kt_(ωt_{− α) and update P}

t= Pt∪ {λ ¯ kt_}.

14 Construct a new cut according to 15 β_llx + δ_ll=1 l l X t=1 λk¯t_(ωt_{− T x) + ψ}k¯t_(ωt_{− α),} 16 If ¯k_t∗∈ P_t, 17 then ψkt_(ωt_{− α) is known.}

18 Otherwise evaluate ψkt_(ωt_{− α) and update P}

t= Pt∪ {¯λ ¯ k∗_t_}. 19 Change the cut indexed by il−1 to

20 β_il l−1x + δ l il−1= 1 l l X t=1 λ¯k∗t_(ωt− T x) + ψk¯ ∗ t_(ωt− α), 21 If t /∈ {l, il−1},

22 then decay the cut indexed by t according to 23 δ_tl=l−1_l δ_tl−1, β_tl=l−1_l β_tl−1.

24 end for

25 If fl(xl−1) − fl(¯xl−1) < r fl−1(xl−1) − fl−1(¯xl−1) 26 then ¯xl= xl−1, il= l.

27 Otherwise ¯xl= ¯xl−1, i_l= i_l−1. 28 Solve the master problem ˜ηlto obtain xl. 29 Termination

30 If

f_maxl,n − f_minl,n/ f_maxl,n < ε and il≤ k − n and Vl= Vl−n 31 then terminate.

32 Otherwise repeat from step 5

4 Experiments

(15)

(MRP) by Mak et al. (1999). For a candidate solution x∗, we obtain an upper bound to the optimality gap

cx∗+ Q(x∗) − η∗, (73)

where η∗ denotes the optimal value of the true MIR model in (1). The upper bound of the optimality gap is a one-sided confidence interval that holds with (1 − γ) × 100% certainty. We choose γ = 0.05 such that the upper bound of the optimality gap holds with 95% certainty. The idea is to take a sample ω1_{, . . . , ω}S _{of size S from the distribution of ω. We then solve the model}

min x n cx + ˆQS(x) : Ax = b o , (74) where ˆ QS(x) := 1 S S X s=1 v(ωs, x) = 1 S S X s=1 min y qy : W y + T x = ω s_, y ∈ Zp2 + × R n2−p2 + , (75)

as a MILP and denote the solution as x0. Then, calculate the gap GS(x∗) by

GS(x∗) = 1 S S X s=1 ((cx∗+ v(ωs, x∗)) − (cx0+ v(ωs, x0))) , (76)

where (cx∗_{+ v(ω}s_{, x}∗_{)) − (cx}0_{+ v(ω}s_{, x}0_{)) ≥ 0. This process is replicated N times with independent}

draws from the distribution of ω to obtain gaps G1 S(x

∗_{), . . . , G}N S(x

∗_{). The gap mean and gap sample}

variance are calculated by

¯ GS(x∗) = 1 N N X k=1 Gk_S(x∗), (77) s2_G(x∗) = 1 N − 1 N X k=1 Gk_S(x∗) − ¯GS(x∗) 2 . (78) Let εG := tN −1,1−γ q s2 G(x∗)

S and define the one-sided 95-level confidence interval by

0, ¯GS(x∗) + εG . (79)

Reported upper bounds to the optimality gaps are calculated by N = 30 replications and sample size S = 100. Whenever we mention optimality gaps in sections 4.1 and 4.2, we refer to the upper bound to the optimality gap in (79). In Section 4.2, we report costs from several candidate solutions for large instances of a mixed-integer recourse model. For a candidate solution x∗, the expected costs cx∗+ Q(x∗) are estimated by an out-of-sample estimation with a sample size of 105_{, where we have}

used the same sample to estimate the costs for each candidate solution.

(16)

Table 1: Notation of the distributions of ω. Notation Distribution f1 ω N (10, 0.01) f2 ω N (10, 0.25) f_ω3 N (10, 1) f_ω4 N (10, 9) fω5 N (10, 100) f6 ω exp(10) f7 ω exp(3) f8 ω exp(1) f9 ω exp(0.3) f_ω10 exp(0.1) f11 ω U (9.8, 10.2) f12 ω U (9.4, 10.6) f13 ω U (8.2, 11.8) f14 ω U (4.5, 15.5) f15 ω U (−7, 27)

4.1 Parameter tuning

Consider the simple integer recourse model

min {x + Q(x)} , _{x ∈ R}+, (80) where Q(x) = Eω min y {2y : y ≥ ω − x} , _{y ∈ Z}+. (81)

Here, ω ∈ R is a random variable and we tune SLBDA(α) for distributions fi

ω, i = 1, . . . , 10 as given

in Table 1. We report average optimality gaps, computation times and iterations over 20 randomly generated test instances for each distribution of ω. Recall the incumbent updating inequality from (60)

fl(xl−1) − fl(¯xl−1) < r{fl−1(xl−1) − fl−1(¯xl−1)}, r ∈ (0, 1). (82)

Since fl−1(xl−1) − fl−1(¯xl−1) ≤ 0, a higher value of r will result in a lower or equal right hand side

in (82) than with a lower value of r. Therefore the incumbent solution updating equation is more restrictive if r is higher. A stricter inequality implies that a new solution will be accepted as the incumbent solution less often. The number of new incumbent solutions found and the number of iterations since the last time a new incumbent solution was found, n, are therefore dependent on r. We choose to test for r ∈ {0.1, 0.5, 0.9}. Next, recall the inequality in iteration l that is necessary to hold for termination

(17)

Both n and ε need sensible values for termination of the algorithm. We choose to test for

n ∈ {10, 20, 30, 40, 50} and ε ∈ {0.1, 0.05, 0.01, 0.005, 0.001} implying 25 different termination criteria. We test each termination criterion for α = 0, and we terminate SLBDA(0) after 105 _{iterations if}

the termination criterion was not satisfied. For all r ∈ {0.1, 0.5, 0.9}, SLBDA(0) did not terminate for any termination criterion with n ∈ {20, 30, 40, 50} for ω ∼ exp(1). Hence, we only consider ε ∈ {0.1, 0.05, 0.01, 0.005, 0.001} and n = 10. We report the average optimality gaps in Table 2, the average computation times in Table 3 and the average number of samples used in the algorithms in Table 4.

Table 2: Average optimality gaps for SLBDA(0) and LBDA(0) for n=10. ε = 0.1 ε = 0.05 ε = 0.01 ε = 0.005 ε = 0.001 LBDA(0) r = 0.1 0.2960 0.2782 0.2413 0.2263 0.2233

r = 0.5 0.3101 0.2811 0.2452 0.2281 0.2241 0.2193

r = 0.9 0.3492 0.2932 0.2530 0.2355 0.2265

Table 3: Average computation time for SLBDA(0) and LBDA(0) for n=10. ε = 0.1 ε = 0.05 ε = 0.01 ε = 0.005 ε = 0.001 LBDA(0) r = 0.1 0.0375 0.0452 0.1830 0.3635 8.1593

r = 0.5 0.0184 0.0290 0.1149 0.3423 7.6799 3.1422

r = 0.9 0.0107 0.0146 0.0539 0.1864 5.7299

Table 4: Average number of samples drawn for SLBDA(0) and LBDA(0) for n=10. ε = 0.1 ε = 0.05 ε = 0.01 ε = 0.005 ε = 0.001 LBDA(0)

r = 0.1 67.26 86.92 238.84 385.61 1646.56

r = 0.5 44.08 68.11 194.95 342.24 1572.63 10000

r = 0.9 25.90 37.01 122.65 259.47 1348.05

From Table 2 we observe that for lower values of ε, the average upper bound on the optimality gap is lower, which we would expect since a lower value of ε implies a stricter inequality in (83). If ε = 0.001, the average optimality gaps for all values of r are close to the average optimality gap for LBDA(0) with S = 105 samples. However, the average computation time for ε = 0.001 for the different values of r are significantly higher than the average computation time of LBDA(0). In terms of computation time, SLBDA(0) terminates too late for ε = 0.001.

(18)

Table 5: Average amount of times a new solution was accepted as the incumbent solution. r f1 ω fω2 fω3 fω4 fω5 fω6 fω7 fω8 fω9 fω10 Avg 0.1 73.35 15.25 13.55 7.35 5.25 1.00 1.00 1.05 1.90 2.60 12.23 0.5 73.35 33.50 17.85 11.65 5.85 1.00 1.00 1.15 2.40 3.15 15.09 0.9 73.35 33.55 26.30 22.20 6.30 2.00 2.00 2.35 3.70 5.10 17.69

The average computation time for r = 0.9 and ε = 0.005 is slightly higher than the computation time for r = 0.1 and ε = 0.01 while over 8% more iterations are completed. To justify this, recall that for a higher value of r, the incumbent solution is less often updated then for a lower value of r as can be found in Table 5. An optimality cut is constructed by the subgradient of the iterative sampling approximation at the incumbent solution in each iteration. For sample ωi_{, we find the optimal basis}

matrix index in iteration l by ¯

k∗_i = arg max

k

{λk_(ωi_{− T ¯}_xl−1_{) :} _λk_{∈ V}

l}, i = 1, . . . , l. (86)

If the incumbent solution did not change in iteration l, ¯xl _{= ¯}_xl−1_{, the optimal basis matrix index in}

the following iteration is given by ¯

k∗_i = arg max

k

{λk_(ωi_{− T ¯}_xl_{) :} _λk_{∈ V}

l+1}, i = 1, . . . , l. (87)

This implies that if Vl = Vl−1, the same basis matrix indices will be chosen for samples ωi, i =

1, . . . , l − 1. Since we require to calculate ψ according to the basis matrix index as found in (63), if we obtain the same basis matrix index, we have calculated ψ before and we do not have to calculate it again. If we obtain new dual vertices, it is likely that for some samples, the same basis matrix indices will be optimal as in the previous iteration. Hence, the choice for a higher value of r can result in lower computation times with equal number of iterations.

The average optimality gap from solving SLBDA(0) for r = 0.9, n = 10 and ε = 0.005 is reasonably low, while being significantly faster than LBDA(0). Additionally, r = 0.9 will require lower computa-tion time for equal number of iteracomputa-tions than r = 0.1 and r = 0.9. Therefore, we conclude that a good termination criterion is obtained when we choose r = 0.9, n = 10 and ε = 0.005.

4.2 Performance

In this section, the performance of SLBDA(α) is evaluated with n = 10, ε = 0.005 and r = 0.9. We consider ten candidate solutions which are described below. We report average optimality gaps, costs and computation times over 20 randomly generated test instances for each separate distribution of ω as given in Table 1.

4.2.1 Setup

The candidate solutions consist of solutions generated by SLBDA(α) and LBDA(α), for different values of α, and the solution from DEF with a sample size of 100. For each value of α, we consider the solution generated by SLBDA(α) and two solutions generated by LBDA(α), one with sample size 103 and one with sample size 104.

Consider α = 0 and denote the optimal solution from SLBDA(0) by ¯x0 and the optimal solutions

from LBDA(0) by ˆx1

0 with sample size 103 and ˆx20 with sample size 104. Next, to get the convex

approximation ˜vαin (14), we have substituted ψ(ω − T x) by ψ(ω − α). Hence, α is a substitute for T x.

While the value of T x changes in each iteration, since the value of x changes in each iteration, we may get a better performing algorithm if we use α close to T x∗, where x∗ is the optimal solution. Consider the Jensen approximation, in which we substitute the random vector ω by µ = Eω[ω] resulting in a

(19)

for µ is denoted by xµ. Construct αµ= T xµ and denote ¯xJAas the solution from solving SLBDA(αµ)

and ˆx1_JA as the solution from solving LBDA(αµ) with sample size 103and ˆx2αµ with sample size 10

4_.

We require no second-stage recourse action if ω = T x. If we do require a recourse action y, we need to satisfy random constraints W y + T x = ω. Since x and y are both used to satisfy ω, T x is usually lower than ω. This implies that P (ω ≤ T x) is high. We want to find a value g such that P (ω ≤ g) = p for some p. We choose to set g as the value at the 95th percentile, that is P (ω ≤ g) = 0.95. Let ˜α be a random vector having a multivariate uniform distribution on [0, g]m_{. We consider 100 different values}

for α, where each value is an independent draw from the distribution of ˜α. Note that it is possible to parallelize SLBDA(α) and LBDA(α) for 100 different values of α. Denote ¯xαas the best solution

from solving SLBDA(α) for 100 different values of α. Similarly, denote the best solution from solving LBDA(α) for 100 different values of α as ˆx1

α with sample size 103 and ˆx2α with sample size 104. We

report the maximum computation time over all the different values of α for each candidate solution from SLBDA(α) and LBDA(α).

In Table 6 the different candidate solutions are summarized.

Table 6: Definitions of the candidate solutions. Notation Definition

¯

x0 Optimal solution from SLBDA(0) for r = 0.9, n = 10 and ε = 0.005

¯

xJA Optimal solution from SLBDA(αµ) for r = 0.9, n = 10 and ε = 0.005

¯

xα Best of SLBDA(α) for 100 different α’s for r = 0.9, n = 10 and ε = 0.005

ˆ x1

0 Optimal solution LBDA(0) with sample size S = 1000

ˆ x2

0 Optimal solution LBDA(0) with sample size S = 10000

ˆ x1

JA Optimal solution LBDA(αµ)) with sample size S = 1000

ˆ x2

JA Optimal solution LBDA(αµ)) with sample size S = 10000

ˆ x1

α Best of LBDA(α) for 100 different α’s with sample size s = 1000

ˆ x2

α Best of LBDA(α) for 100 different α’s with sample size s = 10000

˜

xDEF Optimal solution from DEF with sample size S = 100

4.2.2 Simple integer recourse

Consider the simple integer recourse model min x {x + Q(x), x ∈ R+}, (88) where Q(x) = Eω min y {qy : y ≥ ω − x, y ∈ Z+} . (89)

(20)

Table 7: Average optimality gaps for q = 2. ¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA x¯α xˆ1α xˆ2α xDEF f1 ω 0.3411 0.2935 0.2939 0.3411 0.2935 0.2939 0.0295 0.0270 0.0289 0.0499 f2 ω 0.2177 0.1593 0.1525 0.2177 0.1593 0.1525 0.0552 0.0591 0.0771 0.0895 f_ω3 0.1492 0.1804 0.1907 0.1492 0.1804 0.1907 0.0769 0.0781 0.0771 0.1070 f_ω4 0.1377 0.1321 0.1400 0.1377 0.1321 0.1400 0.0952 0.0952 0.0949 0.1207 fω5 0.2007 0.1807 0.1728 0.2007 0.1807 0.1728 0.1639 0.1607 0.1599 0.2242 f6 ω 0.6422 0.6421 0.6422 0.3226 0.1170 0.1188 0.0338 0.0294 0.0303 0.0445 f7 ω 0.2323 0.2135 0.2134 0.2851 0.0588 0.0592 0.0488 0.0487 0.0487 0.0607 f8 ω 0.1325 0.0845 0.0852 0.1325 0.0845 0.0852 0.0638 0.0867 0.0856 0.0939 f9 ω 0.1202 0.1259 0.1149 0.1376 0.1116 0.1108 0.1008 0.0986 0.1000 0.1269 f_ω10 0.1811 0.1784 0.1877 0.1811 0.1784 0.1877 0.1418 0.1423 0.1606 0.1964 f11 ω 0.4795 0.3032 0.3064 0.4795 0.3032 0.3064 0.0315 0.0158 0.0183 0.0242 f12 ω 0.3353 0.1759 0.1716 0.3353 0.1759 0.1716 0.0247 0.0783 0.1454 0.0471 f13 ω 0.1686 0.0834 0.0764 0.1686 0.0834 0.0764 0.0717 0.0717 0.0717 0.0982 f14 ω 0.1491 0.1268 0.1299 0.1491 0.1268 0.1299 0.1122 0.1114 0.1114 0.1456 f15 ω 0.2515 0.2193 0.2219 0.2515 0.2193 0.2219 0.2014 0.2009 0.2008 0.2801

(21)

Table 9: Average optimality gaps for q = 5. ¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA x¯α xˆ1α xˆ2α xDEF f1 ω 0.3491 0.2789 0.2793 0.3491 0.2789 0.2793 0.046 0.046 0.046 0.082 f2 ω 0.1676 0.1694 0.1694 0.1676 0.1694 0.1694 0.102 0.1059 0.1059 0.1354 f_ω3 0.2494 0.1354 0.1132 0.2494 0.1354 0.1132 0.1045 0.107 0.1053 0.161 f_ω4 0.2479 0.2114 0.207 0.2479 0.2114 0.207 0.1767 0.184 0.1965 0.262 fω5 0.4916 0.3872 0.3751 0.4916 0.3872 0.3751 0.3752 0.3734 0.3752 0.5745 f6 ω 0.5911 0.5911 0.5911 0.2122 0.0875 0.0883 0.062 0.0639 0.0632 0.1181 f7 ω 0.0886 0.0833 0.0831 0.2498 0.3914 0.3908 0.0793 0.0803 0.0783 0.1286 f8 ω 0.139 0.1591 0.1763 0.139 0.1591 0.1763 0.1216 0.1187 0.1221 0.1654 f9 ω 0.398 0.2088 0.2026 0.3546 0.2283 0.234 0.2017 0.1989 0.199 0.276 f_ω10 0.7926 0.404 0.3992 0.7926 0.404 0.3992 0.3833 0.3759 0.385 0.5651 f11 ω 0.3497 0.3032 0.3064 0.3497 0.3032 0.3064 0.016 0.0158 0.0183 0.0611 f12 ω 0.3574 0.6697 0.6975 0.3574 0.6697 0.6975 0.0199 0.0771 0.5613 0.0802 f13 ω 0.288 0.2422 0.2386 0.288 0.2422 0.2386 0.0722 0.0829 0.1636 0.1615 f14 ω 0.2807 0.2084 0.1974 0.2807 0.2084 0.1974 0.1856 0.1839 0.1840 0.2166 f15 ω 0.6151 0.3128 0.2996 0.6151 0.3128 0.2996 0.2904 0.2898 0.2883 0.4012

Table 10: Average computation times in seconds for q = 5. ¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA x¯α xˆ1α xˆ2α xDEF f_ω1 0.009 0.2255 1.6815 0.009 0.2325 1.6715 0.03 0.36 5.45 0.008 fω2 0.0065 0.241 1.8695 0.01 0.2485 1.876 0.04 0.45 2.11 0.0345 fω3 0.0195 0.3735 3.7285 0.0245 0.3815 3.7365 0.05 0.52 5.42 0.047 fω4 0.042 0.3615 3.6755 0.0465 0.365 3.5815 0.08 0.58 6.86 0.061 f5 ω 0.075 0.387 2.8165 0.0755 0.392 2.7725 0.16 0.51 3.95 0.0385 f6 ω 0.002 0.204 1.579 0.071 0.247 1.8925 1.41 0.50 5.58 0.001 f7 ω 0.072 0.261 2.115 0.043 0.3805 3.1565 0.85 0.48 4.30 0.027 f_ω8 0.2965 0.318 3.0225 0.2985 0.315 3.009 0.46 0.49 5.26 0.0515 fω9 0.149 0.3465 2.296 0.176 0.348 2.2835 0.39 0.54 3.98 0.071 fω10 0.1295 0.422 3.82 0.1345 0.4345 3.7885 0.25 0.47 6.02 0.043 f11 ω 0.0065 0.2015 1.5095 0.009 0.203 1.5085 0.02 0.26 2.02 0.003 f12 ω 0.006 0.377 3.499 0.0095 0.3735 3.553 0.04 0.59 12.94 0.014 f13 ω 0.014 0.3405 3.143 0.015 0.344 3.10 0.04 0.72 7.66 0.045 f14 ω 0.0255 0.3545 2.428 0.036 0.3495 2.4415 0.06 0.52 5.54 0.086 f_ω15 0.04 0.3525 2.376 0.0395 0.369 2.378 0.09 0.40 3.98 0.036

The use of α = αµ compared to α = 0 has no effect on the average optimality gaps for SLBDA(α)

and LBDA(α) for both q = 2 and q = 5 if ω is normally or uniformly distributed. However, there are some differences when ω is exponentially distributed. For q = 2 and q = 5, we observe that there is a mixed effect on the optimality gaps when comparing solutions ¯xα, ˆx1α and ˆx2αto solutions ¯x0, ˆx10and

ˆ

x20, respectively. Furthermore, average computation times for these solutions when ω is exponentially

distributed are different, but show no pattern in relation to the change in average optimality gaps. Hence, choosing α = αµ shows no improvement to SLBDA(α) and LBDA(α) compared to using α = 0.

From Table 7, we find that the best solution from solving SLBDA(α) for 100 different values of α, ¯xα, has the lowest optimality gap from all candidate solutions for six distributions of ω. In the

case when ω ∼ f11

ω , the optimality gap of ¯xα is twice as high as the optimality gaps of ˆx1α and ˆx2α.

(22)

from all candidate solutions. For q = 5 the results are similar, but now the average optimality gap of ¯xα for ω ∼ fω11 is close the lowest optimality gap from all candidate solutions. These results are

remarkable since SLBDA(α) is tuned for having a low computation time. Additionally, optimality cuts constructed in SLBDA(α) are less tight than the optimality cuts constructed in LBDA(α). Therefore, we would expect that, in terms of solution quality, solutions from solving SLBDA(α) to perform generally worse than solutions from solving LBDA(α). The error bound given in Section 2 is defined by the absolute value of the difference between the expected second-stage cost function Q and the generalized alpha-approximation ˜Qα. If instances exist for which the generalized alpha-approximation

is an upper bound to the expected second-stage cost function, a weaker lower bound to the generalized alpha-approximation could be a better estimate to Q.

In terms of computation time, we see that if ω is normally or uniformly distributed, the solutions generated by SLBDA(α) have lower computation time than the solutions generated by LBDA(α) for any of the considered values of α and sample size for LBDA(α). However, when ω is exponentially distributed, SLBDA(α) has a higher computation time than LBDA(α) for different parameters of the distribution of ω. Hence, when ω is normally or uniformly distributed, solutions from solving SLBDA(α) perform well in terms of solution quality, and are solved significantly faster than solutions from solving LBDA(α).

4.2.3 Mixed integer recourse

Consider the mixed-integer recourse model min x {cx + Q(x) : x ∈ R n1 +}, (90) where Q(x) = Eω min y {qy : W y ≥ ω − T x, y ∈ Z p +} . (91)

For ω ∈ Rm_{, we consider two cases of the mixed-integer recourse model. A small case is given by}

n1= 10, p = 5, m = 5 and a large case is given by n1 = 100, p = 40, m = 20. The elements of the

parameters are drawn from distributions given by

ci∼ U {1, 5}, (92)

qi∼ U {6, 10}, (93)

Ti,j∼ U {1, 6}, (94)

Wi,j∼ U {1, 6}, (95)

where U is the discrete uniform distribution. We report average optimality gaps, costs and computa-tion time over 20 randomly generated test instances for each distribucomputa-tion of ω. Average optimality gaps and computation time for the small case (n1= 10) are reported in Table 11 and Table 12, respectively.

Average costs and computation time for the large case (n1= 100) are reported in Table 13 and Table

(23)

Table 11: Average optimality gaps (n1= 10). ¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA x¯α xˆ1α xˆ2α xDEF f1 ω 0.5359 0.4595 0.4593 0.6912 0.638 0.6388 0.1254 0.1254 0.1254 0.1159 f2 ω 0.2844 0.1984 0.198 0.4768 0.3715 0.3726 0.0816 0.0829 0.1172 0.1264 f_ω3 0.1971 0.2017 0.1922 0.3678 0.2354 0.2307 0.0875 0.0875 0.0952 0.1542 f_ω4 0.2684 0.2716 0.2862 0.2134 0.4982 0.5327 0.1516 0.1582 0.1566 0.2174 fω5 0.2561 0.2476 0.2575 0.3259 0.3197 0.3067 0.1979 0.1963 0.2051 0.3171 f6 ω 0.4277 0.4314 0.4301 0.3477 0.3659 0.3629 0.0528 0.111 0.1201 0.09 f7 ω 0.2431 0.1564 0.1552 0.2477 0.136 0.1334 0.1004 0.0993 0.099 0.1325 f8 ω 0.1615 0.2152 0.1701 0.2196 0.2118 0.2055 0.1378 0.1382 0.1773 0.1609 f9 ω 0.2213 0.2408 0.2421 0.2729 0.261 0.272 0.1908 0.1969 0.1962 0.2605 f_ω10 0.3288 0.3077 0.3198 0.3494 0.3267 0.3654 0.2733 0.2743 0.2752 0.3734 f11 ω 0.5455 0.4735 0.4724 0.6034 0.6543 0.6548 0.1065 0.1406 0.1406 0.0856 f12 ω 0.4678 0.3667 0.3656 0.5796 0.5475 0.548 0.0184 0.0664 0.0664 0.086 f13 ω 0.3011 0.1305 0.1238 0.556 0.243 0.2451 0.0472 0.0418 0.0322 0.0867 f14 ω 0.3978 0.317 0.2581 0.3258 0.6227 0.8345 0.0356 0.0976 0.0678 0.1081 f15 ω 0.2936 0.2441 0.2399 0.3077 0.2527 0.2271 0.1571 0.1582 0.1673 0.2603

Table 12: Average computation time in seconds (n1= 10).

(24)

Table 13: Average costs, where the lowest cost is normalized to 100 for each distribution (n1= 100). ¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA x¯α xˆ1α ˆx2α xDEF f1 ω 107.06 107.82 107.81 112.26 110.35 110.23 100.00 101.70 101.31 103.94 f2 ω 105.48 101.97 101.35 109.02 110.55 109.45 100.00 102.58 100.27 103.17 f_ω3 110.83 110.91 112.40 107.67 105.18 103.92 100.12 100.00 100.70 103.32 f_ω4 103.26 102.19 101.64 106.38 104.33 104.75 100.29 100.00 100.09 103.20 fω5 102.12 100.64 100.36 103.31 100.54 100.41 100.20 100.00 100.02 EXC f6 ω 181.96 144.60 144.18 168.72 145.56 145.54 100.00 143.83 143.82 114.69 f7 ω 128.48 130.42 130.03 133.89 135.42 134.95 100.00 104.58 105.01 102.91 f8 ω 108.68 109.79 109.18 107.37 104.82 106.45 102.03 100.00 102.19 106.66 f9 ω 102.28 101.46 101.10 103.29 101.60 101.17 100.31 100.12 100.00 EXC f_ω10 102.76 100.48 100.18 103.12 100.46 100.19 100.44 100.05 100.00 EXC f11 ω 105.15 106.84 106.84 111.50 109.06 109.09 101.64 100.01 100.00 102.53 f12 ω 106.71 106.63 107.06 112.30 113.53 112.15 101.42 100.10 100.00 105.49 f13 ω 110.57 104.35 103.43 110.02 106.26 106.34 100.00 100.51 100.92 103.82 f14 ω 106.92 106.61 106.178 107.38 104.90 105.30 100.00 101.38 101.38 103.73 f15 ω 105.05 104.29 104.15 104.30 103.60 102.87 101.16 100.92 100.00 101.31

Table 14: Average computation time in seconds (n1= 100).

¯ x0 xˆ10 xˆ02 x¯JA xˆ1JA xˆ 2 JA ¯xα xˆ1α xˆ2α xDEF f_ω1 0.26 56.83 536.92 0.25 63.85 685.57 7.28 192.67 2544.46 2.27 fω2 0.62 49.34 562.16 1.21 59.61 536.00 7.90 112.91 1353.36 5.08 fω3 1.40 45.79 483.46 1.10 49.62 528.56 6.25 74.33 797.38 6.80 fω4 2.71 36.53 372.64 2.08 37.74 364.26 7.56 53.43 406.59 104.24 f5 ω 1.04 34.56 271.49 0.91 34.91 268.55 3.39 50.92 356.09 EXC f6 ω 0.51 5.49 55.78 0.44 5.32 54.79 12.36 5.88 61.69 0.67 f7 ω 2.27 6.46 64.36 1.74 6.20 63.02 13.05 7.47 70.05 1.43 f_ω8 3.22 9.17 90.38 2.38 8.83 89.78 8.86 12.07 113.66 30.06 fω9 1.70 21.50 149.05 1.82 20.63 144.49 6.56 26.38 172.40 EXC fω10 1.22 36.46 197.09 1.10 36.01 199.65 3.19 41.73 245.49 EXC f11 ω 0.27 47.47 482.91 0.22 56.60 566.22 6.06 209.66 2077.81 1.72 f12 ω 0.55 47.17 482.69 0.69 51.14 534.45 4.20 138.50 1672.70 1.74 f13 ω 0.50 46.47 465.25 0.78 51.61 543.66 4.81 67.63 725.77 2.04 f14 ω 1.05 35.05 403.37 0.75 30.29 313.35 3.90 44.96 617.59 5.45 f_ω15 1.47 28.73 328.19 1.11 27.66 308.76 5.15 38.66 356.53 26.23

The general trends in the results from the simple integer recourse model are similar to the results for the small and large cases of the mixed-integer recourse model. Solving SLBDA(αµ) and LBDA(αµ)

have mixed effects on the optimality gaps, costs and computation times compared to solving SLBDA(0) and LBDA(0), respectively. This shows that solving SLBDA(α) or LBDA(α) using α = αµ instead

of α = 0 has no added benefit for solution quality and computation time for both simple integer and mixed-integer recourse models.

We observe that the optimality gaps of solutions from solving SLBDA(α) for 100 different values of α, ¯xα, are lower than of any of the other candidate solutions for nine of the distribution for the small

(25)

From tables 12 and 14 we see that the computation time of ¯xα is generally lower than that of ˆx1α

and ˆx2α. When the variability of ω is low, solutions from solving SLBDA(α) for 100 different values of

α have generally higher computation time than solutions from solving DEF. However, the computation time of solutions ¯xαare lower than that of solutions from solving LBDA(α) for 100 different values of α

and xDEF, for distributions of ω with high variance. Especially for the large case, where SLBDA(α) is

faster than LBDA(α) and DEF, when ω is normally distributed with σ2_{> 1, exponentially distributed}

with σ2 _{≥ 1 and uniformly distributed with σ}2 _{≥ 10. Hence, for mixed-integer recourse models we}

observe that when ω is normally or uniformly distributed with high variability, solutions from solving SLBDA(α) perform well in terms of solution quality, while being significantly faster than LBDA(α) and DEF.

5 Conclusion

We consider two-stage mixed-integer recourse models with right-hand side uncertainty. These models are solved by substituting a convex approximation called the generalized alpha-approximation for the expected second-stage cost function. Since there is little literature on how to solve these approxima-tions, we contribute by applying the concepts from stochastic decomposition (Higle and Sen (1991)) to generalized alpha-approximations and formulate the stochastic loose Benders decomposition algo-rithm, called SLBDA(α). We sample iteratively and construct in each iteration, a new sample average approximation (SAA) to the generalized alpha-approximation. We call these iterative sample approx-imations. The two-stage mixed-integer recourse model with the generalized alpha-approximation is solved by applying Benders decomposition on the iterative sample approximations. Additionally, we propose a termination criterion that tracks the estimate of the objective function at the incumbent solution, a solution for which the estimate of the objective value is considered sufficiently low.

The advantage from using SLBDA(α) over solving methods as LBDA(α) by van der Laan and Romeijnders (2018) and solving the model in its deterministic equivalent formulation (DEF), is the significantly lower computation time that is required. For most distributions we test for, SLBDA(α) is significantly faster than LBDA(α) and DEF for simple integer recourse and mixed-integer recourse models. Additionally, the best solution from solving SLBDA(α) for different values of α performs well in scalability to larger problem instances. Especially when the variability in the random parameters is large enough, the solutions from solving SLBDA(α) perform well in terms of solution quality and outperform in terms of computation time compared to other solving methods.

(26)

References

Benders, J. F. (1962). Partitioning procedures for solving mixed-variables programming problems. Numerische mathematik 4 , 238–252.

Birge, J. R. and F. V. Louveaux (1988). A multicut algorithm for two-stage stochastic linear programs. European Journal of Operational Research 34, 384–392.

Carøe, C. C. and R. Schultz (1999). Dual decomposition in stochastic integer programming. Operations Research Letters 24 (1-2), 37–45.

Carøe, C. C. and J. Tind (1998). L-shaped decomposition of two-stage stochastic programs with integer recourse. Math. Programming 83(3), 451–464.

Higle, J. L. and S. Sen (1991). Stochastic decomposition: an algorithm for two-stage linear programs with recourse. Mathematics of Operations Research 16(3), 447–669.

Laporte, G. and F. V. Louveaux (1993). The integer L-shaped method for stochastic integer programs with complete recourse. Operations Research Letters 13, 133–142.

L¨ohndorf, N. and A. Shapiro (2019). Modeling time-dependent randomness in stochastic dual dynamic programming. European Journal of Operational Research 273(2), 650–661.

Mak, W. K., D. P. Morton, and R. K. Wood (1999). Monte carlo bounding techniques for determining solution quality in stochastic programs. Operations Research Letters 24(1–2), 47–56.

Parisio, A. and C. N. Jones (2015). A two-stage stochastic programming approach to employee schedul-ing in retail outlets with uncertain demand. Omega 53, 97–103.

Rinnooy Kan, A. H. G. and L. Stougie (1988). Stochastic integer programming. In Yu. Ermoliev and R.J.-B Wets, editors, Numerical Techniques for Stochastic Optimization, volume 10 of Springer Series in Computational Mathematics, pages 201–213. Springer.

Romeijnders, W., R. Schultz, M. H. van der Vlerk, and W. K. Klein Haneveld (2016a). A convex approximation for two-stage mixed-integer recourse models with a uniform error bound. SIAM Journal on Optimization 26(1), 426–447.

Romeijnders, W., M. H. van der Vlerk, and W. K. Klein Haneveld (2016b). Total variation bounds on the expectation of periodic functions with applications to recourse approximations. Mathematical Programming 157(1), 3–46.

Ruszczy´nski, A. (1986). A regularized decomposition method for minimizing a sum of polyhedral functions. Mathematical Programming 35(3), 309–333.

Schultz, R., L. Stougie, and M. H. van der Vlerk (1998). Solving stochastic programs with integer recourse by enumeration: a framework using Gr¨obner basis reductions. Math. Programming 83(2), 229–252.

Sen, S. and J. L. Higle (2005). The C3 _{theorem and a D}2 _{algorithm for large scale stochastic}

mixed-integer programming: set convexification. Math. Program. 104(1), 1–20.

Sen, S. and H. D. Sherali (2006). Decomposition with branch-and-cut approaches for two-stage stochas-tic mixed-integer programming. Mathemastochas-tical Programming 106(2), 203–223.

van der Laan, N. and W. Romeijnders (2018). Generalized alpha-approximations for two-stage mixed-integer recourse models. SOM Research Report, volume 2018015-OPERA, University of Groningen, SOM research school.