On the interface of Stochastic Programming and inventory control

(1)

On the interface of Stochastic Programming and inventory

control

Niels van der Laan

July 27, 2018

Abstract

In inventory management, costs can be saved through causal demand forecasting. Causal demand forecasting means exploiting the dependency of demand on external variables, such as Google search data or weather data, to improve inventory performance. In this paper, we consider a single-item newsvendor model with service level restrictions. We study data-driven approaches to the problem of determining an optimal inventory level using causal demand fore-casting. In a data-driven approach, the inventory decision is based directly on the historical observations on demand and related variables instead of specifying an intermediate demand forecast. We show that existing data-driven approaches for inventory management using causal demand forecasting achieve below target service levels. We improve these methods to achieve on-target performance. Moreover, we propose three new approaches for inventory management using causal demand forecasting. These methods make different distributional assumptions, which provides flexibility to practitioners. Finally, we extend our results to the multi-item case.

1 Introduction

One of the key issues in inventory control is demand forecasting. Indeed, good demand forecasts are needed to achieve a prescribed customer service level while minimizing costs. Demand forecasts are most often based on historical demand data and potentially inaccurate. Therefore, safety stocks are imposed on top of the demand forecast to safeguard against shortages. However, large safety stocks are costly, for example due to holding costs. This implies a trade-off, which is mitigated if the forecast accuracy is improved. One way of doing so is to exploit the dependency of demand on related variables, such as price (changes), weather, or season. For example, during a sale, demand is expected to increase. This is called causal demand forecasting [Beutel and Minner, 2012].

Traditionally, safety stock planning is a two-step procedure. First, a demand forecast is gen-erated based on historical demand observations. Second, an optimal inventory level is determined using standard techniques, taking as given the demand forecast. More recent approaches combine these steps into an integrated approach. In other words, the optimal inventory level is based directly on the historical observations. This is known as data-driven optimization [Bertsimas and Thiele, 2006]. A data-driven approach is useful in data-rich environments, where there are observa-tions on many different variables. In particular, data-driven optimization can be applied to causal demand forecasting, see e.g. Beutel and Minner [2012], Bertsimas and Kallus [2014] and Ban and Rudin [2017]. Because this paper is closely related to their work, we discuss their results further in Section 2.1.

(2)

methods and improve existing methods for safety stock planning using causal demand forecasting, in both single- and multi-item settings.

The contributions of this paper are threefold. First, we improve a data-driven approach for safety stock planning using causal demand forecasting by Beutel and Minner [2012]. Using a mixed-integer programming framework, they base the safety stock level directly on historical observations, thereby circumventing the need to estimate a demand distribution. We discuss their approach in detail in Section 2.1. While Beutel and Minner [2012] do not make distributional assumptions, they observe that the resulting service levels are typically below target and that there is a sample size effect: as the number of historical observations increases, their method performs better. We offer an explanation for this sample size effect, based on the insight that their method can be interpreted as a Sample Average Approximation (SAA) of chance constraints. These concepts are discussed in Section 2.2. Based on this insight, we improve their method and derive bounds on the sample size required to achieve on-target performance.

Second, we propose three new data-driven approaches for single-item safety stock planning using causal demand forecasting. These methods are based on results from chance-constrained programming. The three methods make different types of distributional assumptions. The first method assumes normality, the second assumes that only the first- and second-order moments are known, and the third considers all probability measures within a given distance of a central probability measure, which we estimate based on data. We show that by making additional distributional assumptions, better results can be obtained, provided the assumptions are correct. Third, we consider safety stock planning using causal demand forecasting in a multi-item set-ting. In particular, we show that in the multi-item case, service level restrictions can be formulated as joint chance constraints. Using a Bonferroni approximation for joint chance constraints, we ex-tend the methods we propose to the multi-item case. Joint chance constraints and Bonferroni approximations are discussed in Section 2.2.1.

The remainder of this paper is organized as follows. In Section 2, we review some of the litera-ture on data-driven safety stock planning using causal demand forecasting, as well as (integrated) chance constraints and SAA. In Section 3, we suggest improvements to the integrated approach by Beutel and Minner [2012] and we present bounds on the required sample size. Moreover, we evaluate these bounds using numerical experiments. In Section 4, we present new methods for safety stock planning using causal demand forecasting. Section 5 presents the extension to multiple items. Section 6 concludes.

2 Literature review

We first discuss data-driven approaches to safety stock planning using causal demand forecasting. In particular, we describe the mixed-integer programming approach by Beutel and Minner [2012]. Next, we go over (integrated) chance constraints and SAA.

2.1 Data-driven safety stock planning

(3)

Beutel and Minner [2012] study a single-item one-period newsvendor model with zero lead time. In the review period (period N + 1), the firm has to make a decision on the inventory level I = q(XN +1) before demand D is known. The objective is to minimize expected holding costs while meeting a service level restriction. Holding costs are due over the surplus inventory (I −D)+, where (s)+:= max{0, s}. Demand D in the review period is a random variable with an unknown distribution. However, there are demand observations Di, i = 1, . . . , N , in the previous N periods. Additionally, in all N + 1 periods, there are observations on m variables influencing demand Xi = (Xi1, . . . , Xim), i = 1, . . . , N + 1. The firm can exploit possible dependencies between Xi and Di to obtain a more accurate demand forecast in period N + 1.

Beutel and Minner [2012] consider two types of service level restrictions. First, ready-rate restrictions impose that all demand is met with a prescribed probability. Second, fill-rate restric-tions impose that a prescribed fraction of demand is met. More formally, let D and I denote the demand and inventory level in period N + 1. Note that D is random and that the firm can decide on I. A ready-rate restriction is represented by

P(I ≥ D) ≥ 1 − α,

and a fill-rate restriction is given by E(D − I)+ ≤ (1 − α)E[D], where 1 − α is the required service level.

Beutel and Minner [2012] impose a linear decision rule: I = q(XN +1) = XN +1β, where β is a vector of decision variables. In case of a ready-rate restriction, their integrated approach is to solve the mixed-integer program (1)-(7):

min N X i=1 yi (1) s.t. yi≥ Xiβ − Di, i = 1, . . . , N, (2) Xiβ + γiM ≥ Di, i = 1, . . . , N, (3) N X i=1 γi≤ αN, (4) γi∈ {0, 1}, i = 1, . . . , N, (5) yi≥ 0, i = 1, . . . , N, (6) β unrestricted. (7)

The idea is to choose β so that in hindsight, if the inventory level in period i was Xiβ, then the holding costs over the previous N periods were minimized, while demand was met in at least (1 − α)N periods. Here, yi represents the surplus inventory over which holding costs are due. This is enforced by constraints (2) and (6), which ensure that yi = (Xiβ − Di)+. Without loss of generality, we assume unit holding costs, leading to the objective function in (1). Constraints (3) enforce that Xiβ ≥ Di, that is, demand should have been met in period i, unless γi= 1. Together with constraint (4), this ensures that demand is met in at least (1 − α)N of the previous periods.

In case of a fill-rate restriction, constraints (2)-(4) are replaced by constraints (8)-(10):

(4)

Here, si is the demand that was fulfilled in period i. Indeed, constraints (8)-(9) enforce that si= min{Xiβ, Di}. Hence, constraint (10) is the fill rate constraint over the previous N periods. Data-driven safety stock planning is subject to the risk of overfitting [Ban and Rudin, 2017]. This means that the decision rule q performs well in-sample, but poor out-of-sample. Beutel and Minner [2012] observe a similar phenomenon: while their method enforces that service level constraints are met in-sample, out-of-sample performance is below target.

2.2 Chance constraints and integrated chance constraints

(Integrated) chance constraints are special types of constraints known from Stochastic Program-ming. They enforce that a random goal constraint holds with some prescribed probability, see for example Pr´ekopa [2003]. A typical chance constrained programming problem has the following form [Luedtke and Ahmed, 2008]:

min{f (x) : x ∈ X, _{P[G(x, ξ) ≤ 0] ≥ 1 − α},} (11)

where ξ is a random variable with support Ξ ⊆ Rmand known distribution, x ∈ X is a vector of decision variables, and G : X × Ξ → R is a known function.

While chance constraints have many applications, they are generally intractable [Nemirovski and Shapiro, 2006]. The reason is that even for fixed x, the probability P[G(x, ξ) ≤ 0] requires computing a multi-dimensional integral. Furthermore, the feasible region

{x ∈ X : P[G(x, ξ) ≤ 0] ≥ 1 − α}

is in general non-convex, even if X is convex. This implies that solving (11) is very challenging. Integrated chance constraints [Klein-Haneveld and Van der Vlerk, 2006] are expected value constraints of the form

E[G(x, ξ)] ≤ r, (12)

where G and the distribution of ξ are known. Evaluation of E[G(x, ξ)] is in general again very challenging and optimization under integrated chance constraints may be difficult, if at all possible. 2.2.1 Joint chance constraints

Suppose G is a multidimensional function, that is, G(x, ξ) = (G1(x, ξ), . . . , Gk(x, ξ)). Consider the joint chance constraint

P[G1(x, ξ) ≤ 0 ∧ · · · ∧ Gk(x, ξ) ≤ 0] ≥ 1 − α. (13)

It follows from Bonferroni’s inequality that (13) can be conservatively approximated by k individual chance constraints:

P[Gi(x, ξ) ≤ 0] ≥ 1 − αi, i = 1, . . . , k, wherePk

i=1αi≤ α [Zymler et al., 2013]. For example, we can take αi= α_k, i = 1, . . . , k.

2.2.2 Sample average approximations

One approach to deal with the difficulties of (integrated) chance constraints is sampling, see for example Luedtke and Ahmed [2008] or Pagnoncelli et al. [2009]. The idea is to obtain a random sample ξ(1), . . . , ξ(N ) of size N from the distribution of ξ, and to replace the chance constraint

P[G(x, ξ) ≤ 0] ≥ 1 − α (14)

(5)

where1 denotes the indicator function. Alternatively, the scenario approximation imposes that G(x, ξ(i)) ≤ 0, i = 1, . . . , N.

The latter constraint has the additional advantage that it is convex. Note that it is more con-servative, because more values of x ∈ X are excluded. The sample average approximation of the integrated chance constraint in (12) reads

1 N N X i=1 G(x, ξ(i)) ≤ r.

Sampling is a technique to obtain a tractable approximation of (integrated) chance constrained programming problems. However, we can use the same techniques if we only have a sample at our disposal and we do not want to specify the distribution of ξ. This is valid if the observations in the sample are drawn independently from from the distribution of ξ. As we show in Section 3, this is equivalent to the data-driven approach by Beutel and Minner [2012]. The advantage of this approach is that it does not make distributional assumptions such as normality. The disadvantage is that the sample size is not under our control. This is unfortunate, because the quality of SAA improves as the sample size increases.

Pagnoncelli et al. [2009] show that in case of chance constraints, under mild conditions, the optimal value of the SAA problem converges to the optimal value of the true problem (11) as N → ∞. For finite N , this is not the case. Moreover, the optimal solution of the SAA problem may be infeasible in (11). Pagnoncelli et al. [2009] suggest replacing the 1 − α in (15) by 1 − α0, where α0 < α, to arrive at a more conservative approximation. However, they do not specify hard rules for the choice of α0. Note that if α0 = 0, then we obtain the scenario approximation.

The probability that the optimal solution of the SAA problem or the scenario approximation problem is contained in the feasible region of the true problem (11) is referred to as solution reliability. Intuitively, we may have a ’bad sample’, so that the approximating problem is a poor approximation of the true problem. As the sample size increases, the probability of drawing a bad sample decreases, that is, the solution becomes more reliable. Calafiore and Campi [2006] and Luedtke and Ahmed [2008] derive bounds on the solution reliability as a function of sample size. We state the results that are relevant to our analysis below. For integrated chance constraints, we refer to Wang and Ahmed [2008] for results on convergence of SAA.

Theorem 1. Suppose that X ⊆ Rn _{is convex, f is convex, and G is convex in x for every ξ.} Then the optimal solution of the scenario approximation problem

min{f (x) : x ∈ X, G(x, ξ(i)) ≤ 0, i = 1, . . . , N },

is feasible in the true problem (11) with probability at least 1 − δ for N ≥ 2/α log(1/δ) + 2n + 2n

α log(2/α). Proof. See Calafiore and Campi [2006].

Luedtke and Ahmed [2008] consider the approximating constraint 1 N N X i=1 1(G(x, ξ(i)_{) + γ ≤ 0) ≥ 1 − α}0_, ₍₁₆₎

(6)

Theorem 2. Write Xα:= {x ∈ X : P[G(x, ξ) ≤ 0] ≥ 1 − α} for the feasible region of (11). For α0∈ [0, α) and γ > 0, write X_αN0_,γ:= {x ∈ X : 1 N N X i=1 1(G(x, ξ(i)_{) + γ ≤ 0) ≥ 1 − α}0_}

for the feasible region of the approximating problem. Suppose X is bounded and let R := sup{||x − x0||∞: x, x0∈ X}, where || · ||∞ denotes the maximum norm. Suppose

|G(x, ξ) − G(x0, ξ)| ≤ L||x − x0||∞ ∀x, x0∈ X and ∀ξ ∈ Ξ. Then for any β ∈ (0, α − α0),

P[XαN0 ⊆ Xα] ≥ 1 − 1 β 2LR γ n exp{−2N (α − α0− β)2_}e. ₍₁₇₎

Proof. See Luedtke and Ahmed [2008].

Note that Theorem 1 deals with the reliability of the optimal solution of the scenario approx-imation problem. In contrast, Theorem 2 deals with the reliability of a feasible solution of the approximating problem based on (16). As a result, the lower bound presented in Theorem 2 can be sharpened if just the optimal solution is considered. To our knowledge, results of this type are not available.

3 Sample average approximation: solution quality

In this section, we consider the integrated approach by Beutel and Minner [2012] for safety stock planning using causal demand forecasting. First, we interpret their approach as an SAA of a chance constraint. Next, we use Theorems 1 and 2 to bound the sample size required to achieve a prescribed solution reliability. In addition, we use Theorem 1 to derive a lower bound on the expected service level. Moreover, we estimate solution reliability and mean service level for a specific demand specification using numerical experiments. This allows us to compare the estimated reliability and mean service level to the theoretical lower bounds.

The reason we also consider mean service level is that a typical firm stocks many items and is mostly interested in achieving on-target service levels averaged over all items. Thus, the goal is to achieve on-target service levels in expectation. In contrast, a high solution reliability of, say, 99%, means that the service level is above target in 99% of the cases, so that the mean service level far exceeds the target.

We show that for a ready rate restriction, the integrated approach by Beutel and Minner [2012] can be interpreted as an SAA of the chance constraint

P[XN +1β ≥ DN +1] ≥ 1 − α. (18)

Indeed, Beutel and Minner [2012] assume that a sample (X1, D1), . . . , (XN, DN) of size N from the joint distribution of (XN +1, DN +1) is available. The sample average approximation of (18) based on this sample is

1 N N X i=1 1(Xiβ ≥ Di) ≥ 1 − α. (19)

In a mixed-integer program, (19) can be enforced by the constraints (3) and (4). In fact, (3) ensures that1(Xiβ ≥ Di) = 1 − γi, so that (4) is equivalent to (19).

The scenario approximation of (18) is given by

(7)

Alternatively, we can approximate (18) by 1 N N X i=1 1(Xiβ ≥ Di) ≥ 1 − α0, (21) where α0≤ α.

In anticipation of the results in the remainder of this section, we recommend to use the scenario approximation (20) for moderate sample sizes (N ≤ 150). Only for large sample sizes (N > 150), using the approximating constraint (21) is justified. However, hard rules for the choice of α0 are difficult to specify for realistic sample sizes (N < 1500), even when armed with Theorem 2.

Using Theorem 1, we find a lower bound on the sample size required to achieve a prescribed reliability of the optimal solution ˆβ of the scenario approximation. Here, reliability refers to the probability that ˆβ is feasible with respect to (18). Figure 1 shows the required sample size as a function of the prescribed solution reliability if n = 2, that is, β ∈ R2_{, and α = 0.1, meaning} that in the review period, all demand must be met with 90% probability. Roughly speaking, 250 observations are sufficient to be reasonably confident about the scenario approximation solution. Even if just 50% reliability is required, we still need at least 130 observations.

Figure 1: Sample size N required to achieve a prescribed reliability of the scenario approximation solution (n = 2 and α = 0.1). 0.5 0.6 0.7 0.8 0.9 1.0 140 160 180 200 220 240 260 Solution confidence

Required sample siz

e

We can also use Theorem 1 to derive a lower bound on the expected service level. Indeed, with a probability of at least 1 − exp −α 2 N − 2n −2n α log 2 α ,

(8)

This is true for any α ∈ (0, 1). Thus, for any α ∈ (0, 1), the quantity M (α) := (1 − α) 1 − exp −α 2 N − 2n −2n α log 2 α

is a lower bound on the expected service level. We compute sup_α∈(0,1)M (α) to find the best bound on the expected service level implied by Theorem 1. Figure 2 shows this lower bound as a function of N (n = 2). Note that in order to achieve an expected service level of 85%, we need at least N = 200 observations.

In a controlled numerical experiment, we can estimate the reliability and mean service level of the solutions obtained using SAA and the scenario approximation for a given demand specification. We compare this to the theoretical lower bounds implied by Theorem 1. We adopt the setup of Beutel and Minner [2012]. They consider the demand specification

D = a − bp + u, (22)

where a and b are fixed constants, drawn from a uniform distribution on [1000, 2000] and [500, 1000], respectively. The price level p is drawn from a uniform distribution on [0, 1]. The error term u is normally distributed with mean 0 and variance σ2_{, where σ}2 _{is such that at mean price, the} coefficient of variation of demand is equal to cv = 0.3.

First, Beutel and Minner [2012] generate a price-demand sample of size N based on this specification. In the notation of Section 2.1, Xi= (1, pi), where pidenotes the price observation in period i. Next, they compute the optimal β = (β1, β2) in (1)-(7). To estimate the resulting service level, they generate m = 100000 out-of-sample price-demand observations (po

i, Dio), i = 1, . . . , m, using the earlier demand specification. An estimate of the service level is then given by

b sl = 1 m m X i=1 1(β1+ β2poi ≥ D o i),

that is, the fraction of out-of-sample observations for which demand is satisfied.

In order to obtain a good estimate of the service level, this experiment is repeated k = 500 times to obtain k service level estimates csl1, . . . , cslk. Beutel and Minner [2012] then compute the average of these estimates to obtain a good estimate of the mean service level. Here, we are also interested in the solution reliability, which we estimate by

1 k k X i=1 1(csli≥ 1 − α),

i.e. the fraction of experiments for which the service level estimate is on- or above target. Figure 2 shows the estimated mean service levels of the SAA solution and the scenario approx-imation solution as a function of sample size. Using the aforementioned demand specification, the scenario approximation method yields on-target service levels even for small sizes (N = 20). Note that the estimated mean service levels far exceed the lower bound. However, mean service levels of the SAA solution are below target, even for large sample sizes (N = 200). This supports our recommendation to use the scenario approximation method.

(9)

Figure 2: Mean service level lower bounds and estimates as a function of sample size: Sample Average Approximation and scenario approximation. (n = 2 and α = 0.1).

(10)

Figure 3: Theoretical reliability of scenario approximation solution versus sample size (solid), estimated reliability of scenario approximation solution under normality (connected dots), and estimated reliability of SAA solution under normality (dots). (n = 2, α = 0.1).

50 100 150 200 250 0.0 0.2 0.4 0.6 0.8 1.0 N (sample size) Solution reliability ●

Scenario approximation (SA) Lower bound SA SAA ● ● ● ● ● ● ● ● ● ● ● ● ● ●

Based on Theorem 2, we can asses the reliability resulting from the approximating con-straint (21) for different values of α0. In order to apply Theorem 2, we need an upper bound on the diameter R of the feasible region X. Note that in our case X = R2, but we can real-istically restrict our attention to a bounded subset of R2_{. For the demand specification under} consideration, R = 1000 is a reasonable choice. Additionally, we need a Lipschitz constant L for the function G(β; p, D) = −β1− β2p + D. Note that

|G(β; p, D) − G(β; p, D)| = β0

1− β1+ (β02− β2)p ≤ (1 + p)||β − β0||∞.

Because the support of p is [0, 1], we can take L = 2. Note that in practice, L and D are unknown. Here, we made use of the demand specification, which is in practice unknown. Finally, we set γ = 1 and we choose β ∈ (0, α − α0) to maximize the expression in (17).

Figure 4 shows lower bounds on the reliability of any feasible solution with respect to (21) for α0 ∈ {0.01, 0.025, 0.05, 0.075}. Note that the lower bounds in Figure 3 are much higher because it shows the reliability of the optimal solution of the scenario approximation. Unfortunately, the required sample sizes implied by Figure 4 are unrealistically large. Even if α0 = 0.01, we need at at least 1500 observations to obtain a reasonable solution reliability.

(11)

d0.925 · 20e = 19 out of N = 20 cases. Hence, if α0 _{= 0.075, the SAA solution is more conservative} if N = 10 compared to N = 20.

Figure 4: Lower bound on solution reliability of any feasible solution with respect to (21) as a function of sample size for α0 ∈ {0.01, 0.025, 0.05, 0.075} (α = 0.1, n = 2).

(12)

Figure 5: Estimated reliability of SAA solution for α0_{∈ {0.01, 0.025, 0.05, 0.075} under normality} (n = 2, α = 0.1). 20 40 60 80 100 0.0 0.2 0.4 0.6 0.8 1.0 N (sample size) Solution reliability ● ● ● ● ● ● ● ● ● ● ● 0.01 0.025 0.05 0.075

4 Safety stock planning using causal demand forecasting

In the previous section, we have seen that large sample sizes are required to obtain reliable solutions using the data-driven approach for safety stock planning by Beutel and Minner [2012]. This is unfortunate, because the sample size is not under control of the practitioner. The scenario approximation method mitigates this issue by using a more conservative approximating constraint. However, even under normality, the scenario approximation method is still unreliable for small, realistic sample sizes. Furthermore, there is no useful guarantee on the achieved expected service level using either method.

This motivates us to propose three new methods for safety stock planning using causal demand forecasting. They are based on results from chance-constrained programming. The advantage of the methods we propose is that the resulting solution reliability is under control of the practitioner, even for small sample sizes. Moreover, the alternative methods make varying distributional as-sumptions, providing flexibility of use. If we make additional distributional assumptions on the distribution of (XN +1, DN +1), then we are able to achieve lower expected holding costs. However, if the distributional assumptions are invalid, then the resulting solution is likely to be infeasible with respect to the service level restriction.

4.1 Normality assumption

Suppose that XN +1 contains a constant: XN +1 = (1, ˜XN +1) and that ( ˜XN +1, DN +1) follows a multivariate normal distribution with mean µ = (µX, µD) and covariance matrix Σ. Write β =β_˜1

β

(13)

and takes the following form: −µ_Xβ + µD+ Φ−1(1 − α) s ˜ β −1 Σ _˜ β −1 ≤ 0. (23)

If the normality assumption is correct and µ and Σ are known, then the resulting service levels are on-target. In practice, the normality assumption cannot be verified and µ and Σ are unknown. As a result, the resulting service levels are potentially below target. To hedge against this, Nemirovski [2012] suggests to use worst-case values of µ and Σ in (23). We estimate µ and Σ based on their sample counterparts. In order to hedge against inaccurate parameter estimates, we set

ˆ µD= ¯D + s c sD 2 n ,

where ¯D ands_cD2 are the sample mean and standard deviation of D. The correction term q

c

sD2

n is motivated by the distribution of ¯D under normality:

√

n( ¯D − µD) → N (0, σD2).

4.2 Ambiguous chance-constrained programming

In chance-constrained programming, one assumes that the probability measure of ξ is known. In practice, this is unrealistic. Typically, distributional assumptions are made and parameters are estimated based on historical data. In Section 4.1, we hedged against this by using worst-case parameter estimates, but we still made the assumption of normality. An alternative solution to this problem is to consider worst-case distributions [Nemirovski, 2012]. The idea is to consider a set Q of probability measures and to replace (11) by the ambiguous chance constraint

min{f (x) : x ∈ X, _PQ[G(x, ξ) ≤ 0] ≥ 1 − α ∀Q ∈ Q}, (24)

where PQ denotes the probability with respect to Q. In words, the chance constraint has to be satisfied for every probability measure in Q.

The two alternative methods we propose are derived from ambiguous chance constraints. They differ in the choice of Q.

4.2.1 Second-order moment information

Zymler et al. [2013] consider ambiguous chance constraints and assume that only second-order moment information is available. They define Q as the set of all probability measures that agree with the first- and second-order moments. To illustrate this, suppose we only have demand observations D1, . . . , Dn. In other words, we temporarily abstract from causal demand forecasting. Based on the sample, we can estimate the mean and variance of D, denoted µ and σ2_{. Denote} by Q the set of all probability measures with mean µ and variance σ2_{. Then the ambiguous chance} constraint PQ[β1≥ D] ≥ 1 − α ∀Q ∈ Q is equivalent to β1≥ µ + σ r 1 − α α (25)

by a result known as Cantelli’s inequality, see e.g. Uspensky [1937].

(14)

prove that if G is concave in ξ, then determining feasibility with respect to the ambiguous chance constraint based on second-order moment information is equivalent to solving a tractable semi-definite program. More formally, let Ω denote the second-order moment matrix ofξ

1

. Assume that XN +1 contains a constant: XN +1= (1, ˜XN +1) , and write β =

β1 ˜ β

. Then, β is feasible in (24) if the optimal value of

min λ,Mλ + 1 αtr(ΩM ) s.t. M ∈ Sm+1, _{λ ∈ R} M ≥ O, M −    O 0 −1 2β˜ 0 0 1₂ −1 2β˜ T ₁ 2 −β1− λ   ≥ O.

is non-positive. Here, Sm+1_{denotes all square symmetric matrices of size m + 1.}

If we use the true value of Ω, then the resulting service levels are on-target for any distribution that agrees with the first- and second-order moments of ξ. In practice, Ω is unknown, but we estimate it by its sample counterpart. This approach is subject to estimation error. However, the specification of the set of probability measures Q hedges against this, so that we still expect on-target service levels.

4.2.2 Prohorov metric

Erdo˘_{gan and Iyengar [2006] consider ambiguous chance constraints where the set Q of probability} measures is of the form

Q = {Q : ρp(Q, Q0) ≤ η}. (26)

Here, Q0denotes a central probability measure, which is typically estimated based on data, and ρp denotes the Prohorov metric between probability measures, defined as

ρp(Q1, Q2) := inf{ : Q1(B) ≤ Q2(B) + : ∀B ∈ F }, where F denotes the σ-algebra of the probability space and

B_{:= {ξ ∈ Ξ : inf} ξ0_∈B||ξ − ξ

0_{|| ≤ }.}

Erdo˘gan and Iyengar [2006] propose the following procedure to approximate the ambiguous chance constraint based on the Prohorov metric: (i) obtain N0 _{independent samples ξ}(1)_{, . . . , ξ}(N0)_drawn from Q0 and (ii) replace the ambiguous chance constraint (24) by

G(x, z) ≤ 0 ∀z s.t. ||z − ξ(i)|| ≤ η, i = 1, . . . , N0. (27)

The constraint in (27) is in general still intractable, but Erdo˘gan and Iyengar [2006] present a linear progamming formulation if G is affine in x.

(15)

where m is the number of decision variables. Hence, we can find finite N0 such that the solution achieves a prescribed reliability, provided η < . This allows us to produce reliable solutions, even for small sample sizes.

To implement this method, we impose normality for the central probability measure Q0, and we estimate its parameters using their sample counterparts. Choosing a positive value of η in the definition of Q, see (26), hedges against violations of the normality assumption as well as inaccurate parameter estimates. We set η = 0.001. Using (28), we choose N0 _{such that the} probability that the resulting solution is feasible in (24) is at least 0.5.

4.3 Numerical experiments

We apply all three methods to data generated under three different demand specifications. This allows us to assess the importance of the distributional assumptions underlying the methods. We use the demand specification by Beutel and Minner [2012]:

D = a − bp + u,

see also the text surrounding (22). In the original specification, u follows a normal distribution. In addition, following Beutel and Minner [2012], we consider the case where the error term u is gamma distributed. Again, the parameters of the distribution of u are chosen such that at mean price, the coefficient of variation is equal to cv = 0.3. Finally, we also change (22) to

D = a − b exp(p) + u,

that is, demand is non-linear in price.

The setup of our numerical experiments is the same as in Beutel and Minner [2012], see also Section 3. We the estimate mean service level and solution reliability. The mean service level should be on- or above target and solution reliability should be at least 50%. The reason is that a typical firm stocks many items and is mainly concerned about the achieving overall on-target service levels. Therefore it is reasonable to aim for a reliability of 50%. In Tables 1-3, we report the estimated mean service level and reliability of the solutions based on the normality assumption, second-order moment information, and the Prohorov metric, respectively. We consider sample sizes N ∈ {10, 20, 50, 100} and service levels 1 − α, where α ∈ {0.05, 0.1}, for each demand specification. The results show that the method based on the normality assumption achieves lower solution reliabilities compared to the other methods, especially under the gamma demand specification. Moreover, for α = 0.05, the mean service levels under both the gamma and non-linear demand specification are below target. The other methods achieve high reliabilities and on-target service levels for all three demand specifications, even for small sample sizes. In general, it appears that the methods based on second-order moment information and the Prohorov metric are too conservative. Note that the method based on second-order moment information is more conservative than the method based on the Prohorov metric.

(16)

Table 1: Results numerical experiments: chance constraints under normality assumption. Demand specification:

Normal Gamma Non-linear

Reliability Mean service Reliability Mean service Reliability Mean service

N level level level

α = 0.1 10 0.614 (0.022) 0.902 (0.004) 0.512 (0.022) 0.886 (0.004) 0.602 (0.022) 0.900 (0.004) 20 0.646 (0.021) 0.911 (0.002) 0.574 (0.022) 0.901 (0.002) 0.670 (0.021) 0.911 (0.002) 50 0.704 (0.020) 0.913 (0.001) 0.602 (0.022) 0.905 (0.001) 0.676 (0.021) 0.913 (0.001) 100 0.670 (0.021) 0.910 (0.001) 0.618 (0.022) 0.903 (0.001) 0.710 (0.020) 0.912 (0.001) α = 0.05 10 0.544 (0.022) 0.936 (0.003) 0.448 (0.022) 0.926 (0.003) 0.520 (0.022) 0.927 (0.003) 20 0.642 (0.021) 0.953 (0.002) 0.464 (0.022) 0.935 (0.002) 0.494 (0.022) 0.937 (0.002) 50 0.618 (0.022) 0.953 (0.001) 0.374 (0.022) 0.939 (0.001) 0.500 (0.022) 0.945 (0.001) 100 0.666 (0.021) 0.955 (0.001) 0.334 (0.021) 0.941 (0.001) 0.428 (0.022) 0.944 (0.001)

Estimated values based on numerical experiments, standard errors in parentheses.

Table 2: Results numerical experiments: ambiguous chance constraints using second-order moment information.

Demand specification:

N level level level

α = 0.1 10 0.968 (0.008) 0.983 (0.001) 0.948 (0.010) 0.972 (0.002) 0.882 (0.014) 0.965 (0.002) 20 0.996 (0.003) 0.993 (0.001) 0.996 (0.003) 0.984 (0.001) 0.976 (0.007) 0.979 (0.001) 50 1.000 (0.000) 0.997 (0.000) 1.000 (0.000) 0.990 (0.000) 1.000 (0.000) 0.986 (0.001) 100 1.000 (0.000) 0.998 (0.000) 1.000 (0.000) 0.992 (0.000) 1.000 (0.000) 0.990 (0.000) α = 0.05 10 0.990 (0.004) 0.997 (0.001) 0.954 (0.009) 0.991 (0.001) 0.898 (0.014) 0.981 (0.002) 20 1.000 (0.000) 1.000 (0.000) 0.998 (0.002) 0.997 (0.000) 0.946 (0.010) 0.991 (0.001) 50 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 0.999 (0.000) 0.996 (0.003) 0.997 (0.000) 100 1.000 (0.000) 1.000 (0.000) 1.000 (0.000) 0.999 (0.000) 1.000 (0.000) 0.998 (0.000)

(17)

Table 3: Results numerical experiments: ambiguous chance constraints using Prohorov metric. Demand specification:

N level level level

α = 0.1 10 0.836 (0.017) 0.952 (0.003) 0.784 (0.018) 0.940 (0.003) 0.764 (0.019) 0.935 (0.003) 20 0.944 (0.010) 0.971 (0.001) 0.940 (0.011) 0.960 (0.001) 0.890 (0.014) 0.952 (0.002) 50 0.988 (0.005) 0.977 (0.001) 0.982 (0.006) 0.967 (0.001) 0.976 (0.007) 0.965 (0.001) 100 0.998 (0.002) 0.982 (0.001) 0.996 (0.003) 0.970 (0.001) 0.994 (0.003) 0.970 (0.001) α = 0.05 10 0.824 (0.017) 0.970 (0.002) 0.694 (0.021) 0.958 (0.002) 0.654 (0.021) 0.948 (0.003) 20 0.914 (0.013) 0.982 (0.001) 0.838 (0.016) 0.973 (0.001) 0.790 (0.018) 0.969 (0.002) 50 0.988 (0.005) 0.990 (0.000) 0.954 (0.009) 0.980 (0.001) 0.888 (0.014) 0.978 (0.001) 100 1.000 (0.000) 0.991 (0.000) 0.982 (0.006) 0.982 (0.001) 0.942 (0.010) 0.980 (0.001)

Figure 6: Sample size effects: reliability of alternative methods under normal demand specification, α = 0.1. 0 20 40 60 80 100 0.4 0.6 0.8 1.0 N (sample size) Solution reliability ● ● ● ● ● ● ● ● ● ● ● Normality

(18)

Figure 7: Sample size effects: mean service level achieved by alternative methods under normal demand specification, α = 0.1. 0 20 40 60 80 100 0.85 0.90 0.95 1.00 N (sample size) Mean ser vice le v el ● ● ● ● ● ● ● ● ● ● ● Normality

Second−order moment information Prohorov

5 Multiple items

In a multi-item setting with k items, the firm has to decide on inventory levels I1, . . . , Ik in the review period (period N + 1) for each item. The firm has observations on demand of all k items in the first N periods: Di,j, i = 1, . . . , N , j = 1, . . . , k. As before, there are observations on related variables up until the review period Xi, i = 1, . . . , N + 1. The firm can base its inventory decision on XN +1. We impose the restriction Ii = XN +1βi, where βi, i = 1, . . . , k, are decision vectors. The firm faces a ready rate restriction, which specifies that demand for all items is met in at least (1 − α)100% of the cases:

P[XN +1βi≥ DN +1,i, i = 1, . . . , k] ≥ 1 − α. (29)

We can conservatively approximate (29) by

P[XN +1βi≥ DN +1,i] ≥ 1 − αi, i = 1, . . . , k, (30)

where Pk

i=1αi ≤ α. We can apply any of the methods discussed in Section 4 to approximate the individual chance constraints in (30). Zymler et al. [2013] give an explicit formulation of this approximation for ambiguous chance constraints based on second-order moment information.

(19)

generated according to

(Di,1, Di,2) = a − B(pi,1, pi,2)T + ui, (31)

where the elements of a = (a1, a2) are drawn from a uniform distribution on [1000, 2000], the diago-nal (off-diagodiago-nal) elements of B are drawn from a uniform distribution on [500, 1000] ([−1000, −500]).

The error term u follows a multivariate normal distribution with zero mean, and covariance matrix Σ, which is such that the coefficient of variation of both D1 and D2 is cv = 0.3 and the correlation between D1 and D2 at mean price is equal to ρ, which is drawn from a uniform distribution on [−1, 1].

We consider two alternative demand specifications. First, we change (31) into a non-linear specification:

(Di,1, Di,2) = a − B(exp(pi,1), exp(pi,2))T + ui.

Second, we let the individual elements of ui follow independent gamma distributions. Again, the parameters are chosen such that at mean price, the coefficient of variation is equal to cv = 0.3.

The fact that the Bonferroni approximation is a conservative approximation hedges against possible violations of distributional assumptions and inaccurate parameter estimates. Therefore, we expect the method based on the normality assumption to perform well, even if the normality assumption is violated. In our experiments we take α = 0.1, and α1= α2= 0.05.

In Table 4, we report the estimated mean service level and solution reliability of both meth-ods for all three demand specifications. In terms of mean service level, the method based on the normality assumption only performs on-target under the normal and the non-linear demand specification if N > 20. The method based on second-order moment information performs on- or above target under all demand specifications, except the non-linear specification if N = 10.

Table 4: Results numerical experiments: two-item case (α = 0.1). Demand specification:

N level level level

Normality assumption

10 0.402 (0.022) 0.854 (0.004) 0.240 (0.019) 0.826 (0.004) 0.414 (0.022) 0.859 (0.005) 20 0.564 (0.022) 0.899 (0.003) 0.314 (0.021) 0.868 (0.003) 0.652 (0.021) 0.910 (0.002) 50 0.718 (0.020) 0.916 (0.001) 0.380 (0.022) 0.884 (0.002) 0.738 (0.020) 0.917 (0.001) 100 0.800 (0.018) 0.918 (0.001) 0.302 (0.021) 0.886 (0.001) 0.800 (0.018) 0.919 (0.001)

Second-order moment information

10 0.738 (0.020) 0.920 (0.005) 0.700 (0.020) 0.904 (0.006) 0.520 (0.022) 0.840 (0.008) 20 0.822 (0.017) 0.949 (0.004) 0.772 (0.019) 0.940 (0.004) 0.634 (0.022) 0.896 (0.006) 50 0.852 (0.016) 0.957 (0.004) 0.838 (0.016) 0.954 (0.004) 0.648 (0.021) 0.900 (0.006) 100 0.860 (0.016) 0.960 (0.003) 0.878 (0.015) 0.960 (0.003) 0.656 (0.021) 0.903 (0.005)

6 Conclusion

(20)

Based on this insight, we improved an existing method for safety stock planning using causal demand forecasting, which achieved below target mean service levels. In addition, we derived bounds on the sample size required to achieve on-target performance. Unfortunately, numerical experiments show that these bounds are too conservative. Moreover, the sample size is not under control of the practitioner. We proposed three alternative methods for safety stock planning using causal demand forecasting under varying distributional assumptions. The proposed alternatives achieve on- or above target performance, even for small sample sizes. However, the methods based on ambiguous chance-constrained programming hedge against worst-case scenarios by setting very high inventory levels. As a result, these methods are overly conservative. We recommend to use the method based on the normality assumption, with a variance correction term for the estimate of mean demand.

Finally, we studied the multi-item case. We showed that this problem can be cast into the framework of joint chance constraints. Using a Bonferroni approximation, we extended our meth-ods to the multi-item case. Numerical experiments indicated on-target performance.

In general, we observe that existing methods achieve below target performance. While im-proved methods achieve on- or above target performance, they are typically overly conservative. Closing this gap provides one avenue for future research. Furthermore, the method based on the assumption of normality performs near- or on-target, even if the normality assumption is violated. While this finding is relevant for practice, we remark that there is no theoretical performance guarantee if the normality assumption is unlikely to hold. Deriving such a performance guarantee is another avenue for future research. Finally, while the method based on second-order moment information is overly conservative, there exists worst-case distributions for which performance is just on-target. Identifying these worst-case distributions is interesting, both from a theoretical as well as practical perspective. For practical purposes, it is relevant to know if these worst-case distributions are realistic, or can be safely ignored.

References

G. Ban and C. Rudin. The big data newsvendor: Practical insights from machine learning. Submitted to Operations Research, 2017.

D. Bertsimas and N. Kallus. From predictive to prescriptive analytics. arXiv preprint arXiv:1402.5481, 2014.

D. Bertsimas and A. Thiele. Robust and data-driven optimization: Modern decision-making under uncertainty. INFORMS tutorials in operations research: models, methods, and applications for innovative decision making, 3, 2006.

A.-L. Beutel and S. Minner. Safety stock planning under causal demand forecasting. International Journal of Production Economics, 140(2):637–645, 2012.

G. C. Calafiore and M. C. Campi. The scenario approach to robust control design. IEEE Trans-actions on Automatic Control, 51(5):742–753, 2006.

E. Erdo˘gan and G. Iyengar. Ambiguous chance constrained problems and robust optimization. Mathematical Programming, 107(1-2):37–61, 2006.

W. K. Klein-Haneveld and M. H. van der Vlerk. Integrated chance constraints: reduced forms and an algorithm. Computational Management Science, 3(4):245–269, 2006.

J. Luedtke and S. Ahmed. A sample approximation approach for optimization with probabilistic constraints. SIAM Journal on Optimization, 19(2):674–699, 2008.

(21)

A. Nemirovski and A. Shapiro. Convex approximations of chance constrained programs. SIAM Journal on Optimization, 17(4):969–996, 2006.

B. Pagnoncelli, S. Ahmed, and A. Shapiro. Sample average approximation method for chance constrained programming: theory and applications. Journal of optimization theory and appli-cations, 142(2):399–416, 2009.

A. Pr´ekopa. Probabilistic programming. Handbooks in operations research and management sci-ence, 10:267–351, 2003.

J. V. Uspensky. Introduction to mathematical probability. McGraw-Hill Book Company, New York, 1937.

W. Wang and S. Ahmed. Sample average approximation of expected value constrained stochastic programs. Operations Research Letters, 36(5):515–519, 2008.