On the handling of forecast uncertainty in inventory models

(1)

On the handling of forecast uncertainty in inventory models

Dennis Prak s1891545

(2)

Master’s thesis Research Master: Economics, Econometrics, and Finance Master’s thesis EORAS: Operations Research

(3)

On the handling of forecast uncertainty in inventory models

Dennis Prak

July 2, 2014

Abstract

The current inventory control literature exhibits a separation of forecasting and decision making. Pa-rameter estimates of the future demand distribution are directly substituted into the full information ob-jective function, and hence the resulting decision is not optimal. We derive a general approach to properly addressing the estimation uncertainty. First, the parameters are efficiently estimated and their errors are modeled. Then, the point estimates plus their errors are substituted into the objective function, after which the expectation of this objective function is taken with respect to the estimation errors. We present this approach in a model with shortage and holding costs and Normally distributed demand that can be mean-stationary or trended, but the general approach can be applied to any inventory model using any demand distribution. As the magnitude of forecast errors in a trend model is quadratic over time, the largest cost advantages can be achieved here (30-50% in a continuous review setting, based on simulation studies). If the lead time grows and the effect of misestimating increases, then the cost benefit of our method also increases. We mainly use the Normal approximation to the error distribution, but study the exact approach, bootstrapping, and the Bayesian approach as well. We also provide a heuristic improvement of the Normal approximation that fixes the mismatch between error distribution support and parameter space.

(4)

1 Introduction

Whereas the inventory control literature deals with a wide range of different models in terms of review struc-tures, cost frameworks, and demand characteristics, it is hardly ever acknowledged that in practice the future demand distribution is not given, but its parameters are estimated. Fundamental inventory control textbooks, such as Waters (2012), Axsäter (2006), and Silver, Pyke, and Peterson (1998), do deal with models of “un-certain” demand, but only refer to uncertainty in the sense that demand follows some given distribution and therefore takes on sizes with probabilities according to a probability mass function (in case of discrete demand sizes) or probability density function (in case of continuous demand sizes). Although the textbooks thus do make an attempt to handle uncertainty by introducing stochastic demand, they all make a very restrictive as-sumption: the parameters of the underlying distribution are assumed to be known. All three mentioned books (and in fact most theoretical inventory control textbooks) do contain a chapter on forecasting future demand, but no textbook actually integrates these forecasts into the decision models: those models simply start from a demand distribution with known parameters and thereby seem to completely neglect the uncertainty due to forecast errors that arises in their practical application. A particular problem when making this simplifying assumption, is that the correlations of forecast errors of future demands are neglected, such that the variance of total future demand in a certain period is underestimated and subsequently, safety stock levels are set too low. In this thesis we derive, for different demand models, the optimal inventory decisions taking into account this forecast uncertainty, and show the cost consequences of the ignorant way of modeling.

1.1 Literature survey

The above specified mismatch between theoretical models and their practical applications has been touched upon before. Contrary to the general textbooks, it was already stipulated in the late 1950’s that forecasting and optimisation should go hand-in-hand. After Arrow, Karlin, and Scarf (1958) had treated a dynamic inventory model with fixed holding and shortage costs and stochastic demand (a similar framework as we will study throughout this thesis), one year later Scarf (1959) published an article in which he applies Bayesian statistics to update the parameter estimates in this model. By gathering historical demands in a sufficient statistic and using a prior parameter distribution, he provided a framework to find a posterior distribution and integrate this into the optimisation problem, so that both estimate updating and uncertainty handling were achieved in a Bayesian framework. Despite the theoretical basis that this article provided for incorporating estimation uncertainty, no full-fledged applications were given, nor guidelines on e.g. what prior distributions should be used. As was noted by Azoury and Miller (1984), both subjectivity in distribution choice and intractability of computations led to the dilusion of the Bayesian approach in the inventory control literature. The general approach in the 1960’s and 1970’s neglected estimation uncertainty, as can be found in e.g. the textbook by Hillier and Lieberman (1980).

(6)

construct a spare parts inventory model in which they combine forecasting and decision making and take into account that the lead time demand forecasts are correlated. Recently, the necessity of correcting for the cor-relations between future demand forecasts has been stipulated more explicitly by Syntetos and Teunter (2014). It is clear that the current inventory control practice fails to properly take into account forecast uncertainty. Throughout the years, the Bayesian method by Scarf (1959) is sometimes applied and extended in order to come up with a complete dynamic updating of parameter estimates and corresponding uncertainty. However, it heavily relies on the choice of prior distribution and is therefore a subjective approach. Other authors have provided ad hoc adjustments to the general theory by for example using a safety stock mark-up or explicitly approximating the forecast errors and their covariances. Contrarily, a proper fundamental approach to incorpo-rating estimation uncertainty in inventory models is modeling the forecast errors’ distributions corresponding to the estimators and integrating these into the objective function. None of the authors mentioned above takes this approach, whereas it has recently been stated by Ali, Boylan, and Syntetos (2012) that forecasting and inventory decision making should never be separated, and that the interactions between forecasting and stock control are not fully understood. Furthermore, they remark that improvements in forecasting accuracy do not automatically lead to inventory cost reductions. In an empirical study, Syntetos, Nikolopoulos, and Boylan (2010) analyse the relationship between forecasting accuracy and inventory costs.

In this thesis, we will take the above mentioned fundamental approach. We will, for different demand models, apply appropriate estimation techniques and use known information on the estimators’ properties to model estimation uncertainty at its core and directly take it into account in the optimisation problem. This method is free of subjectivity, contrary to for example the Bayesian approach, since without any prior assumptions the estimation error is approximated. While we will mainly focus on the asymptotic approximations of the estimators’ distributions, we will also comment on its drawbacks and present different approaches to finding the distribution of the estimation error: we will discuss modeling the exact error distribution, bootstrapping, and we will show the link between the frequentist and the Bayesian approach. Once the forecast error dis-tribution is found, it is directly incorporated in the inventory problem in order to find the optimal inventory decision when estimation uncertainty is properly taken into consideration. This optimal method will be com-pared with the naïve approach that directly plugs parameters estimates into the full information problem, and subsequently performs the optimisation.

1.2 Notation

Throughout this thesis, the following notation will be used consistently. General notation

Parameters and decision variables

h holding costs per unit and per time unit p shortage costs per unit and per time unit L length of the lead time

T length of the review period (periodic review) n number of historical demand observations S order-up-to level

˜

(7)

Functions and other entities

IL(t) inventory level at time t directly after the arrival of demand

IP(t) inventory position at time t directly after the arrival of demand and the subsequent order TCc(S, n) total cost function for continuous review, at time n, given order-up-to level S

TCp(S,t) total cost function for periodic review, at time t, given order-up-to level S

Dt (unobserved) demand at time t

D_[k,m] (unobserved) total demand at times k, k + 1, ..., m

dt observed demand at time t (continuous review) or review period t (periodic review) (t ≤ n)

FX cumulative distribution function of X . Special case: Φ is the standard Normal cdf

fX probability density function of X

(A)Var(X ) (asymptotic) variance of X

(MV )N(µ, σ2) (multivariate) Normal distribution with mean µ and variance σ2 ˆ

X point estimate of X , using the derived method that will be specified κX estimation error of the point estimate of X

In any function, specific parameter choices, if applicable, are given after the arguments, separated by a semi-colon.

Specific notation for Section 2

µ , σ mean and standard deviation of per-time unit demand

L(µ, σ ), `(µ, σ ) likelihood and log-likelihood function of the parameters µ and σ I (µ,σ) Fisher information matrix

S∗_µ optimal order-up-to level under uncertainty of µ S∗_{µ ,σ} optimal order-up-to level under uncertainty of µ and σ

θ ,τ mean and standard deviation of the Bayesian prior distribution of per-time unit demand Specific notation for Section 3

α , β regression parameters of the trend model for demand

εt, νt error terms in resp. the continuous and periodic review model

σν standard deviation of νt

µlead,n, σlead,n2 mean and standard deviation of lead time demand when currently at time n

S∗ optimal order-up-to level under uncertainty of α and β The notation for Section 4 will be introduced when necessary.

1.3 Modeling approach

Throughout this thesis we will focus on a discrete time, single item inventory model with fixed, linear holding and shortage costs. That is, the time horizon is t = 1, 2, ..., holding costs per unit per time unit are h > 0, and shortage costs per unit per time unit are p ≥ h. Orders can be placed free of charge and arrive after a fixed lead time L ≥ 0. At every review an optimal order-up-to level is derived, and the order of events at any review time instant is as follows: first demand occurs, then outstanding orders arrive, and finally new orders can be placed. It is in principle possible that the order-up-to level is smaller than the current inventory position. This is however unlikely, since it can only occur if the negative difference between two subsequent optimal order-up-to levels is larger than the demand realisation. In these rare cases we allow for negative orders. This set-up makes the model decomposable in the sense that at each review the optimal order-up-to level can be derived independently from former and future decisions, as we will show below

(8)

allows for negative demands with non-negligible probability, which may be unrealistic. However, standard textbooks such as Axsäter (2006) widely use it in their models and therefore we will apply it as well. The calculations can easily be extended to other demand distributions, as long as a consistent estimator for the parameters exists and the distribution of the estimator can be derived or approximated. In Section 2 we apply a mean-stationary demand model, and in Section 3 we generalise our analysis to demand that is variance-stationary along a linear trend.

The costs arising from the inventory model at any time instant are easily characterised. Denote the inven-tory level (the inveninven-tory on hand) at time t, after the arrival of demand at time t, by IL(t). Then costs at time tare given by

C(t) = h(IL(t))++ p(IL(t))−, where (x)+= max{x, 0} and (x)−= max{−x, 0}.

In all applications we will first focus on a continuous review model, by which we mean that the inventory level can be observed and orders can be placed at any time instant. In this model, we refer to the current time instant by n, so that we have observed n historical demands. We derive an optimal order-up-to policy that is characterised by an order-up-to level Snfor the inventory position at each time instant, which includes

outstanding orders. Furthermore, we assume that we are currently at the end of time n, and that we have observed demands at time 1, ..., n. Orders that are placed at time n will arrive at the end of time n + L. The next possible order can be placed at time n + 1 and will arrive at the end of time n + L + 1. Therefore, the decision made at time n affects costs at time n + L + 1, which are driven by the inventory level directly after the arrival of demand at time n + L + 1, but before the next ordering, which we, as above, denote by IL(n + L + 1). The inventory level at time n + L + 1 is therefore given by the inventory position (the inventory level plus outstanding orders) after the ordering at the end of time n, minus the demands at times n + 1, ..., n + L + 1. Denote the inventory position at the end of time n by IP(n) and denote the sum of demands at times k, ..., m by D[k,m]. Then we can write

IL(n + L + 1) = IP(n) − D[n+1,n+L+1].

Since, according to the policy to be derived, the inventory position at time n is given by Sn, we find that at

time n we face the problem of minimising

TCc(Sn, n) = hE(Sn− D[n+1,n+L+1])++ pE(Sn− D[n+1,n+L+1])−,

where lead time demand D[n+1,n+L+1] follows a distribution which will be specified differently depending

on the used model and of which the parameters have to be estimated. Observe that this model is decom-posable in the sense that the optimal order-up-to level at time n can be found independently of former and future order-up-to levels. Therefore, in the sequel we drop the subscript n and refer to the optimal order-up-to level at time n as S. Finally, the estimation uncertainty has to be integrated into the model and at any time instant n the corresponding objective function has to be minimised to solve for the optimal order-up-to level S. After having discussed the continuous review version of the discrete time model (i.e. with a review at ev-ery discrete time instant), we will show how the calculations can be extended to periodic review models, where only total demand at the beginning and end of periods of length T = 1, 2, ... time instants can be ob-served. At the base of this analysis lies that now historical observations consist of total demands in n such review periods. We preserve n for denoting the number of historical demand observations. Therefore, in the periodic review setting we denote the time instant of the optimisation by t, which is at the beginning of the n+ 1st review period. An order placed at time t arrives at time t + L, and the next possible order can be placed at time t + T and arrives at time t + L + T . Therefore, the order-up-to level chosen at time t affects costs at times t + L + 1, ...,t + L + T . By evaluating the inventory level at each time instant in this interval, we find that at time t the objective function that has to be minimised is

(9)

Again, the demand function has to be specified according to the model at use, the parameters have to be es-timated, and the uncertainty in their estimation has to be taken into account. Please note that by substituting T= 1, the continuous review model is obtained.

Throughout this thesis we will first consistently use the following method for incorporating estimation uncer-tainty in the decision models. Let θ be the parameter vector with length k of the demand distribution that has to be estimated, and write total costs for the optimisation at time n, given order-up-to level S and the parameter vector θ , as TC(S, n; θ ). Some consistent estimation method is used to produce an estimate ˆθ of θ and the corresponding estimate for its variance matrix, cVar( ˆθ ). It then follows that the distribution of the estimation error κθ = θ − ˆθ can be approximated (based on asymptotic results) by a Normal distribution with mean 0

and variance cVar( ˆθ ), of which we denote the pdf by fκθ. Denote the different parameters in θ by θ1, θ2, etc.

and denote the corresponding elements of κθ by κθ1, κθ2, etc. For example, one can have θ1= µ, θ2= σ ,

κθ1 = µ − ˆµ , and κθ2 = σ − ˆσ . Now writing θ = ˆθ + κθ and substituting this expression into the total cost

function which is the objective function for minimisation, we can make the inventory decision while properly taking into account estimation uncertainty by using the objective function

c TC(S, n; ˆθ ) = ∞ Z −∞ . . . ∞ Z −∞ TC(S, n; ˆθ + ˜κθ) fκθ( ˜κθ)dκθ1. . . dκθk.

Although we will mainly use the estimator’s asymptotic variance and asymptotic normality to construct the distribution of the estimation error, different approaches are possible, such as deriving the finite sample dis-tribution of the error, bootstrapping, and Bayesian techniques. These can be applied and integrated into the decision model in a similar fashion, as we will explore in Section 4.

1.4 Outline

(10)

2 A mean-stationary demand model

We will start our analysis with a model in which demand is mean-stationary. As explained, we will adhere to the Normal distribution for modeling demand and commence with a continuous review model, as stipulated in Section 1.3. Demand Dt at any time t follows a Normal distribution with mean µ and standard deviation

σ and is distributed independently of demand at other time instants, so Dt i.i.d.

∼ N(µ, σ2_{). Fixed holding costs}

are h > 0 and fixed shortage costs are p ≥ h. Orders can be placed free of charge and arrive after a fixed lead time L ≥ 0. Our goal is to find an optimal order-up-to level at every time instant.

2.1 Optimal policy with known distribution parameters

Let us first assume, like the general textbooks, that µ and σ are known. As derived in Section 1.3, the decision at time n boils down to minimising

TCc(S, n; µ, σ ) = hE(S − D[n+1,n+L+1])++ pE(S − D[n+1,n+L+1])−, where D_[n+1,n+L+1]= n+L+1

∑

t=n+1 Dt.

Denote the distribution function of lead time demand D[n+1,n+L+1]by FD_[n+1,n+L+1]. Since

E(S − D[n+1,n+L+1])+ = S Z −∞ (S − s)dFD[n+1,n+L+1](s) = S Z −∞ S Z s dxdFD[n+1,n+L+1](s) = S Z −∞ x Z −∞ dFD_[n+1,n+L+1](s)dx = S Z −∞ FD_[n+1,n+L+1](x)dx, and similarly E(S − D[n+1,n+L+1])−= ∞ Z S h 1 − FD_[n+1,n+L+1](x) i dx, we can write TCc(S, n; µ, σ ) = h S Z −∞ FD[n+1,n+L+1](x)dx + p ∞ Z S h 1 − FD[n+1,n+L+1](x) i dx. Using d dS S Z −∞ FD[n+1,n+L+1](x)dx = FD[n+1,n+L+1](S), and d dS ∞ Z S h 1 − FD[n+1,n+L+1](x) i dx = −h1 − FD[n+1,n+L+1](S) i , we find the first order condition

FD_[n+1,n+L+1](S) =

p p+ h, which is a newsvendor equation and can be solved for S. Since

d2TCc

(11)

it follows that TC is convex and the found optimum is a global minimum.

Given the per time unit demand distribution and the independence assumption, we find that D[n+1,n+L+1]

has a Normal distribution with mean (L + 1)µ and standard deviation√L+ 1σ . The equation defining the optimal order-up-to level in this case has a closed-form solution which is given by

S= µ(L + 1) + Φ−1 p p+ h σ √ L+ 1, (1)

where Φ denotes the distribution function of the standard Normal distribution. Naïve inventory control meth-ods directly substitute point estimates for µ and σ in (1) and subsequently calculate the “optimal” order-up-to level, which we denote by ˜S. This order-up-to level is not optimal under uncertainty. Furthermore, as is pointed out by Strijbosch et al. (2000) and Syntetos and Teunter (2014), forecast errors of future demands are correlated. Therefore, the cost effect of neglecting forecast uncertainty increases as lead time increases. To overcome this problem, we will hereafter elaborate on proper estimation and uncertainty handling in this model.

2.2 Parameter estimation

Let us use Maximum Likelihood to estimate the parameters of the demand distribution. This method is consistent and, given that the distribution is properly specified, asymptotically efficient. The principles of Maximum Likelihood estimation can for example be found in Cameron and Trivedi (2005). Assume that we are currently at the end of time n. Our observation set consists of observations dt of demand at times

t= 1, ..., n. Note that per time unit demand is independently Normally distributed with mean µ and standard deviation σ . Since the density function of the observations is given by

f(dt; µ, σ ) = 1 σ √ 2πe −(dt −µ)2 2σ 2 ,

and since the demand realisations are independent, the log-likelihood of the sample is given by `(µ, σ ) = n

∑

t=1 log( f (dt; µ, σ )) = −n log(√2π) − n log(σ ) − 1 2σ2 n

∑

t=1 (dt− µ)2. We have d` dµ = 1 σ2 " n

∑

t=1 dt ! − nµ # . The resulting first order condition leads to the estimator

ˆ µ = 1 n n

∑

i=1 dt, (2)

which is the average of all previous per time unit demand observations. Note that in this case the OLS esti-mator and the Maximum Likelihood estiesti-mator for µ coincide. In the generalised model with trend-stationary demand, we will use the OLS estimation approach. Furthermore, we have

d` dσ = − n σ+ 1 σ3 n

∑

i=1 (dt− µ)2,

(12)

We proceed by using Maximum Likelihood theory in order to derive the information matrix and, ultimately, the asymptotic variance and covariance of the estimators. These are the main ingredients for analysing the cost effect of relaxing the assumption that the demand distribution has known parameters.

We have: d2` dµdµ = − n σ2, d2` dµdσ = d2` dσ dµ = − 2 σ3 " n

∑

t=1 dt ! − nµ # , d2` dσ dσ = n σ2− 3 σ4 n

∑

t=1 (dt− µ)2.

Hence, the information matrix is given by I (µ,σ) = −E −n σ2 − 2 σ3[(∑ n t=1dt) − nµ] − 2 σ3[(∑ n t=1dt) − nµ] _σn2−_σ34∑ n t=1(dt− µ)2 . Since E[(∑tn=1dt) − nµ] = 0 and E

∑nt=1(dt− µ)2 = nσ2, we find I (µ,σ) = n σ2 0 0 _σ2n2 .

The asymptotic variance matrix of the estimators is given byI−1. Hence, AVar( ˆµ ) =σ 2 n , and AVar( ˆσ ) = σ 2 2n,

and the asymptotic covariance of both estimators is 0. Therefore, we can estimate the variance of both estimators by c Var( ˆµ ) = σˆ 2 n and c Var( ˆσ ) = σˆ 2 2n.

2.3 Optimising under uncertainty

Previously we derived the optimal order-up-to level starting from a known demand distribution with known parameters. Here we acknowledge immediately that the underlying demand distribution is unknown, and that in the optimisation we use an estimate.

2.3.1 Optimal order-up-to level under uncertainty of µ

First assume that µ is unknown, whereas σ is known, and µ is estimated through (2) using historical data. Given the point estimate, we acknowledge that µ is still unknown. However, as we know that

(13)

Denote the corresponding density function by fκµ. In this specific case where σ is known, the asymptotic

approximation of fκµ actually coincides with the true finite sample distribution, and thus this approach is

actually an exact approach to modeling parameter uncertainty. Given any value ˜κµ for κµ, and given

order-up-to level S, total costs are given by TCc(S, n; ˆµ + ˜κµ, σ ). Hence, given a point estimate ˆµ , expected total

costs are given by

c TCc(S, n; ˆµ , σ ) = ∞ Z −∞ TCc(S, n; ˆµ + ˜κµ, σ ) fκµ( ˜κµ)d ˜κµ.

With a derivation analogous to the one from Section 2.1, we find that the first order condition reduces to

∞ Z −∞ FD[n+1,n+L+1](S; ˆµ + ˜κµ, σ ) fκµ( ˜κµ)d ˜κµ = p p+ h,

where the lead time demand distribution uses the value ˆµ + ˜κµ for µ and is thus specified as a Normal

distribution with mean (L + 1)( ˆµ + ˜κµ) and standard deviation

√

L+ 1σ . Given the point estimate ˆµ , the first order condition can be solved for S. Denote the optimal order-up-to level by S∗_µ. Note that

d2 c TCc dS2 = (h + p) ∞ Z −∞ fD[n+1,n+L+1](S; ˆµ + ˜κµ, σ ) fκµ( ˜κµ)d ˜κµ> 0,

so that the corrected objective function is still convex. 2.3.2 Relationship with the Bayesian approach

Whereas we did not use any Bayesian analysis in finding our parameter estimates, our method of handling uncertainty does have similarities with the Bayesian framework, of which the principles can be found in Geweke, Koop, and Van Dijk (2011) . Recall that in our model dt ∼ N(µ, σ2), where we assumed that σ is

known, but µ is not. Bayes’ rule states that

f_{µ |d}=Lµfµ m(d),

where fµ reflects a prior belief about the distribution of µ, Lµ is the likelihood of the observed data, given

µ , m(d) is a normalizing constant following from the marginal distribution of the data, and f_{µ |d} is the re-sulting posterior distribution of µ. Hence, given a prior belief about the parameter distribution, the data is used to find a posterior distribution which is then used as the distribution of the parameter to form the point estimate and handle uncertainty. In our analysis we use a known estimator for µ and its asymptotic proper-ties to estimate a “posterior” distribution directly, which we subsequently use to handle estimation uncertainty. It is possible to arrive by Bayesian principles at the same result as we did. To achieve that, one should use an uninformative prior which, as the name suggests, does not contain any prior beliefs. Let us resort to an improper, uninformative uniform prior fµ= 1. This prior is improper in the sense that it does not integrate up

to 1 and is therefore not a valid density function. Then, the posterior is proportional to the likelihood and the prior cancels out. Hence, the posterior density is proportional to

n

∏

t=1 1 exp −(dt− µ) 2 2σ2 = exp −∑ n t=1dt2− 2µ ∑nt=1dt+ nµ2 2σ2 , which is in turn proportional to

exp n µ − 1 n∑ n t=1dt 2 2σ2 ! .

So, apart from proportionality terms that can be captured in m(d) we find µ|d ∼ N( ˆµ , σ2/n), which is nu-merically equivalent to stating that µ = ˆµ + κµ, where κµ∼ N(0, σ2/n). The fundamental difference is that

(14)

2.3.3 Simulation study under uncertainty of µ

Table 1: Simulation study: µ unknown, σ known

µ σ n p h L S˜ S∗_µ TC( ˜S) TC(S∗_µ) 100S ∗ µ− ˜S ˜ S 100 TC(S∗_µ)−TC( ˜S) TC( ˜S) 10 2 3 10 1 5 66 71 18 15 9% -18% 10 2 5 10 1 5 66 70 15 13 5% -11% 10 2 10 10 1 5 66 68 12 11 3% -4% 10 2 3 10 1 1 24 25 7 7 5% -5% 10 2 5 10 1 1 24 25 6 6 3% -2% 10 2 10 10 1 1 24 24 6 6 2% -1% 10 2 3 10 1 2 35 37 10 9 6% -8% 10 2 5 10 1 2 35 36 8 8 4% -4% 10 2 10 10 1 2 35 35 7 74 2% -2% 10 2 3 10 1 10 119 129 34 26 9% -25% 10 2 5 10 1 10 119 126 26 21 6% -18% 10 2 10 10 1 10 119 123 19 174 3% -9% 10 2 3 5 1 5 65 68 14 13 5% -9% 10 2 5 5 1 5 65 67 12 11 4% -5% 10 2 10 5 1 5 65 66 9 9 2% -2% 10 2 3 20 1 5 68 74 24 18 9% -26% 10 2 5 20 1 5 68 72 18 15 6% -17% 10 2 10 20 1 5 68 70 14 13 3% -6% 10 2 3 50 1 5 70 77 35 21 11% -41% 10 2 5 50 1 5 70 75 24 18 7% -27% 10 2 10 50 1 5 70 73 17 15 4% -10% 10 3 3 10 1 5 70 77 28 23 10% -18% 10 3 5 10 1 5 70 74 22 20 7% -11% 10 3 10 10 1 5 70 73 17 17 4% -4%

We can calculate the expected cost reduction that follows from using the method derived above, as compared to using the naïve method which plugs in ˆµ in the optimal equation under full information. As before, we denote the resulting order-up-to level from this naïve method by ˜S. Given the true parameters µ and σ , we simulate historical demands, estimate µ by ˆµ and calculate both S∗_µ using the newly derived, corrected pro-cedure, and ˜Susing the naïve method. By plugging the order-up-to levels into the true expected cost function which depends on µ and σ , we can compare the cost consequences. Table 1 shows the results based on 5000 simulations per configuration.

(15)

too low, and this results in an even larger cost consequence already in the case of continuous review, where no uncertainty of demand over a longer period, apart from the lead time, has to be taken into account. Based on three observations in the base case, the cost benefit of correctly handling the estimation uncertainty in µ is 18%, whereas the difference in safety stock is 9%. As n increases, both the safety stock mark-up and the cost benefit decrease, as extra information results in more accurate estimates and hence less uncertainty in the parameters. However, even with 10 observations, the cost difference is still 4%. Because of the assumption that σ is known, the safety stock mark-up per parameter configuration is constant throughout the simulation repetitions and can be read off from Table 1 by subtracting the reported mean value of ˜Sfrom that of S∗_µ. To show this, in Figure 1 we plot the distributions of both the naïve order-up-to levels in the different simula-tions, and the improved versions. However, because different draws lead to different order-up-to levels, the percentage difference does change.

Figure 1: Order-up-to level distribution plot of ˜S (classical approach) and S∗_µ(improved approach), for µ = 10, σ = 2, p= 10, h = 1, L = 5, n = 5.

As the simulations show, the lead time has a large effect on the cost benefit that S∗_µ provides over ˜S. When the lead time is equal to 1, the cost benefits are below 5%. However, if L = 10, then cost benefits for n = 3 of up to 25% can be achieved. This can be explained by noting that if the lead time increases, then the effect of estimation error in µ magnifies. For example, if µ is underestimated, then demand in all time instants during the lead time is underestimated, and the positive correlation of forecast errors is more severe.

Also the shortage costs have a substantial impact on the actual benefit. Since S∗_µ is typically larger than ˜

S, it reduces the risk of backorders. It is therefore expected, and underpinned by the simulation results, that the costlier backorders are, the larger the cost benefit of applying S∗_µ is. If p = 50, then even with n = 10 historical observations, S∗_µ leads to a cost advantage of almost 10% compared to ˜S, which increases to over 40% for n = 3. Observe that as an optimal result, in a model with holding and shortage costs, the fill rate FR follows from

FR= p

p+ h.

(16)

Figure 2: Violin plots of costs for order-up-to levels ˜S (classical approach) and S∗_µ (improved approach), for µ = 10, σ = 2, p = 10, h = 1, L = 5, n = 5.

that yield the largest benefits but may seem extreme, are actually most realistic in practice.

Lastly, when σ increases, this increases the fluctuation in lead time demand and hence the mark-up of S∗_µ over ˜Sincreases. However, the cost benefit remains approximately the same. So, σ can be seen as a scaling parameter of the uncertainty in demand. If it is increased, then a larger correction for uncertainty is needed, but a similar cost benefit will be achieved.

In conclusion, especially depending on the lead time and the number of historical observations, the cost benefit of properly addressing the uncertainty in µ may be substantial. The cost benefit is largest with a large lead time, small number of observations, and high shortage costs. The first condition leads to a large effect of positively correlated forecast errors, the second condition leads to a direct increase in estimation uncertainty, and the latter condition increases the cost error for underestimating demand.

(17)

makers can greatly benefit from using the improved method which correctly handles forecast uncertainty. 2.3.4 Optimal order-up-to level under uncertainty of µ and σ

In practice, we generally cannot assume that σ is known. In that case, σ can be efficiently estimated through (??). The uncertainty of σ affects both the distribution of the estimation error of µ and σ . Especially, whereas the distribution of κµ was known exactly in the previous sections (because of the fact that σ was known), it

has to be approximated now. The unknown parameter vector is θ =

µ σ

, which can be decomposed into a point estimate and an estimation error: θ = ˆθ + κθ. Using the results of Section 2.2 we find that we can

approximate the distribution of κθ by a multivariate Normal distribution:

κθ approx ∼ MV N 0, " ˆ σ2 n 0 0 σˆ2 2n #! .

Denote the pdf of this distribution by fκθ. We remark that although a standard deviation has to be non-negative,

the approximate Normal distribution that is used does allow for negative values. Therefore, in extreme cases where the probability of negative values becomes non-negligible, the model may become unsolvable. In Section 4 we will elaborate on this issue. Given point estimates ˆµ and ˆσ , collected in ˆθ , expected total costs are given by c TCc(S, n; ˆθ ) = ∞ Z −∞ ∞ Z −∞ TCc(S, n; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ.

The first order condition reduces to

∞ Z −∞ ∞ Z −∞ FD[n+1,n+L+1](S, n; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ = p p+ h,

where the lead time demand distribution uses the value ˆµ + ˜µ for µ and ˆσ + ˜σ for σ . Given the point estimates ˆ

µ and ˆσ , the first order condition can be solved for S. Denote the optimal order-up-to level by S∗_{µ ,σ}. Convexity of cTCcfollows, since d2TCcc dS2 = (h + p) ∞ Z −∞ ∞ Z −∞ fD_[n+1,n+L+1](S, n; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ > 0.

2.3.5 Simulation study under uncertainty of µ and σ

For the general case with both µ and σ unknown, we also perform a simulation study. Although the computa-tions are somewhat more involved and require the calculation of double integrals, they are still well-solvable with standard software packages. Again simulating demands based on similar parameter configurations as in the case with only µ unknown, we compute S∗_{µ ,σ} as derived above. The order-up-to level of comparison is ˜S, which arises from plugging the point estimates for mean and variance in the full-information cost function and optimising from that point of departure. These simulations are again based on 5000 repetitions per parameter configuration. We restrict the exposition to the base case, and extensions to L = 10, p = 20, and σ = 3. The results are shown in Table 2.

The results of Table 1 and Table 2 are not readily comparable, because the underlying models differ (the simulation of Table 1 assumes that σ is known, which means that the distribution of κµ is known exactly).

(18)

Table 2: Simulation study: µ and σ unknown µ σ n p h L S˜ S∗_{µ ,σ} TC( ˜S) TC(S∗_{µ ,σ}) 100S ∗ µ ,σ− ˜S ˜ S 100 TC(S∗_{µ ,σ})−TC( ˜S) TC( ˜S) 10 2 3 10 1 5 66 71 21 19 7% -12% 10 2 5 10 1 5 66 69 16 15 5% -9% 10 2 10 10 1 5 66 68 12 12 3% -4% 10 2 3 10 1 10 118 128 40 32 8% -19% 10 2 5 10 1 10 118 125 29 25 6% -16% 10 2 10 10 1 10 119 123 20 18 3% -8% 10 2 3 20 1 5 67 74 31 24 9% -22% 10 2 5 20 1 5 68 72 22 18 6% -16% 10 2 10 20 1 5 68 70 15 14 3% -8% 10 3 3 10 1 5 69 76 32 28 10% -12% 10 3 5 10 1 5 69 74 24 22 7% -9% 10 3 10 10 1 5 70 72 18 18 4% -4%

Figure 3: Order-up-to level distribution plot of ˜S (classical approach) and S∗_{µ ,σ}(improved approach), for µ = 10, σ = 2, p= 10, h = 1, L = 5, n = 5.

Comparing both set-ups, the conclusion is that when studying a model where also σ is unknown, cost benefits do not increase, presumably because ˆµ and ˆσ are asymptotically uncorrelated. Actually, the cost benefits are overall slightly smaller than under certainty of σ , since the parameter σ of the estimation error variance is not known and has to be estimated. Another issue is the fact that fκσ supports the entire real line, whereas σ

cannot be negative. We will return to this issue in Section 3.

(19)

Figure 4: Violin plot of costs for order-up-to levels ˜S (classical approach) and S∗_{µ ,σ} (improved approach), for µ = 10, σ = 2, p = 10, h = 1, L = 5, n = 5.

Not only is S∗_{µ ,σ} higher on average, its spread is also larger. The explanation is that the estimated variances of both ˆµ and ˆσ depend on the point estimate ˆσ . Therefore, if σ is underestimated, then the mark-up of S∗_{µ ,σ} over ˜S will be smaller than when σ is overestimated. Figure 4 shows for the same configuration the corre-sponding violin plots of the cost distributions. Similar to the case with only µ unknown, the cost distribution corresponding to ˜Shas non-negative mass for higher costs than does the distribution corresponding to S∗_{µ ,σ}. Again, by setting an overall higher safety stock level, median costs are increased slightly, but thereby mean and standard deviation of costs are reduced. The improvement is less significant than in the case with only µ unknown. For example, the probability of observing costs larger than 30 is now under the classical approach twice as large as under the improved approach, whereas in the case with only µ unknown this probability was three times as large.

2.4 Extension to periodic review

A more general inventory framework deals with periodic review, by which we mean, as explained in Section 1.3, that demand is only observed at the beginning and end of review periods of length T > 0. The optimisation is done at time t, at the beginning of the n + 1st review period. As derived in Section 1.3, the objective function that has to be minimised is

TCp(S,t) = h T

∑

i=1 E(S − D_[t+1,t+L+i])+ + p T

∑

i=1 E(S − D_[t+1,t+L+i])− . Using the results of Section 2.1, this objective function can be rewritten as

TCp(S,t) = h T

∑

i=1   S Z −∞ FD_[t+1,t+L+i](x)dx  + p T

∑

i=1   ∞ Z S h 1 − FD_[t+1,t+L+i](x) i dx  .

(20)

(t = 1, ..., n). As Dt i.i.d.

∼ N(µ, σ2_{), we find that total demand per review period has a Normal distribution with}

mean µT and standard deviation σ√T. Since T is fixed, the historical observations dt (t = 1, ..., n) can be

used to estimate µ and σ . Therefore, we will apply Maximum Likelihood estimation again. Since the pdf of the observations is given by

f(dt; µ, σ , T ) = 1 σ √ 2πTe −(dt −µT )2 2σ 2T ,

the log-likelihood of the sample is given by `(µ, σ ) = n

∑

t=1 log( f (dt; µ, σ , T )) = −n log(√2πT ) − n log(σ ) − 1 2T σ2 n

∑

t=1 (dt− µT )2.

Analogously to Section 2.2 we now find

ˆ µ = 1 nT n

∑

t=1 dt,

which is the average of all previous per time unit demand observations, divided by the length of a period. The corresponding estimator for σ is

ˆ σ = s 1 (n − 1)T n

∑

t=1 (dt− ˆµ T )2.

The approximate variances of the estimators are given by

c Var( ˆµ ) = σˆ 2 nT and c Var( ˆσ ) = σˆ 2 2n,

and both estimators are asymptotically uncorrelated. Therefore, we can write θ = µ σ = ˆθ + κθ, where κθ approx ∼ MV N 0, " ˆ σ2 nT 0 0 σˆ2 2n #! . Denote the corresponding pdf by fκθ.

Then, similar to before, conditional total costs, given point estimates ˆµ , and ˆσ , are

c TCp(S,t; ˆθ ) = ∞ Z −∞ ∞ Z −∞ TCp(S,t; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ. Since dTCp dS = (p + h) T

∑

i=1 h FD[t+1,t+L+i](S; ˜µ , ˜σ ) i − pT, we find that in optimum S solves

(21)

(22)

3 A trend-stationary demand model

The models treated so far assumed that demand is mean-stationary and demand at any time instant is uncorre-lated with other demands. This resulted in estimation errors that are linear with respect to time (mean demand over a period of length r is µr). However, a more general assumption is that demand has a trend, which can be either upward, downward, or zero. In this case, as we will show, the estimation error will increase quadrat-ically in time. The trend-stationary model is a direct generalisation of the mean-stationary model presented in Section 2. By assuming a trend of zero, the mean-stationary model is obtained. In this section we will discuss the general discrete time, linear trend model. In line with the previous section, we will consider a continuous review model where orders arrive after a lead time of L ≥ 0 time units. Holding costs h, shortage costs p ≥ h are similar to above, per unit and per time unit. Order costs are 0, and remaining inventory can be freely disposed of such that the desired order-up-to level can always be achieved. We are currently at time n, and hence the historical observations consist of n demands dt, t = 1, ...n. Demand at time t (t = 1, 2, ...) is

distributed according to a linear trend-stationary model Dt= α + β t + εt, with εt

i.i.d.

∼ N(0, σ2). (3)

3.1 Optimal policy with known distribution parameters

As in Section 2, we will first derive the optimal order-up-to policy under the assumption that α, β , and σ2are known. Apart from the distributional assumption on demand, the inventory model is identical to the model described in Section 1.3 and applied in Section 2. Let the process D[k,m] denote total demand during time

interval k, ..., m. The decision at time n affects costs at time n + L + 1. Given order-up-to level S, under full information, expected costs are thus

TCc(S, n; α, β , σ2) = hE(S − D[n+1,n+L+1])++ pE(S − D[n+1,n+L+1])−, where D_[n+1,n+L]= n+L+1

∑

t=n+1 Dt. Generally, if Dt i.i.d.

∼ N(µ(t), σ2_{(t)), where µ and σ}2_{are functions of t, then}

D_[n+1,n+L+1]∼ N n+L+1

∑

t=n+1 µ (t), n+L+1

∑

t=n+1 σ2(t) ! ≡ N(µlead,n, σlead,n2 ).

Denote the corresponding cdf by FD[n+1,n+L+1]. Then, in analogy with Section 2, we can write

TCc(S, n; α, β , σ2) = h S Z −∞ FD[n+1,n+L+1](x)dx + p ∞ Z S (1 − FD[n+1,n+L+1](x))dx.

Minimizing with respect to S, we find that in optimum S must satisfy FD_[n+1,n+L+1](S) =

p p+ h.

(23)

3.2 Parameter estimation

However, in general applications, α, β , and σ2are unknown and have to be estimated. It is easily checked that model (3) satisfies the assumptions of linearity, homoskedasticity, strict exogeneity, and no multicollinearity, such that the OLS estimator for (α, β ) is the best linear unbiased estimator. Hence, at time n, the parameters α and β are best estimated by

ˆα ˆ β = n ∑nt=1t ∑nt=1t ∑t=1n t2 −1 ∑nt=1dt ∑nt=1tdt . (4)

Then the OLS predictor of demand at time ˜t > n is ˆ

D˜t= ˆα + ˆβ ˜t. The asymptotic covariance matrix of the OLS estimator is given by

AVar ˆα_ˆ β = σ2     n n

∑

t=1 t n

∑

t=1 t n

∑

t=1 t2     −1 = σ 2 n

_∑

t t2−

∑

t t 2  

∑

t t2 −

_∑

t t −

_∑

t t n  = σ2 4n+2 n(n−1) − 6 n(n−1) − 6 n(n−1) 12 n(n2₋₁₎ ! ,

where we use that

n

∑

t=1 t= n(n + 1) 2 and n

∑

t=1 t2=n(n + 1)(2n + 1) 6 . We can replace σ2by ˆ σ2= 1 n− 2 n

∑

t=1 ˆεt2,

letting ˆεt = Dt− ˆDt, and find an efficient estimator of the asymptotic covariance matrix of ˆα and ˆβ :

d AVar ˆα_ˆ β = ˆσ2 4n+2 n(n−1) − 6 n(n−1) − 6 n(n−1) 12 n(n2₋₁₎ ! .

Furthermore, as ˆσ2is the Maximum Likelihood estimator, analogously to (??), we can derive its asymptotic variance similar to the derivation in Section 2.2 (but now minimising the likelihood with respect to σ2rather than σ ), and find

d

AVar( ˆσ2) =2σ

4

n .

3.3 The error in point-forecasted mean demand

Using the derived approximate distributions of the estimators, we find that the estimated asymptotic variance of ˆD˜t(the forecast of demand at time ˜t > n) is

c Var( ˆD_˜t) = ˆσ2 1 ˜t 4n+2 n(n−1) − 6 n(n−1) − 6 n(n−1) 12 n(n2₋₁₎ ! 1 ˜t =σˆ 2 n 12˜t2 n2_{− 1}− 12˜t n− 1+ 4n + 2 n− 1 , (5)

which is quadratic in ˜t, the time instant for which the forecast is made.

(24)

Figure 5: Point forecast variance of per time unit demand with a deterministic trend, based on n observations. ˆσ = 1.

Figure 6: Point forecast variance of lead time demand with a deterministic trend, based on n observations. ˆσ = 1.

Silver et al., 1998), this uncertainty is usually neglected. Instead, the OLS prediction is directly used as the mean value for demand, and the variance of demand is estimated by ˆσ2= _n−21 ∑nt=1ˆεt2. So, whereas in the

(25)

especially when n is small, the resulting inaccuracy is enormous.

For estimating lead time demand, demand estimates for all time units in the lead time are aggregated. The forecast variance now consists of all individual forecast variances plus their covariances. Note that

ˆ D_[n+1,n+L+1]= (L + 1) ˆα + ˆβ n+L+1

∑

t=n+1 t= (L + 1) ˆα + (L + 1) n+ 1 +L 2 ˆ β .

Therefore, the sum of demands during the lead time, starting at time n + 1, has estimated forecast variance

c Var( ˆD_[n+1,n+L+1]) = ˆσ2(L + 1)2 1 n+ 1 +L 2 4n+2 n(n−1) − 6 n(n−1) − 6 n(n−1) 12 n(n2₋₁₎ ! 1 n+ 1 +L₂ .

In Figure 6 we depict this estimated total lead time forecast variance, which is neglected in the general literature. As the figure shows, the lead time forecast errors are substantially larger than the sum of the individual demand forecast errors. It becomes apparent that the positive correlations between the per time unit demand forecasts increases the total lead time forecast errors considerably. For example, for n = 10 and L= 10, the lead time forecast variance is 131, whereas the sum of the per time unit demand forecasts is 14. For the lead time, the demand variance that is taken into account is (L + 1) ˆσ2, whereas the (usually neglected) error variance of the point estimated mean is again a multiple of this number. This inaccuracy is not accounted for in general inventory models. Therefore, in the sequel we will apply an inventory model which correctly handles the uncertainty, and compare its incurred costs with those following from the generally used method.

3.4 Optimising under uncertainty

3.4.1 Optimal order-up-to level under uncertainty of α, β , and σ2

We will now derive, under uncertainty of the true parameters, but with OLS point estimates, an optimal order-up-to level S. Actually, we can write the entire cost function directly in terms of α, β , and σ2, and take the expectation. However, since Figure 6 showed that lead time demand forecast errors are substantial compared to the per time unit errors, it is instructive to derive the total demand forecast errors explicitly. Later on, we will give an example under periodic review in which we take the direct approach. Our point of departure is again the full information total cost function,

TCc(S, n; α, β , σ2) = h S Z −∞ FD[n+1,n+L+1](x)dx + p ∞ Z S (1 − FD[n+1,n+L+1](x))dx.

Total demand in time instants n + 1, ..., n + L + 1 can be written as µlead,n= ˆD[n+1,n+L+1]+ κµ, where κµ approx. ∼ N0, cVar( ˆD_[n+1,n+L+1]) .

The point-estimated variance of total demand is (L + 1) ˆσ2. Since the estimated variance of ˆσ2is 2 ˆσ4/n, the estimated variance of the total demand variance estimator is 2(L + 1)2σˆ4/n. Therefore, we can write

σ_lead,n2 = (L + 1) ˆσ2+ κ_σ2, where , κ_σ2 approx. ∼ N 0, 2(L + 1)2_ˆ σ4/n . Similar to before, we will write

(26)

where κθ approx. ∼ MV N 0, c Var( ˆD_[n+1,n+L+1]) 0 0 2(L + 1)2σˆ4/n .

Denote the corresponding pdf by fκθ. Now, write the conditional total costs, given the error realisations ˜κµ

and ˜κ_σ2, as TCc(S, n; ˆθ + ˜κθ) = h S Z −∞ FD[n+1,n+L+1](x; ˆθ + ˜κθ)dx + p ∞ Z S h 1 − FD[n+1,n+L+1](x; ˆθ + ˜κθ) i dx.

Then, total expected costs, given order-up-to level S, are

c TCc(S, n; ˆα , ˆβ , ˆσ2) = ∞ Z −∞ ∞ Z −∞ TCc(S, n; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ2.

Maximizing total expected costs w.r.t. S leads to the defining equation

∞ Z −∞ ∞ Z −∞ FD[n+1,n+L+1](x; ˆθ + ˜κθ) fκθ( ˜κθ)d ˜κµd ˜κσ2 = p p+ h,

which can be solved for S. Again, it easily follows from the second order condition that this function is convex. Denote the resulting optimal order-up-to level by S∗. Observe furthermore that by modeling the forecast uncertainty directly in terms of lead time demand mean and variance, we reduce the dimension of the integration in the final expression from three to two, which even with the current computational standards leads to a dramatic reduction in solving time.

3.4.2 Simulation study under uncertainty of α, β , and σ2

Table 3: Simulation study: linear trend model

(27)

As we did with the previous models, we will compare the derived, optimal policy with the general, naïve policy ˜S from Section 3.1. In our simulation study, which is again based on 5000 simulations per param-eter configuration, we draw demand observations according to fixed paramparam-eter choices for α, β , and σ2, apply both the newly derived optimal method and the generally applied naïve method and compare the actual expected costs of both methods under the true parameters, using total cost function

TCc(S, n; α, β , σ2) = h S Z −∞ FD_[n+1,n+L+1](x; α, β , σ2)dx + p ∞ Z S (1 − FD_[n+1,n+L+1](x; α, β , σ2))dx.

The results are shown in Table 3.

What immediately strikes the eye is that the cost differences between the improved and the naïve approach are positive for n = 3 (except if p = 20). This is undesirable, as it reflects that the improved approach actu-ally performs worse than the naïve approach. On the other hand, we observe that for n = 5 and n = 10 the cost differences are negative and much larger in magnitude than in the case of mean-stationary demand. The explanation for the bad results in the case n = 3 can be found in the Normal approximation that we use for modeling the error lead time demand variance. Our demand model specifies lead time demand via a Normal distribution with a mean and a variance, which we both model by their point estimates plus some Normally distributed error term. For the mean this does not lead to technical problems, as the entire real line is sup-ported, but for the variance this means that we attach positive probability to negative values which are not allowed. The probability attached to such values increases in L and σ and decreases in n, as follows from the estimated variance of κ_σ2. Since in the defining expression for S one cannot integrate over values for κ_σ2

that result in a negative variance, the actual lead time demand variance will be overestimated, and therefore safety stock levels will be set too high. This is reflected in the results. The safety stock adjustment is about 250%, 340%, and 480% in respectively the n = 3 base case configuration, the case with L = 10, and the case with σ = 3. For p = 20 and n = 3 the resulting cost difference is still negative (as high safety stocks decrease the probability of backorders, which are very costly in that setting), but very small, as one would expect the benefit to be larger than in the case n = 10. Note, however, that although it is striking that an improved method results in larger costs, this is only the case in the set-up with n = 3, in which the model is just identified. It is questionable how realistic such a setting is in practice and one maybe should not expect an asymptotic approximation to perform well in such a setting. We will return to this issue later. In Section 3.5 we use a heuristic approach to overcome the problem of negative variances, and in Section 4 we expand on different methods of modeling the forecast error.

Apart from the undesirable effect when n is small, it is clear that compared to the mean-stationary demand case the cost benefits are overall much larger in this setting, which is in line with our analysis of the lead time demand forecast errors. Like in the mean-stationary demand case, the estimates become more accurate as n increases, and therefore the safety stock mark-up decreases in n. This also leads to a reduction in cost benefit, as follows from a comparison of the n = 5 and n = 10 cost differences.

(28)

Figure 7: Violin plot of costs for order-up-to levels ˜S (classical approach) and S∗(improved approach), for α = 10, β = 1, σ = 2, p = 10, h = 1, L = 5, n = 5.

is likely to play a role here. However, overall in the improved approach there is much more mass attached to small costs than in the classical approach, which shows that even when using this asymptotic approximation, the proper handling of uncertainty is indeed beneficial to the decision maker.

3.5 Heuristic approach

As observed in the previous section, the asymptotic Normal approximation that we use to model the forecast error has its drawbacks, the major one being that the support of the Normal distribution covers the entire real line. The benefits of this asymptotic approximation are clear: it provides a simple approach to modeling the forecast error, based on a distribution that is easy to work with, and as the sample size increases, its lack of accuracy compared to an exact approach diminishes. In this section we discuss an approach based on the truncated Normal distribution. The truncated Normal distribution is related to the Normal distribution, but it has a bounded support, which is exactly what we desire for modeling lead time demand variance. In order to prevent the lead time demand variance from being evaluated at a negative value, we follow the derivation of Section 3.4, but substitute the distribution of κ_σ2, the error in the lead time demand variance forecast, for

a truncated Normal distribution with minus one times the point estimate for the lead time demand variance as lower truncation bound, and with no upper truncation bound. Observe that if the support of the error is bounded at minus one times the point estimate, then the actual range of values that will be substituted in the cost function for the lead time demand variance is bounded at 0, as we desire. Furthermore, whereas we derive this heuristic in the setting of the linear trend model, it can obviously also be applied to the mean-stationary demand model.

(29)

Table 4: Simulation study: heuristic approach to the linear trend model α β σ n p h L S˜ S∗ TC( ˜S) TC(S∗) 100S ∗_{− ˜}_S ˜ S 100 TC(S∗)−TC( ˜S) TC( ˜S) 10 1 2 3 10 1 5 105 142 146 95 45% -35% 10 1 2 5 10 1 5 117 138 74 45 19% -40% 10 1 2 10 10 1 5 147 157 33 24 7% -28% 10 1 2 3 10 1 10 223 334 402 246 151% -39% 10 1 2 5 10 1 10 240 302 210 115 28% -45% 10 1 2 10 10 1 10 295 321 87 52 9% -41% 10 1 2 3 20 1 5 106 151 266 137 54% -49% 10 1 2 5 20 1 5 118 145 124 57 23% -54% 10 1 2 10 20 1 5 149 161 51 28 8% -44% 10 1 3 3 10 1 5 111 166 191 126 145% -34% 10 1 3 5 10 1 5 119 151 113 70 29% -39% 10 1 3 10 10 1 5 151 165 48 35 10% -28% 10 2 1 3 10 1 5 143 179 153 102 28% -34% 10 2 1 5 10 1 5 167 189 77 46 13% -40% 10 2 1 10 10 1 5 228 238 32 23 4% -28%

in terms of cost savings than the asymptotic approach, and for n = 10 the cost savings are almost identical. Intuïtively these results are very clear: the negative variances that were supported by the asymptotic approach are now prevented. The probability of such negative variances decreases if n increases, and therefore, as n increases, this heuristic approach will converge to the asymptotic approach. Therefore, it is in any situation advisable to use the heuristic approach instead of the asymptotic approach.

The safety stock mark-ups that this method provides show the same pattern as we saw before, with large mark-ups for small n and smaller mark-ups for large n. Although we observed that the asymptotic approach causes the mark-ups for n = 3 to be much too large, we observe here that these mark-ups are still large (al-though less than under the asymptotic approach), and the mark-ups given here do lead to significant cost reductions. Hence, it is clear that the classical approach of substituting the parameter point estimate directly into the decision problem leads to dramatic cost disadvantages, especially in this situation of trended demand. For obvious reasons, this disadvantage is largest when the lead time is largest and when backorders are most expensive. Furthermore, we again observe that the cost advantages are invariant with respect to the parame-terisation of α, β , and σ . The latter does influence the size of the safety stock mark-up, however.

Whereas for mean-stationary demand we noted that the cost benefit decreases in n, the exception for trend-stationary demand is still the case n = 3. Although the heuristic approach leads to a cost benefit, it is still percentually consistently smaller than for n = 5. It appears that in this (rare) case where the model is just-identified, the variance of the estimators is so large that properly taking it into account causes the safety stock mark-ups to be larger than is desirable. Observe that although given an estimate ˆσ , κσ may indicate that the

(30)

3.6 Extensions: periodic review and exogenous variables

The optimal policy for discrete time, linear trend models can be extended to periodic review inventory models as well. Consider the same set-up as above, but now in the periodic review set-up as explained in Section 1.3, with review periods of T time units. The data set consists of n historical review period demands. In order to use these to forecast trended demand, we rewrite (3) as

t+T

∑

i=t+1 Di= T α + β t+T

∑

i=t+1 i+ νi, where t+T

∑

i=t+1 i=T 2 2 + T t + T 2, and νt i.i.d. ∼ N(0, σ2

ν) is distributionally equivalent to the sum of T independently distributed εi. With this

transformation, we still allow for demand that is not mean-stationary during the review period, whereas we correct for the fact that we only observe aggregated demands per review period. Hence, we obtain a new OLS equation. The demand observation of the period that started after time t has to be regressed on the constant review period length T and the variableT₂2+ T t +T₂. Let us collect all regressor observations in the matrix Xp.

By applying the standard OLS techniques on this regressor matrix, ˆα and ˆβ can be estimated. The forecast of demand for time ˜t is still ˆD_˜t= ˆα + ˆβ ˜t, with estimated variance ˆσ_ν2(1, ˜t)0(Xp0Xp)−1(1, ˜t), where ˆσν2is estimated

in the usual way, via the sum of squared residuals of the OLS regression. Assume that we are at time t, at the beginning of the n + 1st review period. As we touched upon before, total costs are given by

TCp(S,t; α, β , σ2) = h T

∑

i=1 E(Sp− D[t+1,t+L+i])+ + p T

∑

i=1 E(Sp− D[t+1,t+L+i])− = h T

∑

i=1   Sp Z −∞ FD[t+1,t+L+i](x)dx  + p T

∑

i=1   ∞ Z Sp h 1 − FD[t+1,t+L+i](x) i dx  .

This function depends implicitly on α, β , and σ2_{, since all demand distributions have as mean linear}

transfor-mations of α and β , and as variance a linear transformation of σ2. Because the different demand distributions that are captured in the total cost function, are, when estimated, intercorrelated, it is preferred to capture the uncertainty in total costs directly by considering total costs as a function of α, β , and σ2_{, contrary to what we}

did under continuous review. The estimators ˆα and ˆβ have an approximate multivariate Normal distribution: ˆα ˆ β approx. ∼ MV N α β , ˆσ_ν2(X_p0Xp)−1 , and, independently, ˆ σ_ν2approx.∼ N σ_ν2,2σ 4 ν n . Note that σ2= σ_ν2/T . So, we can write

(31)

Denote the corresponding multivariate pdf by fκθ. Then, similar to before, the conditional total costs, given

point estimates ˆα , ˆβ , and ˆσ2, are

c TCp(S,t; ˆα , ˆβ , ˆσ2) = ∞ Z −∞ ∞ Z −∞ ∞ Z −∞ TCp(S,t; θ + ˜θ ) fκθ( ˜κθ)d ˜καd ˜κβd ˜κσ2. Since dTCp dS = (p + h) T

∑

i=1 h FD_[t+1,t+L+i](S) i − pT, we find that in optimum S solves

∞ Z −∞ ∞ Z −∞ ∞ Z −∞ T

∑

i=1 h FD[t+1,t+L+i](S; ˆθ + ˜κθ) i f_κ_θ( ˜κθ)d ˜καd ˜κβd ˜κσ2= pT p+ h.

This defines in analogy with the continuous review model, for periodic review, discrete deterministic trend models, the optimal order-up-to level, which we denote by S∗. Please note that if T = 1, we find back the defin-ing equation for the continuous review model, with S∗ directly defined in terms of the parameters, whereas previously we derived the distribution of lead time demand first and from there optimised with respect to S∗. Instead of considering a model with only an intercept and a linear trend, one can in a similar fashion add exogenous variables to the linear model. Consider the general linear model

Dt= x0tβ + εt,

where xt and β are k-vectors. The intercept and time variable can be included in xt, but note that if the time

variable is not included, then this method is instead an extension to the mean-stationary demand inventory model of Section 2. Collect the vectors xt in the matrix X . This model can be estimated by means of OLS,

resulting in parameter estimates ˆβ and error variance estimate ˆσ2and we can use OLS theory to derive that

θ ≡ β σ2 = ˆθ + κθ, where κθ approx. ∼ MV N 0, ˆσ 2_(X0_X)−1 ₀ 00 2σ_n4 . Then it follows that the optimal order-up-to level S satisfies

(32)

4 Alternative techniques

In the previous two sections we have shown for different demand models how the uncertainty in the point estimate can be incorporated in the decision making. Given an approximation of the distribution of the estimator, the distribution of its error is derived, and finally the expectation of the objective function is taken with respect to the distribution of this error. For deriving an estimate of the distribution of the estimator, we used asymptotic properties, which are generally available and easy to implement, since they result in Normal distributions. However, using asymptotic approximations has drawbacks. The most striking example was the allowance of negative values for σ , which is impossible in a Normal distribution. For modeling estimation uncertainty, one can take different approaches. In this section we will discuss possible alternatives.

4.1 The exact distribution

It may be possible to derive the exact distribution of the estimator. Such an exact distribution is best applicable if the demand distribution indeed fits the data well. In our models, this means that the data has to resemble a Normal distribution. As stated in Section 2.3.1, in the case of mean-stationary demand under uncertainty of only µ, the asymptotic distribution is actually the exact distribution of the sample mean, ˆµ . Under uncertainty of also σ , as discussed in Section 2.3.4, this is no longer true. Since

µ − ˆµ σ / √ n ∼ N(0, 1), and since (n − 1) ˆσ2 σ2 ∼ χ 2 n−1,

as is proven e.g. in Miller and Miller (2004), it follows that µ − ˆµ ˆ σ /√n = µ − ˆµ σ / √ n r (n−1) ˆσ 2 σ 2 n−1 ∼ t_n−1,

where tn−1 denotes a Student’s T distribution with n − 1 degrees of freedom. To apply the monotonic

trans-formation rule, which can also be found in Miller and Miller (2004), we can write κµ≡ µ − ˆµ =

ˆ σ √

ntn−1≡ u(tn−1), where u has the unique inverse

w(x) =x √

n ˆ σ . It now follows that

fκµ( ˜κµ) = ftn−1(w( ˜κµ)) · w 0_{( ˜} κµ) = ftn−1 ˜κµ √ n ˆ σ · √ n ˆ σ .

The exact distribution of ˆσ is also different from its asymptotic approximation. Since(n−1) ˆσ

2

σ2 has a Chi-square

distribution with n − 1 degrees of freedom and is distributed independently from ˆµ . Rewriting, we find that σ2

(n − 1) ˆσ2 ∼ 1 χ_n−12 .

The distribution of the right hand side is Inverse χ2 with n − 1 degrees of freedom (denoted by Iχ_n−12 ). Therefore, we can write σ2= ˆσ2(n − 1)κ_σ2, where κ_σ2 ∼ Iχ_n−12 , with a pdf denoted by f_κ

(33)

use κ_σ2 in a slightly different form than usual here, namely as a multiplicative term. Denote its pdf by f_κ σ 2.

Writing fκθ( ˜κθ) = fκµ( ˜κµ) fκσ 2( ˜κσ2), we are back in the familiar framework, with objective function

c TCc(S, n; ˆθ ) = ∞ Z −∞ ∞ Z 0 TCc(S, n; ˆµ + ˜κµ, ˆσ 2_{(n − 1) ˜} κ_σ2) f_κ_θ( ˜κ_θ)d ˜κ_µd ˜κ_σ2.

The approach taken here to model the uncertainty in the estimate of σ does not suffer from the problem of attaching negative values to σ , which was one of the main drawbacks of using the Normal approximation. Contrarily, note that the derived exact distributions of the estimators depend heavily on the Normal distribu-tion underlying the demand process. If the demand distribudistribu-tion is not Normal, then one can try to find the exact distribution of the estimators via similar steps, or one can again resort to the asymptotic approximation. In a general OLS regression model, of which the deterministic trend model is a special case, with εt

i.i.d.

∼ N(0, σ2), k regressors, and parameter vector β , we have

( ˆβ − β ) ∼ MV N 0, σ2(X0X)−1 , or similarly, 1 σ( ˆβ − β ) ∼ MV N(0, (X 0 X)−1).

However, σ2is not known. Instead, we observe ˆσ2. It follows in a similar fashion as the mean-stationary case that

(n − k) ˆσ2

σ2 ∼ χ

2 n−k.

Then, similar to the univariate case, 1

ˆ

σ( ˆβ − β ) ∼ MV T (0, (X

0

X)−1, n − k),

where MV T (0, (X0X)−1, n − k) denotes a multivariate Student’s T distribution with mean 0, covariance matrix (X0X)−1 and degrees of freedom n − k. Equivalently,

κ_β≡ ( ˆβ − β ) ∼ MV T 0, ˆσ2(X0X)−1, n − k . Denote the corresponding pdf by fκβ. Similar to above, it follows that σ

2_{= ˆ}

σ2(n − k)κ_σ2, where κ_σ2∼ Iχ_n−k2 ,

with a pdf denoted by fκ_{σ 2}. By integrating the total cost function over these distributions, we are back in

the familiar framework. The deterministic trend model is a special case of this general model with k = 2 parameters α and β , and as regressors a constant and a time variable. Observe that the resulting inverse Chi-square distribution will for small n give large probabilities for high values of σ2. Although this is the exact distribution of the forecast error, it is clear that for small n using this distribution will finally lead to relatively large safety stock mark-ups. It is therefore questionable whether this approach performs much better in practice than the asymptotic approximation.

4.2 Bootstrapping

If one cannot or does not want to make a distributional assumption about the data, then the best approach so far is the asymptotic Normal approximation. There is an alternative, which is the bootstrap. A detailed description of this method can be found in Chapter 52 by J.L. Horowitz of Heckman and Leamer (2001). We will discuss here the non-parametric bootstrap, which does not rely on distributional assumptions. We will do this for the general regression model, and subsequently show how the other models are special cases of this model. The bootstrap for the k-vector ˆβ can be performed as follows.

(34)

2. From this sample, calculate ˆβ and ˆσ2.

3. Repeat steps 1 and 2 B times, where B is a large number.

4. Use the empirical distribution of observations ˆβ and ˆσ2 as the approximated distribution of ˆβ and ˆσ2 and denote the pdfs by f_βˆ and fσˆ2.

5. Use the transformation rule to find that the distribution of κβ is approximated by

f_κ

β( ˜κβ) = fβˆ( ˆβ − ˜κβ)

and that the distribution of κ_σ2 is approximated by

fκ_{σ 2}( ˜κσ2) = fσˆ2( ˆσ 2_{− ˜}

κ_σ2).

Here, β and σ2are replaced by their unbiased estimators to find the distribution of the estimation errors, which should have mean 0.

6. By the independence assumption, we can write fκθ( ˜κθ) = fκβ( ˜κβ)· fκσ 2( ˜κσ2). By taking the expectation

of the total cost function with respect to this distribution, we are back in the familiar framework. This procedure is, without the necessity of any distributional assumption about estimators, a consistent method of approximating the distribution of their errors. Note, however, that in the final optimisation problem, the assumption that demand follows a Normal distribution is still made. This is a distributional assumption that we made throughout this thesis. If empirical results provide evidence that this distribution is misspecified, then this can easily be solved by inserting the cdf of another distribution in the total cost function. Then, the estimated parameters can still be used to estimate the mean and standard deviation of the distribution, or specific estimators for the new distribution can be used and their uncertainty can be dealt with via the same approach as discussed in this thesis.

The deterministic trend model is a specific case of this general regression model, with α and β as param-eters and a constant and a time variable as corresponding regressors. The mean-stationary demand model is also a specific case of this model, with as single regressor a constant and corresponding parameter µ. There-fore, the bootstrap approach described above can be readily applied to all models treated in this thesis. A related approach is Jackknife resampling, which calculates the required statistics n times, each time leaving out one observation, and subsequently finds a parameter estimate and a variance estimate. This method is computationally less involved than the bootstrap approach, but it is less general in the sense that it in principle only estimates a variance and not a complete distribution. It is applied much less often than the bootstrap and we will not treat it here.

4.3 Bayesian approach

As opposed to the frequentist approach which we followed throughout this thesis, there is also the Bayesian approach, which we already briefly touched upon in Section 2.3.2. The fundamental difference between the frequentist and the Bayesian approach is that the latter treats the unknown parameters as random variables, whereas the former does not. Given a prior belief about the parameters, and given the likelihood of the data, a posterior distribution of the parameters is obtained via Bayes’ rule:

f_{θ |d,X}= Lθfθ m(d, X ),

where fθ reflects a prior belief about the distribution of θ , Lθ is the likelihood of the observed data, given

On the handling of forecast uncertainty in inventory models

On the handling of forecast uncertainty in inventory models

On the handling of forecast uncertainty in inventory models

Contents

1

Introduction

2

A mean-stationary demand model

∑

∑

∑

∑

∑

∑

∑

∑

∏

∑

∑

∑

∑

∑

∑

∑

∑

∑

3

A trend-stationary demand model

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

∑

4

Alternative techniques

_∑

_∑

_∑