Workforce Planning under Demand Uncertainty with Incomplete Data

(1)

University of Groningen

Master Thesis

Workforce Planning under Demand

Uncertainty with Incomplete Data

Author:

Ivo Wester (s2744945)

(2)

(3)

Workforce Planning under Demand

Uncertainty with Incomplete Data

Author:

Ivo Wester (s2744945)

Abstract

A common problem in data-driven decision making is the presence of incomplete data. Although workforce management relies heavily on the available data, dealing with missing values and outliers in this field has not been studied in depth. In this thesis, missing data are randomly generated and imputed with Bayesian multiple imputation. Furthermore, outliers and anomalies are detected and replaced by taking the skewness and tail-heaviness of the data into account. With the introduction of a ‘safety workforce’ that takes demand uncertainty into account, the effect of different methods of handling missing values and outliers will be assessed. With complete data, the target service level of 98% is met for schedules generated with a safety workforce of only a 62% quantile. This is causes by the definition of the service level, the discretization of the safety workforce and different labour regulations. With incomplete data, the schedules created with single imputation show to generate equally efficient schedules within less computational time. Moreover, the outlier replaced data set generate worse schedules than the outlier kept data sets.

(4)

3.4 Outliers . . . 9 3.4.1 Outlier detection . . . 9 3.4.2 Outlier replacement . . . 10 3.5 Safety workforce . . . 11 4 Results 13 4.1 Service level . . . 13 4.2 Comparing schedules . . . 14 4.2.1 Imputation methods . . . 14 4.2.2 Outlier replacement . . . 16 5 Conclusion 17 6 Discussion and future research 18 References 19 A Appendix 21 A.1 Pooling imputations . . . 21

A.2 Normalizing data . . . 21

A.3 February daily patterns . . . 21

A.4 Tukey g-and-h family . . . 22

(5)

1 Introduction

Labor costs typically account for 20 to 35% of gross sales. This varies greatly between industries, where service businesses typically have higher relative labor cost than manufacturing businesses do (Adkins, 2019). This significant burden shows that proper workforce management is crucial for the long term success of a company (Mundschenk and Drexl, 2007). The problem that many companies face is that workforce scheduling has many dimensions. These include, for instance, individual specific contracts, individual preferences and employee specific skill sets (Disselkamp, 2013; Villarreal, Goldsman, and Keskinocak, 2015).

Workforce management becomes more and more complicated. Every day we are dealing with more information, rules and people to schedule properly (Disselkamp, 2013). This diverse and complex envi-ronment therefore becomes increasingly difficult to manage and optimize. Having too many employees during a particular shift causes idle employees and thus unnecessary costs. On the other hand, scheduling too few employees for a given shift causes service levels below target, overtime or even lost sales. The manager thus faces a complex optimization problem for the optimization of a prespecified variable (costs, customer satisfaction, etc.), while meeting all requirements (labor contracts, employee preferences, etc.). Melachrinoudis and Olafsson (1992) argue that scheduling is heavily dependent on the demand that a company has. Since schedules are made before the demand is known, demand forecasting is a powerful tool to help in scheduling. The schedules are made by first forecasting demand and then converting the forecasted demand into staffing levels needed to satisfy service standards (Ernst et al., 2004). Forecast-ing, however, is not an exact science as forecasts are rarely exactly right (Phillips and Gully, 2009). To overcome the costs of over or under coverage caused by an incorrect forecast, forecast uncertainty needs to be taken into account.

Using a data-based decision making procedure, like demand forecasting, emphasizes the importance of having access to the required data. More specifically, to optimize the workforce of a company by using demand forecasting, a proper data set is needed. In practice, however, it is unlikely that the acquired data set is complete and does not contain missing or incorrect data points (Borges et al., 2020; Lin and C.-F. Tsai, 2020; R¨assler, Rubin, and Zell, 2013; Schafer, 1997). There are many complications that can arise in the data collection process, like time-outs in the system or user mistakes. Moreover, Plazas-Nossa, ´Avila Angulo, and Torres (2017) state that “Relying on a time series tainted by outliers proves to be a source of frustration, as it often leads to misinterpretation and model misspecification”. Therefore, not only missing data, but also outliers could impact the allocated workforce. An adequate approach to mitigate the problems of outliers and missing data points is necessary.

In this study, we will consider different methods for handling outliers and missing data points. These lead to different data sets and so to different workforce schedules, which are optimized whilst taking demand uncertainty into account.

The data set used in this paper will be data of a quick service restaurant (QSR) in the period from 01-01-2017 to 08-02-2020, which is provided by Widget Brain (2020). The demand is aggregated per 30 minutes and these 30 minute intervals are referred to as ‘time buckets’. Widget Brain (2020) makes AI-driven workforce schedules that satisfy different requirements and optimizes these schedules according to the preferences of the manager. Historical data is used to forecast the upcoming demand, which in turn, is used to determine the number of employees per time bucket for every role. This scheduling consists of two parts: shift create and shift fill. In shift create, the forecasted demand is used to create shifts while satisfying prespecified service levels. In shift fill, the specific employees are assigned to the created shifts depending on their role, skill set and contract types (Widget Brain, 2020). The filling of the created shifts will not be included, as the efficiency and cost differences (mostly) depend on the shifts rather than who is assigned to the shifts.

(6)

with improper data will be evaluated. In Section 5 the main findings will be summarized and in Section 6 the shortcomings and future research ideas will be discussed.

2 Literature review

First, different methods of handling missing data will be assessed. Then, different outlier detection and replacement techniques will be evaluated. Finally, literature on workforce optimization under demand uncertainty will be evaluated.

2.1 Missing data

2.1.1 Mechanisms of missing data

To evaluate the different techniques to overcome problems of missing data, the mechanisms of missing data need to be defined. Rubin (1976) distinguished three different mechanisms of missingness that still form the base of research in missing data: (1) missing completely at random (MCAR), (2) missing at random (MAR), and (3) missing not at random (MNAR).

The first missing data mechanism, MCAR, is when the distribution of the missing-data mechanism is completely independent from the data. The cause of missingness is thus not depending on the observed or missing data.

A somewhat weaker assumption of missingness is the MAR case. In this case the missing-data mechanism depends only on the observed data. For example, if in a survey regarding salaries, men are more reluctant in telling their wage than women are and therefore there are more data points missing for the salaries of men.

For the third case, MNAR, the distribution of missingness depends on both the observed as unobserved data. Suppose now that in the same salary survey example, people with a lower wage are more reluctant in sharing their wage than people with a higher wage. In this case the true value of the missing value causes it to be missing.

2.1.2 Listwise deletion

If the amount of data that is missing is sufficiently small, say lower than 10 to 15% of the whole data set, this missing data can simply be ignored as it may not add significant value to the analysis results (Ray-mond and Roberts, 1987; Strike, El Emam, and Madhavji, 2001). This potential remedy entails deleting the entire observation when it has a missing data point and continue with all observations that are complete. The major drawback from this approach is that the number of observations goes down, which may cause problems in statistical inference and interpretation (Van Buuren, 2018). Van Buuren (2018) argues that the most evident benefit of this method is that every data point left in the data set is an actual observation. This implies that if the data is MCAR, we get an unbiased estimate of the mean, but since the number of observations is lower than the number of observations of the complete data set, the computed variance is too large.

2.2 Imputation

Lin and C.-F. Tsai (2020), argue that listwise deletion is not a good approach, as small amounts of missing data often contain important information, which cannot be ignored. Rustum and Adeloye (2007) argue that if there are sufficiently many data points missing or defined as an outlier, these data points cannot be discarded but need to be replaced by more reasonable estimates. Thus, a possibly better solution for the missing data problem is to impute, i.e. fill in, the missing data points. Walter et al. (2013) argue that missing observations in time series data is a common problem that could occur due to many reasons, like machine failures or lost records, and its effects can be mitigated with imputation. When imputing the missing data points, the number of observations will be the same as for the originally, not fully observed, data set. Then the new, partially imputed, data set can be used for analyses.

(7)

(single imputation), or to impute more than one value (multiple imputation), to allow appropriate as-sessment of imputation uncertainty. Thus, the main difference between the two is that single imputation ignores the variation in estimation due to imputation, whereas multiple imputation does take this into account (Fridley et al., 2009).

2.2.1 Single imputation

Little and Rubin (2019) evaluate multiple single imputation methods from which a few are evaluated. A popular single imputation technique is to take the mean of similar observations that are in the data set. For example, it is not unlikely that the demand for hamburgers on a regular Saturday at 9 p.m. is similar to the demand of other observed Saturdays at 9 p.m..

Statistically, however, this method is not very convenient. Suppose that a data set contains many missing data points and all missing data points are imputed with their mean. Then, by definition, the variance is lower than the variance without imputation. The decreased variance has two causes. The first is that the number of observations of that variable increases, since the missing value is now replaced by its mean. The second effect is that the variance is constructed by deriving the difference between the observation and its mean, which is in this case zero for all imputed values. Together they cause that the variance is too low and therefore the confidence interval is too small (R¨assler, Rubin, and Zell, 2013; Van Buuren, 2018). The mean imputation method produces unbiased estimates for the mean only under MCAR.

Regression imputation is a less restrictive method than mean imputation, as it allows for some slope in the missing data points. Non-stochastic regression imputation can be expanded by allowing for some random noise in the regression (Little and Rubin, 2019). The residual term is then assumed to be a random normally distributed value with zero mean and a variance equal to the residual variance of the non-stochastic regression. The stochastic regression model can, if correctly specified, yield unbiased estimates under MAR (Van Buuren, 2018). However, Enders (2010) argues that the variance under the current specification of the residual term still is too small.

2.3 Multiple imputation

Single imputation methods ignore the variation in estimation due to imputation (Fridley et al., 2009). To reflect the uncertainty in the missing data, Donald B. Rubin came up with the idea to use multiple versions of the complete data set in 1977. The report with the introduction of the idea was published in 2004 (Rubin, 2004).

He and Raghunathan (2012) argue that assuming MAR, it is valid to fit the complete-data model and multiply impute missing values under the model while ignoring the mechanism for the missing-ness. A possible method of multiple imputation is Bayesian multiple imputation. White, Royston, and Wood (2011) describe that there are three steps in Bayesian multiple imputation. First, multiple draws from the posterior predictive distribution of unobserved data ymis conditional on the observed data yobs

(8)

Figure 1: Multiple imputation scheme (Van Buuren, 2018). 2.3.1 Goal of multiple imputation

Let Q be a known function of the population data, like the population mean or variance. Then the goal of multiple imputation is to find an estimate ˆQ of Q that is unbiased and confidence valid (Rubin, 1996). Note that we can only calculate Q if we can observe the entire population. The unbiased and confidence valid concepts and their derivation are described following Murray et al. (2018):

• Unbiased estimates yield that the average ˆQ is equal to Q:

E[ ˆQ] = Q. (1)

• Let U be the estimated variance-covariance matrix of ˆQ. This estimate is confidence valid if the average U is at least the variance of ˆQ:

E[U ] ≥ V [ ˆQ], (2)

where the expectation and variance are computed by repeated sampling. Where the sampling distribu-tion of ˆQ is assumed to be normal, so valid confidence intervals can be obtained from ˆQ and ˆU . Thus, the estimate ˆQ should on average be equal to the value of the population parameter Q and the estimated confidence interval coverage should achieve at least the stated nominal interval coverage (Rubin, 1996). In practice Equations (1) and (2) may only hold asymptotically, or when particular modeling assump-tions are correct (Murray et al., 2018).

2.4 Outliers

In outlier analysis Aggarwal (2017) makes a distinction between outliers and anomalies. Outliers are defined as noise from the observed pattern and anomalies are defined as unusual (and interesting) devi-ations from the observed pattern. Noise is often defined as a weak form of outlier that does not always meet the strong criteria necessary for a data point to be considered interesting or anomalous enough (Aggarwal, 2017). For example, if at a particular time on Friday it is busier than normally is, this is (if the difference is large enough) considered to be noise, i.e. an outlier. If on a regular day the system that keeps track of the traffic has a timeout, and 0 traffic is observed during a normally very busy hour, this should be considered as an anomaly. Essentially, they are detected with the same techniques, but the separation between the two is rather ambiguous. The techniques to separate and handle the two will be elaborated on more extensively later on.

2.4.1 Outlier detection

Tukey (1977) introduced a standard boxplot to detect potential outliers for univariate data with five parameters; a minimum, the first quartile Q1, the median, the third quartile Q3and the maximum. The

interquartile range IQR is defined as the difference between Q1 and Q3and potential outliers are points

falling outside the interval (Q1− 1.5IQR, Q1+ 1.5IQR). Figure 2 gives an example of this standard

(9)

Figure 2: Outlier detection with a standard boxplot.

For skewed distributions, however, this boxplot cannot be used for the detection of outliers. Hubert and Vandervieren (2008)’s adjusted boxplot is widely used to mitigate this problem by adjusting the whiskers according to the degree of asymmetry in the data. Several skewness measures are introduced to correct the original boxplot for skewness and to detect outliers more accurately.

This method, however, has several shortcomings as well, such as its computational complexity and that it does not account for the heaviness of the tails. Bruffaerts, Verardi, and Vermandele (2014) introduced a generalized boxplot that corrects for tail heaviness and which also has a lower computational complexity than the adjusted boxplot. This methods relies on a rank-preserving transformation of the data such that it follows the normal distribution, which in turn can be fit to the Tukey g-and-h (TGH) family. This family of distributions is widely used due to its flexibility and its ability to generate several well known distributions (He and Raghunathan, 2006; He and Raghunathan, 2012; Jim´enez and Arunachalam, 2011). Apart from the transformation Bruffaerts, Verardi, and Vermandele (2014) propose, many distributions can directly be fitted into the TGH distribution. Bruffaerts, Verardi, and Vermandele (2014) argue that their transformation guarantees a good fit (for most distributions) and if one nevertheless fits the data directly into the TGH, the results are generally worse.

2.4.2 Outlier handling

Lewis-Beck (1995) argues that there are four possible ways to deal with outliers; (1) remove, (2) trans-form, (3) leave as they are, or (4) report results with and without outliers. The first is argued to be the worst, as it just hides the problem. Transforming the data might be a more appropriate solution if the data is used for regression purposes, but the newly generated data does not always have a meaningful interpretation. This will be elaborated on more extensively in Section 2.4.3. The third could be preferred if the outliers do not affect the statistics too much. However, it could be the case that leaving outliers as they are makes the distribution more skewed. Plazas-Nossa, ´Avila Angulo, and Torres (2017) argue that outliers in time-series data often lead to misinterpretation and misspecification, which makes dealing with outliers in time-series data especially important. With the fourth approach of dealing with outliers Lewis-Beck (1995) argues that one can show the results both with and without outliers. In this way the reader can see what the impact of the outliers is.

Outliers and anomalies require different imputation methods than the proposed missing data methods in Section 2.1. Removing outliers results in missing data that is not missing at random, as the value of the data point itself causes its removal. Therefore, imputing outliers with the same approach as missing values is statistically invalid and a different approach should be used for the replacement of outliers. 2.4.3 Outlier replacement

(10)

A trivial outlier replacement method is using a (trimmed) mean (Rustum and Adeloye, 2007). A fixed percentage of the ranked data is dropped from the top and bottom and from the remaining data the mean is computed. This mitigates the effect of outliers on the mean. This method, however, replaces a very low or high value with the (trimmed) mean, which makes the value itself uninformative. For an outlier in workforce optimization this would imply that a peak demand is replaced with the (trimmed) mean, which results in no peak at all and possibly a worse forecast.

If, on the other hand, the detected outlier is an anomaly caused by a machine error that resulted in no demand at a normally very busy hour, the value itself is incorrect and analysis with this value can negatively influence the performance. In this case, replacing the value with the trimmed mean might be a good guess.

To replace an outlier with a more informative value winsorization can be used. In the most common winsorization approach the extreme observations are shrunk to a boundary value, which often are em-pirical percentiles. Boudt, Todorov, and Wang (2020) state that winsorizing causes that every outlier will get the same value which implies that the ranking of the winsorized data is no longer informative. In workforce optimization this would imply that the different demand peaks that are defined as outliers are indistinguishable. From an upper bound perspective this implies that the smallest outlier will be replaced with the exact same value as the largest outlier. Therefore, Boudt, Todorov, and Wang (2020) propose a distribution-driven winsorization method that preserves the ranking of the original data by replacing the extreme values with values that have an equidistant grid of quantiles of data from the robustly fitted distribution.

2.5 Workforce optimization

2.5.1 Service quality

Workforce optimization considers rostering human subjects from which some extra problems arise. Okulicz-Kozaryn and Golden (2018) argue that having time autonomy (total flextime), and to an even greater extend, having unpredictable and irregular working hours causes unhappy employees. The sched-ules should therefore be available at least a few weeks beforehand and are more or less fixed afterwards to overcome the unpredictability of the working hours. Furthermore, the labor contracts form several constraints that govern which work patterns are allowed for the employees (Ernst et al., 2004). These can vary from hard to soft constraints. Hard constraints cannot be violated, whereas soft constraints can be violated with a certain penalty, e.g. higher costs for overtime. If the demand in a certain interval exceeds what the rostered employees can deliver, overtime could be necessary to satisfy the forwarded demand. Apart from the fact that overtime could be more expensive than regular time, it also influences the performance of the employee. Excessively long working hours and poorly planned shift work can result in employee fatigue (Kany, 2001; Othman, Gouw, and Bhuiyan, 2012). In turn, worker fatigue can greatly impact the performance of the employee and therefore the quality of the work they deliver (Eklund, 1997; Kany, 2001).

Service quality is argued to be one of the drivers of long-term success of a company (H. Lee, Y. Lee, and Yoo, 2000; Mason et al., 2016; Ryu and Han, 2010). Mason et al. (2016) state that in quick-service restaurants this is no different. This quality is, among others, measured in the speed that the orders are processed and speed is, in turn, positively correlated with the number of scheduled employees. To achieve a certain quality, the manager specifies a conversion rate between units sold and scheduled employees based on his preference and expertise. This conversion number will be based on several service attributes that he wants his restaurant to provide (Ernst et al., 2004). Thus, if the observed demand in a certain time bucket is larger than what the scheduled employees can handle, the service quality is below the prespecified service level or a lost sale could occur. The latter, however, is unobserved in the data; if there is no sale due to a lost customer, it is also not seen in the number of sold items (Nahmias, 1994). 2.5.2 Labor regulations

(11)

work overtime for one time bucket, might be cheaper than rostering an employee for a time span equal to the minimum duration of a shift. Villarreal, Goldsman, and Keskinocak (2015) pose a penalty for forecasted under coverage to mitigate this problem. The height of the penalty should be determined by the manager’s tendency to avoid under coverage. The level of compliance can be assessed by checking the number of (soft) constraints that are violated by the computed schedule.

2.5.3 Forecasting demand

Kalekar et al. (2004) argue that the best place to start with any time series forecasting analysis is to graph the time series to be forecasted. The purpose of the plot is to give the analyst a visual impression of the nature of the time series. Some plots of the data are given in Appendix A.3. Winters (1960) uses a model where the only input to the forecasting system is the past history of sales, excluding inputs like what is happening in the market, the sales of competition, and so on. Historical sales show that many products have a marked trend and a seasonal pattern (Winters, 1960). Kalekar et al. (2004) state that Holt Winter’s (multiplicative) trend seasonal model, introduced by Winters (1960), takes these properties into account.

When the daily demand is forecasted, Villarreal, Goldsman, and Keskinocak (2015) argue that the daily forecast can be broken into time buckets based on historical data. These time bucket demands can in turn be used for the scheduling of employees during the day.

2.5.4 Uncertainty in demand

Completely relying on the forecast and basing all schedules on the point estimate is too optimistic and the schedules need to be corrected for this uncertainty. Villarreal, Goldsman, and Keskinocak (2015) take uncertainty into account by adding/subtracting a percentage of the demand forecast and checking how robust the implemented schedules are. They conclude that even with a 20% demand deviation from the forecast, the idle time and unfulfilled demand increased only slightly compared to the perfect information scenario.

Although demand uncertainty is an important driver in decision making in other fields, other re-searches in this field do not include uncertainty in demand. In inventory control, for example, safety stock is determined based on uncertainty in the forecast (Axs¨ater, 2015). Safety stock is the additional stock to protect against demand uncertainty and it can, among others, be calculated with probability of no stockout (Axs¨ater, 2015). An application of safety stock in the field of workforce management is to schedule additional workforce to mitigate the consequences of demand uncertainty. In this context, the ‘safety workforce’ can be calculated with the probability of complete demand coverage.

Another well-known service measure in inventory control is the fill rate. Axs¨ater (2015) defines the fill rate as the fraction of demand that can be satisfied immediately from stock on hand. An application of the fill rate in the field of workforce management is defined as the fraction of demand that can be covered by the scheduled workforce.

The service level of choice should then depend on the short and long term costs for giving good service and the underlying shortage costs. De Treville et al. (2014) argue that the profit-maximizing critical fractile of the newsvendor model is given by the costs of being understocked over the total costs of being over- or understocked. This critical fractile is often used for optimal inventory calculations.

2.5.5 Mixed-integer linear programs

(12)

2.6 Contribution

The first contribution is the introduction of different methods of handling outliers and missings in the field of workforce optimization. Although workforce management is (often) data-driven, dealing with improper data sets in this field is, to the best of my knowledge, not researched yet. The second contribution of this thesis is the introduction of a new method of dealing with demand uncertainty in workforce planning. Villarreal, Goldsman, and Keskinocak (2015) consider demand uncertainty by generating scenario’s that add/subtract a certain percentage from the forecasted demand, whereas this thesis proposes a slightly more elegant approach of dealing with demand uncertainty by introducing a safety workforce. With this introduction, the company can meet the target service level by protecting against demand fluctuations and forecast uncertainties.

3 Method

3.1 Data distribution

The data set used in this paper will be the data of a quick service restaurant (QSR) in the period from 01-01-2017 to 08-02-2020. The period 08-01-2020 until 08-02-2020 will be forecasted and scheduled and the data acquired in this period will be used to test the performance of the generated schedules.

Since there are no missings in the data, 15% of the data is removed at random. The benefit is twofold. First, the observed data can be compared to the imputed data. Second, the estimated parameters with observed data can be compared to the ones with imputed data to assess the differences. In Section 2.1 it is argued that multiple imputation is only statistically valid if the missing data mechanisms is MAR or MCAR. By definition, the removed data is MCAR in this case.

3.2 Basic problem

First, the data will be prepared by applying different methods of handling missing data and outliers. Then the complete and cleaned data set will be used to forecast and to compute a safety workforce to deal with demand uncertainty. The safety workforce is then used to schedule workforce accordingly. Lewis-Beck (1995)’s in Section 2 introduced ways of handling outliers and missings each generate a unique data set. The different data sets that are considered are: 1) a data set with all observed data, 2) only observed data and outliers replaced, 3) imputation of missings with multiple imputation, and 4) imputation of missings with single mean imputation.

The optimization process in this thesis follows a similar approach as Ernst et al. (2004) and takes the following steps:

1. Demand forecasting: forecast future sales and determine what demand must be satisfied to meet the target service level.

2. Labour forecasting: discretizing the demand forecast into required number of employees. 3. Shift create (SC): creating the shifts for each role.

Winters (1960)’s seasonal multiplicative model is used to forecast daily demand since the data shows clear signs of seasonality during the week, which is shown in Appendix A.3. The daily demand is assigned to weekday and time specific time buckets by using a fraction of daily demand that satisfies a given probability of full demand coverage. For example, the fraction of daily demand that covers all demand at Saturday at 21:00 with a 95% probability is, say, 15%.

To make such statements, the distribution of the different time buckets needs to be assessed. Bruf-faerts, Verardi, and Vermandele (2014)’s data transformation is used to generate normalized fractions of daily demand for each time bucket. This transformation is given in Appendix A.2 and it is build on the assumption that the fraction of daily demand X is continuous, unimodal and has independent realizations. The result is that the transformed variables Yi,j, for i = 1, 2, ..., 7 and i = 1, 2, ..., 48, roughly

follow the standard normal distribution for all weekday and time bucket combinations.

(13)

demand. These conversion rates of safety workforce to employees are determined beforehand by the manager and are thus fixed. With the inputs provided by the manager and the forecast made, the shifts can be created to cover the demand (Ernst et al., 2004). The two roles that will be scheduled are cooks and cashiers. Both have their role specific conversion rate, which is a simplified and slightly adjusted version of the manager’s input. In Section 3.5 this will be elaborated on more extensively.

To create actual shifts out of the forecasted number of employees a mixed-integer linear program (MIP) will be used to find the optimal solution subject to the demand and labor constraints.

3.3 Bayesian multiple imputation

The number of data sets that will be imputed is 10, since the computational time is high and in Section 2 it is argued that this should be sufficient. Since the missing mechanism is MCAR, Bayesian multiple imputation is valid and is executed with the approach of Van Buuren (2018). Suppose that Y is the fraction of daily demand in a particular time bucket, where n0 observations in Y are missing. Then

denote ˙Y as the vector with length n0 that contains the imputed values of y. Then ˙Y = ˙β + ˙ε, where

˙

ε ∼ N (0, ˙σ2_{) and where ˙}_{β and ˙σ are random draws from the posterior distribution.}

An example of a time bucket for a single repetition of the Bayesian model is given in Figure 3. Although, the imputation method does not perfectly predict the observed values, the imputed values do seem to have a similar amount of volatility as the observed value, which has a close connection to the confidence valid goal of multiple imputation mentioned in Section 2.3. This data will later be used to compute the parameters, i.e. population mean and variance, of the model. The parameter estimation will be executed for all 10 repetitions and the computed parameters will in turn be pooled, for which the method is shown in Appendix A.1. 0 20 40 60 80 100 120 140 160 Observations 0.00 0.05 0.10 0.15 0.20 0.25

Fraction of daily demand

True values Imputed values

Figure 3: Bayesian single imputation for an example time bucket.

3.4 Outliers

First, the outlier and anomaly detection methods are explained. Then, the outlier and anomaly replace-ment methods are given.

3.4.1 Outlier detection

(14)

For standard normal variable W a Tukey g-and-h random variable Z is given by Z = Tgh(W ) =

egW _{− 1}

g e

hW2/2_, ₍₃₎

where g and h represent the skewness and heaviness of the tail respectively. Bruffaerts, Verardi, and Vermandele (2014)’s approach to approximate them are given in Appendix A.4.

When the data are transformed to the TGH distribution, the rate at which outliers are detected is set. In this thesis outliers are detected at a 0.3% and 99.7% percentile of the TGH distribution. Since there is no data on certain events, anomalies are assumed to be machine or human errors and are values outside the 0% and 100% percentile of the TGH distribution. The values corresponding to these percentiles are referred to as the ‘critical values’. When the values are not in the interval of the critical values, the probability that the value is drawn from the normalized distribution is roughly 0% and therefore the observation is detected as an anomaly, i.e. a human/machine error.

The critical values that are determined for the outlier and anomaly detection then follow the inverse of the transformation steps to get to the lower and upper bounds of the fraction of demand of the different time buckets.

The normal boxplot, where Su and C.-L. Tsai (2011) define potential outliers as observations that are not in the interval (Q1− 1.5IQR, Q1+ 1.5IQR), will be used to show how the generalized boxplot works.

An example histogram is plotted in Figure 4(a) and the normal boxplot and generalized boxplot are respectively shown in Figure 4(b) and Figure 4(c).

The normal boxplot shows some outliers on the upper bound. This boxplot assumes a roughly symmetric distribution, which is clearly violated. Due to this incorrect assumption the upper whisker is too low. Thus, observations that are high, are defined as an outlier too early. The generalized boxplot in Figure 4(c) does take the skewnesss of the data into account and therefore does not detect any outliers.

(a) Example histogram of a time bucket.

(b) Boxplot before generalizing. (c) Generalized boxplot

Figure 4: Outlier detection with generalized boxplot. 3.4.2 Outlier replacement

The detected anomalies are assumed to be errors in the data set and therefore their value itself is unin-formative. The anomalies are replaced with the trimmed mean, where 2.5% of the tails of the data are dropped to compute the mean. The detected outliers are winsorized with the rank preserving method introduced by Boudt, Todorov, and Wang (2020).

(15)

(a) Outlier detection. (b) Rank preserving winsorization.

Figure 5: Winsorizing outliers for an example time bucket.

Since the distance between the most evident outlier and all other observations is so significant, it is likely that this value is caused by a machine or human error. Having almost 25% of daily sales in half an hour does not seem to be plausible. In Figure 6 the outlier and anomaly detection and replacement are shown for an example time bucket. The outliers are replaced with Boudt, Todorov, and Wang (2020)’s method and the anomaly is replaced with the trimmed mean.

(a) Anomaly detection. (b) Outlier detection. (c) Outlier and anomaly replacement

Figure 6: Outlier and anomaly replacement for an example time bucket.

3.5 Safety workforce

First, the equations to derive the safety workforce and fill rate will be given. Recall that demand fraction Y follows a normal distribution with mean µ and standard deviation σ, where µ ≈ 0 and σ ≈ 1, as discussed in Section 3.2. Then, using a similar approach as Axs¨ater (2015), the safety factor S_σ is calculated with the probability of complete demand coverage Ψ:

Ψ = P Y ≤ S + µ =⇒ S σ = Φ

−1 _Ψ, ₍₄₎

for standard normal distribution function Φ. The fraction of daily demand that should be met for a given quantile Ψ is thus given by S and multiplying S with the forecasted daily demand gives the safety workforce.

The fill rate is derived following a similar approach as De Treville et al. (2014). Let total demand be given by D and let the total demand that can be satisfied by the scheduled labour be given by L. Then, the observed fill rate Γ, which will be referred to as the service level for the remainder, is given by

Γ = min{L, D}

D . (5)

(16)

implies that every shift role can make one unit of demand per time bucket. Scheduling based on the safety workforce of a 95% quantile yields that for each time bucket, the probability of full demand cov-erage is roughly 95%. Then if in 95% of the cases all demand is met and in the other 5% of the cases at least some demand is covered, the fraction of demand that is met, i.e. the service level, is at least 95%. When discretizing and rounding up the forecast to a minimum required number of employees per role, the service level is expected to be even higher. For a QSR a conversion rate larger than 1 is very common as an employee can handle multiple units of demand per time bucket. For example, if a cook can handle 10 units of demand per time bucket and the 95% quantile of forecasted demand in a certain time bucket is 15, the number of cooks scheduled will be equal to 2. This implies that all observed demand ≤ 20 can still be satisfied.

Moreover, if the safety workforce is used to schedule employees, the manager is obliged to meet cer-tain labor constraints, e.g. minimum shift duration, which decreases the flexibility of scheduling. In case of perfect flexibility, the manager can create shifts that have a duration equal to the duration of a time bucket and all forecasted demand peaks can be satisfied without many consequences. In practice, however, this is often not the case due to labor constraints. Scheduling employees to satisfy certain demand peaks results in over coverage in the time buckets around the forecasted peak.

Under coverage is taken as a proxy of service quality and service quality determines long term success for restaurants (H. Lee, Y. Lee, and Yoo, 2000; Mason et al., 2016; Ryu and Han, 2010), which causes the target service level of a QSR to be high. Therefore, the target service level is argued to be 98%. Due to the newsvendor critical fractile, this implies that the manager should be indifferent between 1 unit of under coverage and 49 units of over coverage. The observed service level is argued to be higher than the quantile to compute the safety workforce. This is due to the definition of the service level, the labor regulations and the discretization of the safety workforce positively affect the observed service level.

Figure 7(a) shows an example where the forecasted daily demand is split into time buckets by using the mean fraction of daily demand for each time bucket (‘Mean’) and the safety workforce that is computed with a quantile of 98%. The uncertainty in the fraction of daily demand differs widely among the different time buckets. If the uncertainty would be (proportionally) constant over the different time buckets, a safety workforce calculated with a quantile of 98% would imply that the safety workforce would be (proportionally) constant above the forecast for each time bucket. This, however, is not the case. Figure 7(a) shows that the level of uncertainty differs per time bucket. For example, time bucket 18:00 has a higher level of uncertainty than 18:30, since their means do not differ much, but the computed safety workforces do. Furthermore, from Figure 7(b) it is evident that the service level for this day is already very high for a safety workforce computed with only a 64% quantile.

00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 13:00:00 14:00:00 15:00:00 16:00:00 17:00:00 18:00:00 19:00:00 20:00:00 21:00:00 22:00:00 23:00:00 daytime 0 10 20 30 40 50 60 70 value Mean Safety workforce 98%

(a) Example of safety workforce computed with a 98% quantile. 00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 13:00:00 14:00:00 15:00:00 16:00:00 17:00:00 18:00:00 19:00:00 20:00:00 21:00:00 22:00:00 23:00:00 daytime 0 10 20 30 40 value Mean Safety workforce 64%

(b) Example of safety workforce computed with a 64% quantile.

(17)

Due to these interactions, it is expected that the quantile to calculate the safety workforce can be lowered to satisfy the target of 98%.

By taking the uncertainty in the forecasted demand into account with the computation of a safety workforce, the computationally complex MIP becomes a deterministic problem instead of a stochastic one. The computed safety workforce serves as forecasted demand to generate schedules from. The objective is to minimize costs, while all forecasted safety workforce and the minimum and maximum shift duration are met.

4 Results

In this section, the main results of this research will be discussed. Firstly, the observed data set will be used to compute the safety workforce with different quantiles. The safety workforce will then be used as demand forecast, thereby taking uncertainty into account, to generate schedules. These schedules will, in turn, be used to assess the observed service level, i.e. the fraction of demand that is satisfied. This will show the effect of increasing the quantile to calculate the safety workforce on the observed service level.

Secondly, the effects of different methods of imputation and the effect of outlier removal on the performance of the schedules will be assessed. Recall that different methods of handling missings and outliers generate different data sets. These data sets are used to calculate the safety workforce and in turn generate schedules. The performance of these schedules will be assessed with the amount of over and under coverage as key performance indicators (KPIs).

For the remainder, the conversion rates to discretize the safety workforce into demand for labour are 12.5 for the cashier role and 11.11 for the cook role.

4.1 Service level

As previously mentioned, the computed safety workforce is subject to discretization and labour regula-tions. An example of this interaction is given in Figure 8. In this figure, the safety workforce computed with a quantile of 64% is used to generate a schedule for the cashier role. The maximum units of demand all cashiers can handle are shown in Figure 8 as the ‘Maximum supply cashiers’.

The labour regulations cause that the peak at 12:30 is met, but give over coverage in the time buckets around the peak. Also, at 19:30 the safety workforce did not satisfy the observed demand, but the discretization causes that the demand is still completely satisfied in this time bucket.

00:00:00 01:00:00 02:00:00 03:00:00 04:00:00 05:00:00 06:00:00 07:00:00 08:00:00 09:00:00 10:00:00 11:00:00 12:00:00 13:00:00 14:00:00 15:00:00 16:00:00 17:00:00 18:00:00 19:00:00 20:00:00 21:00:00 22:00:00 23:00:00 daytime 0 10 20 30 40 50 value

Maximum supply cashiers Safety workforce 64%

Figure 8: Cashier shifts made with the safety workforce computed with a 64% quantile for an example day.

(18)

the demand that can be satisfied in a certain time bucket is at least as large as the computed safety workforce.

Figure 9 shows the interaction between the quantile to calculate the safety workforce and the observed service levels for the two roles. The relation between the two is increasing of a decreasing scale and not all increases in quantile result in an increase in service level. In Figure 9(b) for example, a 70% quantile in each time bucket gives an equally observed service level as the 74% quantile. This is due to the discretization of the safety workforce to demand for labour. If the quantile increases, the safety workforce increases as well. An increase in safety workforce, however, does not necessarily imply an increase in demand for labour. For example, if the safety workforce increases from 14 to 15, the cook role with a conversion rate of 11.11 units of demand per time bucket would imply a demand for 2 cashiers in both cases. This causes that the observed service level does not show a linear relation with the quantile. Furthermore, for a safety workforce calculated with only a 50% quantile, the service level of the resulting schedule is already higher than 96.5%. Moreover, the argued target service level of 98% is already satisfied at a quantile of roughly 62%. This is caused by three aspects. The first is the definition of the service level, which is at least equal to the quantile. The second is the discretization of the safety workforce to demand for labour that causes that the maximum demand that can be covered is at least as large as the computed safety workforce. The third is the labour regulations that decreases the flexibility in workforce planning. 0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Observed service level

Observed service level cashier

(a) The interaction of the quantile and the observed service level for the cashier role.

0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Observed service level cook

(b) The interaction of the quantile and the observed service level for the cook role.

Figure 9: The interaction of the quantile and the observed service level for the cashier and cook roles.

4.2 Comparing schedules

The key performance indicators (KPIs) of a schedule are the over and under coverage measured in units of demand. Thus, if the maximum demand that can be satisfied by the scheduled cashiers is 37.5 units of demand and the observed demand is 38 in that time bucket, the under coverage is 0.5 units of demand and the over coverage is 0.

First, the schedules of the differently imputed data sets will be compared. Second, the schedules of the observed data set will be compared to the schedules generated on the outlier removed data set. Both the outlier replacement strategy and method of imputation affect the calculated safety workforce indirectly. The different methods of handling missings and outliers affect the forecasted daily demand and the level of uncertainty in parameter estimation, which affects the safety workforce. The level of uncertainty and the forecasted daily demand both positively influence the safety workforce.

4.2.1 Imputation methods

(19)

for 98% of the time buckets.

The safety workforces computed with the single and multiple imputed data sets are calculated for quan-tiles of 50%, 52%,..., 98%. The KPIs of the resulting generated schedules are shown in Figure 10. The closer the curve to the origin, the better the schedule performs. In other words, the lower the under coverage for the same over coverage.

Figure 10 shows that there is hardly any difference in the performance of the schedules. Also for higher quantiles, where the effect of an increased level of uncertainty due to multiple imputation should be more evident, there is little difference between the imputation methods. This could be due to too few missing observations, which causes that different imputation methods show very little difference in the performance of the schedules.

Furthermore, the returns of an increase in over coverage are diminishing. Thus, for lower quantiles to calculate the safety workforce, an increase in over coverage causes a higher decrease in under coverage than for higher quantiles for both imputation methods.

10000 12500 15000 17500 20000 22500 25000 Over coverage 100 200 300 400 500 600 700 Under coverage

Single imputed cashier Multiple imputed cashier

(a) KPIs with a single imputed data set for the cashier role. 10000 12500 15000 17500 20000 22500 25000 Over coverage 0 100 200 300 400 500 600 700 Under coverage

Single imputed cook Multiple imputed cook

(b) KPIs with a multiple imputed data set for the cook role.

Figure 10: KPIs with single and multiple imputed data sets for both roles.

Another observation is that the starting and ending points of the curves are, especially for the cashier role, quite different. In Figure 10(a) for example, the 50% quantile to calculate the safety workforce gives an under coverage of 724.5 units of demand for the single imputed data and 638.5 units of demand for the multiple imputed data. Since the observed demand is equal for the two methods, this implies that the service level is higher for the multiple imputed data set.

In Figure 11 the interaction between the quantile to calculate the safety workforce and the observed service level for both imputation methods is shown. It is evident that the observed service level for the multiple imputed data is almost always larger than for the single imputed data for both roles. This difference is caused by the difference in number of time buckets scheduled. In other words, for the same quantile to calculate the safety workforce, the multiple imputed data schedules more time buckets than the single imputed data. This is caused by a higher safety workforce for the multiple imputed data set. The higher safety workforce can have two origins: a higher forecasted daily demand and/or a higher level of parameter uncertainty in the time buckets. The sum of daily forecasts for the single imputed data set is slightly larger than for the multiple imputed data set. This would, ceteris paribus, imply that the safety workforce for the single imputed data set is larger as well. The parameter uncertainty of the multiple imputed data set, however, is substantially larger than for the single imputed data set. The effect of the higher level of uncertainty of the multiple imputed data set is larger than the effect of the lower forecasted daily demand. This causes that the number of time buckets scheduled is larger for the multiple imputed data set than for the single imputed data set.

(20)

0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.960 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Single imputed cashier Multiple imputed cashier

(a) Observed service level with a single imputed data set for the cashier role.

0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.960 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Single imputed cook Multiple imputed cook

(b) Observed service level with a multiple imputed data set for the cook role.

Figure 11: Observed service level with single and multiple imputed data sets for both roles. Combining Figure 10 and Figure 11 results in the following arguments. Using the same quantile, the multiple imputed data set gives a higher observed service level than the single imputed data set. The performance of the different imputation methods, however, is not substantially different. Therefore, an equally efficient schedule can be generated with single imputation by increasing the quantile to calculate the safety workforce. The single imputation method does require less time, which is therefore preferred in this case.

4.2.2 Outlier replacement

In this section, the effect of the introduced method of outlier and anomaly replacement is assessed. For the remainder, outliers and anomalies together will be referred to as ‘outliers’ since their effects on the computed safety workforce have the same sign.

In Figure 12 the KPIs of the schedules are generated with a safety workforce of quantiles equal to 50%, 52%,..., 98% for the data sets with and without outliers removed. For both roles, the curve of the outlier removed data set is above the outlier kept data set at a certain point. From that particular point, roughly 15000 and 14000 over coverage for the cashier and cook respectively, the generated schedules with the outlier removed data set are worse than the outlier kept data set.

10000 12000 14000 16000 18000 20000 22000 24000 26000 Over coverage 0 100 200 300 400 500 600 Under coverage

Outliers kept cashier Outliers removed cashier

(a) KPIs with and without outlier removal for the cashier role. 10000 12000 14000 16000 18000 20000 22000 24000 Over coverage 0 100 200 300 400 500 600 Under coverage

Outliers kept cook Outliers removed cook

(b) KPIs with and without outlier removal for the cook role.

Figure 12: KPIs with and without outlier removal for the cashier and cook roles.

(21)

In Figure 13 the interaction between the quantiles to calculated the safety workforce and the observed service levels are given for the data with and without outliers removed. The higher observed service level for the outlier removed data is caused by its higher safety workforce. This is, in turn, caused by a 3% higher forecasted daily demand for the outlier removed data set. In contrast, the outlier removed data set has a lower level of uncertainty in the parameter estimation, which implies a lower safety workforce. The effect of the lower level of uncertainty in the parameter estimation is offset by the increase in forecasted daily demand, which causes the higher number of scheduled time buckets.

0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Outliers kept cashier Outliers removed cashier

(a) Observed service level with and without outlier re-moval for the cashier role.

0.5 0.6 0.7 0.8 0.9 1.0 Quantile 0.965 0.970 0.975 0.980 0.985 0.990 0.995 1.000

Outliers kept cook Outliers removed cook

(b) Observed service level with and without outlier re-moval for the cook role.

Figure 13: Observed service level with and without outlier removal for both roles.

Combining Figure 12 and Figure 13 results in the following arguments. For the same quantile to calculate the safety workforce, the outlier removed data set shows a higher observed service level. The performance of the schedule, however, is worse than for the outlier kept data. Furthermore, the higher observed service level can be replicated by the outlier kept data set by increasing the quantile to calculate the safety workforce. Then, the observed service level is replicated with a more efficient schedule than for the outlier removed data.

5 Conclusion

A common issue in data-driven decision making is the presence of incomplete data. Since workforce management heavily relies on the available data, an adequate approach of dealing with missing data and outliers is crucial. Furthermore, while the effectiveness of the schedules is highly dependent on the forecasted values, forecast uncertainty is not researched much in this field. In this study the effect of missing data and outlier imputation on the schedules made under demand uncertainty are summarized in the upcoming paragraphs.

For quick-service restaurants, the service level is an important driver for long term success. Therefore, we have argued that for this sector there is a 98% service level in this sector. The introduction of the safety workforce is used to make schedules, whilst correcting for uncertainty in demand, which makes the computationally complex MIP a deterministic problem instead of a stochastic one.

When having access to complete data, service levels of roughly 96.5% are already achieved at a safety workforce calculated with a quantile of 50%. This is caused by the definition of the service level, which is at least equal to the quantile, the discretization of the safety workforce into demand for employees and certain labour regulations that reduce flexibility in scheduling.

(22)

This study makes a distinction between outliers and anomalies, where outliers are defined as noise in the data and anomalies are defined as human or machine errors. Both outliers and anomalies are detected with a univariate approach that takes the skewness and tail-heaviness of the data into account. After the detection, anomalies are replaced with the trimmed mean and outliers are winsorized with a rank-preserved method. The schedules generated with the outlier replaced data show a higher service level for an equal quantile. The performance of the schedules generated with the outlier replaced data, however, is worse than for the schedules generated with the outlier kept data. This implies that keeping the outliers as they are and increasing the quantile to compute the safety workforce generates better schedules for an equally high service level.

6 Discussion and future research

This section presents avenues for future research and discusses limitations of the research performed. Firstly, although (almost) every forecasting method is, to some extent, influenced by the number of outliers and missings in the data, Holt Winter’s multiplicative model is not a very sensitive one. This could be one of the reasons that there is not much difference in the generated schedules for the multiple imputed data. Therefore, it could be interesting to look at the effects of outliers and missings in combination with a method that is more sensitive towards missings and outliers.

Secondly, the interaction between the computed safety workforce and the observed service level is also influenced by the daily forecast. If the daily forecast is too high, a 50% quantile to compute the safety workforce will give a high observed service level as well. For future research a generalized, stand-alone method to measure this interaction should be explored.

Thirdly, in this thesis the decision is made to account for uncertainty in demand in the forecast rather than in the MIP. An alternative approach employing the MIP with stochastic programming, e.g. chance constraints deserves attention as ewll.

Also, in this research the time buckets are all seen as equally important, whereas different time buckets could have different levels of importance. For example, the manager could prefer to cover all demand at peak times, whereas some of the demand is not met in less crowded times.

(23)

References

Adkins, W. (2019). How to Calculate the Employee Labor Percentage. url: https://smallbusiness. chron.com/calculate-employee-labor-percentage-15980.html.

Aggarwal, C. C. (2017). Outlier analysis. Springer, pp. 1–34.

Alfares, H. K. and J. E. Bailey (1997). Integrated project task and manpower scheduling. IIE transactions 29(9), pp. 711–717.

Axs¨ater, S. (2015). Inventory control. Vol. 225. Springer.

Borges, C. E., O. Kamara-Esteban, T. Castillo-Calzadilla, C. M. Andonegui, and A. Alonso-Vicario (2020). Enhancing the missing data imputation of primary substation load demand records. Sustain-able Energy, Grids and Networks 23, p. 100369.

Boudt, K., V. Todorov, and W. Wang (2020). Robust distribution-based winsorization in composite indicators construction. Social Indicators Research, pp. 1–23.

Bruffaerts, C., V. Verardi, and C. Vermandele (2014). A generalized boxplot for skewed and heavy-tailed distributions. Statistics & probability letters 95, pp. 110–117.

De Treville, S., I. Bicer, V. Chavez-Demoulin, V. Hagspiel, N. Sch¨urhoff, C. Tasserit, and S. Wager (2014). Valuing lead time. Journal of Operations Management 32(6), pp. 337–346.

Disselkamp, L. (2013). Workforce asset management book of knowledge. John Wiley & Sons.

Eklund, J. (1997). Ergonomics, quality and continuous improvementconceptual and empirical relation-ships in an industrial context. Ergonomics 40(10), pp. 982–1001.

Enders, C. K. (2010). Applied missing data analysis. Guilford press.

Ernst, A. T., H. Jiang, M. Krishnamoorthy, and D. Sier (2004). Staff scheduling and rostering: A review of applications, methods and models. European journal of operational research 153(1), pp. 3–27. Fridley, B. L., S. K. McDonnell, K. G. Rabe, R. Tang, J. M. Biernacka, J. P. Sinnwell, D. N. Rider, and

E. L. Goode (2009). Single versus multiple imputation for genotypic data. BMC proceedings. Vol. 3. S7. Springer, S7.

Haneveld, W. K. K., M. H. Van der Vlerk, and W. Romeijnders (2019). Stochastic Programming: Modeling Decision Problems Under Uncertainty. Springer Nature.

He, Y. and T. E. Raghunathan (2006). Tukey’s gh distribution for multiple imputation. The American Statistician 60(3), pp. 251–256.

He, Y. and T. E. Raghunathan (2012). Multiple imputation using multivariate gh transformations. Journal of Applied Statistics 39(10), pp. 2177–2198.

Hubert, M. and E. Vandervieren (2008). An adjusted boxplot for skewed distributions. Computational statistics & data analysis 52(12), pp. 5186–5201.

Jim´enez, J. A. and V. Arunachalam (2011). Using Tukey’s g and h family of distributions to calculate value-at-risk and conditional value-at-risk. The Journal of Risk 13(4), p. 95.

Kalekar, P. S. et al. (2004). Time series forecasting using holt-winters exponential smoothing. Kanwal Rekhi School of Information Technology 4329008(13), pp. 1–13.

Kany, K. (2001). Mandatory Overtime: New developments in the campaign. AJN The American Journal of Nursing 101(5), pp. 67–70.

Lee, H., Y. Lee, and D. Yoo (2000). The determinants of perceived service quality and its relationship with satisfaction. Journal of services marketing.

Lewis-Beck, M. (1995). Data analysis: An introduction. 103. Sage.

Lin, W.-C. and C.-F. Tsai (2020). Missing value imputation: a review and analysis of the literature (2006–2017). Artificial Intelligence Review 53(2), pp. 1487–1509.

Little, R. J. and D. B. Rubin (2019). Statistical analysis with missing data. Vol. 793. John Wiley & Sons. Mason, K., S. Jones, M. Benefield, and J. Walton (2016). Building consumer relationships in the quick

service restaurant industry. Journal of Foodservice Business Research 19(4), pp. 368–381.

Melachrinoudis, E. and M. Olafsson (1992). A scheduling system for supermarket cashiers. Computers & industrial engineering 23(1-4), pp. 121–124.

Mundschenk, M. and A. Drexl (2007). Workforce planning in the printing industry. International Journal of Production Research 45(20), pp. 4849–4872.

Murray, J. S. et al. (2018). Multiple imputation: A review of practical and theoretical findings. Statistical Science 33(2), pp. 142–159.

(24)

Okulicz-Kozaryn, A. and L. Golden (2018). Happiness is flextime. Applied Research in Quality of Life 13(2), pp. 355–369.

Othman, M., G. J. Gouw, and N. Bhuiyan (2012). Workforce scheduling: A new model incorporating human factors. Journal of Industrial Engineering and Management (JIEM) 5(2), pp. 259–284. Phillips, J. and S. M. Gully (2009). Staffing forecasting and planning. Society for Human Resource

Management.

Plazas-Nossa, L., M. A. ´Avila Angulo, and A. Torres (2017). Detection of outliers and imputing of missing values for water quality UV-Vis absorbance time series. Ingenierıa 22(1), pp. 111–124.

Ramond, F. (2003). “Optimized rostering of workforce subject to cyclic requirements”. PhD thesis. Virginia Tech.

R¨assler, S., D. B. Rubin, and E. R. Zell (2013). Imputation. Wiley Interdisciplinary Reviews: Computa-tional Statistics 5(1), pp. 20–29.

Raymond, M. R. and D. M. Roberts (1987). A comparison of methods for treating incomplete data in selection research. Educational and Psychological Measurement 47(1), pp. 13–26.

Rubin, D. B. (1976). Inference and missing data. Biometrika 63(3), pp. 581–592.

Rubin, D. B. (1996). Multiple imputation after 18+ years. Journal of the American statistical Association 91(434), pp. 473–489.

Rubin, D. B. (2004). The design of a general and flexible system for handling nonresponse in sample surveys. The American Statistician 58(4), pp. 298–302.

Rustum, R. and A. J. Adeloye (2007). Replacing outliers and missing values from activated sludge data using Kohonen self-organizing map. Journal of Environmental Engineering 133(9), pp. 909–916. Ryu, K. and H. Han (2010). Influence of the quality of food, service, and physical environment on

cus-tomer satisfaction and behavioral intention in quick-casual restaurants: Moderating role of perceived price. Journal of Hospitality & Tourism Research 34(3), pp. 310–329.

Schafer, J. L. (1997). Analysis of incomplete multivariate data. CRC press.

Strike, K., K. El Emam, and N. Madhavji (2001). Software cost estimation with incomplete data. IEEE Transactions on Software Engineering 27(10), pp. 890–908.

Su, X. and C.-L. Tsai (2011). Outlier detection. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery 1(3), pp. 261–268.

Tukey, J. W. (1977). Exploratory data analysis. Vol. 2. Reading, MA. Van Buuren, S. (2018). Flexible imputation of missing data. CRC press.

Villarreal, M. C., D. Goldsman, and P. Keskinocak (2015). Workforce management and scheduling under flexible demand. Service Science 7(4), pp. 331–351.

Walter, Y., J. Kihoro, K. Athiany, and H. Kibunja (2013). Imputation of incomplete non-stationary seasonal time series data. Math. Theory Model 3, pp. 142–154.

White, I. R., P. Royston, and A. M. Wood (2011). Multiple imputation using chained equations: issues and guidance for practice. Statistics in medicine 30(4), pp. 377–399.

Widget Brain, Q. (2020). Automated Employee Scheduling. url: https://widgetbrain.com/workforce-scheduling/ai-services/automated-employee-scheduling/.

(25)

A

Appendix

A.1 Pooling imputations

To pool the imputed data sets a few parameters are defined. For M sets of imputations from a multiple imputation scheme there are M completed data sets and hence M estimates and their as-sociated variances, say ˆQm and Um, m = 1, 2, ..., M respectively (He and Raghunathan, 2012). The

point estimate of Q based on the multiply imputed data sets is then the average of the M completed-data estimates: Q =¯ PM

m=1 ˆ

Qm

M . Rubin (1996) argues that the associated variance-covariance of ¯Q

consists of the within-imputation variability ¯U = PM

m=1

Um

M and the between-imputation variability

B =PM

m=1

(Qm− ¯Q)(Qm− ¯Q)0

M −1 . The total variability associated with ¯Qmis then given by T = ¯U + M +1

M B.

A.2 Normalizing data

Bruffaerts, Verardi, and Vermandele (2014)’s rank preserving transformation for normalizing data takes the following steps. Suppose we have random variable X that represents the demand fraction of a particular time bucket at a given day and suppose we have n continuous realizations {x1, ..., xn} that

are assumed to be independent and unimodal. 1. First standardize the data

x∗_i =xi− l0 s0

, for i = 1, ..., n,

where l0 and s0are the median and the interquartile range of {x1, ..., xn} respectively.

2. Shift the data set to obtain strictly positive values

ri= x∗i − min({x∗1, ..., x∗n}) + 0.1, for i = 1, ..., n.

3. Standardize again to obtain new values in the open interval (0, 1) ˆ ri = ri min({x∗ 1, ..., x∗n}) + max({x∗1, ..., x∗n}) , for i = 1, ..., n.

4. These values can then be transformed to the normal distribution wi= Φ−1(ˆri), for i = 1, ..., n,

where Φ denotes the cumulative distribution function of the standard normal. 5. Standardize the values of wi

w∗_i = wi− l0 s0/1.3426

, for i = 1, ..., n,

where l0 and s0 are the median and interquartile range of {w1, ..., wn} respectively. The constant

1.3426 ensures, in the Gaussian case, that the modified scale estimator s0/1.3426 is consistent for

the scale parameter σ (the standard deviation).

A.3 February daily patterns

(26)

(a) Plotted daily demands for February 2017. (b) Plotted daily demands for February 2018.

(c) Plotted daily demands for February 2019.

Figure 14: Plotted daily demands for observed February.

A.4 Tukey g-and-h family

The skewness g and heaviness of the tail h are calculated with Bruffaerts, Verardi, and Verman-dele (2014)’s approach. Let zp denote the quantile of order p and Qp(x) denote the empirical quantile

of x of order p, where p ∈ (0, 1) denotes the robustness of the method with respect to outliers. Then g and h are estimated by

ˆ g = 1 zp ln− Qp(x) Q1−p(x) and h =ˆ 2ln− ˆg Qp(x)Q1−p(x) Qp(x)+Q1−p(x) z2 p . (6)

A.5 Rank preserving outlier winsorization

For the lower bound, the rank preserving method follows in the following way. Let G be the empirical quantile function of the TGH family of distributions and let the critical lower bound be G(θ1) for quantile

θ1, e.g. θ1 = 1%. Also, let the lower bound for the calibration process be given by q1, e.g. q1 = 10%.

Let the number of observations lower than the lower bound mL be sorted in ascending order and let

kL = [1, ..., mL]. Then the quantiles of the winsorized observations are sL = θ1(kL− 1)θ_m1−q1

L−1 with

corresponding values given by G(sL), which replaces the outliers whilst preserving the original rank.

Workforce Planning under Demand Uncertainty with Incomplete Data

University of Groningen

Master Thesis

Workforce Planning under Demand

Uncertainty with Incomplete Data

Author:

Ivo Wester (s2744945)

Workforce Planning under Demand

Uncertainty with Incomplete Data

Author:

Ivo Wester (s2744945)

Contents

1

Introduction

2

Literature review

2.1

Missing data

2.2

Imputation

2.3

Multiple imputation

2.4

Outliers

2.5

Workforce optimization

2.6

Contribution

3

Method

3.1

Data distribution

3.2

Basic problem

3.3

Bayesian multiple imputation

3.4

Outliers

3.5

Safety workforce

4

Results

4.1

Service level

4.2

Comparing schedules

5

Conclusion

6

Discussion and future research

References

A

Appendix

A.1

Pooling imputations

A.2

Normalizing data

A.3

February daily patterns

A.4

Tukey g-and-h family

A.5

Rank preserving outlier winsorization