Multivariate Time Series Forecasting of Daily Reservations in the Hotel Industry Using Recurrent Neural Networks

(1)

Multivariate Time Series Forecasting of

Daily Reservations in the Hotel Industry

Using Recurrent Neural Networks

submitted in partial fulfillment for the degree of

master of science

Sai Wing Mak

12109479

master information studies

data science

faculty of science

university of amsterdam

2019-07-01

Internal Supervisor External Supervisor Title, Name Ms. Inske Groenen Mr. Rik van Leeuwen Affiliation UvA, IvI Ireckonu

(2)

Multivariate Time Series Forecasting of Daily Reservations in

the Hotel Industry Using Recurrent Neural Networks

Sai Wing Mak

University of Amsterdam makwing0221@gmail.com

ABSTRACT

Reservation forecasting is a multivariate time series problem in-volving multiple seasonal patterns and a combination of exogenous factors. While a variety of models was previously implemented for the task, the adoption of Recurrent Neural Networks, in particular Long- and Short-Term Memory (LSTM) models which have proven to be capable of capturing temporal dynamics, remains scarce in hospitality. This study proposes a seasonal LSTM + residual LSTM (sLSTM+rLSTM) architecture to evaluate the e�ectiveness of LSTM in reservation forecasting. The result shows that sLSTM with only seasonal features already outperforms the "Same date-Last year" (SdLy) model that is widely adopted in the industry. However, the sLSTM+rLSTM model, despite the out-performance of the SdLy model, fails to learn the residual function, probably due to its re-quirement of an enormous amount of training data. Room rate is incorporated into the prediction using the concept of choice sets to produce forecasts for several rate categories, yet it is shown that the predictions of daily reservations with room rates above a spe-ci�c amount are rendered inaccurate. The research concluded with the argument that a simpler model might be preferred over a com-plicated LSTM structure for hotels with relatively small training samples, given the computational complexity the latter introduces.

1 INTRODUCTION

Accurate forecasting of daily reservation numbers is an important part of revenue management in hotel industry. Hotel revenue man-agement is perceived as the sales decision in which the hotel sells the right number of rooms to the right target at the right time and price with the ultimate objective of maximizing its revenue [25]. In particular, forecasting techniques have been adopted over the years to provide an estimate, using historical reservation history and current reservation activities, of sales in the future to achieve the purpose [30].

As new business models keep emerging, scholars highlighted the need of developing new and better forecasting models over time, given the increasing complexity of modeling the real time dynamic of revenue management [6][31]. In recent years, Recurrent Neural Network (RNN), in particular Gated Recurrent Units (GRUs) and Long- and short-term memory (LSTM), have proven to be e�ective in capturing temporal dependency in sequential data [4][16][24]. These methods, however, are not popularly applied in hotels as the industry has been widely adopting traditional forecasting tech-niques, in particular the ’same date-Last year’ (SdLy) approach due to their simplistic nature [9][20].

The number of daily reservations is a time series data, which is de�ned as a series of observations taken at successive points of time. Its natural temporal ordering distinguish itself from cross-sectional data, where observations of a subset of variables are recorded at a

speci�c point of time [3]. In the context of reservation forecasting, the number of reservations is characterized by numerous factors, such as seasonality, room rate and competitor price, etc. [13], which drives the prediction into a multivariate time series problem. Tra-ditional models on multivariate time series forecasting have long been established, such as vector autoregressive and transfer func-tion models to capture the interdependencies among time series [3].

Time series data usually exhibits seasonal patterns, which refers to the regular periodic �uctuations over a �xed and known interval, such as monthly and weekly [3]. For the hotel industry, seasonality poses a signi�cant in�uence to the number of reservations [25]. It has been shown that deseasonalizing a time series before producing forecasts would e�ectively reduce forecast error [12][32]. In view of this, this research attempts to �rst extract the seasonal compo-nents to investigate if it provides an improvement in the forecast performance.

Groenen [12] adopted the residual learning framework in time series forecasting, where the seasonal components are allowed to skip the network by using a separate GRU model, which in turn forces the network to learn the residual function. The result has suggested a signi�cant improvement in the model performance and motivates the application of residual learning in this research given the similar nature in terms of multivariate time series prediction.

All in all, the goal of the research is to propose a forecasting model for an accurate prediction of the total number of reservations in day-to-day operations in the hotel industry using the multivariate time series data provided by the company. The research is split into the following sub-questions:

(1) What is the performance of the current baseline model of reservation prediction in the hotel industry?

(2) How to incorporate reservations with di�erent room rates into the prediction?

(3) Does the selected forecasting model improve the perfor-mance of the current practice of sales forecasting in hotel industry?

The contribution of the research lies in the application of RNNs in solving hotel reservation forecasting problems that have not been intensively studied before. Besides, as this research adopts the models proposed by Groenen [12], this research also serves as a reproducibility study on the e�ectiveness of combining RNNs for seasonality extraction and residual learning to generate forecasts. This research is structured in the following: �rst, previous re-search work in summarized (Section 2), followed by the rere-search methodology (Section 3). Then, the implementation and evaluation is conducted (Section 4). Lastly, conclusions are drawn based on the �ndings (Section 5).

(3)

2 LITERATURE REVIEW

This section describes previous researches on the forecasts in the hotel industry (2.1), recurrent neural networks (2.2), seasonality extraction (2.3) and residual learning (2.4).

2.1 Forecasts in the hotel industry

2.1.1 Approaches of hotel forecasts.

Extensive researches have been previously done in reservation fore-casting using a variety of models, such as Holt-Winters method [20][22], exponential smoothing [5][28], moving average [28], re-gression [5][28] and Monte Carlo simulation approach [30]. Em-pirical evidence has shown, however, not a single model would universally outperform the others on all occasions as di�erent mod-els suit di�erent tourism forecasting context [21]. On the other hand, the concept of combination forecasts was suggested [8], in which forecasts are produced by a range of models combined and it is proven to yield a better performance [5][23]. Despite of an abundance of researches, however, the hotel industry tends to use a simpler method called ’Same date-Last year’ (SdLy) model due to its simplistic nature that can be easily explained and the re-quirement of only one data point [9], which su�ciently eases the implementation of reservation forecasts.

In recent years with the rise of recurrent neural networks (NNs), scholars have started to exploit these modern architecture, for in-stance, the adoption of NNs to model the trend component in the reservation numbers [29], yet researches remain relatively scarce. As such, this research tends to further explore the ability of NNs, in particular recurrent neural networks (2.2) in reservation forecast-ing. To be able to compare to the current practice of the industry, SdLy is served as a baseline model in this research.

2.1.2 Incorporation of room rates.

Room rate is an in�uential factor to reservation predictions [13][25]. Customers’ utility is maximized if they could pay a lower price dur-ing the decision process. When the price increases, customers’ util-ity diminishes, assuming all other conditions remain the same [2]. To take rate into account, the concept of choice sets is introduced, which refers to the coherent sequence of rates that a customer is willing to pay when making a reservation [14]. It provided an insight on how customer choice behavior can be accounted for in the forecast. Based on the assumption of diminishing utility, it is proposed that the model would output forecasts for di�erent choice sets as a way to incorporate rate plans into the research framework.

2.2 Recurrent Neural Networks

Recurrent neural networks, or RNNs, are a type of arti�cial neural networks specialized for handling sequential data. It is designed in a way that the state at some time t read information from the previous state t-1 and the external input xtto make predictions [10].

They have gained substantial attention and have proven e�ective in handling problems such as time series classi�cation, machine translation and speech recognition [10][11][16][24]. One pitfall of the traditional RNN, however, is its vulnerability to the vanishing gradient issue. During model training, a cost function is computed to measure, most commonly, the sum of squared di�erence between the actual and the predicted value. The back propagation algorithm

allows the cost to �ow backward through time to compute the gradients, and eventually update the network weights to improve the model. When the number of steps continuously increases, the gradients could become vanishingly small, which prevents models from further updating the weights [10].

To tackle this, scholars have developed new models such as Gated Recurrent Units (GRUs) and Long- and Short-Term Memory (LSTM). These models apply a gating mechanism through the adop-tion of forget gate, input gate and output gate to learn both short and long-term dependencies. This special architecture make GRUs and LSTMs powerful sequential models and have proven e�ective in practical applications [10]. In this study, LSTM networks are incorporated into this research as they have been shown to more easily learn long-term dependencies than other simple recurrent architectures [11][24].

2.3 Seasonality Extraction

Seasonality extraction has long been an ongoing �eld of research as it is shown that forecasting models perform better when the input data is deseasonalized [32]. The common approaches are to decompose the time series into trend, seasonal, cyclical and residual components by the means of moving average, smoothing or time series decomposition [25]. On the other hand, seasonal autoregressive integrated moving average (SARIMA) model is one of widely used models, as it speci�es, not only autoregression, di�erencing and moving averages for the time series, but also those of the seasonal component [1].

As one could expect the hotel industry might experience several seasonality patterns, mostly monthly, weekly and daily [25], the complexity of seasonality extraction increased drastically and some traditional seasonal models might not be su�ciently adequate. It therefore further consolidates the rationale behind adopting of RNNs in handling seasonality extraction.

2.4 Residual Learning

The increase in the depth of neural networks makes them more di�cult to train and might even result in the degradation of training accuracy [15][18]. Therefore, residual learning was introduced as a means to tackle the issue and is achieved by connecting the output of previous layers to the output of new layers [15]. The underlying mapping H(x) after several stacked layers is �t to F(x) + x, where F(x) is the residual function and x the input to these stacked lay-ers. The identical mapping of x ! H(x) guarantees the model to have training error no greater than its shallower counterpart and at the same time forces the intermediate layers to learn the residual function F(x) [15]. It has proven to be e�ective in various image recognition, sequential classi�cation [15][26][27] and in particular time series problems [12]. Inspired by previous researches, this paper adopts residual learning to allow seasonal in�uences, namely monthly, weekly and daily to bypass the network for an improve-ment in model performance. As He [15] also suggested that, while F(x) can be in a �exible form, performance improvement is only observed when the it has more than one single layer. Therefore, the residual models in this research consist of stacked LSTM cells.

(4)

3 MODELS

The section describes the baseline models (3.1) and the residual models (3.2).

3.1 Baseline Models

3.1.1 Same date-Last year (SdLy).

The SdLy model uses the number of reservations with the same calendar period and day of week in the previous year as this year’s prediction. For instance, the forecast of reservation numbers for Thursday, 21stFebruary 2019 would be that of Thursday, 22nd Feb-ruary 2018. Formally,

F₀,w,d =R₀ 1,w,d (1) where F ,w,d_{denotes the reservation forecast for day-of-week d}

(1  d  7) in week w (1  w  52) of year , and R 1,w,d_denotes

the observed reservation number for the same day-of-week in the same week of last year. Only the check-in date (t = 0) re�ecting the total reservation numbers is calculated as partial bookings (t = 1, 2, ...M) are not of hotels’ major interest.

3.1.2 Weighted Seasonal Components (WSC).

The monthly, weekly and daily seasonality patterns are extracted from the number of daily reservations and are calculated as the weighted sum of the seasonal in�uences. This can be expressed as:

W SCi =’

c

Wmcmic+Wwcwic+Wdcdic (2)

where W SCiis a vector of the weighted seasonal prediction at

time step i of c rate categories. Wmc denotes the weight of mic,

which is the di�erence between the daily reservation number for the month of time step i and the mean daily number over all months of rate category c;Wwcdenotes the weight of wic, which represents

the di�erence between the daily reservation number for the week of time step i and the mean daily number over all weeks of rate category c; and Wdc denotes the weight of dic, which is the

di�er-ence between the daily reservation number for the day of week of time step i and the mean daily number over all weekdays of rate category c. The weights are found using Adam optimization. To predict n-step ahead the number of reservations,

ˆY = WSC(xsc) (3)

where ˆY is a matrix of n predictions for c rate categories, and xscis a vector containing the seasonal features of the n prediction

steps of c rate categories.

3.1.3 Long and Short-Term Memory (LSTM).

A LSTM unit comprises a cell, an input gate, an output gate and a forget gate. The model contains two stacked LSTM cells with 75 neurons each, where the parameters are found by grid search. A stochastic optimization i.e. using a batch size of one is implemented to speed up the learning process while achieving a low value of cost function [10]. At each time step t, the �rst LSTM cell inputs both seasonal and residual explanatory features xtand the hidden

state from the previous time step h1

t 1to output the hidden state h1t.

This is achieved using the gating mechanism of LSTM as depicted in Equations 4-9: f_(t)i = ©≠ ´ ’ j U_{i, j}f x_(t)j +’ j W_{i, j}f hj_{(t 1)}+b_if™Æ ¨ (4) i (t)= ©≠ ´ ’ j U_{i, j}x_(t)j +’ j W_{i, j}hj_{(t 1)}+b_i™Æ ¨ (5) ˜si (t)=tanh ©≠ ´ ’ j Ui, jx_(t)j +’ j Wi, jhj_{(t 1)}+bi™Æ ¨ (6) s_(t)i =f_(t)i si_{(t 1)}+ i (t) ˜s(t)i (7) qi_(t)= ©≠ ´ ’ j U_{i, j}o x_(t)j +’ j W_{i, j}oh_{(t 1)}j +bo_i™Æ ¨ (8) hi_(t)=qi_(t) tanh⇣si_(t)⌘ (9) where fi

(t)denotes the forget gate, (t)i the input gate, ˜si(t)the

state unit, si

(t)the updated state unit, qi(t)the output gate, and

hi_(t)the output for time step t and cell i. Matrices b, U and W are respectively biases, input weights and recurrent weights into the LSTM cell. and tanh denote respectively sigmoid and hyperbolic activation functions, and denotes element-wise multiplication.

The second stacked LSTM cell at time step t takes as input the hidden state of the �rst cell h1

tand its hidden state from the previous

time step h2

t 1and output the updated hidden state ht2using the

same formulas as depicted in Equations 4-9.

After processing 56 time steps, the last hidden state of the second LSTM cell is passed into a dense layer ofc⇥n units with no activation function for the n prediction time steps of the c rate categories. This �nal prediction is expressed as:

ˆY = H(h2

t) (10)

where h2

t denotes the last hidden state of the second LSTM cell

and ˆY denotes a vector of c ⇥ n predictions. 3.1.4 seasonal LSTM (sLSTM).

The architecture of sLSTM is identical to that of LSTM, in which the only di�erence is, instead of inputting both seasonal and residual features xt, sLSTM only takes as input the monthly, weekly and

seasonal features as that of WSC as depicted in Equation 2. The model contains two stacked cells with 50 neurons each, where the parameters are found by grid search. The implementation remains the same, where at each time step t, the �rst LSTM cell inputs the seasonal explanatory features xs and the hidden state from

the previous time step h1

t 1to output the hidden state h1t, and the

second cell takes as input the hidden state of the �rst cell h1 t and

its hidden state from the previous time step ht 12to output the

updated hidden state h2

t using Equations 4-9. sLSTM is also able to

process 56 time steps. After that, the last hidden state of the second LSTM cell is passed into a dense layer of n units with no activation function to generate the seasonal features of the n prediction time steps. This �nal prediction is expressed as:

(5)

where S(xsc) denotes a vector of seasonal features of the n steps

of prediction and xsc denotes a vector of monthly, weekly and

daily seasonal di�erences of the n steps of prediction for the c rate categories.

3.2 Residual Models

3.2.1 Weighted Seasonal Components + residual Long- and Short-Term Memory (WSC+rLSTM).

This model combines both the WSC and LSTM, whereby the ar-chitecture of the WSC is identical to the one in the baseline model. The rLSTM, instead of mapping the input xtto the output ˆY, now

maps the input xr to F(xr), where xr denotes the input vector of

representing the residual features at time t and F(xr) the di�erence

between the WSC prediction and the actual reservation number for all prediction steps. It is formalized as:

ˆY = WSC(xsc) + F(xr) (12)

Instead of taking 56 steps before making a prediction, the rLSTM takes in 42 steps as it is observed from the explanatory data analysis that more 80% of the reservations were made six weeks before the check-in date. After processing 42 steps of information, the last hidden state of the second LSTM cell is passed to a dense layer of c⇥ n units with no activation and be concatenated with W SC(xsc),

as shown in Equation 12.

3.2.2 Seasonal Long- and Short-Term Memory + residual Long- and Short-Term Memory (sLSTM+rLSTM).

This model is a combination of two LSTMs, whereby the seasonal patterns are �rst extracted using the �rst LSTM, hereinafter sLSTM that have the same structure as the one in the baseline model. This is followed by the learning of the di�erence by rLSTM with the information from the booking horizon, and it has an identical structure as the one in WSC+rLSTM. It is formalized as:

ˆY = S(xsc) + F(xr) (13)

4 EVALUATION

This section describes the data used in this research (4.1), the adop-tion of booking matrix for residual learning (4.2), the incorporaadop-tion of choice sets to account for room rates (4.3), the model imple-mentation details (4.4), the accuracy measures (4.5), the exhibited seasonality patterns in the data (4.6) and the experimentation results of the proposed models (4.7).

4.1 Data

The data set used in this research contains:

(1) Reservation records from January 2013 to March 2019 with-out booking date;

(2) Reservation records from July 2016 to March 2019 with: • the booking date;

• the source of booking; and

• the room rate at the time of making reservation. The hotel changed from one Property Management System (PMS) to another during April-June 2016, so 1) the booking date for each reservation; and 2) the rate plan for each reservation are only avail-able starting July 2016, and the construction of a booking matrix

(4.2) is therefore only possible on the time series data between July 2016 and March 2019. Despite of that, the in�uence of multiple seasonality patterns is to be modeled using all available data since January 2013 to provide a better understanding of how reservations are a�ected by these seasonal factors.

As guests might spend more than one night in the hotel, a tech-nique to handle the issue is by decomposing the n-night reservation for arrival day d into n single-night reservations with di�erent dates of stay (from d to d + n 1) and corresponding lead time (from t to t + n 1) [9]. After applying the technique, the number of reservations represents the number of rooms that are occupied for a particular night.

From the data set, the series of daily reservation number is ex-tracted as the target variable. For seasonal models, three features re-garding seasonality are constructed for each target variable, namely the di�erence between the daily reservation number for the month and the mean daily number over all months (mi), the di�erence

be-tween the daily reservation number for the week and the mean daily number over all weeks (wi), and the di�erence between the daily

reservation number for the day of week and the mean daily number over all weekdays (di). Therefore, there are 3 seasonal explanatory

variables for both WSC and sLSTM of each rate category. The residual features for each target variable are the count of daily reservations, the average rate, the number of bookings through the four available sources: online travel agency (OTA), direct reservations (DIR), web reservation (WEB) and Global Dis-tribution Systems (GDS) over the booking horizon. A dummy vari-able representing whether or not the check-in date is also included. These variables are extracted, as suggested by [13], and also because of the high correlation detected during explanatory data analysis. This results in 10 explanatory variables (3 seasonal, 7 residuals). As the model is going to give predictions on 3 rate categories (4.3), all the aforementioned features, except the average rate and the holiday dummy, are constructed three times to account for the ac-cording seasonality and residual patterns. At the end, there are 26 explanatory variables (9 seasonal, 17 residuals) for respective reser-vation forecasting of the three rate categories for WSC+rLSTM, sLSTM+rLSTM and LSTM.

Five n-steps ahead predictions are evaluated: {3, 7, 10, 14, 21} to investigate the e�ectiveness of both long- and short-term predic-tions of the models. A sliding window approach is adopted, that being said, when new information becomes available at t + 1, pre-dictions are made using this new data point together with the input time steps.

4.2 Booking matrix

The booking matrix, as shown in Table 1, is a means to illustrate how the �nal reservation number is accumulated by partial bookings over the booking horizon [17]. For a particular check-in date d, the number of guests who make a reservation t days before the check-in date is denoted as Rd

t where t = 0,1,2...,M with M the length of

the booking horizon. Each column includes all partial advanced bookings for check-in date d, and the column sum returns the �nal reservation number.

As described in Section 3.2, after the seasonal models capture the monthly, weekly and daily in�uences, rLSTM is set up using

(6)

Table 1: Booking matrix

Rd

t denotes the total number of reservations at time t days before the check-in date

d. The question marks represent the reservation numbers that are not known yet on date d. ... d-2 d-1 d d+1 d+2 ... d+M t ... Rd 2 0 Rd 10 Rd0 ? ? ? ? 0 ... Rd 2 1 Rd 11 Rd1 Rd+11 ? ? ? 1 ... Rd 2 2 Rd 12 Rd2 Rd+12 Rd+22 ? ? 2 ... Rd 2 3 Rd 13 Rd3 Rd+13 Rd+23 ... ? 3 . . . . ... Rd 2 M 1 Rd 1M 1 RdM 1 Rd+1M 1 Rd+2M 1 ... ? M-1 ... Rd 2 M Rd 1M RdM Rd+1M Rd+2M ... Rd+MM M

Equations 5-9 with each Rd

t is the input to the network for t =

0,1,2...,M to further learn the residual function F(x).

Figure 1 shows the accumulated partial bookings over a 90-day booking horizon. It is generally observed that these bookings in-crease exponentially when the check-in date approaches.

Figure 1: Booking horizon

4.3 Choice Sets

Room rate is incorporated using the concept of choice sets (2.1.2). Reservations are divided into three bins, representing the number of guests who paid more than e 0 (i.e. all guests), e 100, e 150. By doing so, the company could gain insight into the pricing strategy by approximating the number of guests to expect in each rate bin. The model is con�gured accordingly to output �ve n-step ahead forecasts for each of the three bin categories.

4.4 Implementation Details

The train set covers daily reservation numbers from 1 July 2016 to 11 September 2018, which consists of 741 observations. During model training, the train set is randomly shu�ed and further divided into train and validation sets, where 666 observations are used to train the model and 75 observations for validation purpose. All models are evaluated on the test set that covers the period from 12 September 2018 to 31 March 2019, which results in 201 observations in total.

Grid search was implemented to search for the optimal hyperpa-rameters of the models. Values being searched were, for the number

of layers of sLSTM and LSTM, {1, 2}; for the number of neurons of sLSTM, LSTM, WSC+rLSTM and sLSTM+rLSTM, {50, 75, 100, 150}; and for the dropout rate in each LSTM cell of LSTM, sLSTM, WSC+rLSTM and sLSTM+rLSTM, {0, 0.25, 0.5, 0.75}. As for the rL-STM component in WSC+rLrL-STM and sLrL-STM+rLrL-STM, 2 layers are constructed as with reference to [15].

To train WSC+rLSTM, the previously trained sLSTM was �rst loaded and all layers are frozen for optimizing the rLSTM cells. After that, the sLSTM layers were unfrozen to optimize the entire model. The same applied to sLSTM+rLSTM, whereby instead of WSC, the sLSTM was �rst frozen.

All LSTM models were constructed using Keras [7]. During model training, Adam optimization with a step decay was applied, in which the learning rate was initially set at 0.01, and was dropped by half every 10 epochs. Early stopping with a patience of 50 epochs was applied as an implicit regularization. After the training was stopped, the most optimal parameters were returned and the models were considered trained.

4.5 Evaluation

The following metrics are used, namely Root Mean Squared Error (RMSE), Mean Absolute Error (MAE) and Weighted Mean Absolute Percentage Error (WMAPE) to evaluate the model performance. In essence, these measures quantify the di�erence between the actual and the prediction reservation numbers. The smaller the value, the higher the accuracy.

RMSE = sÕn i=1( i î)2 n (14) MAE = Õn i=1| i î| n (15) W MAPE = Õn i=1| i î| Õn i=1 i ⇤ 100% (16) where

i=actual daily reservations ˆi=predicted daily reservations n =total number of observations

4.6 Seasonality of the data

Figures 2, 3 and 4 visualize the monthly, daily and weekly seasonal patterns respectively. It can be observed in Figure 2 that the winter time i.e. from December to February is the o� season of the hotel, with daily reservation numbers reaching only two-third of the maximum capacity of 225 rooms. The number of daily reservations remains relatively steady in other months, except a slight decrease in July.

The day-of-week seasonality in Figure 3 reveals the hotel took in the least number of guests on Sunday while Saturday appears to be the most popular day for staying. As for the weekly seasonality, it can be seen in Figure 4 that the number of reservations plummeted in the 51stweek, very likely due to the Christmas holiday. Also, the bookings are comparatively lower in the �rst few weeks than that of the other weeks, which explains the low reservation numbers in December, January and February as shown in the Figure 2. The

(7)

relatively lower number of reservations in July can also be attributed to the slump in the 31st_week.

Figure 2: Monthly average of reservation numbers in the anonymized hotel in Amsterdam from January 2013 to March 2019

Figure 3: Day of week average of reservation numbers in the anonymized hotel in Amsterdam from January 2013 to March 2019

4.7 Results

This section evaluates the baseline and residual models as described in Section 3 and presents the result of these experiments. 4.7.1 Experiment 1: SdLy model.

The �rst experiment is to construct the SdLy model that is currently being used by the hotel for the prediction of all reservations. It is shown in Table 2 that the model achieves very similar performance for all the time steps given its naive nature. The performance is fairly acceptable, given the usage of one data point to make future prediction, which further solidify the reason why the hotel chooses to adopt this naive model.

4.7.2 Experiment 2: Evaluating models with only seasonal features. Table 2 shows that both prediction models with only the seasonality have already outperformed the SdLy model. This suggested that the reservation numbers of the hotel are signi�cantly in�uenced by seasonal factors, so only capturing the monthly, weekly and daily di�erences could already provide a more accurate forecast. The reason that the sLSTM is achieving a lower accuracy score for all the time steps than WSC might be due to the reason that only a

relatively small amount of data was used to train the model. If more training data could be obtained, the performance would further be optimized.

For reservation prediction with room rate above e 100 and e 150, both WSC and sLSTM perform poorly. Both models fail to capture the number of guests who will pay more than a certain price. This might be explained by the coconut uncertainty, a term coined by Makridakis et al. [19], which refers to the events that can hardly be envisioned. While there is a clear seasonal patterns for the daily reservations thus resulting in a fairly accurate predictions, the two time series representing reservation numbers with room rate above e 100 and e 150 suggested otherwise. For reservations priced above e 150, for instance, the large proportion of days in which the reservation numbers are zero inevitably creates noises and hinders the ability of the model to learn the non-zero irregular pattern. In addition to the fact that model is trained with a relatively small amount of data, the forecasts fail to generalize to the two rate categories.

4.7.3 Experiment 3: Evaluating LSTM model with both seasonal and residual features.

As can be observed from Tables 2, 3 and 4, the performance of using both seasonal and residual features in the LSTM model does not provide any signi�cant improvement. The calculated metrics for all the three rate categories are similar to those of sLSTM, in which the di�erences are very likely due to random �uctuations.

This is surprising, considering the fact that the room rate and the source of booking are correlated to the reservation numbers. A possible explanation, apart from the relatively small training samples, is that the variables are in the form of sparse vectors, so the LSTM decided to abandon these features and eventually return a similar mapping as that of sLSTM. It is not uncommon for some days over the booking horizon to have zero reservations. The room rate and the source of booking thus also end up in zeros. As the seasonal features already provide some predictability to the reservation numbers, these sparse variables might be dropped for the sake of simpler and more e�ective learning.

4.7.4 Experiment 4: Evaluating the performance using WSC for sea-sonality extraction and rLSTM for residual learning.

The results in Tables 2, 3 and 4 suggest that combining WSC with rLSTM does not improve the model performance, not align with the �ndings in [12]. The computed metrics are very similar to those in WSC, revealing the model solely returns the identical mapping x ! H(x) and fail to learn the residual function F(x). A similar explanation to the previous experiment (4.7.3) might be drawn, in which the sparsity of features in addition to the relatively small sample size hinders the learning of F(x).

4.7.5 Experiment 5: Evaluating the performance of using sLSTM for seasonality extraction and rLSTM for residual learning.

A similar result to the previous experiment (4.7.4) is observed, in which sLSTM+rLSTM does not improve the model performance as the evaluation metrics only slightly �uctuate around those of sLSTM, so only the identical mapping is returned with no minimal learning of F(x). Given the similar behavior of the model as that of LSTM and WSC+rLSTM, the same rationale behind this observation

(8)

Figure 4: Weekly average of reservation numbers in the anonymized hotel in Amsterdam from January 2013 to March 2019 Table 2: t-step ahead prediction of reservation numbers for all reservations.

t Measures SdLy Seasonality Models LSTM Residual Models WSC sLSTM WSC+rLSTM sLSTM+rLSTM RMSE 28.01 22.66 22.82 25.26 22.98 23.45 t+3 MAE 19.87 16.44 17.50 19.01 16.70 17.84 WMAPE 11.07% 9.33% 10.06% 10.92% 9.52% 10.31% RMSE 28.02 22.63 23.63 21.48 22.71 23.96 t+7 MAE 19.91 16.70 17.33 16.21 16.58 18.76 WMAPE 11.09% 9.48% 9.95% 9.31% 9.44% 10.77% RMSE 28.01 22.73 23.02 23.81 22.85 23.83 t+10 MAE 19.88 16.73 17.43 18.01 16.61 18.35 WMAPE 11.04% 9.53% 9.99% 10.32% 9.48% 10.52% RMSE 28.06 22.81 23.94 21.40 22.87 23.16 t+14 MAE 19.93 16.78 18.08 16.47 16.62 17.80 WMAPE 11.07% 9.55% 10.36% 9.44% 9.48% 10.20% RMSE 27.88 22.53 22.44 22.60 22.90 23.33 t+21 MAE 19.75 16.45 17.05 17.21 16.71 18.50 WMAPE 10.92% 9.33% 9.77% 9.86% 9.53 10.60% is drawn which is attributed to the sparsity of features on top of

the relatively small sample size.

5 CONCLUSION

In this study, the e�ectiveness of LSTM on reservation forecast-ing in the hotel industry is evaluated. It is shown, based on the three evaluation metrics, that sLSTM outperforms the SdLy baseline model that is currently being adopted in the hotel industry, though it achieved a slightly lower accuracy than a simpler WSC model. One conclusion that can be reasonably drawn is that seasonality plays an important part in hotels, align with previous research [25], and thus the adoption of seasonal models, either a simple or a complicated one, already contributes to the forecast improvement with respect to current practice.

Combining sLSTM or WSC with rLSTM fails to improve the model performance as expected. As aforementioned, two possible

rationales behind this are 1) the comparatively small size of training samples during model implementation, thus preventing the full functionality of LSTM to be exploited, and 2) the sparsity of features of room rates and the source of booking over the booking horizon, which makes the LSTM to make the decision of dropping these operations. This might be handled by dimensionality reduction techniques in future researches.

The concept of choice sets is adopted to provide forecasts of reservation numbers in di�erent rate category in conjunction with the proposed models and the result shows that, except the fairly accurate forecast of daily reservation numbers of all reservations, the predictions for the two speci�c rate bins i.e. room rates above e100 and e 150 are not promising given the high forecast errors. The rationale behind this observation potentially attributes to the coconut uncertainty, where the irregular non-zero patterns of these

(9)

Table 3: t-step ahead prediction of reservation numbers for reservations with a room rate above e 100. t Measures SdLy Seasonality Models LSTM Residual Models

WSC sLSTM WSC+rLSTM sLSTM+rLSTM RMSE - 39.45 42.47 42.22 39.35 41.12 t+3 MAE - 29.27 33.40 32.05 28.80 31.61 WMAPE - 21.13% 24.63% 23.64% 20.90% 23.32% RMSE - 39.80 42.87 40.45 38.40 43.53 t+7 MAE - 29.59 32.94 30.31 28.54 32.99 WMAPE - 21.37% 24.32% 22.39% 20.86% 24.37% RMSE - 38.84 45.97 45.38 38.81 46.00 t+10 MAE - 29.03 37.11 34.40 28.81 36.61 WMAPE - 21.16% 27.43% 25.42% 21.07% 27.06% RMSE - 39.37 42.41 41.21 39.53 42.03 t+14 MAE - 29.26 31.81 30.90 29.14 31.48 WMAPE - 21.41% 23.55% 22.87% 21.38% 23.31% RMSE - 39.34 43.44 41.02 39.37 43.26 t+21 MAE - 29.25 33.81 30.04 28.88 33.65 WMAPE - 21.23% 25.12% 22.32% 21.13% 25.00% Table 4: t-step ahead prediction of reservation numbers for reservations with a room rate above e 150. t Measures SdLy Seasonality Models LSTM Residual Models

WSC sLSTM WSC+rLSTM sLSTM+rLSTM RMSE - 46.29 56.48 52.80 46.86 54.29 t+3 MAE - 31.39 37.64 35.98 32.54 37.10 WMAPE - 67.59% 78.93% 75.44% 67.83% 77.79% RMSE - 51.53 57.97 53.98 45.35 59.7 t+7 MAE - 33.32 38.87 37.58 30.26 42.23 WMAPE - 68.81% 81.36% 78.68% 67.02% 82.87% RMSE - 47.72 59.45 53.60 45.39 58.59 t+10 MAE - 31.34 39.61 37.49 30.41 40.57 WMAPE - 68.95% 80.81% 76.49% 67.72% 82.77% RMSE - 45.39 55.95 54.03 45.73 54.39 t+14 MAE - 30.32 38.34 38.06 30.84 38.21 WMAPE - 68.61% 79.76% 79.18% 67.68% 79.49% RMSE - 45.65 58.54 54.05 45.98 57.08 t+21 MAE - 30.63 41.22 38.97 31.34 42.28 WMAPE - 68.14% 85.51% 80.84% 68.14% 87.69% time series makes it di�cult, if not impossible, for the LSTM model

to capture.

One limitation of this research, that has already been outlined, is the comparatively small number of training samples for func-tionality of LSTM to be fully exploited. If one is to replicate this study, more samples are to be expected so LSTM could learn from data involving a longer period and potentially anticipate an im-proved performance. Another limitation is attributed to the current setting of LSTM, where the model connects to a dense layer to provide 5 prediction steps for 3 rate categories. As the model fails to generalize the result to the second and third rate categories, the inclusion of these two series is believed to deteriorate the overall

model performance. In this study, the construction of separate mod-els was not made possible due to a limited computational budget, but one might expect a better result if each time step is optimized separately.

To conclude, the unpromising result of the research might po-tentially explain why RNNs have not been widely adopted in the hotel industry attributed to its data-hungry nature, in contrast to the SdLy model where only one data point is needed and its easy interpretation [9]. Despite its outperformance of SdLy model, in the cases where hotels do not have a multitude of training samples and the reservations exhibit a variety of prominent seasonality patterns, a simpler method such as WSC might even be preferred,

(10)

given the computational complexity of the more complicated LSTM architecture with fairly similar results.

ACKNOWLEDGEMENTS

I would like to thank Ireckonu for providing all the necessary data I needed and created a fun work environment to work in. I am grateful to Rik van Leeuwen for delivering his professional knowl-edge of the hotel industry and clari�ed any doubts I had at his best e�orts. Also, many thanks to my friends, in particular my house-mate Marius Zeevaert for all the positive encouragement. Last but de�nitely not the least, I would like to express my sincere gratitude to Inske Groenen for her unparalleled support during my thesis. She had been most helpful and patient the entire time and provided constructive feedback all along. This thesis would not have been done without her.

REFERENCES

[1] George Athanasopoulos and Rob J. Hyndman. Forecasting: Principles and Practice. OTexts: Melbourne, Australia, 2013.

[2] Moshe Ben-Akiva and Steven R. Lerman. Discrete Choice Analysis: Theory and Application to Travel Demand. Cambridge, Mass. : MIT Press, 1985.

[3] B.L. Bowerman, R.T. O’Connell, and A.B. Koehler. Forecasting, Time Series, and Regression: An Applied Approach. Duxbury advanced series in statistics and decision sciences. Thomson Brooks/Cole, 2005.

[4] Rohitash Chandra and Mengjie Zhang. Cooperative coevolution of elman re-current neural networks for chaotic time series prediction. Neurocomputing, 86:116–123, 2012.

[5] Christopher Chen and Soulaymane Kachani. Forecasting and optimisation for hotel revenue management. Journal of Revenue and Pricing Management, 6(3):163– 174, Sep 2007.

[6] Wen-Chyuan Chiang, Jason C.H. Chen, and Xiaojing Xu. An overview of research on revenue management: current issues and future research. Int. J. Revenue Management, 1(1):97–128, 2007.

[7] François Chollet et al. Keras. https://keras.io, 2015.

[8] Robert T. Clemen. Combining forecasts: a review and annotated bibliography. International Journal of Forecasting, (5):559–584, 1989.

[9] Anna Maria Fiori and Ilaria Foroni. Reservation forecasting models for hospi-tality smes with a view to enhance their economic sustainability. Sustainability, 11(5):1274, 2019.

[10] Ian Goodfellow, Yoshua Bengio, and Aaron Courville. Deep learning. Mas-sachusetts Institute of Technology, Cambridge, MA., 2016.

[11] Alex Graves, Abdel rahman Mohamed, and Geo�rey Hinton. Speech recognition with deep recurrent neural networks. arXiv e-prints, 2013.

[12] Inske Groenen. Representing seasonal patterns in gated recurrent neural networks for multivariate time series forecasting. Master Thesis, University of Amsterdam, 2018.

[13] Peng Guo, Baichun Xiao, , and Jun Li. Unconstraining methods in revenue management systems: Research overview and prospects. Advances in Operations Research, 2012.

[14] Alwin Haensel and Ger Koole. Estimating unconstrained demand rate functions using customer choice sets. Journal of Revenue and Pricing Management, 10(5):438– 454, 2011.

[15] Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), June 2016.

[16] Michael Hüsken and Peter Stagge. Recurrent neural networks for time series classi�cation. Neurocomputing, 50(C):223–235, 2003.

[17] Anthony Owen Lee. Airline reservations forecasting: probabilistic and statistical models of the booking process. Cambridge, Mass.: Flight Transportation Laboratory, Dept. of Aeronautics and Astronautics, Massachusetts Institute of Technology, 1990.

[18] Roi Livni, Shai Shalev-Shwartz, and Ohad Shamir. On the computational e�-ciency of training neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N. D. Lawrence, and K. Q. Weinberger, editors, Advances in Neural Information Processing Systems 27, pages 855–863. Curran Associates, Inc., 2014.

[19] Spyros Makridakis, Robin M. Hogarth, and Anil Gaba. Forecasting and uncer-tainty in the economic and business world. International Journal of Forecasting, 25(4):794 – 812, 2009. Special section: Decision making and planning under low levels of predictability.

[20] Luis Nobre Pereira. An introduction to helpful forecasting methods for hotel revenue management. International Journal of Hospitality Management, 58:13–23,

2016.

[21] Bo Peng, Haiyan Song, and Geo�rey I. Crouch. A meta-analysis of international tourism demand forecasting and implications for practice. Tourism Management, 45:181 – 193, 2014.

[22] Mihir Rajopadhye, Mounir Ben Ghalia, Paul P. Wang, Timothy Baker, and Craig V. Eister. Forecasting uncertain hotel room demand. Information Sciences, 132(1):1 – 11, 2001.

[23] Shujie Shen, Gang Li, and Haiyan Song. Combination forecasts of international tourism demand. 2011.

[24] Ilya Sutskever, Oriol Vinyals, and Quoc V Le. Sequence to sequence learning with neural networks. In Z. Ghahramani, M. Welling, C. Cortes, N.D. Lawrence, and K.Q. Weinberger, editors, Advances in Neural Information Processing Systems 27 (NIPS 2014), pages 3104–3112. Neural Information Processing Systems Foundation, Inc., 2014.

[25] Kalyan Talluri and Garrett van Ryzin. The Theory and Practice of Revenue Man-agement. Springer Science+Business Media, Inc., 233 Spring Street, New York, NY 10013, USA., 2004.

[26] Fei Wang, Mengqing Jiang, Chen Qian, Shuo Yang, Cheng Li, Honggang Zhang, Xiaogang Wang, and Xiaoou Tang. Residual attention network for image clas-si�cation. In The IEEE Conference on Computer Vision and Pattern Recognition (CVPR), July 2017.

[27] Yiren Wang and Fei Tian. Recurrent residual learning for sequence classi�cation. In Proceedings of the 2016 conference on empirical methods in natural language processing, pages 938–943, 2016.

[28] Larry R. Weatherford and Sheryl E. Kimes. A comparison of forecasting methods for hotel revenue management. Cornell University, School of Hotel Administration, 19(3):401–415, 2003.

[29] Athanasius Zakhary, Neamat El Gayar, and Sanaa El-Ola. H. Ahmed. Exploit-ing neural networks to enhance trend forecastExploit-ing for hotels reservations. In Friedhelm Schwenker and Neamat El Gayar, editors, Arti�cial Neural Networks in Pattern Recognition, pages 241–251, Berlin, Heidelberg, 2010. Springer Berlin Heidelberg.

[30] Athanasius Zakharya, Amir F. Atiyab, Hisham El-Shishinyc, and Neamat E.Gayar. Forecasting hotel arrivals and occupancy using monte carlo simulation. Journal of Revenue and Pricing Management, 10(4):344–366, 2011.

[31] Hossam Zaki. Forecasting for airline revenue management. The Journal of Business Forecasting Methods Systems, 19(1):2–6, 2000.

[32] G.Peter Zhang and Min Qi. Neural network forecasting for seasonal and trend time series. European Journal of Operational Research, 160(2):501–514, 2005.