Promotion techniques at Bol.com: Dealing with heterogeneity in a store that sells nearly everything

(1)

Dealing with heterogeneity in a store

that sells nearly everything

by

Tom Jongsma

Thesis supervisor: Prof. Dr. Jaap Wieringa Co-reader: Dr. Kees Praagman Company supervisor: Melissa Perotti M.Sc.

THIS FILE IS STRICTLY PRIVATE, CONFIDENTIAL AND PERSONAL TO ITS RECIPIENTS AND SHOULD NOT BE COPIED, DISTRIBUTED OR REPRO-DUCED IN WHOLE OR IN PART, NOR PASSED TO ANY THIRD PARTY.

A thesis submitted in partial fulfillment for the degree of

M.Sc. in Econometrics, Operations Research and Actuarial Studies

at the

Faculty of Economics and Business Rijksuniversiteit Groningen

(2)

(3)

(4)

(5)

List of Tables vii

1 Introduction 1

1.1 Bol.com organizational structure . . . 2

1.2 Problem definition . . . 3

2 Literature Review 5 2.1 Theory. . . 5

2.1.1 Effect of sales promotions . . . 5

2.1.2 Sales promotion techniques . . . 7

2.1.3 Similar studies . . . 9

2.2 Modeling options . . . 9

2.2.1 Panel data models . . . 10

2.2.2 Clusterwise regression . . . 11

2.2.3 Seemingly Unrelated Regressions . . . 12

2.2.4 Hierarchical models . . . 13

2.3 Conclusion . . . 14

3 Empirical methods 17 3.1 Data architecture . . . 17

3.2 Raw data description. . . 18

3.3 Data collection . . . 19 3.4 Data editing. . . 20 3.5 Data description . . . 21 3.6 Model specifications . . . 25 3.6.1 Variable selection. . . 25 3.6.2 Fixed effects . . . 27 3.6.3 SUR . . . 28 3.6.4 Hierarchical models . . . 28 4 Results 31 4.1 Fixed effects. . . 31

4.2 Seemingly Unrelated regressions. . . 34

4.3 Hierarchical models. . . 34

5 Discussion 37 5.1 Goodness of fit . . . 37

5.2 Hierarchical model results . . . 38

5.3 Fixed effects models . . . 39

5.4 Hypotheses . . . 40

5.5 Data quality. . . 42

6 Conclusion 43 6.1 Further research . . . 45

(6)

7 Appendix 47

7.1 Price stars . . . 47

7.2 Additional goodness of fit statistics . . . 49

7.3 Tool for simple insights into the relative effectiveness of promotion techniques . . 50

7.4 Clustered by model fit results . . . 51

7.5 Product clusters by model fit . . . 59

7.6 Pig-Latin scripts . . . 74

7.6.1 Merging different data sources . . . 74

7.7 Aggregating data . . . 79

7.8 R scripts. . . 84

7.8.1 Loading, joining and editing data. . . 84

7.8.2 Fixed effects . . . 91

7.8.3 Hierarchical Models . . . 95

7.9 Stata do-file for clustering by model fit . . . 96

(7)

2.1 Sales promotion classification according toBlattberg and Neslin(1990). . . 7

3.1 Summary statistics . . . 25

4.1 Goodness of fit statistics for fixed effects models . . . 32

4.2 Fixed effects regressions results . . . 33

4.3 Goodness of fit statistics for hierarchical models. . . 34

4.4 Hierarchical models. . . 35

5.1 Goodness of fit statistics for all models . . . 38

7.1 Goodness of fit statistics cluster by model fit . . . 49

(8)

(9)

Introduction

Bol.com is the largest e-commerce corporation in the Netherlands. Its target audience is con-sumers in the Benelux area and at the beginning of 2017 Bol.com offered over 15 million products to a customer base of 14 million consumers in the Netherlands and Belgium. In 2016 Bol.com’s revenue topped 1.2 billion euros, 25% of which was realized during promotions. Because of this, optimizing the strategies Bol.com employs when deploying promotions is of great importance. However, within Bol.com no thorough research has been done that investigates the effective-ness of the many different promotions in the store. While Bol.com has been collecting data on the promotions running in the store since September 2016, insights up until this point consist of dashboards visualizing historical performance by allowing for cross-sections and aggregations. No models have been created that attempt to do statistical inference or generate other additional insights.

This presents an interesting opportunity for econometric research, since insights derived from the collected data on promotional activities could help Bol.com further improve its promotions. Making the very conservative assumption that promotions in 2018 will generate the same revenue as in 2016 means that even a 1% increase in effectiveness would generate an additional 3 million euros. Thus research on how to effectively use promotions has high potential to be valuable to Bol.com. While research has been done on many aspects of promotions it is often not clear whether conclusions from such research can be applied to Bol.com, since the rise of major e-commerce corporations offering a large variety of products is a recent development. Furthermore, even in cases where previous research can be reasonably assumed to be applicable to Bol.com it is interesting to check the validity of these assumptions using Bol.com’s own data, while simultaneously providing some insights into the effectiveness of promotions in a webshop of its size.

In this thesis we will use the data collected by Bol.com to research the effectiveness of bol.com’s promotional activities for all products the company offers. Since it can be expected that product groups have different reactions to promotions we will test several model specifications that can account for this. The results of these modeling efforts give Bol.com insights in what kind of promotions perform optimally for each group of products. These insights allow Bol.com to further optimize its promotions, which has the potential to generate a significant amount of additional revenue.

(10)

1.1 Bol.com organizational structure

Since Bol.com has products in many different markets the company is structured to accommodate this. Within the company the department that manages Bol.com’s products and promotions is divided into 19 different categories, each representing a specific market. These categories operate largely independently of each other and thus any advice given on the strategies behind promotions has to be given to every category separately. Because the categories represent different markets this would be wise regardless, since the markets are sufficiently different that optimal strategies would likely be different per category. In order for this research to be valuable to Bol.com, some additional time will have to be invested into making the practical implications clear to supporting teams and defining our research in such a way that the results have potentially actionable implications for categories.

Promotions created by Bol.com can be divided into two categories: promotions inside campaigns and promotions outside of campaigns. Campaigns are (often store-wide) cohesive promotion campaigns centered around a common theme, for instance Christmas or Black Friday. During campaigns promotions are supported by increased online and offline advertising. Promotions out-side campaigns are created by categories independently and do not necessarily have any support from increased advertising. The content of promotions, whether inside or outside campaigns, can vary depending on the goals of the categories creating them. Promotion content can be anything from an awareness promotion (no discount of any kind but just putting products in a spotlight) to multi-unit discounts, to free products. The categories currently base their decisions on what kind of promotions to run on market knowledge and historical data in the form of dashboards.

The promotions that are created within Bol.com are are classified by a number of internally defined variables. There are a number of promotion types and subtypes which then be further differentiated by individual promotion characteristics. It is possible to look at data concerning sales and promotions for all 19 categories over time, but Bol.com’s product classifications offer the option of choosing another level of aggregation. Bol.com has defined four product units which can be further split into 19 product categories which can then be further divided 63 product groups which can be separated into 665 product subgroups or even individual products. Since Bol.com has a significant amount of promotions and orders even when looking at the product subgroup level, each aggregation level would allow for the use of rich data of the performance of promotions over time. Because we are following units over time the available data could be defined as panel data and brings with it a host of modeling options, depending on available data and underlying assumptions.

(11)

1.2 Problem definition

The main question posed by Bol.com is how their data collected on products, promotion tech-niques and sales can be utilized to generate insights which would allow the company to create more effective promotions. There are of course many ways in which the effectiveness of promo-tions can be measured. For the purpose of this thesis we define the effectiveness of a promotion by the amount of additional sales it generates. By looking at data collected by Bol.com this thesis also ensures that any insights are tailor-made to the company. This means that no as-sumptions have to be made about the practical relevancy of our research setting and has the added advantage that results can be easily tested and verified.

In an attempt to answer our main question the effects of various aspects of promotions will be investigated using regression methods. This offers some immediate questions when it comes to modeling. Since Bol.com offers a very wide assortment of products, one of the first questions to be tackled is what level of aggregation will be used. Aggregating at too high a level runs the risk of losing sight of important dynamics within aggregated groups, while aggregating at too low a level might make computations intractable and could make any derived insights to impractical to use, as there are too many products to micro-manage. For the purpose of this thesis we determine both a time and unit aggregation. For unit aggregation we settle on product subgroups, which is one of the lowest levels of product classification Bol.com uses. For time aggregation we settle on days, as Bol.com has many promotions which run for less time than a week, making it likely that higher aggregations lose sight of influential observations.

Part of our main question is if there is a significant difference in the effectiveness of promotions between the many markets Bol.com operates in. It is probable that the categories Bol.com has defined serve different segments of consumers which react in a different manner to promotion characteristics. If this is indeed the case it would be beneficial to create a model that allows for heterogeneity across different groups of products as it might influence optimal promotion prac-tices. However, in this matter again a trade-off will have to be made. Allowing for heterogeneity decreases the amount of pooling which can be done which in turn decreases the amount of data available for estimations of effects.

Keeping this in mind we come to the following research question:

How can we most effectively determine the response to promotion techniques for the products Bol.com offers in a way that appropriately accommodates the heterogeneity caused by grouping products?

The academic relevance of this thesis consists of two parts. First, we investigate the effectiveness of a diverse set of promotion techniques for a very large number of product groups using real data in a setting that is, as of yet, not studied. Results of this thesis can be used for compar-ison to other studies that investigate the effect of promotion techniques in other settings with similar product groups. Second, we compare a number of different modeling approaches in an attempt to find a balance between accommodating for heterogeneity and utilizing all data. The (relative) modeling performances we find can be of interest to researchers studying datasets with heterogeneity across units in the future.

(12)

(13)

Literature Review

In this section relevant research related to the problem posed in this thesis will be discussed, starting with a short overview on previous work done in the field of estimating the effect of sales promotions. Of most interest here is distinguishing which factors are likely to be relevant in our models.

We will then proceed to highlight some research that is closely related to our topic. We take a closer look at two studies regarding the effectiveness of promotions that utilize scanner data at product category level and investigate if some of their methods and findings can be applied to our research.

To continue we use insights derived from the previously discussed relevant research to discuss some of the modeling approaches that can be taken that most closely align with our interests. Here special attention is paid to utilizing the large amount of data available for the purpose of this thesis with optimal efficiency, while at the same time being mindful of heterogeneity issues.

Finally, takeaways from the literature review will be summarize and a further approach to our research will be formulated.

2.1 Theory

2.1.1 Effect of sales promotions

Numerous studies have been done on the effect of sales promotions. In Sales Promotion: The long and the short of it, Blattberg and Neslin (1989) gave an overview on marketing research to that date. They find, unsurprisingly, that in general marketing promotions have a significant immediate impact on sales. However, it is important to note that for a retailer not all sales generated during a sales promotion are strictly ‘additional’ sales. To start, it could be that sales during a promotion are due to stockpiling, which decreases sales later on, this is often referred to as a ‘post-promotion dip’. Additionally a portion of the increased sales could be due to the fact that consumers wait for the sales promotion to start before buying products, decreasing sales in the period leading up to a promotion, commonly called a ‘pre-promotion dip’. Finally, another

(14)

portion of additional sales could be due to cannibalization, meaning that the extra sales of a brand causes decreased sales for another brand in the same product category. From a managerial standpoint ‘net’ additional sales are usually of main interest. While cross-brand and cross-period sales may be appealing due to a difference in margins or because it may be beneficial to shift revenue to another period, revenue that would have otherwise not been obtained is normally a more important performance indicator for sales promotions.

Van Heerde et al. (2004) decompose sales promotions bumps into three parts, accounting for demand borrowed from other brands, demand borrowed from other periods (pre-promotion and post-promotion dips) and actual additional demand. The dataset used in this study is scanner-level data across multiple stores of a grocery chain and for this research four different product categories are investigated. These are tuna, tissue, shampoo and peanut butter. Van Heerde et al. find that in this setting on average each previously defined part accounts for about 33% of the total bump in sales due to sales promotions. The decompositions per product group differ significantly, highlighting the need for taking into account heterogeneity across product groups. All products in the study by Van Heerde et al. (2004) are consumables that take a long time to expire and are often bought at regular intervals, which makes them ideal for stockpiling. It is reasonably to assume that in a setting where this is not strictly the case, like in the setting of an e-commerce company offering a wide assortment of products, dynamic effects are smaller. Hence we can formulate our first hypothesis:

Hypothesis 1: Demand borrowed from other periods due to promotions on average will be significantly smaller than 33%.

Foekens et al.(1998) also show that dynamic effects of promotions should not be underestimated when it comes to impact on sales. To this end Fok et al. (2006) look at promotional price changes and propose a vector error correction model to disentangle long-term effects from short-term effects on brand sales in 25 different categories, finding that effects differ significantly for different categories. Category characteristics like price dispersion, price differentiation and promotion frequency are found to heavily influence results. The researchers note that frequent use of discounts can cause the reference price of products to decrease, thus reducing the effectiveness of future discounts. This would be another argument against a high frequency of promotions, whichPapatla and Krishnamurthi(1996) note can also decrease brand loyalty and increase price sensitivity.

Aside from sales promotions, marketing efforts for companies as a whole could also influence individual product sales. This is also an interesting assumption to check, hence we formulate our second hypothesis:

Hypothesis 2: Promotions within campaigns will be significantly more effective than promo-tions outside campaigns.

(15)

could thus function as control variables for a model trying to identify the effect of individual promotions by providing important control variables.

2.1.2 Sales promotion techniques

A first step to classify different promotion techniques is to differentiate between price promotions and non-price promotions Cooke (1985). Cooke defines price promotions as ”promotions such as coupons, cents off, refunds and rebates that temporarily reduce the cost of the goods or service” and non-price promotions as ”promotions such as giveaways or contests in which value is temporarily added to the product at full price”. Chandon et al.(2000) investigate why both price and non-price sales promotions work and find that sales promotions in general offer consumers three levels of utilitarian and three levels of hedonic benefits. The research concludes that sales promotions can be improved by matching the promotion type to the consumer segment that buys the promoted products. As an example, products that are hedonic in nature will attract consumers that are likely to also more highly value hedonic benefits offered by promotions. Since in this thesis we a researching an e-commerce company offering a large variety of products we can formulate an additional two hypotheses:

Hypothesis 3: There will be a cluster of product groups consisting of hedonic products where price promotions are relatively less effective than in a cluster of product groups consisting of utilitarian products.

Hypothesis 4: There will be a cluster of product groups consisting of utilitarian products where non-price promotions are relatively less effective than in a cluster of product groups consisting of hedonic products.

Blattberg and Neslin (1990) further offer definitions where promotions need not add value to a product. These classifications can be used to differentiate between promotion techniques at a more specific level than price or non-price, see Table2.1.

Table 2.1: Sales promotion classification according toBlattberg and Neslin(1990)

Retailer promotions Trade promotions Consumer promotions

Price cuts Case allowances Couponing

Displays Advertising allowances Sampling

Feature advertising Display allowances Price packs/value packs

Free goods Trade coupons Refunds

Retailer coupons Financing incentives Special events

Contests/premiums Contests Sweepstakes/contests

(16)

example framing a price cut as ”10 euros off” or ”10% discount”), which Yildirim and Aydin (2012) shows can significantly influence sales.

Research has shown that the optimal promotion technique to use depends on the market segments a retailer wishes to target. This is caused by consumers differing in price sensitivity and how much they value the unique elements of different promotion types. While, for example, some consumers might highly value the sense of achievement gained by using coupons in a smart manner, others might be more attracted to the excitement of participating in a lottery. Blattberg and Neslin (1990) argue that in markets containing a majority of smart, price-aware shoppers coupons might be more effective, while in markets where impulse buyers occur simple display promotions often are the key to increased sales. Campbell and Diamond(1990) have a more general finding, concluding that the type of promotion should depend on ”whether a price-conscious or a premium product market segment is being sought”.

The effect of the different types of sales promotions (partially) classified byBlattberg and Neslin (1990) and Cooke(1985) have been studied on numerous occasions, mainly through the use of surveys.

Kalwani and Yim(1992) look at the effects of discount promotions in an experimental setting with 200 undergraduate students serving as respondents. They find that, as expected, both discount depth and promotion frequency have a significant effect on consumer brand choice. Furthermore, they find a region of relative price insensitivity around the expected price. That is, price changes only have a significant effect on consumer brand choice when outside of a range of values close to the expected prices. On the topic of timing of promotions the researchers find that when promotions are unexpected they can have a stronger positive effect on brand choice than when expected. In the same vein, when a promotion is expected and not given it can have a severe negative effect. This suggests promotions should be carefully scheduled to optimize benefits.

Laroche et al.(2003) construct a multidimensional cognitive-affective-conative model where they compare coupons and two-for-one promotions in order to investigate how and why consumers use promotions. The setting they use is that of a survey with 250 participating respondents from a North American city. They find that consumer traits have a significant impact on the effec-tiveness of promotions and advise companies to take this into account by segmenting consumer markets when initiating promotions. Furthermore, they find evidence for consumers enjoying using coupons or making use of two-for-one promotions, suggesting that promotions may indeed add value to a product beyond the discount they offer.

(17)

2.1.3 Similar studies

There are two studies that, similarly to this thesis, research the effect of promotions on different product categories using sales data from real companies.

Ailawadi et al.(2006) investigate the effect of promotions on a large amount of different product categories simultaneously for CVS, at the time a leading drugstore chain in the United States. Their research spans items across approximately 3800 stores and 189 product categories. In analogue to Abraham and Lodish(1993), gross uplift per promotion per product for separate stores is simply defined as sales minus a baseline, where the baseline is a moving average of neighboring non promotional weeks. Furthermore, a model capable of explaining correlations at product category per store per week level is formulated. To proceed, 50% of the stores in their dataset are excluded in a random manner to obtain 12.9 million observations and make data tractable. These observations are then used to decompose uplift into into switching, stockpiling, and incremental lift as well estimating cross category effects of promotion specific to CVS. In this research much attention is paid to heterogeneity across stores, while the effects of promotion techniques are only minimally inspected. Since an e-commerce company has no stores selling the same products, and promotion techniques are of main interest in this thesis, the proposed model byAilawadi et al.(2006) may not be a good fit. Furthermore, where sales uplift in this study is easily defined as sales minus a moving average, sales uplift and the decomposition of uplift into different promotional actions is the main area of interest within this thesis. Very little attention is paid to this in the way of modeling by Ailawadi et al., which in the discussion is also brought forward as a weak part of the conducted research.

Drechsler et al.(2017) research and compare the effect of two promotion techniques on sales of four product categories. The researchers propose to select category-level sales as the dependent variable to account for cannibalization effects. They proceed to control for, among other things, discount level, duration of the promotion and timing of the promotion as these are proven to be significant predictors of promotion effectiveness in previous research. The specified model ac-counts for heterogeneity across stores through random effects and has a structure similar to the well-known SCAN*PRO-model developed byWittink et al. (1988), which is a standard within this field of research while also having had success in numerous commercial applications. The SCAN*PRO model allows for coefficients to be interpreted as elasticities and multipliers, which is consistent with previous research. To continue, the model is applied to each product category separately in order to estimate the effects of the two different promotion techniques. It is found to have good fit with a majority of statistically significant coefficients. They find that the coeffi-cients of independent variables vary heavily across the different product categories, which further confirms the importance of factoring in heterogeneity across product groups. Furthermore, it is found that one of the two promotion techniques performs better across all investigated product categories, which would indicate that research into this area could certainly be valuable when applied to a company.

2.2 Modeling options

(18)

different reactions to promotions. Because of this we are interested in finding a balance between pooling data for better inference, while at the same time being mindful of pooling bias. We therefore look to methods that allow for heterogeneity in their model specifications. Three methods that are capable of this are investigated in this section. First, we could allow for a limited form of heterogeneity in the form of clusterwise regression. Second, it is possible to achieve more efficient separate equation-by-equation estimation with Seemingly Unrelated Regressions, thus allowing all coefficients to differ completely across product subgroups. Third and final, it may be optimal to take advantage of hierarchical modeling to allow for heterogeneity across multiple levels.

2.2.1 Panel data models

Even though it is possible to estimate a separate equation for each product subcategory or pool all observations, ideally one would in some way employ the panel-data-like structure of sales per product subcategory to obtain more accurate estimations. It is possible to think about fixed-and rfixed-andom effects models which are stfixed-andard in this field, a good overview of which is given in Hayashi(2000).

A short method for estimating fixed- and random effects models can be defined as follows. Define yit as the value of the dependent variable of unit i at time t, Xit as the vector of values for our

independent variables for i at time t, ci as the unobserved individual effect with E(ci|Xi) 6= 0

and it ∼ N (0, σ2). We can compute the fixed effects estimator obtained by demeaning the

dataset and computing ordinary least squares for the model (Mundlak, 1978):

yit− yi= (Xit− Xi)β + it− i where xi = _T1 P T t=1xit, yi = T1 PT t=1yit and i = _T1 P T

t=1it. Our random effects estimator

is given by additionally assuming that E(ci|Xi) = 0 and ci ∼ N (0, σc2) and computing the

model

yit= Xitβ + ci+ it.

A limitation of the fixed effects and random effects models is that the assumption of all re-gressors being strictly (strongly) exogenous needs to hold for unbiased results to be computed. Especially in the case of total sales for product subgroups of Bol.com it can be reasonably ex-pected that lagged dependent variables values are relevant as a predictor of sales which violates the strict exogeneity assumption. In other words, in the case of lagged dependent variables E(uit|xi1, . . . , xiT) 6= 0, for i = 1, . . . , N , where T is the maximal number of time periods, N is

the number of cross-sectional units, uit is the error term of cross-sectional unit i at time t and

xijis the set of regressors of unit i at time j, j = 1, . . . , T . In that case the model can be defined

(19)

2.2.2 Clusterwise regression

Clusterwise regression (or latent class regression) in the field of marketing is usually used to segment consumers into groups with different reactions to marketing mechanisms. For the pur-pose of this thesis it could also be valuable as a tool to segment product assortment into groups that react similarly to promotions. An obvious idea could be to segment product subgroups where effects are similar and within these segments use panel data models to do estimations, thereby making use of the added efficiency of panel data while simultaneously allowing for some heterogeneity across product subgroups. This approach is supported by research, for example Andrews et al. (2002) show that in practice clustering reduces overfitting that occurs on indi-vidual equation levels. The question that remains is how to assign products to clusters. One could of course use pre-defined clusters, for instance higher level product categories, price classes or clusters based on advice of experts in the field. There are also methods to cluster based on regressors values, where the most popular methods are hierarchical clustering and k-means clus-tering Gan et al.(2007). However, ideally one would like the clustering method to be optimal from a modeling standpoint.

In practice an often used approach that builds on this idea is to mix clustering and modeling by using a latent class model like the one suggested byWedel et al.(1993). Latent class regression builds on the idea that a dataset is distributed by a mixture of Q segments. In the case of latent class regression these segments can all be described by regression models, fq, which occur with

probability πq. So the distribution of y can be given by

f (y | x, ψ) =

Q

X

k=1

πqfq(y | x, ϑq),

where ψ = (π, ϑ) is a vector of all parameters and fq are mixture components parametrized by

ϑq, (Wedel and DeSarbo(1995) andKamakura et al.(1994)).

(20)

Weber (2015). This paper introduces a method where heterogeneity across clusters is allowed but restricts the cross-sectional units to be part of the same cluster for every time period. An approach is formulated where a fixed effects regression is combined with clustering so an optimal amount of clusters is determined simultaneously with maximum fit of fixed effects models within clusters. A second option is one developed bySu et al. (2016), who introduce a penalized least squares approach for models without endogenous regressors.

One potential issue with the use of clusterwise regression (in any form) for the purpose of determining clusters for a large amount of products is that it is highly likely that a large number of clusters is required. For studies studying the same phenomenon across product groups the effect of promotions oftentimes differ significantly Wierenga (2008). In contrast, clusterwise modeling very often results in a small number of segments, almost always less than five e.g. Nylund et al.(2007),Deb and Trivedi(2002),Pickles et al.(1995) andTein et al.(2013). Because of this it remains to be seen if clustering is suitable to avoid pooling bias entirely by identifying the (likely) many clusters in large groups of products given limits of computation.

2.2.3 Seemingly Unrelated Regressions

In an attempt to use a large amount of panel data efficiently a third approach can be formu-lated that allows for complete heterogeneity across all product subgroups but estimates more efficiently than separate equations. Seemingly Unrelated Regression (SUR) was introduced by Zellner(1962) and is a framework that utilizes correlated error terms of separate equations for consistently more efficient estimation than an equation-by-equation by approach under nearly all circumstances (Yahya et al.,2008).

The asymptotically efficient SUR estimator as first described byZellner(1962) can be described as follows. Consider N equations

yi= Xiβi+ ui,

where yiand ui are n × 1 vectors, Xiis an n × ki matrix, i = 1. . . . , N . Let U = (u1, . . . , uN)

and Σ = E(U_t0Ut) where Ut is the tth row of U .

Let y0

i = (y1,i, y2,i, . . . , yn,i) and u0i = (u1,i, u2,i, . . . , un,i) for i = 1, 2, . . . , N . Put equations

1, . . . , N in a single-equation set up y•= X•β•+ u• where y•= (y10, y02, . . . , yN0 )0, u•= (u01, u20, . . . , u0N)0, β•= (β10, β02, . . . , βN0 )0 and X•=      X1 ∅ . ._. ∅ XN      .

The asymptotically efficient SUR estimator for the βs is given by (Davidson and MacKinnon, 2004)

β_•SUR= {X_•0 Σ−1⊗ In X•}−1X•0 Σ −1

⊗ In y•.

(21)

soccer matches and unexpected heatwaves, cause sales of an e-commerce company to drop dras-tically across all product groups, while the opposite holds for rainy Sundays and days when a company receives media attention. Although of course any model tries to control for these sorts of occurrences as much as possible, it is inevitable that some of the changes to sales will be unexplained by the included regressors and thus end up in error terms. As a result, separate error terms are very likely to exhibit similar patterns of dips and rises, and efficiency gains due to the use of this method have the potential to be substantial.

A problem that has to be overcome in order to utilize SUR is that a large amount of cross-sectional units and model parameters can cause the error covariance matrix for all separate equations to become singular. In this case the traditional SUR estimator can not be computed (Davidson et al. (1993)). This can be corrected for in two ways. Firstly, Takada et al.(1995) developed a method to reliably estimate a subset of SUR models with a singular covariance matrix. The second method would be to break up the dataset into chunks such that nonsingular covariance matrices are possible. A downside is that this would decrease estimation efficiency, since significantly less error terms can be used to compute correlations in each cluster. However, this loss in efficiency could be partially offset by clustering units in such a way that their error terms are closely related. After clustering units the SUR framework could be employed to utilize the correlation in error terms within those clusters for more efficient estimations.

Using SUR to estimate demand with correlated error terms is not uncommon. Carlson(1978) employ SUR to utilize the correlated error terms of demand equations for five different segments of cars, finding that in this particular instance it was 20% more efficient than regressing equations separately. Khalik Salman et al.(2010) use SUR to estimate demand for tourism from different countries using a set of control variable and lagged values of demand, finding that the error terms are significantly correlated while coefficients per equation vary strongly.

2.2.4 Hierarchical models

A fourth approach would be to disregard clustering in favor of a hierarchical approach that allows for random coefficients like Hierarchical Bayesian modeling (Allenby et al.,2005) or Generalized linear mixed models (GLMM) (Pinheiro and Bates(2000)). Using these methods it is possible to have multilevel models where different coefficients are allowed to vary over multiple levels. In the context of an e-commerce corporation it is for example possible that all product subgroups react differently to promotional mechanisms, reactions to aggregate marketing efforts vary only on the product group level, while reactions to the weather could be assumed to be relatively similar for all products and thus can be pooled over the entire dataset for more efficient estimation. Hierarchical models allow for this sort of multilevel heterogeneity, thus making modeling much more flexible in what it can achieve.

According toLang et al.(2015) Hierarchical Bayesian modeling specifically has been used many times to model heterogeneity of marketing effects across stores, by, for exampleAndrews et al. (2008);Boatwright et al.(1999);Montgomery and Rossi(1999). A standard HB formulation can be defined in analogue toAndrews et al.(2008),Andrews et al.(2002) andLenk et al. (1996). Let our model be defined by equation qkt = Xktβk+ kt, where t = 1, . . . , T , k = 1, . . . , N and

qkt is our dependent variable of unit k at time t. all independent variables are included in the

(22)

Bayesian specification for a random coefficients model can be formulated as follows: kt∼ N (0, σ2k) βk ∼ N (β, Λ) β ∼ N (b0, D0) Λ−1∼ W (v0, S0) σ_k−2∼ G(a, b)

where r is the number of independent variables and all other parameters are chosen such as to obtain proper uninformative priors. Here W (v0, S0) denotes a Wishart prior, G(a, b) a gamma

distribution and N (b0, D0) a multivariate normal distribution.

In the study in ‘Estimating the SCAN*PRO model of store sales: HB, FM or just OLS?’, Andrews et al. (2008) investigate accounting for heterogeneity across stores using Hierarchical Bayesian Modeling and Clustering for the SCAN*PRO model. Scanner data for Dutch Shampoo at a weekly level from 28 stores across 109 weeks supplemented by simulation to compare mod-eling performances. The researchers compare two different methods allowing for heterogeneity across stores, namely clusterwise regression and random coefficients in the form of a Hierar-chical Bayesian model, to a baseline model using aggregated OLS which assumes homogeneity. Since heterogeneity across stores is often named as a significant factor (for exampleVan Heerde et al. (2000)), the heterogeneous models are expected to severely outperform the aggregated OLS model. This hypothesis is tested by comparing several goodness of fit statistics. Most importantly root squared mean error, R2 _{and Log-Likelihood statistics. The researchers find,}

surprisingly, that accounting for heterogeneity across stores offers no benefit and conclude that there is little incentive for developing heterogeneous versions of SCAN*PRO for commercial ap-plications across stores. They do note that they welcome further research to disprove this theory, as they find it counterintuitive. Even though heterogeneous approaches might be of less interest when using it to account for heterogeneity across stores, heterogeneity across product categories is highly likely to be much more pronounced and a variant of the heterogeneous SCAN*PRO model suggested in this research has the potential to perform well.

2.3 Conclusion

To summarize, we investigated the effect of promotion efforts on sales to orientate ourselves on the topic of which marketing factors might influence sales, inspected two studies closely related to our thesis subject and discussed modeling options to handle the wide assortment of products in our dataset.

(23)

(24)

(25)

Empirical methods

In this section all empirical methods will be described. Starting with a description of the available data sources, the approach for collecting data will be outlined and the resultant dataset will be explained and explored. To continue, relevant models are specified and their estimation methods outlined. The results of these estimations will be inspected and discussed in the results section.

3.1 Data architecture

Bol.com stores all of its data using the Hadoop Distributed File System (Borthakur,2007), often also called Hadoop. Hadoop is popular due to its high fault tolerance and ability to run on commodity hardware and is run on a cluster of computers. Inside the cluster we have multiple datanodes and one namenode. The datanodes are responsible for distributed data storage and processing while the namenode manages the system’s namespace as well as user requests. To safeguard against datanode-failure the data is replicated over a selection of datanodes, three on average. Because of its flexibility and high fault tolerance Hadoop is used by many large e-commerce companies like for instance Bol.com.

In addition to Hadoop, Bol.com also uses Hadoop MapReduce (Dean and Ghemawat, 2008), a programming model often used in combination with Hadoop to process data. As its name implies, programs running on MapReduce utilize two phases. The Map phase is responsible for mapping all data on the cluster and making sure all data needed for the reduce phase is collected on a single datanode. After this MapReduce initializes the reduce phase to pick up the requested data and give final output.

Mapreduce gives programmers the option to do a wide range of data operations utilizing multiple datanodes using only the Map and Reduce function. Standard data processing functionalities like sorting, merging, transforming and clustering data are already available in MapReduce.

Although in theory only MapReduce could be used to program data operations in the Java programming language, extracting data in this way is too time-consuming due to the low level nature of Java. For this reason Bol.com uses two other programming languages designed for easier data operations on a Hadoop network, Apache HIVE (Thusoo et al.,2009) and PIG-latin

(26)

(Olston et al.,2008). For the purpose of this thesis for the majority of operations PIG-latin will be used since, due to it being optimized for sequential computations, it is most suited for the large scale data operations needed to aggregate data for all product subgroups.

3.2 Raw data description

Within Bol.com the IT department extracts and stores data on Bol.com’s Hadoop cluster. Data that is available and pertinent to the subject to of this thesis can be split into 7 segments. The data sources and a short description are:

• Offers. This source contains data about offers per day in the Netherlands. For (nearly) each product within Bol.com multiple sellers or suppliers exist. For a given product these sellers can state the conditions for which they are willing to sell a product. In theory a buyer on Bol.com can choose to accept any of the available offers, but in practice more than 99% of buyers choose to accept the Best Offer, which is the offer displayed first. The best offer is determined by an algorithm developed by Bol.com’s data science department and takes into account a host of factors like delivery time, price and return rates. This dataset contains all offers available on each day for the past few years of Bol.com and accompanying relevant variables. A few examples are product and seller ID corresponding to an offer, price and price stars (a measure for relative price position in the market). Current offer data is available for all of Bol.com’s products, historical data is only available for offers that were active in the Netherlands.

• Order data. This is a dataset with information regarding all of Bol.com’s orders. Very basic, has only info about quantity ordered for each product ID per day and the country from which it was ordered.

• Promotions. This source contains data about promotions per product. Information about promotions of Bol.com exist in a dataset separate from offers. Since September 2016 each day a snapshot has been taken of the active promotions at that day. This dataset is a collection all of those snapshots. The promotion data available is for instance promotion technique, promotion ID, promotion start date and end date and whether or not the promotion was active in the Netherlands and/or Belgium.

• Campaigns. This is a small dataset containing information about campaigns per product category. There exists no dataset that keeps track of the campaigns ran by Bol.com. For the purpose of this thesis one was generated manually. Using campaign planning documents created by Bol.com’s campaign team a dummy variable for each product category was created indicating whether or not they participated in a large scale campaign on a given day.

• Finance. This source contains data about everything relevant to the finance department of Bol.com. For the purpose of this thesis it is only used to obtain product units, product categories, product subgroups and product subsubgroups.

(27)

and the number of impressions and clicks these campaigns generated each day. We use this to create an adstock variable for the number of impressions generated by SEA.

• KNMI. This dataset contains historical weather data for the past 100 years from the Koninklijk Nederlands Metereologisch Instituut. Data from a central location (weathersta-tion ”De Bilt”) in the Netherlands is used as a proxy for weather condi(weathersta-tions for Bol.com’s customers. Rain duration, sun hours, wind speed and mean temperature are extracted for the relevant period. Control variables for weather conditions are common in models predicting sales, they are for instance used by Drechsler et al.(2017).

3.3 Data collection

The range of dates available in our final dataset is determined by the data source spanning the least amount of time. This is the promotion dataset which has data available from September 2016 up to October 2017. Furthermore, historical offers are only available for the Netherlands. Therefore we only load in observations from this period and country from all data sources. To start with, we load in all offer data and filter on best offer. In this way we account for more than 99% of Bol.com’s sales while removing potential outliers where consumers have specific sellers they would like to buy from for a premium. Next, we load in the order data and join these two datasets on product ID and date, thus obtaining a dataset containing offer data and quantity ordered for all of Bol.com’s products for a period of 13 months. To continue, the finance data is loaded in and merged on product ID to obtain a dataset containing best offers, category information and daily quantity ordered for approximately 15 million products for a period of 13 months.

While the offer and order data are well-maintained, possibly due to the fact that they are used in daily operations, the promotion data is not. The promotion data is generated by a system named RPG. This system is used by Bol.com’s product and promotion specialists for the creation of promotions. All promotions created in RPG have to be created and labeled according to a set of guidelines, but these guidelines are sometimes not adhered to. RPG is on occasion used to make some changes to product pages, for instance by adding some visual effects usually meant for promotions or changing prices quickly. These changes are also labeled as promotions in our data even though they are not and thus have to be filtered out. This is done in multiple ways. We start by removing promotions with the action type ”NULL” or ”other”. We then proceed by removing promotions with unrealistically long promotion periods, which is defined by Bol.com’s promotion specialists as promotions with a runtime of more than 45 days. Finally, we remove promotions which have no banners or labels and thus can not be identified by consumers as promotions.

(28)

400 virtual cores of Bol.com’s Hadoop cluster for approximately 50 minutes to do the necessary computations.

After obtaining our data on a product per day level we aggregate it to a subcategory per day level. To start, all variables are formatted and transformed such that they are suitable for aggregation. All numerical variables are destringed and indicator variables (taking on a value of 0 or 1) are generated for categorical variables of interest like promotion types. To continue, the dataset is grouped by Bol.com’s product subcategory level and date. After grouping, variables are summed, averaged and counted to obtain the final dataset. To keep information about higher grouping levels available, the first observation about product group, category and unit are extracted from each grouping of product subgroup and date. All computations are again done using PIG-latin on Bol.com’s Hadoop cluster, the code can be found in section7.7. This script ran for about two hours using an average of approximately 350 virtual cores.

The remaining datasources are formatted so that they are compatible with the merged dataset. For Search Engine Advertsing advertising adstock variables are generated. Advertising adstock (Broadbent, S. (1979)) is a simple concept that states that media exposure has not only an immediate but also a ‘memory’ effect. That is, part of the effect of advertising through media channels carries over to a later time. The formula for advertsing adstock A at time t is At =

Xt+ γAt−1, where Xtis the value of the variable that measures advertising and γ is the adstock

rate. We determine γ by varying γ in order to minimize PT

t=1(ut)2 in salest = c + βAt+ ut.

Here salestis total sales of Bol.com at time t, At= Xt+ γAt−1, ut is the error term and c is a

constant, t = 1, . . . , T .

Adstock, weather and campaign data are then joined on date to the aggregated dataset con-structed on Bol.com’s Hadoop cluster. This is achieved with R (Rfoundation, 2000), which is the software used for further statistical analysis in this thesis. The R code can be found in section7.8.1.

3.4 Data editing

After collecting, aggregating and merging we have a large dataset that needs to be slightly adjusted before it is suitable for modeling.

To start with, we remove all product subgroups that represent products for which the classifi-cation is unknown. This is done by removing all product subgroups that are labeled as some form of ‘unknown’, ‘onbekend’ or ‘other’ and removing subgroups belonging to unknown units, categories and groups. To continue product subgroups which do not have a year’s worth of observations are also removed to make sure sufficient observations are available for all estimated product subgroups. After this a small amount out outlying product subgroups are removed. This includes product subgroups which have had no recorded promotions for the past year, have had less than one sale per day on average and product subgroups containing less than 100 products on average. This results in a final dataset of 557 productgroups.

(29)

value for pricestars. In our dataset we have approximately 400 cases of this happening. Since this is a relatively small sample of missing values (about 0.15%), this is resolved using a quick fix. We replace all missing values for average pricestar with the mean of the productsubgroup excluding missing values. For price-offs we have 638 cases where on average a negative discount of more than 50% is given, while the maximum recorded discount is 80%. While it is of course possible that on occasion a negative discount is given, these high values are likely to be a result of errors in either the discount calculation method or a product which was erroneously labeled as a discount. For this reason these values are set to zero and their promotion indicator variables are also changed to zero.

Finally, we also create some additional dummy variables. These are dummy variables for quar-ters, dummy variables for months, a dummy variable denoting whether a given day took place a week before a campaign, three dummy variables denoting whether a given day took place one, two or three weeks after a campaign, a dummy variable denoting whether a given day took place a week before a promotion and three dummy variables denoting whether a given day took place one, two or three weeks after a promotion.

3.5 Data description

After aggregating and merging all data the dataset contains the following variables: Salesit the number of products sold in product subcategory i at day t .

T otalproductsitis the amount of products in subcategory i on day t.

Avgpriceit is the average price level of product subcategory i at day t.

P roductcategoryi denotes the product category for product subcategory i.

P roductsubcati denotes the product subcategory for product subcategory i.

P romoindit is an indicator variable for whether or not a promotion of any kind took place in

subcategory i on day t.

P romolagLit is an indicator for whether or not a day t falls in a period 7 × l days following a

major campaign that product subcategory i participated in, L = 1, 2, 3.

P priceof fit is an indicator variable for whether or not a price-off promotion took place in

P awarenessitis an indicator variable for whether or not an awareness promotion took place in

P xf orxit is an indicator variable for whether or not an ‘X for fixed’ promotion took place in

P dayweekit is an indicator variable for whether or not a day/weekdeal promotion took place

in subcategory i on day t.

P cheapf reeit is an indicator variable for whether or not a ‘cheapest product free’ promotion

(30)

P f reeproductitis an indicator variable for whether or not a ‘free product’ promotion took place

P cashbackit is an indicator variable for whether or not a cashback promotion took place in

P itemdiscountit is an indicator variable for whether or not an item discount promotion took

place in subcategory i on day t.

P otheritis an indicator variable for whether or not a promotion that does not fit in any of the

other promotion types took place in subcategory i on day t.

aP priceof fitdenotes how many products participated in a price-off promotion in subcategory

i on day t.

aP awarenessitdenotes how many products participated in an awareness promotion in

subcat-egory i on day t.

aP xf orxitdenotes how many products participated in an ‘X for fixed’ promotion in subcategory

i on day t.

aP dayweekitdenotes how many products participated in a day/weekdeal promotion in

subcat-egory i on day t.

aP cheapf reeitdenotes how many products participated in a ‘cheapest product free’ promotion

aP f reeproductit denotes how many products participated in a ‘free product’ promotion in

aP cashbackitdenotes how many products participated in a cashback promotion in subcategory

i on day t.

aP cashbackitdenotes how many products participated in an item discount promotion in

sub-category i on day t.

aP otherit denotes how many products participated in a promotion that does not fit in any of

the other promotion types in subcategory i on day t.

P avgpromodayit is the average how long a promotion has been running in subcategory i on

day t, given that a promotion is running.

P productsit is the amount of products in any promotion in subcategory i on day t.

P productssqitis the squared amount of products in any promotion in subcategory i on day t.

P avgpricestarsit is the average pricestar of price-offs in product subcategory i at day t.

P avgdiscountitis the average discount (compared to the price three days before the promotion

started) for price discounted products in product subcategory i at day t.

Campaigninditis an indicator variable for whether or not product subcategory i is participating

in a major promotion campaign organised by Bol.com in product subcategory at day t. Campaignleaditis an indicator for whether or not a day t falls in a period of 7 days preceding

(31)

CampaignlagLitis an indicator for whether or not a day t falls in a period 7 × L days following

a major campaign that product subcategory i participated in, L = 1, 2, 3.

Holidayt is an indicator for whether or not day t was on a relevant holiday or in a period of

seven days before a relevant holiday.

AdstockSEAtis a variable denoting SEA adstock for Bol.com as a whole at time t.

W indspeedt denotes the average windspeed in 0.1m/s of the amount of rain on day t in

sub-category i.

Sunhourstdenotes the duration of sunlight in 0.1 hours on day t in subcategory i.

T emperaturetdenotes the average temperature in 0.1 degrees celsius on day t in subcategory

i.

Dayd,it is an indicator variable for day of the week for day t, d = 1, . . . , 7 in subcategory i.

M onthm,itis anindicator variable for month for day t, m = 1, . . . , 12 in subcategory i.

Quarterq,itis an indicator variable for quarter of the year for day t, q = 1, . . . , 4 in subcategory

i.

Every product category within Bol.com has its own team that manages products and promotions for that category. Product categories can be further divided into product groups, which can be further split into product subgroups.

Bol.com has its own definitions for promotion techniques that often do not align with standard definitions in literature. We therefore give a short explanation of every promotion type. A ‘price-off’ promotion is defined as any promotion where the price of a product is reduced on its product page by a certain percentage. For example ‘10% off’ is a price-off.

‘Awareness’ promotions are defined as any promotion where a product is given extra attention on Bol.com’s website but no discount or benefit of any sort is given. Banners with ‘read the new book by J.K. Rowling’, ‘pre-order the new FIFA-game now’ and ‘these coats are the new trend this winter’ are all examples of awareness promotions.

‘X for fixed’ promotions are any promotions of the form ‘x for y’. For example ‘2 for 10 euros’ and ‘5 for 50 euros’.

‘Day/weekdeal’ promotions are promotions where it is clearly communicated that they are avail-able for a limited amount of time (a day or week). These usually involve discounts on selected products and have a special slot on Bol.com’s homepage highlighting it as a day/weekdeal. ‘Cheapest product free’ promotions are what the name implies. When ordering two products from a selection of promoted products the customer does not have to pay for the cheapest product.

(32)

back on any TVs produced by Samsung. These kinds of promotions have the added advantage that Bol.com’s competitor’s can not automatically detect that a promotion is happening and thus will be slower on matching prices. A downside is that price-comparing websites will also not automatically display discounted prices and thus traffic from this source will be smaller than it could have been.

An ‘item discount’ promotion is a promotion where the final discount depends on the number of products ordered. Examples of this are ‘50% discount on the second book you order’ and ‘10 euros off on your second DVD’.

‘Other’ promotions are promotions that are used very infrequently and do not belong to any other promotion method. Examples of this are ‘expedited shipping’ promotions (for example ‘when ordering this game we will make sure you can play it tonight’) and ‘lottery’ promotions (for example ‘be eligible to win a trip to Las Vegas when buying a LG phone’).

Pricestars are variables created and maintained by Bol.com’s Offer and Sourcing Intelligence department to give an intuitive impression of how good a price is relative to the market. They are used extensively within Bol.com as a help with decision-making. For example all price discount promotions have to have a resultant price of at least four pricestars and products with one pricestar (insult prices) are not shown on the website. Because pricestars are used to determine which promotions to run it is interesting to test if they have a measurable effect on promotion effectiveness. A detailed explanation of Bol.com’s definition for pricestars can be found in appendix section7.1.

Relevant ‘holidays’ for Bol.com are Valentine’s day, Father’s day, Mother’s day, Black Friday, Cyber Monday, Sinterklaas and Christmas.

(33)

Table 3.1: Summary statistics

Statistic Mean St. Dev. Min Pctl(25) Median Pctl(75) Max

Sales 220.610 605.515 0 13 53 179 20,603 Totalproducts 22,510.270 80,806.100 44 991 4,066 14,331 2,757,792 Promoind 0.466 0.499 0 0 0 1 1 Promolag1 0.089 0.284 0 0 0 0 1 Promolag2 0.063 0.242 0 0 0 0 1 Promolag3 0.051 0.220 0 0 0 0 1 Pavgpromoday 6.660 9.437 0.000 0.000 0.000 12.000 44.000 aPromoawareness 10.536 118.077 0 0 0 0 5,076 aPromopriceoff 36.106 285.613 0 0 0 4 16,154 Pavgpricestars 1.276 1.907 0.000 0.000 0.000 3.772 5.000 Pavgdiscount 3.461 10.222 −49.800 0.000 0.000 0.000 80.762 aPromodayweek 0.506 18.211 0 0 0 0 3,215 aPromoxforfixed 16.824 256.596 0 0 0 0 18,676 aPromocheapfree 1.812 24.211 0 0 0 0 851 aPromocashback 0.063 1.224 0 0 0 0 83 aPromofreeproduct 1.472 74.984 0 0 0 0 6,627 aPromoitemdiscount 2.062 95.185 0 0 0 0 16,481 Avgprice 70.531 135.099 4.935 19.819 31.806 60.020 1,650.593 Campaignindicator 0.411 0.492 0 0 0 1 1 Campaignlead1 0.133 0.340 0 0 0 0 1 Campaignlag1 0.133 0.340 0 0 0 0 1 Campaignlag2 0.101 0.301 0 0 0 0 1 Campaignlag3 0.093 0.290 0 0 0 0 1 Holiday 0.099 0.299 0 0 0 0 1 SEA adstock 112.737 32.172 18.566 88.682 112.040 133.091 203.911 Sunhours 49.504 39.004 0 14 44 78 152 Windspeed 30.695 12.688 7 21 29 38 83 Temperature 110.267 62.480 −38 66 108 165 236

3.6 Model specifications

As a general model we have, in analogue to the often-used SCAN*PRO (Wittink et al.,1988), a multiplicative model. The coefficients in this model can therefore be interpreted as percentage changes. Variables that are included as exponents, such as indicator variables, can be inter-preted as multipliers. When a variable with observed values of zero is included without being exponentiated, linearizing our model will cause problems due to the observations with a value of zero being transformed to − inf. To avoid this relevant variables are transformed by adding 1 to all observed values before linearizing.

3.6.1 Variable selection

(34)

Squared Error (RMSE(Y)) are also taken into account. These statistics are computed manually since they are not computed by default in R for some of the models we estimate. Let n equal the number of observations in a given model and ˆi = yi− ˆyi be the estimated residual for

observation i, i = 1, . . . , n. The formula for AIC is given by

AIC = −2k − 2 ln( ˆL), (3.1)

where k is the number of parameters used in the model and ˆL is the maximum value of the log-likelihood of the estimated model. In the case of normally distributed residuals, as we assume is the case with our models, it can be shown that this is equivalent to

AIC = −2k − 2 n X i=1 ln ₁ ¯ σ √ 2πe −(i−¯)2/2¯σ2 , where ¯ = Pn i=1ˆi n and ¯σ= Pn i=1 (ˆi−¯)2 n−1 .

The formula for RMSE is given by

RM SE = v u u t n X i=1 ˆ 2 i (3.2)

For RMSE(Y) we use the estimated model and apply it to an out-of-sample dataset to compute new residuals and plug these residuals and corresponding number of out-of-sample observations in our RMSE formula to obtain RMSE(Y).

We have 13 months of data and estimate the linearized version of our models on the first 12 months of data. Following this we compute the AIC, RMSE and RMSE(Y) statics as previously defined. Based on the resultant statistics we decide what to variables to include and which variables to exponentiate in the non-linearized version of our general model. Since the logarithm of zero is minus infinity we linearize variables with zero-values by adding zero before taking their logarithm.

We test a number of different model specifications. To start with, we test different ways to specify what promotion types are being used on a given day. Our first model specification uses only dummy variables to indicate whether or not a promotion type took place. This model is significantly improved when adding a separate variable denoting the number of products in a promotion, and even more so when using the logarithm of this variable. Replacing the indicator variables for promotion types by the number of products using promotion types improves fit even more, which is further increased by using the logarithm of these values. To indicate quality of priceoffs we test including percentage discount or average pricestar of the products in a priceoff. AIC indicates to use average pricestars instead of percent discount.

(35)

3.6.2 Fixed effects

Let N equal the number of product subgroups and k equal the product subgroup number in our dataset, let ck,it equal one if i = k and zero otherwise, i = 1, . . . N and k = 1, . . . , N . For the

store-wide fixed effects model we have as a general model

Salesit= N Y k=2 γck,it k · T otalproductsitλ1· λP romo2 it· λ P romolag1it 4 · λ P romolag2it 5 · λ P romolag3it 6

· Avgpromodayitλ7· aP romoawarenessitλ8· aP romopriceof fitλ9· P avgpricestarsitλ10

· aP romodayweekdealitλ11· aP romoxf orf ixeditλ12· aP romocashbackitλ13

· aP romof reeproductitλ14· aP romo&discountitλ15· Avgpriceitλ16· λ

Campaignindit 17 · λCampaignleadit 18 · λ Campaignlag1it 19 · λ Campaignlag2it 20 · λ Campaignlag3it 21 · λ AdstockSEAit 22 · λ holidayit 23 · 7 Y d=2 ωDayd,it d · 12 Y m=2 θM onthm,it m · Sunhours λ24 t · it. (3.3)

Linearizing this we obtain (where we introduce all new coefficients for the sake of clarity):

ln(Salesit) = N

X

k=2

ck,itαk+ ln(T otalproductsit)β1+ P romoitβ2+ P romolag1itβ4

+ P romolag2itβ5+ P romolag3itβ6+ ln(Avgpromodayit)β7+ ln(aP romoawarenessit)β8

+ ln(aP romopriceof fit)β9+ ln(P avgpricestarsit)β10+ ln(aP romodayweekdealit)β11

+ ln(aP romoxf orf ixedit)β12+ ln(aP romocashbackit)β13+ ln(aP romof reeproductit)β14

+ ln(aP romo&discountit)β15+ ln(Avgpriceit)β16+ Campaigninditβ17

+ Campaignleaditβ18+ Campaignlag1itβ19+ Campaignlag2itβ20+ Campaignlag3itβ21

+ AdstockSEAitβ22+ + 7 X d=2 Dayd,itΩd+ 12 X m=2 M onthm,itΘm + Sunhourstβ24+ it. (3.4)

As discussed in section2.2.1this model can be computed by demeaning the dataset which is the method we use.

We also apply fixed effects models to two different clustered versions of our dataset. To start, we group subgroups into 19 categories according to Bol.com’s internal categorization system and then run fixed models equivalent to3.4on all clusters separately. We then use the method devel-oped bySarafidis and Weber(2015) to group the dataset into 19 clusters using an information-based criterion. Since this method is relatively new, the dataset is exported to STATA, here Sarafidis and Weber (2015) created a user-written package that allows for implementation of their method. The algorithm used in this package to determine the optimal cluster partition given a pre-defined number of clusters is outlined inSarafidis and Weber(2015) as:

(36)

2. Assign the ith cross-section to all remaining clusters and obtain the resulting RSS value that arises in each case. Finally, assign the ith individual into the cluster that achieves the smallest RSS value;

3. Repeat the same procedure for i = 1, . . . , N ;

4. Repeat steps 2 to 3 until RSS cannot be minimized any further.”

Here RSS denotes the Residual Sum of Squares. The model described in equation 3.4 is used as the model by which to cluster the data and the initial cluster partition is chosen randomly. After obtaining the computed clusters we again run fixed models equivalent to equation3.4on all obtained clusters separately.

3.6.3 SUR

Let N again be equal to the number of product subgroups and s = 1, . . . N . Our SUR model can be seen as computing equation3.5separately for each s using OLS while allowing s,tto be

correlated for all s. This can be achieved using the estimator described in section 2.2.3, using the R package ‘systemfit’ by Henningsen et al. (2007).

ln(Salest) = cs+ ln(T otalproductst)βs,1+ P romotβs,2+ P romolag1tβs,4

+ P romolag2tβs,5+ P romolag3tβs,6+ ln(Avgpromodayt)βs,7+ ln(aP romoawarenesst)βs,8

+ ln(aP romopriceof ft)βs,9+ ln(P avgpricestarst)βs,10+ ln(aP romodayweekdealt)βs,11

+ ln(aP romoxf orf ixedt)βs,12+ ln(aP romocashbackt)βs,13+ ln(aP romof reeproductt)βs,14

+ ln(aP romo&discountt)βs,15+ ln(Avgpricet)βs,16+ Campaignindtβs,17

+ Campaignleadtβs,18+ Campaignlag1tβs,19+ Campaignlag2tβs,20+ Campaignlag3tβs,21

+ AdstockSEAtβs,22+ holidaytβs,23+ 7 X d=2 Dayd,tΩd+ 12 X m=2 M onthm,tΘm + Sunhourstβs,24+ s,t. (3.5)

3.6.4 Hierarchical models

(37)

In our first hierarchical model we allow the effectiveness of promotions to vary over product subgroups. This is equivalent to estimating

ln(Salesit) =β0+ ln(T otalproductsit)β1+ P romoitβ2+ P romolag1itβ4

+ P romolag2itβ5+ P romolag3itβ6+ ln(Avgpromodayit)β7+ ln(aP romoawarenessit)β8

+ ln(aP romopriceof fit)β9+ ln(P avgpricestarsit)β10+ ln(aP romodayweekdealit)β11

+ ln(aP romoxf orf ixedit)β12+ ln(aP romocashbackit)β13+ ln(aP romof reeproductit)β14

+ ln(aP romo&discountit)β15+ ln(Avgpriceit)β16+ Campaigninditβ17

+ Campaignleaditβ18+ Campaignlag1itβ19+ Campaignlag2itβ20+ Campaignlag3itβ21

+ AdstockSEAitβ22+ holidayitβ23+ 7 X d=2 Dayd,itΩd+ 12 X m=2 M onthm,itΘm + Sunhourstβ24+ it, (3.6) where β0= ¯β0+ ρ0i β2= ¯β2+ ρ2i β4= ¯β2+ ρ3i β5= ¯β2+ ρ4i β6= ¯β2+ ρ5i β7= ¯β7+ ρ6i β8= ¯β8+ ρ7i β9= ¯β9+ ρ8i β10= ¯β10+ ρ9i β11= ¯β11+ ρ10i β12= ¯β12+ ρ11i β13= ¯β13+ ρ12i β14= ¯β14+ ρ13i β15= ¯β15+ ρ14i (3.7)

and (ρ0_i, . . . , ρ14_i ) ∼ N (0, ε) , i = 1, . . . N , N = 577 is the number of product subgroups and d (ρ0

i, . . . , ρ14i ) are allowed to be correlated.

(38)

(ρ15 h, . . . , ρ

18

h ) ∼ N (0, Σ) , h = 1, . . . H, H = 19 is the number of product categories and

(39)

Results

4.1 Fixed effects

We estimate fixed effects models in R using the ‘lfe’ package developed by Gaure (2013). As previously discussed, our first model estimates a model for all Bol.com’s product subgroups using a standard fixed effects model, where it is assumed that coefficients are equal across all product subgroups. In our second model the complete dataset is divided into 19 clusters according to product categories defined by Bol.com, a fixed effects model is then estimated for each cluster separately. In our third model the dataset is divided into 19 clusters according to optimal model fit as defined bySarafidis and Weber(2015), after which fixed effects models are again estimated for each cluster separately. Both our second and third model thus estimate 19 different sets of coefficients. For our last two models goodness of fit statistics are computed by predicting fitted values per cluster, then combining all clusters to obtain a complete dataset where residuals can be computed according to equations 3.1 and 3.2. For the first fixed effects model the number of parameters for the purpose of computing AIC is the number of included coefficients found in table 4.2 plus the number of fixed effects, which is equal to the number of product subgroups minus 1. For our second and third fixed effects model the number of parameters is equal to he number of included coefficients found in table4.2times 19 plus the number of fixed effects. The corresponding computed goodness of fit statistics can be found in table4.1.

We observe that clustering by model fit drastically improves both AIC and RMSE, while RMSE(Y) is slightly improved by clustering by category. For regressions results we refer to table4.2. Since for the second and third fixed effects models coefficients are estimated for 19 clusters each, we select one cluster for each model to serve as representative model output. Computed clusters can be found in the appendix under section7.9.

Both the first and second fixed effects model can be computed in under a minute. Computation time for the method developed by Sarafidis and Weber (2015) grows exponentially with both amount of data and desired number of clusters. After exporting the data to STATA implementing the method bySarafidis and Weber(2015) to compute optimal cluster compositions for a fixed number of 19 clusters took approximately 52 hours. Computations were done on a personal desktop running a 3.4GHz quad-core processor and 8 gigabytes of RAM. The long computation times make it impractical to compute cluster compositions for an increased number of clusters

(40)

for the purpose of this thesis. However, since the results of this method are promising, future research could benefit from increased computing power and time in order to obtain an optimal amount of clusters.

Table 4.1: Goodness of fit statistics for fixed effects models

Statistic AIC RMSE RMSE(Y)

(41)

Dependent variable:

log(Totalsales)

Bol.com as a whole Category Sound & Vision Cluster 19