• No results found

Exploring the Use of a Dynamic Hierarchical Factor Model in a Marketing Context: Toward a New Way of Modeling in Marketing Science

N/A
N/A
Protected

Academic year: 2021

Share "Exploring the Use of a Dynamic Hierarchical Factor Model in a Marketing Context: Toward a New Way of Modeling in Marketing Science"

Copied!
60
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

1

Exploring the Use of a Dynamic Hierarchical Factor Model in a

Marketing Context: Toward a New Way of Modeling in Marketing

Science

Luuk Appels S2515539

(2)

2

Exploring the Use of a Dynamic Hierarchical Factor Model in a

Marketing Context: Toward a New Way of Modeling in Marketing

Science

Master Thesis Marketing Intelligence Faculty of Economics and Business

University of Groningen June 16th, 2018

(3)

3

Exploring the Use of a Dynamic Hierarchical Factor Model in a

Marketing Context: Toward a New Way of Modeling in Marketing

Science

Luuk Appels

Abstract: This research examines the use of a Dynamic Hierarchical Factor Model (DHFM)

in a marketing setting. By applying a DHFM to supermarket sales data, we attempt to unravel the latent factors behind price and advertising. In particular, we construct a variance decomposition, an impulse response analysis, and a price forecast to analyze the dynamics of price and advertising across different level of the hierarchical structure of the model. In addition, we forecast sales using different regression-type methods, amongst which a Factor-Augmented Vector Autoregression (FAVAR), including two factors; one from the sales data and one from weather data. We provide both a graphical and numerical comparison between these methods, and ultimately find that standard autoregression outperforms other methods employed due to the possibility of exogenous inclusion of the factors in the model. In addition, the method comparison substantiates the value of adding these factors to the regression-forecasts. The main contribution of this research lies in its explorative function with respect to marketing modeling; it pioneers the DHFM in a marketing setting and proves its usefulness.

Keywords: Dynamic Hierarchical Factor Model; forecasting; variance decomposition;

(4)

4 Table of Contents 1 Introduction . . . . . 2 Literature Review . . . 2 Methodology . . . . . 2.1 Data . . . 2.2 Dynamic Hierarchical Factor Model . . . . . 2.3 Factor-Augmented Vector Autoregression . . . 2.4 Estimation and Procedure . . . . .

3 Results . . .

3.1 Variance Decomposition . . . . . 3.2 Impulse Response Analysis . . . 3.3 Price Forecast . . . . . 3.4 FAVAR & AR Forecasting . . .

(5)

5

1. Introduction

Nestle, Unilever and Procter and Gamble are company names deeply embedded in the minds of most people. Their brands, such as KitKat and Dove, are fast-moving consumer goods (FMCGs), which are defined as goods that sell quickly and typically at low prices, making supermarkets their dominant channel of sales. FMCGs play an important role in the lives of virtually anyone, employ millions of people, and are amongst the most well-known brands in supermarket shelves. The sheer size of the FMCG market and its steady increasing trend over the years are nothing short of exhilarating. A 420 page report on the global FMCG market issued by Reportbuyer (2017) estimates the global market size of FMCG to currently be around 4.000 billion USD. This paper will focus on one specific example of such an FMCG: lemonade. According to data of Statista (2015), the total Dutch market for non-alcoholic beverages, including lemonade, amounted to roughly 2 billion Euros in said year. The data that will be used was obtained from Nielsen, a leading global information and measurement company and contains weekly sales, price, and promotional data on six lemonade brands across six supermarket chains in the Netherlands, over the course of 169 weeks.

The term marketing science, focusing on quantitative and analytical methods for doing marketing research, was first assigned this definition in the 1980’s (Leeflang, Wieringa, Bijmolt, & Pauwels, 2013). Ever since then, especially in the light of recent technological developments, numerous work has been published addressing state-of-the-art methods for marketing purposes. This research will focus on the field of marketing science by proposing a model that has not, to our knowledge, been applied to marketing-related data before. In particular, this research will pioneer the use of a Dynamic Hierarchical Factor Model (DHFM) on marketing data, with the aim of validating its use in said context. In doing so, we hope to significantly add to both the academic and business world, by paving the way for more exhaustive and specific research to ultimately foster a new way of modeling in the field of marketing

(6)

6 potentially very broad range of variables in one factor, which can subsequently be used in different analyses, further details and technicalities of DHFMs will be discussed later. Subsequently, to achieve our first objective, we will shed light on the dynamics of price and advertising as captured by the abovementioned modeled factors by means of a decomposition of variance.

After this has been done, we will proceed with our second goal: illustrating the way lemonade prices and advertising respond to an industry-wide shock to the common factor 𝐹 by means of an impulse response analysis, and analyze differences between chains. Thirdly, we will attempt to provide an accurate forecast for prices of lemonade across chains and brands using the DHFM. An additional aim of this research, related to our main goal, is validating the use of the extracted factors by the DHFM in subsequent analyses. We will attempt to do so by creating and putting to use a sales forecasting model for a specific lemonade brand. The main method we will employ to attain this goal will be a Factor-Augmented Vector Autoregressive (FAVAR) model, which was pioneered by Bernanke, Boivin, & Eliasz (2005). As the name suggests, FAVAR is an augmented version of a basic Vector Autoregression (VAR) with a certain factor, which will be discussed more extensively in due time. Conveniently, we may use the common factor 𝐹, extracted by the DHFM, as augmentation. In addition to this, we will add a common factor 𝑊, to be extracted from an extensive dataset on weather in the Netherlands, obtained from the Royal Dutch Meteorological Institute (KNMI). To assess the usefulness of the extracted factor in the FAVAR, we will benchmark the forecasts of our FAVAR model by providing a comparison with the forecasts of a regular autoregression (AR) - without factors 𝐹 and 𝑊- to explore the added value a factor augmentation might have. Aside from the benchmark AR, we will provide multiple models for forecasting and eventually determine which one perform best.

(7)

7

22. Literature Review

In order to warrant the use of the DHFM in the context of our data, we should check for co-movements at different levels of the hierarchy we aim to impose. To first understand what this might entail, a look at the existing literature reveals the following: firstly, Van Heerde, Gijsenberg, Dekimpe, & Steenkamp (2013) find evidence of differences in the elasticities of both price and advertising across different levels of economic prosperity, indicating there might indeed be a common factor at play and also that levels of price and promotional activity might be expected to move. Hence, prices of different brands at different chains might co-move, just as promotional levels of different brands at different chains might do so. In addition, Lal & Rao (1997) analyze different pricing strategies employed by supermarkets, suggesting supermarkets set prices according to a certain strategy which points toward the existence of co-movement of prices at the chain-level. Similarly, it stands to reason that supermarket chains decide to what extent, if at all, certain brands are advertised or not, suggesting co-movement of promotional activity at the chain-level. Concerning the brand-level, Keeney, Lawless, & Murphy (2010) propose that firms set their own prices and that therefore, lemonade producers to a large extent set prices of their produces. Therefore we argue that co-movement of prices at the brand level might exist.

For a more empirical approach, we computed correlations between both prices and promotions of different brands for all chains separately. We find almost no correlation between promotional activity of different brands within any supermarket and moderate correlation between prices of different brands within supermarkets. Looking separately at the price and promotional activity of all brands in all chains taken together, we see a similar picture: very marginal correlation amongst promotional levels of different brands at different chains and moderate correlations for prices of different brands at different chains. At the market-level this picture does not change, correlation between prices appears to exist, whereas correlation between promotion seems much less observed. In addition, co-movements between price and promotion do not seem to exist.

(8)

8

Price Promo

Albert Heijn

Jumbo Plus Coop EMTE Albert Heijn

Jumbo Plus Coop EMTE

70.90% 71.54% 81.53% 74.54% 80.51% 95.04% 69.71% 80.56% 73.66% 64.47%

42.63%*** 35.87%***

53.60%*****

From the table it becomes evident that there does exist co-movement of prices and promotional activity across chains. Furthermore, for all chains together, we also observe some co-movement of both price and promotional activity, although they appear much lower than the chain-specific figures. Lastly, we also have some co-movement of both price and promo at the most general, economy-wide, level. To finalize the first part of this section we form three hypotheses:

H1: There exists strong co-movement between prices of different brands and strong co-movement of promotional activity of different brands at the chain-specific level.

H2: Some, but weaker, co-movement exists between prices and some co-movement exists between levels of promotional activity between chains.

H3: At the market-level, no co-movement exists between price and promotional activity.

These hypotheses will be applied to each chain and tested by checking them against a variance decomposition, a type of analysis that uses the hierarchical structure of the DHFM to identify important drivers in the variance of price and promotional activity. The proposed structure of price and promotional activity as outlined above by the literature and the fact that these dynamics may change over time, is in this way optimally modeled by the DHFM, which allows for both hierarchy and dynamicity.

With respect to the inclusion of the factors 𝐹 and 𝑊 in the FAVAR we argue as following. Firstly, factor 𝑊 reflects weather conditions. Although the effect of weather on sales might seem logical to at least some, the existing marketing literature has largely ignored the relation However, King & Narayandas (2000) provide anecdotal evidence of Coca-Cola developing vending machines that are programmed to raise prices as temperatures increase, signaling marketers already account for weather in their strategy. Contrary to the marketing literature, other fields have conducted exploratory research related to this topic. Firstly, weather has been widely shown to be able to affect human behavior. For instance, literature in the field of finance

(9)

9 has suggested that the weather might influence stock returns (Saunders, 1993). Extensions of this field of research have shown that this effect can be said to be resultant of the effect weather exerts on mood (Kamstra, Kramer, & Levi, 2003).

Additionally, psychology literature too, has substantiated the effect weather has on mood with examples such as Barker, Hawton, Fagg, & Jennison (1994) and Stoupel, Abramson, & Sulkes (1999), linking suicides to certain weather variables. Connecting this relation to sales, Donovan (1994) found that consumers with positive emotional states engaged in greater overall spending. This relation was later confirmed by Spies, Hesse, & Loesch (1997) who used two IKEA stores with different atmospheres to successfully influence consumer moods and subsequently find a positive relationship between moods and spending.

Given the existing literature, we suspect the weather to influence the sales of lemonade and therefore include a common factor 𝑊, extracted from the KNMI dataset representing the common dynamics of weather in our FAVAR. Also, we will augment the model with the common factor 𝐹, derived from the price and advertising data, representing the common dynamics of the marketing mix inputs of lemonade. This 𝐹 captures the dynamics of all variables included in the data and thus represents price and promotional variables of both the any focal brand we may later chose to analyze and the competition.

(10)

10

3. Methodology

3.1. Data

Two datasets will be used in this paper. Firstly, we will employ a dataset containing time series data of six lemonade brands across six supermarket chains in the Netherlands, namely the lemonade brands Private Label, EuroShopper, Karvan Cevitam, Raak, Slimpie, and Teisseire, sold at supermarkets Albert Heijn, Jumbo, Plus, Coop, C1000, and EMTE. The dataset contains information on price, sales and three promotional variables; feature advertising, display advertising, and feature & display advertising. The dataset contains time series data, aggregated on the weekly level, starting in week 41 of 2013 up to and including week 52 of 2016. Hence, we have 169 weeks of data, containing information on six chains, and six brands. We assume the time series to be stationary, mean-zero, standardized, and to have unit variance (Moench et al., 2009).

Prior to performing any type of analysis, the dataset was checked for inconsistencies, missing values, outliers etc., of which only few were found. Firstly, EuroShopper shows a positive sales and price value for Jumbo in week 24 of 2014, while only being sold at Albert Heijn. We considered this an entry error and set the value to zero manually. As data on EuroShopper were missing for all chains but Albert Heijn - 5/6th of the data - we decided to drop all data on this brand as imputation was considered of no use, since only Albert Heijn data on EuroShopper was available as input. Another finding was the closure of C1000 in week 29 of 2015, after which all values of all brands stop being reported causing a large block of data to be missing. We argue that this data is Missing Not At Random (MNAR), meaning the missing values of a given variable are related to that variable itself (Pituch & Stevens, 2016). MNAR is generally considered the most problematic missing data mechanism, requiring the specification of a distribution for the missings in the data (Schafer & Graham, 2002). To avoid this complex process, and given the large part of the data for C1000 that is missing – about 45 % -, and the reasonably large size of our dataset, we decided to drop C1000 from the analysis completely and continue with five chains instead of six. Summary statistics of the lemonade data are included in Table 1 and 2 of the Appendix, where the three promotional variables have been aggregated as this will be done for the estimation of the DHFM too.

(11)

11 statistical power, and not biased outcomes given MAR data. Furthermore, we drop several stations that have not reported any information on the variables in the dataset as they are of no statistical use. Given the sheer size of our dataset, we argue that these deletions are warranted. For the variables that contain fewer missings – around 3% - we use Rstudio to impute these missings using multiple imputation. Ultimately, the dataset spans the same period as our lemonade data, contains 22 variables measured across 30 stations we divide roughly equal over three regions; North, Middle and South. For a graphical overview of the allocation of weather stations to regions, please see Figure 1 of the appendix.

3.2 Dynamic Hierarchical Factor Model

The Dynamic Hierarchical Factor Model was first introduced by Moench et al. (2009) in a staff report issued by the Federal Reserve Bank of New York and first published in 2013. Since 2009, several studies have been conducted using DHFMs to decompose the causes of dynamics driving different macroeconomic processes into a multitude of factors. For instance, Moench & Ng (2011) employ the model to quantitatively assess the dynamic effects of housing shocks on retail sales and substantiate the presence of a national and a housing component in each of their regions - blocks -, warranting the use of a DHFM in this context. In related research, Förster, Jorra, & Tillmann (2014) decompose capital inflow data from 47 countries into a four-level DHFM and gauge the respective explanatory power of their factors. Moreover, they find that that global factor accurately reflects U.S. financial conditions.

In a recent study by Dehmamy & Halberstadt (2015), the authors use time-series data on a wide range of macroeconomic indicators to estimate the term structure of U.S. interest rates. In doing so, they provide a comparison between the DHFM and a benchmark model - a parsimonious principal component approach - and find that the DHFM outperforms the benchmark model by allowing for a better identification of which dynamics from the dataset are reflected in the derived factors. In addition, by means of a variance decomposition, they confirm the strong block-like structure of some parts of the dataset, further underlining the advantages the DHFM has.

(12)

12 block level, but also a common factor consisting of both advertising and price information. This common factor 𝐹 we may then interpret as representing the so-called marketing mix, reflecting dynamics of both price and advertising. We graphically depicted this structure in Figure 1.

Mathematically, this four-level hierarchical structure may be represented as follows:

𝑍

𝑏𝑠𝑖𝑡

= Λ

𝐻𝑏𝑠𝑖

(𝐿)𝐻

𝑏𝑠𝑡

+ 𝑒

𝑍𝑏𝑠𝑖𝑡

𝐻

𝑏𝑠𝑡

= Λ

𝐺𝑏𝑠

(𝐿)𝐺

𝑏𝑡

+ 𝑒

𝐻𝑏𝑠𝑡

𝐺

𝑏𝑡

= Λ

𝐹𝑏

(𝐿)𝐹

𝑡

+ 𝑒

𝐺𝑏𝑡

𝜓

𝐹𝑘

(𝐿)𝐹

𝑘𝑡

= 𝜖

𝐹𝑘𝑡

(1) (2) (3) (4) 4th level Market Dynamics 3rd level Price/Advertising Dynamics 1st level Brand Dynamics 2nd level Chain Dynamics

(13)

13 In this model,

𝑖

represents

𝑖 = 1, . . . , 𝑁

𝑏time series,

𝑏

denotes the block

(𝑏 = 1, 2)

, and 𝑠 stands for the sub-blocks

(𝑠 = 1, . . . , 5 )

. In particular, the terms may be interpreted as discussed below. In the level one equation (1) ,

𝑍

𝑏𝑠𝑖𝑡 is the total movement.

Λ

𝐻𝑏𝑠𝑖

(𝐿)𝐻

𝑏𝑠𝑡 is the common component, where

𝛬

𝐻𝑏𝑠𝑖

(𝐿)

is the matrix polynomial of lag order

𝐿

, and

𝐻

𝑏𝑠𝑡 describes the variation that time series

𝑖

at block

𝑏

shares with the other time series in block

𝑏

.

𝑒

𝑋𝑏𝑛𝑡 represents the idiosyncratic component, describing the variation unique for time series

𝑖

in block

𝑏

. In the level two equation (2),

𝐻

𝑏𝑠𝑡 is the variation of a sub-block

𝑠

in a given block

𝑏.

We denote the sub-block specific variation by

𝑒

𝐻𝑏𝑠𝑡and the common component, the variation that a sub-block has in common with other sub-blocks, by

Λ

𝐺𝑏𝑠

(𝐿)𝐺

𝑏𝑡

,

Λ

𝐺𝑏𝑠

(𝐿)

again being the polynomial of lag order

𝐿

.

In the level three equation (3),

𝐺

𝑏𝑡 captures the total block specific variation,

𝛬

𝐹𝑏

(𝐿)𝐹

𝑡 represents the common component, capturing the variation that both blocks share with one-another, with

𝛬

𝐹𝑏

(𝐿)

again being the polynomial of lag order

𝐿

.

𝑒

𝐺𝑏𝑡 is the block-specific component, describing the variation that is unique for any given block. Finally, the stochastic process of

𝐹

𝑡 , representing the variation of economy-wide factors, constitutes the level four equation (4).

We assume

𝑒

𝑍𝑏𝑠𝑖𝑡

,

𝑒

𝐻𝑏𝑠𝑡,

𝑒

𝐺𝑏𝑡

,

and

𝑒

𝐹𝑡 to be stationary, normally distributed autoregressive processes so that:

For a more detailed and technical description of the model, this paper refers to Moench et al. (2009).

Concerning the KNMI dataset, we use a four level modelling approach but evidently with different blocks. In particular, we decompose the data into a common factor 𝑊, 𝐾 blocks pertaining to regions across the country, each containing 𝐿 weather stations – sub blocks -, which are in turn split out in several weather related variables, 𝑀. As mentioned earlier, the 30 weather stations have been manually divided into three regions, yielding three blocks. The regions contain roughly the same amount of stations; North: 11, Middle: 10, South: 9. For an overview of the division of weather stations amongst the three regions, see Figure 1 of the appendix. A graphical display of the model can be found in Figure 2.

(14)

14 Mathematically, the model may be represented as follows;

𝑀

𝑏𝑠𝑖𝑡

= Λ

𝐿𝑏𝑠𝑖

(𝐿)𝐿

𝑏𝑠𝑡

+ 𝑒

𝑀𝑏𝑠𝑖𝑡

𝐿

𝑏𝑠𝑡

= Λ

𝐾𝑏𝑠

(𝐿)𝐾

𝑏𝑡

+ 𝑒

𝐿𝑏𝑠𝑡

𝐾

𝑏𝑡

= Λ

𝑊𝑏

(𝐿)𝑊

𝑡

+ 𝑒

𝐾𝑏𝑡

𝜓

𝑊𝑘

(𝐿)𝑊

𝑘𝑡

= 𝜖

𝑊𝑘𝑡

Similarly to the first model,

𝑖

represents

𝑖 = 1, . . . , 𝑁

𝑏time series,

𝑏

denotes the block

(𝑏 =

1, 2, 3)

, and 𝑠 stands for the sub-blocks

(𝑠 = 1, . . . , 11)

.

(9) (10) (11) (12) Figure 2: Schematic representation of the four-level weather DHFM

(15)

15 Lastly, to close the model, we again assume

𝑒

𝑀𝑏𝑠𝑖𝑡

,

𝑒

𝐿𝑏𝑠𝑡,

𝑒

𝐾𝑏𝑡

,

and

𝑒

𝐹𝑡 to be stationary, normally distributed autoregressive processes. Mathematically we have:

3.3 Factor-Augmented Vector Autoregression

As mentioned before, Factor-Augmented Vector Autoregression (FAVAR) was first pioneered by Bernanke et al. (2005), using the model to assess the effectiveness of monetary policy. FAVAR builds on standard VAR, introduced by Sims (1980), a frequently used technique for identifying structural shocks and investigating the way in which they propagate (Marcellino, & Sivec, 2016). Ever since its introduction, FAVAR has been widely used in the fields of macroeconomics and finance (Bai, Li, & Lu, 2016). Examples are Vargas-Silva (2008), Gupta & Kabundi (2010), and Gupta, Jurgilas & Kabundi (2010), who all explore the effect of monetary policy on the housing market using a FAVAR. The reason for FAVAR to be introduced was the problems VAR suffered concerning the prohibition of the inclusion of more than only a few variables, as this leads to degree-of-freedom problems. This is due the property of VAR stating that the number of parameters in a single equation is equal to the product of the number of variables and number of lags, which can quickly increase if either the order or size of the VAR gets larger. This degree-of-freedom problem was an issue because in reality, the problems VARs are used for require many variables to be included in order to reflect a real-world situation. Issues handled by banks related to monetary policy, for instance, typically follow hundreds of data series (Bernanke et al., 2005). FAVAR alleviates this problem by adding a single factor to the VAR, extracted from a large number of variables based on common variance, mitigating the degrees-of-freedom problem through the addition of only a single variable rather than many.

In this paper, we will estimate a FAVAR augmented with two factors. In particular, let 𝑌𝑡 be a 𝑀 𝑥 1 vector of an observable economic variable assumed to drive the dynamics of some process. Conventionally, we might proceed with the estimation of a VAR or any other multivariate time series model using only data for 𝑌𝑡. However, in many instances, additional information that is not fully captured by 𝑌𝑡 is required to accurately model the dynamics of these series. Therefore, let us assume that 𝐹𝑡 and

𝑊

𝑡 are 𝐾 𝑥 1 vectors of unobserved factors capturing this additional information, with 𝐾 being small. In our case, 𝐹𝑡 represents the

(16)

16 common factor of prices and

𝑊

𝑡 the common factor of the weather. The joint dynamics of (𝐹𝑡, 𝑊𝑡, 𝑌𝑡) can then be represented by:

[

𝐹

𝑡

𝑊

𝑡

𝑌

𝑡

] = 𝜇 + Φ(𝐿) [

𝐹

𝑡−1

𝑊

𝑡−1

𝑌

𝑡−1

] + 𝑣

𝑡

where 𝜇 is a (𝑘 + 1) vector of constants, Φ(𝐿) is a conformable lag polynomial of finite order 𝑝,

[

𝐹

𝑡−1

𝑊

𝑡−1

𝑌

𝑡−1

]

represents the vector of lags of

𝐹

𝑡

, 𝑊

𝑡 and

𝑌

𝑡, and 𝑣𝑡 is the error term with mean zero

and covariance matrix 𝑄. In addition, we will benchmark the results of the FAVAR against a regular AR process. Generally, we denote an autoregressive process of order

𝑘

, or AR(

𝑘

) process as follows:

𝑦

𝑡

= 𝛽

0

+ 𝛽

1

𝑦

𝑡−1

+ ⋯ + 𝛽

𝑘

𝑦

𝑡−𝑘

+ 𝜖

𝑡

where

𝑦

𝑡−1…𝑘 represent the lags of the dependent variable

𝑦

, 𝛽0 is the intercept, and

𝛽

1…𝑘

represent the respective coefficients of the lags and

𝜖

𝑡is the error term.

(17)

(17)

17 3.4 Estimation and procedure.

We will estimate the model presented in section 2.2 using a Gibbs sampler which is a Markov Chain Monte Carlo (MCMC) iterative algorithm. A Markov Chain is an example of a random walk process characterized by the property that future state depends only on the current state, and not on previous states of the process. Alternatively, Monte Carlo methods are computational algorithms that take random samples from a certain process. Together, the basic idea of MCMC constitutes the sampling of each unknown parameter in turn, while sequentially iterating over each unknown many times, conditional on most recent draws for every other parameter (Leeflang et al., 2013). In practice, a starting point is chosen after which the algorithm is run and reaches a stationary distribution after several thousands of iterations, having fully converged to the target posterior distribution. These first iterations are commonly discarded as so-called burn-in as they are not actually drawn from the target posterior distribution since the sampler has not reached convergence yet. For a more mathematical and more detailed explanation of the sampling procedure, this paper refers to Moench et al. (2009). After the MCMC has been run, we will conduct a variance decomposition to shed light on the variation at the common-, block level-, sub block level-, and idiosyncratic level which will provide insights into the importance of sources of variations in prices and advertising of both brands and chains. Subsequently, the impulse response analysis will serve to analyze how the price of lemonade responds, at the chain level, to an economy-wide shock, such as the sudden onset of a financial crisis or sudden upswing in the economic climate. Finally, the two obtained factors will be used in a FAVAR to forecast future levels of sales. The results will be compared to a regular AR to provide a benchmark.

4. Results

(18)

18 al., 2013). However, given the time-consuming estimation and resulting time constraint, we decide to use only one lag at each level.

Figure 3 shows a plot of the posterior mean of common factor 𝐹 with 5% confidence bands. This factor reflects dynamics of price and advertising which we may consequently interpret as the common movement of the marketing mix, as extracted by the MCMC algorithm. The negative values present on the x-axis do not reflect negative values for marketing max inputs, but are relative to the mean. Therefore, this graph is very well suited for analyzing the movement of marketing mix dynamics over time. Evidently, the line follows a spiky trend, constantly returning to its base level around zero. Its end point lies slightly higher than zero as our data ends right in the steep fall off the peek from a few weeks earlier. If a small number of weeks more of data of data had been available, we assume the line would have returned to around zero. Generally, we conclude that marketing mix dynamics follow a constant trend with large positive spikes showing regularly.

Figure 3: Common Movement of Marketing Mix

(19)
(20)

20 4.1 Variance Decomposition

In this section, we will attempt to lay bare the main drivers of variance in price and advertising across different supermarket chains and lemonade brands. Variance decomposition analyses are a widely used tool in the field of, amongst others, economics and finance. Table 1 summarizes the results for the price block, with standard errors between parentheses. According to Dehmamy & Halberstadt (2015), the shares represent the parts of the variance in, in this case price, explained by the idiosyncratic components of different hierarchical levels in the model. Chain 𝑆ℎ𝑎𝑟𝑒𝐹 𝑆ℎ𝑎𝑟𝑒𝐺 𝑆ℎ𝑎𝑟𝑒𝐻 𝑆ℎ𝑎𝑟𝑒𝑍 Albert Heijn 0.129(0.008) 0.123(0.005) 0.122(0.005) 0.626(0.014) Jumbo 0.000(0.000) 0.000(0.000) 0.476(0.061) 0.524(0.060) Plus 0.003(0.003) 0.003(0.003) 0.414(0.016) 0.580(0.015) Coop 0.002(0.003) 0.002(0.003) 0.417(0.013) 0.579(0.013) EMTE 0.003(0.004) 0.003(0.004) 0.414(0.015) 0.579(0.013)

Looking at Table 1, an obvious first observation is the atypical values for Albert Heijn. As can be easily seen from Figure 6, which plots the shares in stacked-bar form, Jumbo, Plus, Coop, and EMTE show very similar compositions. Albert Heijn however, seems to derive far less from its variance in price from the chain-specific component 𝐻 (12.2 % vs >40%), and instead much more from the common component 𝐹 and the price dynamics-specific component 𝐺, (12.9% and 12.3% vs virtually negligible values, respectively). Generally, we may conclude that, at an average of about 57%, the brand-specific dynamics are the main underlying factor in the variance of price across supermarkets, followed by the chain-specific component (roughly 35% on average).

Table 1: Chain-specific variance decomposition of the price block

(21)

21 Moreover, for all chains but Albert Heijn, the common factor and the price-specific factor seem to hardly be of importance at all. What we may take away from this is that the chains differ in terms of the extent to which the different levels of the hierarchy contribute to the total variance in price and advertising. This means that changes in, for instance, the common factor will affect Albert Heijn to a much more serious degree than other chains. Alternatively, the other chains will show a more severe response to changes in the chain-specific factor. More concrete, Albert Heijn’s pricing appears to be more subject to changes or shocks in economy-wide factors, whereas other supermarkets their pricing will be affected more by, for instance, bad quarterly results of its chain.

Chain 𝑆ℎ𝑎𝑟𝑒𝐹 𝑆ℎ𝑎𝑟𝑒𝐺 𝑆ℎ𝑎𝑟𝑒𝐻 𝑆ℎ𝑎𝑟𝑒𝑍 Albert Heijn 0.020(0.013) 0.254(0.025) 0.113(0.011) 0.612(0.023) Jumbo 0.001(0.001) 0.010(0.010) 0.281(0.019) 0.708(0.017) Plus 0.001(0.001) 0.008(0.009) 0.285(0.018) 0.707(0.017) Coop 0.000(0.001) 0.005(0.006) 0.254(0.013) 0.741(0.012) EMTE 0.001(0.001) 0.008(0.008) 0.262(0.016) 0.730(0.014)

Considering the advertising block – Table 2 and the right side of Figure 6 -, results are very much alike those of the price block. Again, the brand-specific component seems most important at around 70% on average, with the chain-specific component taking second place, as we saw for the price block. Here too, Albert Heijn differs quite substantially from the other four chains. Again, the common component and the advertising/price-specific component (13% and 25% vs roughly 1% and 8%, respectively) seem to be of far more explanatory value than those components at other chains. Similarly to the price block, this means that Albert Heijn’s advertising dynamics will be influenced to a far greater extent following a shock to the common factor or the price/advertising specific factor than other chains.

Turning to the brand-specific variance decomposition, we summarized the results of the price block in Table 3, and plotted in figure 7. Shares across brands seem to differ slightly more than across chains. For instance, we can see that for Teisseire the chain-specific is component more decisive than the brand-specific component, which stands in sharp contrast with the other brands except for Karvan Cevitam, showing an even more pronounced difference. In particular, Karvan Cevitam seems to almost fully have its price variance explained by chain-specific factors, whereas other brands – aside Teisseire – derive large parts of their variance from the brand-specific component (around 80% on average). This may indicate that Karvan Cevitam allows its retailers to freely set prices whereas other brands impose restrictions or, alternatively, it may be that chains put Karvan Cevitam on price-discount most often, resulting in a large chain-specific influence. Investigating the data reveals

(22)

22 that the latter –but not the former - is the case. In particular, if we define price promotions as outliers outside the lower bounds of a boxplot, having Rstudio these outliers in the respective boxplots of prices of all brands yields a total of 60 outliers across chains for Karvan Cevitam compared to 50 for Slimpie, 35 for Teisseire, 10 for Raak and 0 for Private Label.

Brand 𝑆ℎ𝑎𝑟𝑒𝐹 𝑆ℎ𝑎𝑟𝑒𝐺 𝑆ℎ𝑎𝑟𝑒𝐻 𝑆ℎ𝑎𝑟𝑒𝑍 Private Label 0.020 (0.002) 0.019 (0.002) 0.273 (0.004) 0.688 (0.007) Karvan Cevitam 0.072 (0.004) 0.069 (0.003) 0.761 (0.024) 0.097 (0.023) Raak 0.000 (0.000) 0.000 (0.000) 0.101 (0.037) 0.898 (0.037) Slimpie 0.006 (0.002) 0.005 (0.002) 0.193 (0.030) 0.796 (0.031) Teisseire 0.038 (0.005) 0.036 (0.004) 0.516 (0.015) 0.410 (0.017)

Private label also shows a fairly high value on the chain-specific share of variance, which may be explained by the fact that each chain has its own private label, which can virtually be seen as independent brands, and can thus set prices autonomously. From Table 3 we may additionally conclude that only small responses are to be expected across brands after a shock to the common factor, such as sudden change in the economic climate. Conversely, brand-specific shocks would have great impact, except for Karvan Cevitam, for which a shock to the chain-specific factor would have grave consequences.

Looking at advertising, we can easily see from Figure 7 that Karvan Cevitam is again the odd one out, but this time alone. Similar to what we saw with price, the variance in advertising seems to be largely driven by the chain-specific component rather than by the brand-specific component, which is the case for all other brands. This may again be explained by

Table 3: Brand-specific variance decomposition of the price block

(23)

23 supermarkets putting Karvan Cevitam on one of their advertisement channels relatively often compared to other brands, resulting in the chain-specific component being of such large size.

Brand 𝑆ℎ𝑎𝑟𝑒𝐹 𝑆ℎ𝑎𝑟𝑒𝐺 𝑆ℎ𝑎𝑟𝑒𝐻 𝑆ℎ𝑎𝑟𝑒𝑍 Private Label 0.003 (0.002) 0.044 (0.006) 0.112 (0.005) 0.841 (0.008) Karvan Cevitam 0.012 (0.008) 0.152 (0.016) 0.832 (0.014) 0.004 (0.002) Raak 0.004 (0.003) 0.046 (0.009) 0.114 (0.018) 0.837 (0.021) Slimpie 0.002 (0.002) 0.028 (0.008) 0.088 (0.017) 0.882 (0.021) Teisseire 0.001 (0.001) 0.015 (0.007) 0.050 (0.013) 0.935 (0.016)

From Table 4 we observe that shocks to the common factor will see only small responses across brands. Higher responses are to be expected following a shock to the chain-specific component for Karvan Cevitam, and to the brand-specific component for all other brands.

4.2 Impulse Response Analysis

Now that we have analyzed how the total variance in price and advertising is spread out across the different levels of the hierarchy, we will turn to see how price and advertising respond to a shock to the common factor, both generally and on the chain-specific level. We conventionally chose a positive shock to the common factor with the size of one standard deviation. Such a shock may constitute many different things, an example would be a sudden upswing in the economic environment. The impulse responses of the common factor and the block-specific

Figure 4: Shock on the common factor and its impulse responses across blocks Table 4: Brand-specific variance decomposition of the advertising block

(24)

24 factors of both price and advertising, including 5% confidence bands, have been plotted in Figure 8.

We see that the reaction of the common factor to a shock on the common factor itself evens out rather quick, after approximately six weeks the common factor has returned to its normal level. Price appears to show a positive response to a sudden upswing in the economic environment, as can be seen from the graph there is a peak after about two weeks followed by a quick decline and an ultimate return to the pre-shock level after about seven weeks. This may signal managers increasing prices anticipating increased buying power amongst consumers following the economic upswing. Advertising shows a slightly more gradual effect, there seems to be a negative reaction of advertising dynamics to the shock, reaching a trough after about two weeks, flattening out over the course of the next weeks to return to the normal level after about 17 weeks.

(25)

25 again, effects seem marginal. Albert Heijn’s response looks very much alike the responses of the blocks, but with effects flattening out quicker. This, too, was to be expected as we saw in the variance decomposition that no chains but Albert Heijn derived a substantial amount of their variance from the price/advertising block.

Turning to the graph, price sees a spike that returns to pre-shock levels after around four weeks, advertising experiences a drop that diminishes after about 13 weeks. For the sake of completeness, we also investigated the effect of a shock to the block level. Hence, we applied a one standard deviation positive shock to both the price block and the advertising blocks, the combined results with 5% confidence bands have been plotted in Figure 10.

The left column represents the price-responses of chains following a shock to price, such as a sudden tax increase. We observe that no significant differences seem to exist compared to the shock to the common factor. Turning to the rightmost column, portraying the advertising-responses of chains following a shock to advertising. The advertising-responses seem most significant for Albert Heijn and much less for other chains. An interesting difference here is the positive trend we observe, which looks much like the inverse of the advertising response following a shock to the common factor. Evidently, shocking the advertising block positively will lead to an upswing in advertising but apparently, this does not hold for a positive shock to the common factor.

(26)

26 4.4 Price forecasting

The next part of this paper will be devoted to providing a forecast of the common factor, block-specific factors, and especially the price for all brands in all chains. Figure 11 plots the forecasts from week 140 onward, including 5% confidence bands, as predicted by the common factors of the first 140 weeks. The upper graph starts at 𝑡 = 100 and displays the forecast for the next 30 weeks. As can be seen, the model forecasts the marketing mix factor to return to its mean after about five weeks. The bottom left and right graph shows a plot of price and advertising, including the forecast starting from week 140. It becomes obvious that here as well, the model forecasts price and advertising dynamics to approach zero rather quick, after about three and 15 weeks, respectively. It is interesting to note that the forecast of the common factor and that of price seem to follow a very similar pattern. This was to be expected as a plot of the these factors, as provided at the beginning of this chapter, revealed that the price factor follows a pattern very much alike to the pattern of the common factor.

Moving deeper into the hierarchy, Figure 12 plots the week 140-169 price forecasts of each brand at each chain including 5% confidence bands and the actual price movement during that period in bold. We observe that, similar to the forecasts assessed above, for most of the chains and brands the forecasts tends to zero rather quickly and do not appear to provide an accurate forecast. The only exception is Jumbo, for which the forecast of Karvan Cevitam and Teisseire seem to be quite reasonable. Jumbo is also the only chain for which the forecasts have not approached zero after 29 weeks.

(27)

27 Given the poor results, in order to better forecast price, the DHFM was rerun using four lags at each level, rather than only one. Plotting the price forecasts again yields Figure 13. For the sake of completeness, a forecast using four lags of the common- and block-specific factor has been included in Figure 2 of the appendix. As we can see from Figure 13, the forecasts have improved. Actual data seems to fall between the confidence bands slightly more often and forecasts return to zero far slower than in Figure 10, which is due to the more extensive modeling of the common factor as four lags have been used. Although the forecasts seem to have improved, they still are not able to capture sudden changes or trends in the actual data, which signals a limited forecasting capability of the common factor with respect to sales.

(28)
(29)

29 4.4 FAVAR & AR Forecasting

In order to create sales forecasts of the lemonade industry, a FAVAR is estimated in Rstudio. In particular, we estimate a VAR with two added factors, extracted from the lemonade and weather data, respectively. The sales forecast will thus be based on the values of sales, 𝐹 , and 𝑊 in previous periods, as depicted in (17). We will provide a forecast of each brand in a specific chain, Albert Heijn. The reason Albert Heijn was chosen as focal chain is twofold. Firstly, we examined which chain displayed the greatest variation in the levels of the dependent variable, namely the marketing mix –or common- factor 𝐹, to optimize statistical inference (Leeflang et al., 2013). This pointed towards Albert Heijn as they are the chain that most frequently puts its lemonade brands on price promotion or advertises them. In addition, the variance decomposition revealed that Albert Heijn was most severely influenced by the common factor leading us to believe the most accurate forecasts will be obtained for brands at Albert Heijn.

As we have 1.000 draws for each time series for each factor, the correct way of proceeding would be to estimate a total of 1.000*1.000 auto regressions. However, for the sake of simplicity and to save computation time, we use an average of the weather factor 𝑊, resulting in a total of 1000 autoregressions left to be performed, one for each draw of the common factor 𝐹. In order to estimate a valid FAVAR, we require our time series to be stationary (Lütkepohl, 2005). Testing for stationarity using an augmented Dickey-Fuller test shows that the time series pertaining to the sales of specific brands at Albert Heijn and 𝐹 are stationary, but the average of 𝑊 –our third time series in the FAVAR-, is not. Plotting the mean of W including the 5% confidence bands shaded grey yields Figure 14 below.

It quickly becomes clear that 𝑊 follows a somewhat cyclical trend, which can be explained by the way the DHFM extracts factors. Namely, the first factor the DHFM extracts, which is our

(30)

30 𝑊, reflects the movement of the first variable it addresses in the estimation, which in our case was the hours of sunshine per day. Movements of the other variable will be relative to the direction of the movement of sunshine. As was to be expected, weeks with high values were during spring and summer, whereas the weeks lower in the figure were in autumn and winter. The non-stationary nature of the levels of 𝑊 implies estimated regressions from VAR involving these levels cannot be trusted (Lütkepohl, H. (2005). To test whether the levels regressions itself are trustworthy, we perform a Johanson test for integration and find second order co-integration of our time series. To overcome this problem, rather than directly estimating a VAR, we first estimate a Vector Error-Correction Model (VECM) for every draw of 𝐹, as suggested by Lütkepohl (2005), and Pfaff (2006). Subsequently, we transform the VECMs back VARs in Rstudio, on the estimates of which our forecasts will be based. We use information criteria to decide on the number of lags to be used and conclude that a single lag fits the data best. Similarly, we also use only one lag for the AR process so that equation 18, as denoted in the previous chapter, reduces to an AR(1) process as follows:

𝑦

𝑡

= 𝛽

0

+ 𝛽

1

𝑦

𝑡−1

+ 𝜖

𝑡

Figure 15 shows the forecast of sales for week 140-169 for Karvan Cevitam in Albert Heijn including 5% confidence bands in black, the actual sales during that period in bold black, and the benchmark AR-predicted sales dashed. Forecasts for other brands have been included in Figure 3 of the appendix. From Figure 15 it is evident that the FAVAR provides a much better prediction than the AR-process. Even though the vast majority of the observed values still fall outside the confidence bands, the FAVAR forecast approximates the actual values much closer than the AR forecast, which quickly seems to tend to zero.

(31)

31 A note on the above figure is that in the estimation of the VECM, the factors 𝐹 and 𝑊 have been treated as endogenous variables to the model. More sensible would be to treat the factors as exogenous variables since we are estimating a VAR model. The reason for this is as follows; in VAR, everything depends on everything in the sense that all variables in the model have a feedback relation with one another. We assume indeed that 𝐹 and 𝑊 influence sales, but not vice versa. Hence, having 𝐹 and 𝑊 as endogenous variables makes little sense. Therefore, we also estimate the FAVAR treating 𝐹 as an exogenous variable. Regrettably, the VAR process does not allow for both variables to be treated exogenously as we would then return to standard autoregression. For this reason, we also estimate a standard autoregression of the following form:

𝑆

𝑡

= 𝛽

0

+ 𝛽

1

𝑆

𝑡−1

+ 𝛽

2

𝐹

𝑡

+ 𝛽

3

𝑊

𝑡

+ 𝜖

𝑡

Plotting the same forecast again gives us Figure 16 and 17, now without the AR

forecast and with confidence band shaded grey. Forecasts for the other brands have been included in Figure 4 and Figure 5 of the appendix.

Figure 15: Sales forecasts of AR and FAVAR for Karvan Cevitam

(32)

32 We observe a clear distinction between Figure 15 and 16. Whereas the forecast from Figure 15 quickly approached a threshold value it would stay at, this forecasts jumps up and down. Generally, the actual data appears to be between confidence bands more often and the forecast seems to sometimes also somewhat capture spikes in the data, although it also seems to forecast spikes that were not matched by spikes in the actual data, which might signal some of the spikes could be attributed to sheer luck. Hence, this graph does not provide enough evidence to conclude on its forecasting capability.

Figure 16: Sales forecasts of AR and FAVAR for Karvan Cevitam – F as exogenous variable

(33)

33 Figure 17 shows that forecasting using standard regression a denoted in (20) produces the best results. Actual data falls between the confidence bands virtually any week and the data forecast seems to accurately capture the spikes in sales. An interesting observation is negative values forecast takes, as can be seen from the horizontal line which is drawn at zero. We suspect that this is due to the factors 𝐹 and 𝑊 having negative values, as they are scaled relative to their mean. In order to quantify the forecasting power of the models numerically, and in line with research by Hyndman & Koehler (2006) on forecasting accuracy measures, we computed the Mean Absolute Percentage Error (MAPE) for each of the models across brands. The results can be found in Table 5 below.

AR FAVAR FAVAR (F = exogenous) Autoregression

Karvan Cevitam 99.81 63.73 55.42 51.45 Private Label 96.48 30.37 34.20 26.56 Raak 96.53 23.12 23.69 28.35 Slimpie 92.00 29.39 18.13 15.19 Teisseire 85.26 104.44 90.84 38.10 Sum 470.08 251.05 222.28 159.65

Inspecting the table a clear picture emerges. It appears that autoregression including the factors outperform FAVAR with F exogenously, which outperforms regular FAVAR with factors endogenously, which ultimately outperforms AR. Therefore, we conclude that, of the model used in this research, standard autoregression appears superior with respect to forecasting sales in this context. We do note, however, that if more variables were included in the FAVAR, allowing for the treating of both 𝐹 and 𝑊 exogenously, a FAVAR could well outperform our best model due to its more comprehensive modeling capability.

5. Conclusion and recommendations

This main aim of this research was to explore the DHFM in a marketing setting, this section will be dedicated to outlining the conclusions that we draw from the obtained results. We estimated a Dynamic Hierarchical Factor Model (DHFM) to decompose lemonade price and advertising data into four different hierarchical levels. In particular, we used a four-level modeling approach with a common level, block-specific price/advertising level, sub-block specific chain level, and the idiosyncratic level pertaining to lemonade brands. Firstly, we used a variance decomposition to lay bare what are the main drivers of variance in price and advertising across chains and brands. We observed significant differences between brands and chains. More specifically, we noted that supermarket chains derive most of their variance in

(34)

34 price and advertising from the brand-specific component, followed by the chain-specific component. An exception to this rule was Albert Heijn, apparently having substantial shares of its variance in price and advertising explained by the price/advertising-specific component. Brands mainly had price and advertising driven by the brand-specific component, expect for Karvan Cevitam and Teisseire, also being seriously driven by the chain-specific component.

Secondly, we conducted an impulse response analysis to analyze how different chains’ price and advertising levels would change following a positive shock to the common factor. Only for Albert Heijn a significant response was found, namely a short spike in prices and a slightly longer dip in advertising. Shocking the block specific price and advertising levels individually showed that a positive shock to the common factor has an equal effect on price to a shock on the price level. Alternatively, advertising naturally responded positively to a shock on the advertising level, whereas this was negatively for a shock on the common factor. Thirdly, the DFHM was used to create a price forecast for each brand at each chain. We conclude that the DHFM, in the given context, is not very well able to predict future prices. In addition, however, we note that including more lags into the model did provide better forecasts. Lastly, we provided a forecast of sales using a Factor Augmented Vector Autoregression (FAVAR), with two factors; the marketing-mix factor and a weather factor, which we validated against the actual data to find that the FAVAR was not capable of capturing movements in sales very well. However, compared to the forecasts and MAPE based on regular Autoregression (AR), the FAVAR performed vastly better, underlining the added value of the addition of the two factors into the model. Furthermore, we explored two other types of modeling; FAVAR with one factor exogenously, and a regular autoregression including both factor. We found that the regular autoregression forecasted sales best, followed by the FAVAR with one factor exogenously, in turn followed by the regular FAVAR and AR.

(35)
(36)

36

6. Limitations and directions for further research

This study has several limitations. Firstly, we ran a simple version of the DHFM, extracting only a single factor for each level of the hierarchy and using only one lag at each level, except for the second price forecast, where we used four lags. Future research could use a more elaborate version of the DHFM by selecting the optimal number of lags using AIC and extracting more than a single factor for each level to better reflect reality. Doing so would undoubtedly improve price forecasts, as we have shown by including four lags, and the reliability of the impulse response analysis, amongst others. In addition, given the variance decomposition and its results, pointing toward sometimes large differences in drivers of variance across chains and brands, we emphasize that a different model structure could add to the usefulness of the DHFM. For instance, sub-block and idiosyncratic level could be switched in order to get a better brand-specific perspective. Alternatively, a valuable extension of the model would be to include data from multiple countries and subsequently structure the model in such a way researchers can observe region or nation-specific differences.

(37)

37

7. References

Bai, J., Li, K., & Lu, L. (2016). Estimation and Inference of FAVAR Models. Journal of Business & Economic Statistics, 34(4), 620–641.

Barker, A., Hawton, K., Fagg, J., & Jennison, C. (1994). Seasonal and weather factors in parasuicide. The British Journal of Psychiatry: The Journal of Mental Science, 165(3), 375–380.

Bernanke, B. S., Boivin, J., & Eliasz, P.(2005) "Measuring The Effects Of Monetary Policy: A Factor-Augmented Vector Autoregressive (FAVAR) Approach," Quarterly Journal of Economics, 387-422.

Danaher, P. J., Bonfrer, A., & Dhar, S. (2008). The Effect of Competitive Advertising

Interference on Sales for Packaged Goods. Journal of Marketing Research, 45(2), 211– 225.

Dehmamy, K., & Halberstadt, A. (2015). Modeling the term structure of interest rates in a dynamic hierarchical factor model. Essays on Bayesian Modeling in Marketing and Economics.

Donovan, R. (1994). Store atmosphere and purchasing behavior. Journal of Retailing, 70(3), 283–294.

Förster, M., Jorra, M., & Tillmann, P. (2014). The dynamics of international capital flows: Results from a dynamic hierarchical factor model. Journal of International Money and Finance, 48, 101–124.

Fast Moving Consumer Goods (FMCG) Market Report 2017-2027. (2017, October). Retrieved from https://www.reportbuyer.com/product/5245479/fast-moving-consumer-goods-fmcg-market-report-2017-2027.html#free-sample

Gupta, R., Jurgilas, M., & Kabundi, A. (2010). The effect of monetary policy on real house price growth in South Africa: A factor-augmented vector autoregression (FAVAR) approach. Economic Modelling, 27(1), 315–323.

Gupta, R., & Kabundi, A. (2010). The effect of monetary policy on house price inflation. Journal of Economic Studies, 37(6), 616–626.

Howell, D. C. (2008). The analysis of missing data. In Outhwaite, W. & Turner, S. Handbook of Social Science Methodology. London: Sage.

Hyndman, R. J., & Koehler, A. B. (2006). Another look at measures of forecast accuracy. International Journal of Forecasting, 22, 679-688.

Kamstra, M. J., Kramer, L. A., & Levi, M. D. (2003). Winter Blues: A SAD Stock Market Cycle. American Economic Review, 93(1), 324–343.

(38)

38 Leeflang, P. S. H., Wieringa, J. E., Bijmolt, T. H. A., & Pauwels, K. H. (2013). Modeling

Markets: analyzing marketing phenomena and improving marketing decision making. New York, NY: Springer-Verlag.

Lütkepohl, H. (2005). New Introduction to Multiple Time Series Analysis. Berlin, Heidelberg: Springer Berlin Heidelberg.

Malhotra, N. K. (2010). Marketing Research: An Applied Orientation. Pearson.

Marcellino, M., Sivec, V. (2016). Monetary, fiscal and oil shocks: Evidence based on mixed frequency structural FAVARs. Journal of Econometrics, 193(2), 335–348.

Moench, E., Ng S., & Potter, S. (2009). Dynamic hierarchical factor models. Staff Reports 412, Federal Reserve Bank of New York.

Moench, E., Ng, S., & Potter, S. (2013). Dynamic Hierarchical Factor Models. Review of Economics and Statistics, 95(5), 1811–1817.

Moench, E., & Ng, S. (2011). A hierarchical factor analysis of U.S. housing market dynamics. The Econometrics Journal, 14(1), C1–C24.

Nijs, V. R., Dekimpe, M. G., Steenkamps, J.-B. E. M., & Hanssens, D. M. (2001). The Category-Demand Effects of Price Promotions. Marketing Science, 20(1), 1–22. Pfaff, B. (2006). Analysis of integrated and cointegrated time series with R. Springer. Pituch, K.A., & Stevens, J.P. (2016). Applied Multivariate Statistics for the Social Sciences:

Analyses with SAS and IBM’s SPSS. New York, NY: Routledge.

Sales value of non-alcoholic drinks in the Netherlands from 1st quarter 2013 to 2nd quarter 2015 (in million euros). (2018). Retrieved from

https://www.statista.com/statistics/330180/sales-value-non-alcoholic-beverages-netherlands-nl-quarterly/

Saunders, E. M., & Jr. (1993). Stock Prices and Wall Street Weather. The American Economic Review, 83(5), 1337-1345. American Economic Association.

Schafer, J.L., & Graham, J. W. (2002). Missing Data: Our View of the State of the Art. Psychological Methods, 7(2), 147-177.

Sims, C. A. (1980). Macroeconomics and Reality. Econometrica, 48(1), 1.

Spies, K., Hesse, F., & Loesch, K. (1997). Store atmosphere, mood and purchasing behavior. International Journal of Research in Marketing, 14(1), 1–17.

Stoupel, E., Abramson, E., & Sulkes, J. (1999). The effect of environmental physical

influences on suicide: How long is the delay? Archives of Suicide Research, 5(3), 241– 244.

(39)

39 Vargas-Silva, C. (2008). The effect of monetary policy on housing: a factor-augmented vector

(40)

40

8. Appendix

Variable N Mean St. Dev. Min Max

UnitsalesPL 845 34882.67 28049.27 5158 180432 PromoPL 845 11.47 35.85 0 242 UnitsalesKC 845 66148.01 98404.75 5555 911867 PromoKC 845 24.99 57.74 0 268 UnitsalesRA 845 32976.81 29850.19 5589 287328 PromoRA 845 8.09 30.43 0 212 UnitsalesSL 845 13516.93 12388.32 1805 80002 PromoSL 845 7.49 31.21 0 200 UnitsalesTS 845 3947.65 4863.93 4 43021 PromoTS 845 2.99 18.30 0 191

Variable N Mean St. Dev. Min Max

AHSales 169 323388.8 160853.1 178409 1045886 AHPromo 169 32.56 65.58 0 268 JUSales 169 253692.5 61930.39 141026 502524 JUPromo 169 52.94 81.79 0 364 PLUSales 169 92093.03 43758.89 49338.3 336703 PLUPromo 169 54.98 77.28 0 268 COSales 169 44907.36 13360.05 26542.1 108082 COPromo 169 73.43 77.77 0 254 EMSales 169 43278.65 18804.86 22262 154732 EMPromo 169 61.24 73.89 0 246

Table 1: Descriptive statistics of the lemonade data aggregated over chains.

(41)
(42)
(43)

43

(44)
(45)
(46)
(47)

47

9. Appendix II

(48)
(49)

49 JUMBO PRICE

(50)

50 PLUS PRICE

(51)

51 COOP PRICE

(52)

52 EMTE PRICE

(53)

53

AH PRICE

AH PROMO

JUMBO PRICE

(54)

54

PLUS PRICE

PLUS PROMO

COOP PRICE

(55)

55

EMTE PRICE

(56)
(57)
(58)

58

(59)

59

(60)

60

Referenties

GERELATEERDE DOCUMENTEN

The proposed extension—which includes the earlier model as a special case—is obtained by adapting the multilevel latent class model for categorical responses to the three-way

The observed period life expectancies for newborn, 45-year old and 80-year old males (blue) and females (pink) from our (solid) and the Actuarieel Genootschap ( 2016 )

Because the idiosyncratic variance evidently is an important driver for the product return rate variance, we have evaluated the dynamics of the return reasons within blocks and

Exploring the Use of a Dynamic Hierarchical Factor Model in a Marketing Context: Toward a New Way of Modeling in Marketing

When latent factors caused by general dynamics of price promotion development are found with the help of a Dynamic Hierarchical Factor Model, it is possible to discover if there are

After assessing the variance in each block, an impulse response analysis was conducted in order to see how an economy exogenous shock affects the price promotions of different

The FAVAR model was applied to predict the sales performance of individual brands, different price segments, and the deodorant market.. The common F was extracted from the previous

DUTCH SUPERMARKET DATASET PRICE PROMOTION SALES • Adjust the odd value of Vogue in Edah – mean repalce • Calculate the price promotion • Replace negative value of price promotion