• No results found

DYNAMIC HIERARCHICAL FACTOR MODEL (DHFM) APPLIED IN THE DUTCH LEMONADE MARKET

N/A
N/A
Protected

Academic year: 2021

Share "DYNAMIC HIERARCHICAL FACTOR MODEL (DHFM) APPLIED IN THE DUTCH LEMONADE MARKET"

Copied!
40
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

DYNAMIC HIERARCHICAL FACTOR MODEL (DHFM)

APPLIED IN THE DUTCH LEMONADE MARKET

An application of the DHFM to the prices of lemonade in The

Netherlands.

Master thesis, MSc Marketing, specialisation Marketing Intelligence

University of Groningen, Faculty of Economics and Business

Author:

Supervisor:

Colin Kruishaar

Dr. K. Dehmamy

S3274136

University of Groningen

Zandsteenlaan 55

9743 TJ Groningen

Second supervisor:

Tel: +31 614814622

Prof. Dr. J.E. Wieringa

(2)

PREFACE

When I was 17 years old, I finished secondary school and decided not to go to university. That was one of my biggest mistakes in my life. Despite of a big detour, I still did manage to reach university and passed all the exams.

It feels like a long time ago, but in February 2018 I have started this bumpy ride called writing a master thesis. I really wanted to work with programs like R and MATLAB, so it was easy to decide to enroll for this topic, despite the fact I had never heard of a Dynamic Hierarchical Factor Model before. As always it was hard for me to have a flying start, but after a while I became more involved in the topic and even experienced a lot of fun during all the hours of hard work.

I want to thank a lot of people, but foremost my supervisor dr. Keyvan Dehmamy. Without his help, I would not have been able to write this thesis. Keyvan was always accessible for questions. If something was not easy to answer digitally, there was the possibility to make an appointment at his office.

The second person I would like to thank is my mother Karin. Since I was a little boy she has always supported me. When I doubted about my future, she talked to me and told me I have to chase my dream, whatever it costs.

I also appreciate all my friends who sympathized with me. Sometimes just a little remark as “you can do it” gave me that extra boost to continue working on my thesis.

Finally I also want to thank prof. dr. Jaap Wieringa for reading my thesis and for being my second supervisor.

Groningen, July 24 2018

(3)

TABLE OF CONTENTS

1. Introduction 4

2. Literature overview 6

2.1 Dynamic Hierarchical Factor Model 6 2.2 Importance of analysing common dynamics 6

2.3 Structure of the model 7

2.4 Dynamics 8

2.5 Predictions 8

2.6 Shock in the market 8

3. Research design 10

3.1 Data 10

3.2 Dynamic Hierarchical Factor Model 13

3.3 Estimation 15

3.4 Factor Augmented Vector Auto Regression 15

4. Results 17

4.1 Common factors 17

4.2 Variance decomposition 18

4.3 Impulse response 20

4.4 Forecasting 22

4.5 Factor Augmented Vector Auto Regression 24

5. Conclusion 28

6. Limitations and further research 29

References 30

Appendices 33

A. Forecasts for each brand at each chain 33

(4)

1. INTRODUCTION

In early prehistoric times, people needed to hunt for food and search for drinkable liquids in order to survive. Nowadays people still cannot live without liquids. However, people do not need to hunt anymore. In the twenty-first century consumers do have a lot of choices and do not always like to drink water only. One of the most often chosen drink is a soft drink. A soft drink can be defined as a non-alcoholic drink, especially one that is carbonated (Oxford, 2018). The soft drink category is one of the biggest consumer markets. The soft drink category consists of a lot of different beverages. First of all one might think of coke or fruit carbonates. However, also lemonade is part of this product category. The highest share in this category is coke with 45% followed by fruit carbonates (40%) and lemonade (15%) (Trinh, 2009).

In this paper the focus is on lemonade in The Netherlands. To be more precise five different lemonade brands are investigated within five different well-known Dutch supermarkets. Soft drinks are very popular in The Netherlands as it is the second highest consumed non-alcoholic drink after coffee. Consumers in Europe drank 15.8 liters of lemonade per capita on average (FWS, 2017). The Dutch drunk almost the double of his fellow European: 29.5 liters on average, which results in a total of 501.9 million liters of lemonade sold in The Netherlands in 2016 (FWS, 2017). Although lemonade is consumed slowly, it is bought regularly. This means consumers plan to buy lemonade, therefore retailers try to attract these consumers with promotions (Fader & Lodish, 1990). Besides outside-store promotion for the brand, e.g.

advertisements in a newspaper, retailers can promote their products with displays in store as well.

Because of the competition and popularity of lemonade in The Netherlands, it is interesting to explore the dynamics of prices within this product category. In The Netherlands, producers and suppliers do not have a lot of power in negotiations with supermarkets (SOMO, 2017). Supermarkets increase their stores enormously and join each other in purchase groups in order to enforce big discounts. Therefore it is

(5)

over time? Do they differ among the supermarkets? And are variations caused by common dynamics? Prices do change over time, but what is interesting for e.g. managers is to discover what causes these movements. It can be because of a shock in the market, for instance supermarket chains that become bigger. Differences among supermarkets are useful for negotiations for producers. Moreover, the discovery of common movements can give powerful information for producers in order to determine their (price) strategy.

The fundament of this analysis is a times-series data set containing 169 data points (weeks) prices of five lemonade brands in five supermarket chains. When latent factors caused by general dynamics of price promotion development are found with the help of a Dynamic Hierarchical Factor Model, it is possible to discover if there are any differences among price promotions in the entire market, at chain level and specific brand level. Furthermore sales predictions are being made with the help of a Factor Augmented Vector Autoregressive analysis. The fundament of this analysis is the common Factor F and a lagged amount of sales.

In this paper, the researcher discovered some interesting findings. A three-level Dynamic Hierarchical Factor Model revealed that common movements occur in the Dutch lemonade market. However, there are also differences among the behaviour of prices on chain- and brand level. For instance, one of the chains is more dependent of movements on market level than others. Therefore supermarket chains and lemonade brands do react differently on a shock in the entire market. Moreover, the usage of a Factor Augmented Vector Auto Regression happens to be the best method in order to predict sales.

In chapter 2 a literature overview about this topic is presented. Chapter 3 includes the data description and the methods, which are used. Chapter 4 contains all the results. Finally the fifth and sixth chapter are filled with conclusions and limitations.

(6)

2. LITERATURE OVERVIEW

2.1 Dynamic Hierarchical Factor Model

The model used in this paper to answer these questions is a Dynamic

Hierarchical Factor Model (DHFM). Translating the amount of dimensions within a data set into factors is useful for forecasting purposes. DHFM allows the researcher to analyze a dataset while making use of the structure of the data set. The DHFM is intended to discover latent factors in a certain dataset. The advantage of this method in comparison with a regular factor analysis is the fact a DHFM takes hierarchy in the structure into account (Moench et. al, 2009). With the aid of the DHFM the researcher can investigate whether prices react differently on industry-level, chain-level or brand-level. DHFM is a factor model that captures between and within block variations in data sets using common and block-specific factors. In other words the data will be split into a small number of blocks, which are analyzed in a three level model (Moench et. al, 2009). The first level will be block specific, the second level subblock-specific and the third level idiosyncratic. According to Moench et. al. (2009) within-block variations are caused by block-level factors and the between-block variations by common factors. This structure allows for covariations, which cannot be considered fully as common factors in a parsimonious way.

Moench et al. (2013) developed a four-level DHFM and used this for a data set containing 447 economic time series in the United Stats in order to distinguish common movements from movements on block level. Their model reduced the data and allowed for heterogeneity between blocks. With this model, Moench et al. were able to proof that during the recession at the end of the previous decade several economic activities reacted differently. Moreover, they found out that the recovery process of this recession also differs between the different economic activities.

2.2 Importance of analysing common dynamics

(7)

that allowed for global and country-specific factors. They used a previous yield curve model of Nelson-Siegel (1987) and Diebold-Li (2006) and transformed it into a hierarchical model. This model indicated that global yield level factors accounted for variance of country bond yields dynamics. More in detail, Diebold et al. found out that dynamics of the German yield level matched closely to the common global dynamics, while Japanese dynamics showed the opposite behaviour. In other words, the global and German yield curves have the same dynamics.

Diebold et al. also compared the common and idiosyncratic dynamics over time. All country-specific dynamics moved more towards the global dynamics overtime, which could be clarified by the fact of globalization of the world economy. Therefore, common dynamics are very useful to discover whether they influence dynamics in levels beneath. In our case the question will be if common dynamics in the Dutch lemonade market exist and if they do so, if they can influence the dynamics on chain- and brand-specific level.

2.3 Structure of the model

One way in order to generate values for the factors and parameters of the model is a simple principal component analysis (PCA). It is a relatively old method and first published by Pearson more than 100 years ago. PCA is a method with reducing dimensionality of data as goal, while retaining as much variance as possible (Jollife, 2011). This PCA will generate the initial values for the factors and

(8)

2.4 Dynamics

The reason why it is necessary to include dynamics in the model and not use solely a hierarchical factor model (HFM) is the fact these models without dynamics can forsake information (Byrne et. al, 2017). When prices are influenced by

fundamental variables or consist of important heterogeneity, the extracted factor of a regular HFM will probably not fully reflect the dynamics of the prices (Moench et al, 2013). Byrne et al. used a dynamic hierarchical factor model in order to decompose the price of commodities. They created a four-level model in which they put common, sectoral, sub-sector and idiosyncratic components. Byrne et al. found evidence of co-movements in the commodity prices and identified a common factor. Therefore they conclude it is important to model common variations at different levels, because different levels can share comparable trends, but also can contain of heterogeneity.

2.5 Predictions

In this paper, sales of the different brands will be predicted. One way to do that, is using a multiplicative model such as the SCAN * PRO model, which is useful for unit-by-unit estimations. It is created in order to discover short-term effects of price promotions and advertising expenditures on sales (Wittink et al, 1988). However, the SCAN * PRO model does account for interaction effects and the parameters are elasticities. When using the SCAN * PRO model, the dynamic effects of a price cut are not shown. This is a real disadvantage, since it is expected that after a price promotion, a post-promotion dip will occur when consumers bought a lot of the promoted product (Van Heerde et al, 2002). Moreover, the researcher is not only interested in short-term effects of a price cut since the Dynamic Hierarchical Factor Model resulted in extracted factors there is a need to include dynamic effects in the predictions. The researcher wants to investigate when there are common movements in the entire market or on chain level, what the influence is on the sales in the future.

(9)

extracted factors that contain common dynamics in order to make predictions. FAVAR models are used for forecasting purposes and model variables as the sum of the different components: common and idiosyncratic. The factors proceed from a Vector Autoregressive (VAR) process.

2.6 Shock in the market

During the period of the data set after 92 weeks, one major supermarket in The Netherlands closed their doors and most of the stores fell into the hands of one

competitor. An extra insight can be generated if the variations in price movements before the closure of this supermarket and growth of one of the other chains are being compared with variations afterwards. Previous research showed when Wal-Mart entered the grocery market they became quickly the largest grocer in the United States in 2002. Wal-Martiers prices are about 10% lower than its competitors. As a

(10)

3. RESEARCH DESIGN 3.1 Data

The data used in this paper is provided by a global information and

measurement company. The dataset which will be used in this research contains 169 weekly data points in a time-series format. The dataset starts at week 41 of 2013 and ends at week 52 of 2016. All the brands and chains are known by the researcher, although due to the confidential character of the information all brands and chains are anonymized. Originally six different lemonade brands and six different chains were available in the dataset, with information about the total sales, base price, price per unit and advertising activities. According to figure 1 there are two low-end brands which are under the overall mean price, two high-end brands which are more expensive and two brands that are close to the overall mean price.

Figure 1: The prices of the six different lemonade brands Figure 2:Market shares of all the six

across all supermarkets chains.

(11)

The descriptive statistics displayed in table 1 show the main characters of the five brands used in this dataset across all the five chains.

Brand N Mean Price SD Median Price Maximum Price

1 845 €2.684 0.239 €2.726 €2.976

2 845 €1.859 0.296 €1.761 €2.54

3 845 €1.222 0.097 €1.228 €1.406

4 845 €1.801 0.107 €1.81 €2.048

5 845 €2.638 0.165 €2.661 €3.037

Table 1: Descriptives of the prices per unit of each lemonade brand

There are a total of 4,225 observations in this dataset containing 845 for each brand. These 845 observations are further split into 169 observations per brand at one specific chain. The total sales are available for every brand per chain on a weekly basis. For prediction purposes (FAVAR analysis) the weekly total sales per brand are aggregated and normalized for anonymization.

(12)

As figure 3 clarifies brand 1 can be described as the market leader with the most sales across almost all 169 weeks. Brand 2 and brand 3 are mid-level players in the Dutch lemonade market and brand 4 and brand 5 represent the smallest part. Except the differences in sales among the brands, another remarkable point is the pattern of the sales. Enormous peaks are occurring due to price promotions and other commercial activities.

Finally, there has been investigated whether there are any common dynamics within the data set. A principal component analysis (PCA) is the solution of this subject. PCA describes if observations are described by inter-correlated quantitative dependent variables and displays patterns of similarity of within the data (Abdi, 2010). Table 2 shows the results of the PCA of all the brands of all the chains. The first two factors explain 69% of the variance. Moreover, when the third factor is added, a percentage of 87% of the variance is explained, which confirms there are common dynamics in all the prices.

PC1 PC2 PC3 PC4 PC5 Standard deviation 0.1278 0.0085 0.0079 0.0489 0.0459 Proportion of Variance 0.4777 0.2102 0.1809 0.0698 0.0615 Cumulative Proportion 0.4777 0.6879 0.8688 0.9385 1

Table 2: Principal component analysis of all the brands of all chains

(13)

Chain: PC1 PC2 PC3 PC4 PC5 1 0.4646 0.7109 0.8733 0.94282 1 2 0.4981 0.7248 0.8667 0.94513 1 3 0.589 0.7522 0.8855 0.96773 1 4 0.5231 0.8136 0.89419 0.96734 1 5 0.5295 0.8052 0.9041 0.96341 1

Table 3: Cumulative proportions of variance of all the brands in each chain separately.

3.2 Dynamic Hierarchical Factor Model

(14)

Level 1 Level 2 Level 3 The Dutch lemonade market

(F) Chain 1 (𝑮𝟏) Brand 1 (𝒙𝟏𝟏) Brand 2 (𝒙𝟏𝟐) Brand 3 (𝒙𝟏𝟑) Brand 4 (𝒙𝟏𝟒) Brand 5 (𝒙𝟏𝟓) Chain 2 (𝑮𝟐) Brand 1 (𝒙Brand 2 (𝒙𝟐𝟏𝟐𝟐) ) Brand 3 (𝒙𝟐𝟑) Brand 4 (𝒙𝟐𝟒) Brand 5 (𝒙𝟐𝟓) Chain 3 (𝑮𝟑) Brand 1 (𝒙𝟑𝟏) Brand 2 (𝒙𝟑𝟐) Brand 3 (𝒙𝟑𝟑) Brand 4 (𝒙𝟑𝟒) Brand 5(𝒙𝟑𝟓) Chain 4 (𝑮𝟒) Brand 1 (𝒙𝟒𝟏) Brand 2 (𝒙𝟒𝟐) Brand 3 (𝒙𝟒𝟑) Brand 4 (𝒙𝟒𝟒) Brand 5 (𝒙𝟒𝟓) Chain 5 (𝑮𝟓) Brand 1 (𝒙Brand 2 (𝒙𝟓𝟏𝟓𝟐) ) Brand 3 (𝒙𝟓𝟑) Brand 4 (𝒙𝟓𝟒) Brand 5 (𝒙𝟓𝟓)

Table 4 Structure of the three level Dynamic Hierarchical Factor Model with five blocks

The structure of the DHFM is shown in table 4. The model consists of five blocks (B=5). One block represents one of the supermarket chains. Every block contains five time series, representing the lemonade brands (N!=5). Consequently the model consists of 25 time series (N=N!* B) in total with 169 observations (T=169). In terms of equations, this model can be displayed as followed:

1 𝑋!"# = 𝛬!!" 𝐿 𝐺!" + 𝑒!!"# 2 𝐺!" = 𝛬!! 𝐿 𝐹!+ 𝑒!!"

3 ψ!.! 𝐿 𝐹!" = ϵ!"#.

(15)

at time t with the polynomial of lag order L. Lastly, 𝑒!!"# are the idiosyncratic

residuals. The block factors of block b at time t are again denoted 𝐺!" in the second equation. The common factor F and the variances all blocks share with each other is denoted 𝛬!! 𝐿 𝐹! with the polynomial of lag order L. The components that consist of

chain specific variations are denoted as 𝑒!!". The last equation represents the top level

of the hierarchy, the variations of all factors in the Dutch lemonade market are denoted as 𝐹!.

With this model, the research is able to determine whether prices of the all brands in all chains follow common dynamics, chain specific or brand specific dynamics. The variations can be than decomposed into shares of F, G and X with variance

decomposition. Moreover, with the help of an impulse response analysis the reaction of prices to an exogenous shock in the Dutch lemonade market can be measured on market-, chain -and brand-level.

3.3 Estimation

According to Moench et. al (2011) the Markov Chain Monte Carlo (MCMC) with Gibbs sampling algorithm is a proper method to use for estimation. MCMC is an iterative process that draws coefficients and factors. The Gibbs sampling algorithm estimates the latent dynamic factors. In this paper there has been chosen for three common factors: K! = 3. The blocks do all have one specific factor K!,! = 1. The lag order in this research is two, q!,!,! = 2. In order to get a useful sample, 100,000 draws are made. The first 50,000 draws are meant to burn in and are not used in this

analysis. From the second 50,000 draws every 50th observation is saved. This results in 1,000 draws for each parameter at each point in time.

3.4 Factor Augmented Vector Autoregressive Model

(16)

solely data for 𝑌! will not completely capture additional data that is necessary to model the dynamics correctly. However, the information that is essential in modeling the dynamics can be used when 𝐹! is a K x 1 vector of the unobserved factors with K being small. The next expression explains the joint variations of (𝐹!, 𝑌!):

𝐹

!

𝑌

!

= Φ 𝐿

𝐹

!!!

𝑌

!!!

+ 𝜈

!

(17)

4. RESULTS

In this chapter the outcomes of the Dynamic Hierarchical Factor Model are presented. This DHFM consists of three levels, the upper level represents the common dynamics in the whole Dutch lemonade market, the middle level consists of five blocks each of them corresponding to a supermarket chain and the last level

represents all the prices per unit of each lemonade brand (in €) at all the supermarket chains with N = 25 and T = 169. The iteration process of the Markov Chain Monte Carlo and the Gibbs sampler made 100,000 draws. The first 50,000 are not used because of burn-in purposes. Of the second 50,000 draws, every 50th draw is stored, which resulted in 1,000 draws that can be used for posterior analysis.

4.1 Common factors

Since there has been chosen to set K! = 3, figure 4 shows the three common factors F. These are realized through extracting values for F, the distribution that is been sampled with the Gibbs sampling method. The lag order that is used for the equations is two (q! = 2 & l! =2).

(18)

The graphs represent the common dynamics of the price per unit of the lemonade brands across all supermarket chains with the 5% confidence bands. The first two common factors show a decline around week 92, when one of the supermarket chains disappeared from the market and have been taken over through the second biggest supermarket chain. This finding confirms the theory earlier mentioned of Basker and Noel (2007). According to them, when one big player in the market lowers its prices, others will follow. Another interesting result can be discovered in the bottom two graphs. Around week 140, a steep decline occurred. This can be explained by the time of the year at that data point: June 2016, or in other words the start of the summer.

4.2 Variance decomposition

The variance decomposition is an analysis, which is mainly used in

economics, for instance Campbell (1990) used it in order to find out what moves the stock market. In our case the variance decomposition delivers an interesting insight of what the impact is of each level of the hierarchical model on the prices per unit of lemonade. The variance decomposition will determine which importance to each level is assigned. The following equation displays the weighted sum of the variances of the common, block- and variable-specific components (Moench et. al, 2013):

𝑉𝑎𝑟 𝑋!" = 𝛾! 𝑉𝑎𝑟 𝐹 + 𝛾! 𝑉𝑎𝑟 𝑒!" + 𝛾!(𝑉𝑎𝑟(𝑒!"#))

Moreover, all shares can be defined as:

𝑆ℎ𝑎𝑟𝑒! = 𝛾! 𝑉𝑎𝑟 𝐹 /𝑉𝑎𝑟 𝑋!" 𝑆ℎ𝑎𝑟𝑒! = 𝛾! 𝑉𝑎𝑟 𝑒!" /𝑉𝑎𝑟 𝑋!" 𝑆ℎ𝑎𝑟𝑒! = 𝛾! 𝑉𝑎𝑟 𝑒!"# /𝑉𝑎𝑟 𝑋!"

(19)

Chain 𝑆ℎ𝑎𝑟𝑒! 𝑆ℎ𝑎𝑟𝑒! 𝑆ℎ𝑎𝑟𝑒! 1 0.1622 (0.0317) 0.1441 (0.0231) 0.6937 (0.0472) 2 0.0180 (0.0094) 0.4905 (0.0431) 0.4915 (0.0420) 3 0.0419 (0.0211) 0.3287 (0.0615) 0.6294 (0.0626) 4 0.0328 (0.194) 0.3462 (0.0643) 0.6210 (0.0679) 5 0.0220 (0.01) 0.3240 (0.0343) 0.6540 (0.035) Table 5: The variance decomposition

In table 5 and figure 5, the results of the variance decomposition are given. The brand-specific component is most responsible for price per unit changes in every chain. However, they do differ much among the chains from 49.15% for chain 2 to 69.37% for chain 1. Moreover it is remarkable to detect that mainly chain 1 is being influenced by the common factor with 16.22%, while the others range from 1.8% to 4.19%. Chain 2 appears to be the chain with the highest independence concerning price determination. While chain 1 and chain 2 show different results, chain 3, 4 and 5 are much more similar to each other.

(20)

4.3 Impulse response

After the variance of each block has been obtained, an impulse response analysis can be conducted. This will show how a shock in the entire lemonade market will influence the chain- and brand specific factors, in this case the price per unit of lemonade. In this impulse response analysis, the first factor being positively shocked with one standard deviation.

Figure 6: Impulse response of the common factor with shock on the first common factor

In figure 6 it is clear that a positive shock results in a positive response in the common factor. This positive effect disappears after approximately five weeks.

Chain 1 Chain 2 Chain 3 Chain 4 Chain 5

Figure 7: Impulse response of the five different blocks with shock on the first common factor

(21)

Brand 1 Brand 2 Brand 3 Brand 4 Brand 5

Figure 8: Impulse response of each brand from each chain with shock on the first common factor

On brand-level, more positive responses are found. Chain 1, which was most

depended of the common factor as shown in the variance decomposition, shows a lot of positive responses. Especially brand 1, 3 and 5 respond positively after a shock on the first common factor, which results in a higher price per unit of these brands within

Chain 1

Chain 2

Chain 3

Chain 4

(22)

chain after a shock in the common factor. This effect disappears as fast as the positive response of the common factor did, after five weeks. Most of all the other brands show no or very little response to a positive shock and therefore prices per unit will not be influenced strongly.

4.4 Forecasting

Finally, a forecast for the prices per unit per brand can be conducted with the extracted factors of the common movement F. To be precise, the first 100 weeks are used to make a forecast for the period after which goes 69 weeks ahead.

Figure 9: Forecast of the common movement with 95% confidence bands

Figure 10: Forecast of five different blocks with 95% confidence bands Chain 1

Chain 2

Chain 3

Chain 4

(23)

The forecast of the common movement is being shown in figure 9, figure 10 shows the block-specific forecasts. The predictions made do not include a shock as in the impulse response analysis. After six weeks, the common movement stabilizes at zero. Taking a look at the chains, it is remarkable that the third, fourth and fifth chains show the most jigsaw behavior, which implicates a lot of price changes. Chain 1 tends to develop its price per unit around 0. Also interesting is the forecast of chain 2, which seems to stay along time negative and therefore lowers its prices.

Figure 11: Forecast of Brand 1, Brand 2, Brand 3, Brand 4 and Brand 5 at Chain 2

More in detail in figure 11 are the five different brands at chain 2. The purple line represents the actual data points, the red line is the forecast and the yellow and blue lines are the 90% confidence bandwidths. The purple line, except for brand 1, stays almost the whole time within the confidence bandwidth, which means the predictions are very accurate. The forecasts for all the other brands at every chain can be found in appendix A. Although a lot of the predictions made fit within the confidence

(24)

4.5 Factor Augmented Vector Auto Regression

The extracted common factor of the DHFM can be used for more analyses. This paper will use it to perform a Factor Augmented Vector Autoregression. FAVAR is the combination of a Vector Autoregression and a factor. The FAVAR analysis in this case will give a prediction of sales for each brand, based on the extracted

common factor of the DHFM analysis. This addition of the factor will give more information and produce better predictions than a regular autoregression (AR), which will be proven by predicting sales for all brands with an AR and FAVAR.

(1) 𝑌! = 𝑎!+ 𝑎 ∗ 𝑌!!!+ 𝑏 ∗ 𝐹!!!+ ε!

2 𝐹! = 𝑐!+ 𝑐 ∗ 𝑌!!!+ 𝑑 ∗ 𝐹!!!+ ε!

Above are the two equations which are the fundament of the FAVAR analysis and which will result in four estimated coefficients (Bernanke et al., 2005). Consequently histograms for every brand with the distribution from the posterior are created from the coefficients.

In order to determine the best model fit and perform the best predictions, a VARX is conducted with the factor F as an exogenous variable. With this method, the AIC scores of the different amount of lags are calculated and the optimal amount of lags can be determined.

1 lag 2 lags 3 lags 4 lags 5 lags 6 lags 7 lags Brand 1 -6.189 -6.1355 -6.127 -6.129 -6.105 -6.066 -6.063 Brand 2 -6.673 -6.631 -6.683 -6.679 -6.721 -6.664 -6.734 Brand 3 -6.981 -7.061 -7.097 -7.070 -7.114 -7.095 -7.116 Brand 4 -6.467 -6.422 -6.477 -6.431 -6.431 -6.391 -6.515 Brand 5 -6.619 -6.696 -6.704 -6.676 -6.639 -6.599 -6.589

Table 6: The AIC scores of the different amount of lags

(25)

Brand 1

𝒀𝒕 𝑭𝒕

𝒀𝒕!𝟏 0.03609 -0.06798

𝑭𝒕!𝟏 -0.15623 0.31214

Constant 0.15423 0.05587

Table 7: The FAVAR coefficients for brand 1 across all chains

Figure 12: The histograms of the four coefficients of brand 1

In this FAVAR analysis, the first 140 weeks are used in order to retrieve the

coefficients and distributions shown in table 7 and figure 12, in this case for brand 1. Although there are 29 data points available for predictions, there will be 28

(26)

Figure 13: Predictions and observed values for sales of brand 1

First of all, the bold line represents the predicted sales of brand 1 by the FAVAR and the bold dashed lines, which are very close to that bold line, are the 95% confidence interval. The bottom dashed line, which reaches quickly zero, represents the AR predictions. Lastly, the normal line represents the observed values. Although the observed values do not fit within the confidence interval perfectly, it is clear that the predictions produced by the FAVAR analysis perform much better than the regular AR. Moreover, the predictions of the AR do reach very fast which is not realistic in the real world.

In appendix B, all the other coefficients, histograms and predictions can be found for brand 2, 3, 4 and 5. All these graphs show the same tendency where the FAVAR predictions are far more accurate than those that are produced with the regular AR.

(27)

(28)

5. CONCLUSION

The purpose of this thesis was to examine the movement of prices per unit for the Dutch lemonade market with the help of a Dynamic Hierarchical Factor Model. The choice for using a DHFM is mainly based on the fact that this model enables to analyze the variances of the prices per unit on three different levels: common (the whole lemonade market) block-specific (chain) and idiosyncratic (brand). The first interesting finding is that the common movement showed a decline after 92, when one of the chains disappeared and one of the chains became larger. The variance

decomposition of the chains also gave an interesting insight. For chain 1 an enormous amount of 16% of the variance was effected by the common factor. Chain 2 did have the most freedom in determining their prices, with 49% of the variance declared by the chain-specific level.

Next, the impulse response analysis delivered some extra insights of the findings of the variance decomposition. As it was clear that chain 1 was affected strongly by the common factor, a positive shock of this factor resulted in a big positive response of the prices of brand 1, brand 3 and brand 5 in chain 1. All the other brands at the other chains did not respond very actively to this shock, which is no surprise since

idiosyncratic movements mainly explain their variance. Forecasts of price movements that have been made were relatively accurate and apart from peaks did fit in the within the confidence bandwidths. Moreover, it was interesting to discover that for chain 2, prices movements were predicted negatively for the majority of the time period.

The Factor Augmented Vector Auto Regression was used to predict sales for all the five brands in order to prove that using the extracted common factor of F during the DHFM outperforms a regular AR. Although this model is still quite simple with only one extracted common factor, it performed much better than the regular auto

regression. Besides this, brand 2, 3 and 4 are better to predict when a seven period lag is used. Brand 5 performs better with a three period lag.

(29)

6. LIMITATIONS AND FURTHER RESEARCH

This thesis made use of one method that was available, while there are many more applicable for this topic. It might be interesting to make a bigger comparison, for instance with other prediction methods. Besides the method used, it also

reasonable to take a closer look at the data. Only five supermarket chains and five lemonade brands are taken into account in this research. Although the biggest competitors were present, there are still some left in the market, which were not part of this research.

Moreover, there are a lot of substitution goods available that might have an impact on price changes of lemonade. In this thesis, only the prices per unit are being used, which only covers discounts in terms of euros. Other variables like advertising or weather circumstances can have an influence as well. One could even think about using economic growth numbers.

Furthermore, in this thesis a three-level model is being used. A Dynamic Hierarchical Factor Model gives researchers the opportunity to even use a fourth-level model, which could give extra insights about how movements arise. This DHFM has been conducted with two lags. In order to get the optimal model, the number of factors at each level of lags should be examined and the information criteria should be

compared.

(30)

REFERENCES

Abdi, H., & Williams, L. J. (2010). Principal component analysis. Wiley interdisciplinary reviews: computational statistics, 2(4), 433-459.

Bai, J., Li, K., & Lu, L. (2016). Estimation and inference of FAVAR models. Journal of Business & Economic Statistics, 34(4), 620-641.

Basker, E., & Noel, M. (2009). The evolving food chain: competitive effects of Wal‐ Mart's entry into the supermarket industry. Journal of Economics & Management Strategy, 18(4), 977-1009.

Belviso, F. & Milani, F. (2006). Structural Factor-Augmented VARs (SFAVARs) and the Effects of Monetary Policy. Topics in Macroeconomics, 6(3), Article 2

Bernanke, B. S., Boivin, J., & Eliasz, P. (2005). Measuring the effects of monetary policy: a factor-augmented vector autoregressive (FAVAR) approach. The Quarterly journal of economics, 120(1), 387-422.

Byrne, J. P., Sakemoto, R., & Xu, B. (2017). Commodity Price Co-movement: Heterogeneity and the Time Varying Impact of Fundamentals.

Campbell, J. Y. (1990). A variance decomposition for stock returns (No. w3246). National Bureau of Economic Research.

Diebold, F. X., Li, C., & Yue, V. Z. (2008). Global yield curve dynamics and interactions: a dynamic Nelson–Siegel approach. Journal of Econometrics, 146(2), 351-363.

(31)

Definition of soft drink. (Oxford). Retrieved from

https://en.oxforddictionaries.com/definition/us/soft_drink

Eickmeier, S., Lemke, W., & Marcellino, M. G. (2011). Classical time-varying FAVAR models-Estimation, forecasting and structural analysis. Discussion Paper Series 1: Economic Studies Deutsche Bundesbank, No 04/2011

Fader, P.S. & Lodish, L.M. (1990). A Cross-Category Analysis of Category Structure and Promotional Activity for Grocery Products. Journal of Marketing, 54(4), pp. 52-65

Grupta, R., Jurgilas, M. & Kabundi, A. (2010). The effect of monetary policy on real house price growth in South Africa: A factor-augmented vector autoregression (FAVAR) approach. Economic Modelling, 27(1), pp. 315-323

Van Heerde, H. J., Leeflang, P. S., & Wittink, D. R. (2002). How promotions work: SCAN* PRO-based evolutionary model building. Schmalenbach Business Review, 54(3), 198-220.

Jolliffe, I. (2011). Principal component analysis. In International encyclopedia of statistical science (pp. 1094-1096). Springer, Berlin, Heidelberg.

Moench E., Ng, S. & Potter, S. (2009). Dynamic Hierarchical Factor Models. Federal Reserve Bank of New York Staff Reports, 412, pp. 1-44

Moench, E., Ng, S., & Potter, S. (2013). Dynamic hierarchical factor models. Review of Economics and Statistics, 95(5), 1811-1817.

(32)

SOMO (2017). Eyes on the price: International supermarket buying groups in Europe. SOMO Paper | March 2017 Retrieved from:

https://www.somo.nl/wp-content/uploads/2017/03/Eyes-on-the-price.pdf

Trinh, G., Dawes J.. & Lockshin, L. (2009). Do Product Variants Appeal to Different Segments of Buyers within a Category? Journal of Product and Brand Management, 18(2), pp. 95-105

Widaman, K. F. (1993). Common factor analysis versus principal component

analysis: Differential bias in representing model parameters?. Multivariate behavioral research, 28(3), 263-311.

(33)

APPENDIX A: Forecasts for each brand at each chain

(34)

APPENDIX B: Results of the FAVAR analysis Brand 2 𝒀𝒕 𝑭𝒕 𝒀𝒕!𝟏 0.59476 0.02362 𝑭𝒕!𝟏 -0.10181 0.31826 𝒀𝒕!𝟕 0.59143 -0.13577 𝑭𝒕!𝟕 0.02347 0.07703 Constant 0.11191 0.07703

Table 8: The FAVAR coefficients for brand 2 across all chains

(35)

Brand 3 𝒀𝒕 𝑭𝒕 𝒀𝒕!𝟏 0.617964 -0.04328 𝑭𝒕!𝟏 -0.23286 0.29296 𝒀𝒕!𝟕 0.61423 -0.21467 𝑭𝒕!𝟕 -0.04190 0.28623 Constant 0.10393 0.09713

Table 9: The FAVAR coefficients for brand 3 across all chains

(36)

Figure 18: The histograms of the four coefficients of brand 3 with a seven period lag

(37)

Brand 4 𝒀𝒕 𝑭𝒕 𝒀𝒕!𝟏 0.49515 -0.04979 𝑭𝒕!𝟏 -0.3235 0.28383 𝒀𝒕!𝟕 0.49393 -0.3456 𝑭𝒕!𝟕 -0.05043 0.2721 Constant 0.16644 0.1511

Table 10: The FAVAR coefficients for brand 4 across all chains

(38)

Figure 21: The histograms of the four coefficients of brand 4 with a seven period lag

Figure 22: Predictions and observed values for sales of brand 4

(39)

Brand 5 𝒀𝒕 𝑭𝒕 𝒀𝒕!𝟏 0.646786 0.001144 𝑭𝒕!𝟏 -0.3944 0.2460 𝒀𝒕!𝟑 0.642979 -0.3760 𝑭𝒕!𝟑 0.002132 0.2412 Constant 0.083399 0.1290

Table 11: The FAVAR coefficients for brand 1 across all chains

Figure 23: The histograms of the four coefficients of brand 5 with a one period lag

(40)

Figure 24: The histograms of the four coefficients of brand 5 with a seven period lag

Figure 25: Predictions and observed values for sales of brand 5

Referenties

GERELATEERDE DOCUMENTEN

Because the idiosyncratic variance evidently is an important driver for the product return rate variance, we have evaluated the dynamics of the return reasons within blocks and

This model will function to decompose lemonade price- and advertising data in a large panel of chains and brands into (i) a common factor

In this paper, a Dynamic Hierarchical Factor Model will be used to analyse the movement of the regular price of deodorants at the common, block-specific

Some results of analysis consistency with the previous literature research: high price brands reacting actively on enhance promotion depth when industry-wide price

After assessing the variance in each block, an impulse response analysis was conducted in order to see how an economy exogenous shock affects the price promotions of different

In addition, the omitted factors model, the correlated errors model and the single-factor model are regressed and shows evidence that the endogenous factor is

This method incorporates predictors defined over three distinct levels of data granularity - gene level, mutation level and data record level (mutation/phenotype combination),

Van een trapezium zijn gegeven: de beide evenwijdige zijden, een opstaande zijde en de hoek der diagonalen. Druk nu de oppervlakte van de driehoek DBE uit in de zijde a van