• No results found

Does product involvement matter?

N/A
N/A
Protected

Academic year: 2021

Share "Does product involvement matter?"

Copied!
73
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Does product involvement matter?

The moderating role of product involvement on owned and earned

media’s effectiveness on sales.

by

Louise Nyman

June 26, 2017

MASTER THESIS

(2)

Does product involvement matter?

The moderating role of product involvement on owned and earned

media’s effectiveness on sales.

PUBLIC APPROVED VERSION

June 26, 2017

Master Thesis

MSc Marketing Management | MSc Marketing Intelligence

University of Groningen

Faculty of Business and Economics

Department of Marketing

PO Box 800

9700 AV Groningen

Louise Nyman | S3069591

Parkweg 143A, 9727HB Groningen, NL

a.l.nyman@student.rug.nl

+31 (0)6-87762109

(3)

Management Summary

The number of marketing channels is increasing at a rapid pace. However, at the same time marketing departments are losing credibility and thereby power within organizations. The reason for this loss of power is traced back to the lack of accountability for the different marketing efforts included in the marketing strategy. The solution for marketing to regain its lost power is to estimate the attribution for each media or touchpoint, but this is becoming increasingly difficult as new channels appear and create synergy effects with other channels. Due to low level of control for marketers, little attention has been given to the effects of earned and owned media. This gap in literature is worrying as the two channels are important as sources of information for customers. Moreover, in order to establish the attribution from each channel, different characteristics of the product needs to be considered, such as the level of product involvement.

This thesis investigates the effect of earned and owned media on sales for the product categories energy and dairy, and compares the outcomes. The role of brand awareness is included in the research as a mediator between the two medias and sales. Six hypotheses were derived from previous research and literature. The outcomes presents a strong negative effect of earned media on sales in the high involvement energy category, and a lagged significant effect of 21 days for earned media on sales in the low involvement product category. Owned media presents no significant effect on either of the product categories’ sales, and neither did brand awareness as a mediator. These outcomes are of importance for marketing practitioners as the effect of earned media on sales is different between the product categories and this should be considered in marketing strategies. Furthermore, the lack of significant effect of brand awareness on sales contradicts previous theories and is therefore an important finding for the academic community and further research.

Key words: Product involvement, earned media, owned media, brand awareness, word-of-mouth,

(4)

Preface

Marketing has always been a passion of mine – the numbers and theories behind how and why people respond to different marketing activities is to me the holy grail of creating and sustaining a successful business. It is amazing (and perhaps a bit frightening) that companies, and especially marketers, can create a desire for a product or service that people didn’t even knew that they wanted (for example the iPad), or make people buy items that definitely were not on their shopping list. That is also why I became so involved in this thesis; I want to contribute to the marketing literature that helps in explaining how different media can be used to optimize marketing and thus the desired customer response.

My academic journey within marketing started in the spring of 2012 at Santa Barbara City College in California, US. Five years and countries later, this Master Thesis marks the end of my double track Master degree from the University of Groningen. During my study period, there have been ups and downs, and I first and foremost want to thank my parents for always being there for me, for supporting and helping me with everything, and always reading even the most boring of my assignments. I couldn’t have done it without you and I am forever grateful for you both.

Second, I would like to thank my thesis supervisor dr. Peter van Eck, for his great help and feedback throughout the process of writing this thesis. I would also like to thank Keyvan Dehmamy for taking his time in multiple occasions to discuss my methodology for the thesis. Lastly, I want to thank my friends, fellow students and especially my boyfriend for their support throughout this last period of my studies.

Louise Nyman

(5)

Table of Contents

1. INTRODUCTION   6

 

1.1RESEARCH QUESTION   8

 

1.2CONTRIBUTION   9

 

1.3OUTLINE   9

 

2. THEORETICAL FRAMEWORK   11

 

2.1PRODUCT INVOLVEMENT   11

 

2.2EARNED MEDIA   11

 

2.2.1WORD-OF-MOUTH   12

 

2.3OWNED MEDIA   13

 

2.3.1WEBSITE   13

 

2.4BRAND AWARENESS   14

 

2.5LAGGED EFFECTS   16

 

2.6CONTROL VARIABLES   16

 

2.7CONCEPTUAL MODEL   17

 

3. METHODOLOGY   18

 

3.1DATA COLLECTION   18

 

3.2DATA DESCRIPTION   18

 

3.2.1SAMPLE SIZE   18

 

3.2.2DATA QUALITY   20

 

3.2.3LAGGED EFFECTS   23

 

3.3STATISTICAL POWER   26

 

3.4MODEL SPECIFICATION   27

 

3.5DATA ANALYSIS PROCEDURE   30

 

4. RESULTS   33

 

4.1EXPLORATORY ANALYSIS   33

 

4.2MODEL VALIDATION   34

 

4.3MODEL ESTIMATION   39

 

4.4MODERATING EFFECT   43

 

4.5LAGGED EFFECTS   44

 

5. CONCLUSION   45

 

5.1DISCUSSION   46

 

5.2MANAGERIAL IMPLICATIONS   49

 

5.3LIMITATIONS AND SUGGESTIONS FOR FUTURE RESEARCH   50

 

(6)

1. Introduction

“Half of the money I spend on advertising is wasted; the trouble is, I don’t know which half.” – John Wanamaker

Although dating back to the early 1900s, the quote from John Wanamaker is still well known in the marketing and business world as it describes the problem of measuring the effect of marketing efforts. The failure to measure marketing’s effectiveness is leading to lack of accountability and thereby marketing departments losing its influence and credibility within organizations worldwide. However, research confirms that an increase in accountability of the marketing department would enable marketing to re-gain its lost influence (Verhoef and Leeflang, 2009). The urgency and importance of increasing the accountability of marketing efforts is also emphasized by the Marketing Science Institute (2014), which proposed “measuring and communicating the value of marketing activities and investments” as one of the main research priorities for 2014-2016.

The enormous increase of available data in the last decade creates opportunities for marketers to develop analytics and marketing research techniques that enable more accurate measurements of marketing activities (Sudhir, 2016). At the same time an increasing amount of customers are making the shift from offline to online channels, resulting in the number of touchpoints in the customer journey increasing with over 20% per year (Bughin, 2015). The rapid development of new touchpoints has made it increasingly difficult to develop new ways of measuring the effects of marketing efforts both within and between the different channels. The existing synergy between various marketing channels and touchpoints further increases complexity, as marketers need to consider not only the individual effects of the marketing efforts, but also the joint effect of different marketing efforts and tactics (Naik and Raman, 2003). For example, it has been found that while online media may be more influential for customers, mass media is often more effective in stimulating the online media (Bruce, Foutz, and Kolsarici 2012; Gopinath, Thomas, and Krishnamurthy 2014). Examples as the one mentioned above implies that a careful investigation of the synergy effect between the different medias, and especially under what circumstances each media is optimal to use, is the key to the greatest optimization of marketing efforts.

(7)

limited research has been conducted for the effects of owned media, and very little research can be found on the effects of earned media (Stephen and Galak, 2012).

In order to allocate time and marketing budget, it is essential for marketers to establish the contribution from each media. In order to do this, the purchase journey needs to be clearly mapped out and the customer touchpoints require thorough analyses, both in isolation and in synergy. The purchase journey a customer takes depends on several factors; such as if the product is of utilitarian or hedonic nature, the pricing of the product, main competitors, and what group of potential customers the product is targeted at. A much less researched area is the role of the product category involvement (Kannan and Li, 2016), which is important for marketers because it determines the level of involvement that the customer has with the purchase. In prior literature, definitions of product involvement vary as different authors provide their own definition of involvement. However, the most general explanation is that a low involvement purchase is normally considered relatively insignificant, with low risk, low importance and low personal relevancy, and is often habitual, such as buying milk (McWilliam, 1997). Higher involvement categories, such as buying a car, are characterized by high importance, scrutiny of advertising messages, and high risk (Petty and Cacioppo, 1986). As a result, different product categories result in different purchase journeys and thus there is a strong argument to investigate the impact of product category involvement on different marketing tactics’ effects on sales.

(8)

The research conducted in this thesis compares brand data from two well-known brands: one dairy company and their yoghurt brand, and one energy company offering power and gas to their customers. Both brand datasets contain aggregated panel and census data provided by the marketing research firm GfK. The datasets contain weekly data, with the dairy dataset covering data for 104 weeks, and the energy dataset for 114 weeks.

1.1 Research question

Within the three distinctions of paid, earned and owned media, the latter two medias has gotten the least attention in research, mainly due to their complexity and lack of control for marketers. The majority of research on the effect of earned media is conducted on high involvement product categories, for example earned media’s effect on video games (Zhu and Zhang, 2010), social networking sites (Trusov, Bucklin and Pauwels, 2009) and on the service industry (Godes and Mayzlin, 2009). In a paper by Stephen and Galak (2012), the need for more research within social earned media and in particular the effect of earned media in lower involvement contexts was recognized. Moreover, Kannan and Li (2016) found a gap in literature in the comparison of product categories and the effect of marketing tactics and recommend that the area should be explored in further research. As a respond to the outcry for more targeted research, this thesis will compare the complex and high involvement category of an energy brand, with the habitual and low involvement, fast-moving customer goods category of a dairy brand. The research facilitates a unique comparison as the same variables are included and measured on the same scale in both brand datasets.

Based on the purpose of the research, the following research question was proposed:

”What is the differential effect of earned and owned media on sales when comparing high and low involvement product categories, and to what extent is this relationship mediated by brand awareness?”

In order to answer this question, the following research questions were defined:

1. What is the effect of owned media on sales in low and high involvement product categories?

2. What is the effect of earned media on sales in low and high involvement product categories?

(9)

3. Is owned or earned media the most effective in generating sales, and to what extent is this effect moderated by product involvement?

In providing an answer to the impact of brand awareness, the following questions were created: 4. What is the mediating role of brand awareness on the effects of earned media on sales in

low and high involvement product categories?

5. What is the mediating role of brand awareness on the effects of owned media on sales in low and high involvement product categories?

Answering these questions will facilitate a discussion of the following question:

6. To what extent is the relationship between earned and owned media and sales mediated by brand awareness? What is the moderating role of product involvement on the mediating effect of brand awareness?

1.2 Contribution

This thesis aims to extend the academic research with respect to the main concepts earned and owned media, and the synergy effect that might exist with the degree of product involvement. By addressing the issue of attribution of earned and owned media, marketing practitioners will be able to better estimate attribution of the different media that they are using, and thus better allocate their marketing budget. A more accurate allocation of the marketing budget will increase accountability and thus raise the status of the marketing department. The goal is to provide marketers with the research to make more informed decisions and thus increase the efficiency of their marketing efforts.

For academics, this thesis aims to fill a gap in literature, which can spark further extended research within the underserved area of earned and owned media and product involvement.

1.3 Outline

(10)
(11)

2. Theoretical framework

This chapter contains a literature review that elaborates on the theoretical background and concepts of which the hypotheses are derived from. The chapter starts off broadly by specifying and elaborating on product involvement, followed by sections on earned media, owned media, and brand awareness. The chapter will then continue with a description and argumentation of the inclusion of lagged effects and control variables, and finishes with a conceptual model.

2.1 Product involvement

Product involvement is not a new concept, however it still lacks a clear universal definition since authors tend to develop their own definition of what constitutes low and high product involvement. McWilliam (1992) suggests that fast-moving customer goods are trivial, low involvement products and thus uninvolving in both decision-making and personal relevance. Other authors also state that low involvement products result in less decision-making, and that this is mainly due to the low costs and benefits of these products compared to high involvement products such as a car or house (Arnould, Price and Zinkhan 2004; Wells and Prensky 1996). Semenik and Bamossy (1995) approaches product involvement somewhat differently when they distinguish two forms of low involvement; low involvement with little to moderate information search, and low involvement with very little information search, where in the latter conceptualization the customers do not see the added value of choosing one brand over the other and the choices are guided by already set habits (Semenik and Bamossy, 1995).

For the purpose of the research in this thesis, the included product categories dairy and energy are considered to be at the two opposite ends of the involvement spectrum. The energy product category is higher risk since it involves a contract, it is also considered politically and environmentally important, and it is a high-cost purchase, therefore it is normally characterized by high involvement. Dairy is considered a habitual fast-moving customer goods category, and since it does not involve high financial risk, it is considered a low involvement product category. Moreover, both products are of utilitarian nature since they are emphasized on function, performance and solving a problem, and not on hedonic pleasure (Babin, Darden and Griffin, 1994).

2.2 Earned media

(12)

of earned media into traditional earned media, i.e. journalists and other media publicity, and social earned media, which is generated online, called electric word-of-mouth, hereby referred to as e-WOM (Stephen and Galak, 2012).

In this thesis, the effects earned media will be measured through the total web conversations (e-WOM) about the brand’s product.

2.2.1 Word-of-mouth

Traditional word-of-mouth (WOM) is advice or opinions exchanged between customers on an informal level. It has been found to highly influence customer behavior, mainly due to the interactive communication and lack of commercial bias (East, Hammond and Lomax, 2008). While e-WOM shares fundamental similarities with traditional WOM, there is a difference in familiarity between the sender and receiver of the WOM. In an offline setting, there are mostly some sort of familiarity between the person giving the advice or opinion and the person receiving it. This familiarity is often based on perceived similarities such as product or category knowledge or personal connections. In an online setting, that familiarity is not necessarily present, and another difference is that the receivers of e-WOM are presented with more e-WOM as well as product information (Gupta and Harris, 2010). This thesis investigates e-WOM, however due to the similarities, previous research from traditional WOM is also considered relevant and is therefore included in the theoretical framework and hypothesis development.

(13)

earned media online tends to target more niched and thus also more involved audiences. In the same study, it was found that social earned media demands more involvement from the customer and is therefore more beneficial for high involvement products, while traditional earned media reaches the masses and is therefore better for low involvement products (Stephen and Galak, 2012). Based on that reasoning, is expected that the impact of e-WOM will be higher for the energy brand as the involvement is higher. Important to notice is that e-WOM is not limited to be of positive nature, negative e-WOM about brands is common. In fact, research has shown that the effect of e-WOM is greater when it is negative compared to positive e-WOM (Park and Lee, 2009). In addition, it has been found that e-WOM through its increased information and experience can be used to reduce uncertainty in high-risk purchase stations (Hsieh, Chiu and Chiang, 2005). This also indicates that e-WOM is stronger for high involvement products as customers may have difficulties judging the product or service.

As theory states a strong impact of earned media on sales, and that this impact increases when perceived risk or uncertainty increases, the following was hypothesized:

H1: Product involvement as a moderator intensifies the strength of the relationship between earned media and sales.

2.3 Owned media

In contrast to earned media, owned media are media activity generated by the company or their controlled agents in a channel that is under their control (Stephen and Galak, 2012). In this thesis, owned media is measured by the number of page views per visitor on the brand website.

2.3.1 Website

(14)

browsing strategies, such as their search/deliberation or knowledge building strategy. In addition, information search increases with higher product involvement (Beatty and Smith, 1987). It is therefore more likely that for the high involvement product category, customers are engaged in a more comprehensive information search, generating more page views. It has been found that page views are positively correlated with purchase decision (Mallapragada, Chandukala and Liu, 2016).

As highly involved customers are more likely to engage in a more active information search (Beatty and Smith, 1987), this is expected to increase their interaction with the website and the use of available functions on the website. As exposure affects brand awareness, the increase in interactivity creates familiarity and thus liking which results in purchase intentions (Keller, 1993). In addition, for high involvement categories such as an energy brand trust is an important factor, and it has been found that trust in a website is developed based on the interactions that a customer have with the brand (Bart, Shankar, Sultan and Urban, 2005). This implies that a visit to the website is more impactful for a highly involved customer than a lower involved customer. Based on this, following hypothesis could be derived:

H2: Product involvement positively moderates the positive relationship between owned media and sales.

2.4 Brand awareness

Brand awareness can be defined as a brand node in memory serving as a connection point for brand associations, and consists of brand recall and brand recognition (Keller, 1993). Brand recognition is the customer’s ability to confirm prior exposure to the brand when given the brand itself as a cue, and brand recall is customer’s ability to retrieve the brand when given different cues such as a need or product category (Keller, 1993). Combined with a positive brand image, brand awareness is crucial for customer purchase intention since it increases likelihood that the brand is being included in the consideration set, which are the brands considered by the customer for purchase. High brand awareness also increases the probability of selection among the brands in the consideration set (Keller, 1993). In addition, brand awareness also plays a role in generating brand associations and can brand awareness can affect purchase intentions due to familiarity, which is positively associated with brand awareness and customer choice (Baker, Hutchinson, Moore and Nedungadi, 1986).

(15)

(1993), these exposures facilitate brand awareness. Further, brand awareness generates probability of purchase decision due to familiarity. Thus, it is hypothesized that brand awareness is mediating the relationship between earned and owned media and sales. Research has demonstrated that for a high involvement product, customer’s engagement with the brand’s social media page have positive correlations with both the number of word of mouth activities, brand awareness and the purchase intention (Hutter, Hautz, Dennhardt and Füller, 2013). This further strengthens the notion of a relationship between earned media, brand awareness and sales. Thereby the following hypotheses was created:

H3a: Brand awareness has a mediating effect between earned media and sales. H3b: Brand awareness has a mediating effect between owned media and sales.

When customers make a purchase decision, they are prone to use heuristics, or decision rules, in order to easier make a decision. Due to its importance for both retailers and manufacturers, the area has been widely researched (e.g. Dobson and Kalish 1993; Haas and Kenning 2014; Carlin, Jiang and Spiller 2017). The Elaboration Likelihood Model (hereby referred to as ELM) by Petty and Cacioppo (1986) is one of the most prominent approaches to modeling the use of either heuristics or information processing. According to ELM, the more high involved a customer is, the less likely is the customer to rely on simple heuristics such as price, advertisements and familiarity of brand, and the more likely is the customer to engage in scrutiny of the brand information and even conduct external information search.

The ELM is supported by research conducted by Hoyer and Brown (1990), where it was found that for a common, repeat purchase such as peanut butter, the customers did not engage in thorough information processing, in this case product characteristics such as the taste, if brand awareness was present. For products with no brand awareness, customers were observed sampling more products and processing product information more carefully (Hoyer and Brown 1990; Macdonald and Sharp 2000). This indicates that customers of low involvement products use the familiarity with the brand, their awareness and knowledge of the brand, as a heuristic when making a purchase decision. This even implies that just brand awareness can be enough to purchase one brand over another in the low involvement product category. The following two hypotheses were derived:

(16)

H3d: The mediating effect of brand awareness between owned media and sales is stronger for low involvement products.

2.5 Lagged effects

When marketing actions affects the next period as well as the current period, this is referred to as a lagged- or carry-over effect. Prior research has found that around 40% of current awareness is carried over to the next period (Rutz and Bucklin, 2008), and that advertising effects are not limited to the period in which it is viewed (Leeflang, Wieringa, Bijmolt, & Pauwels, 2015). This means that advertisements in week t can also influence purchase decisions in week t+1 (Farley, Lehmann, Winer & Katz, 1982). Trusov, et al. (2008) found in their study that earned media in the form of WOM has significant lagged effects as well, in some cases even longer than for paid advertising. Although their research investigated referrals (in this context signing up to a website), which is a different customer response than a purchase decision, lagged effects from earned media from past periods are expected to affect sales as well in this research. In addition, research has found that for non-durable products, there is both a direct and delayed effect of word-of-moth on sales (Bone, 1995). The following hypotheses were derived:

H4a: Earned media has a lagged effect on sales for the low involvement product category. H4b: Owned media has a lagged effect on sales for the low involvement product category. H4c: Earned media has a lagged effect on sales for the high involvement product category. H4d: Owned media has a lagged effect on sales for the high involvement product category.

2.6 Control variables

Paid advertising is the third and most researched media. In this thesis online expenditures and offline expenditures are incorporated as they refer to the paid offline and online marketing efforts during the period, and thus account for the paid media. Offline paid advertising concerns media such as TV, radio and print advertising (Batra and Lane Keller, 2016). Online paid advertising, on the other hand, refers to the paid marketing tactics online such paid search advertising and display advertising (Naik and Peters, 2009).

(17)

Hence, it is can be concluded according to prior research that the different media can be used for different purposes, as well as can be both complementary and combined to generate desired outcomes.

Due to the expected spillover effects of both offline and online paid media on earned and owned media, these variables are included in this thesis as control variables. As the scope of this to examine the interaction effects of product involvement and earned/owned media rather than the interaction effects between different medias, online and offline paid media are only included as control variables.

2.7 Conceptual model

The conceptual model in Figure 1 outlines the hypothesized relationships in this thesis. Following the concepts introduced in the theoretical framework, it is hypothesized that for earned media, the effect on sales will be more intense when the product involvement is high rather than low (H1). Owned media is expected to have a significant and positive relationship with sales, and that product involvement will moderate this effect (H2). Brand awareness is expected to mediate the relationship between the two medias and sales (H3). In addition, the lagged effects of both earned and owned media are hypothesized to have a significant impact on sales (H4).

Figure 1: Conceptual model

Earned media                     SALES                       Total web conversations   Carry-over effects   Owned media   Control variables: - Online expenditures - Offline Brand awareness   Page views per

(18)

3. Methodology

This chapter of the thesis starts with a description of the data collection, sample size, and the variables included in the models. The statistical power of the data is also tested. Thereafter, the model is specified and the data analysis procedure is elaborated upon.

3.1 Data collection

The two datasets used in this thesis were retrieved via the marketing research firm GfK, an external source of data. The first dataset contains panel and census data from a well-known dairy brand, belonging to the low involvement product category as a fast-moving customer good. The second dataset contains panel and census data from a well-known energy brand that is a member of a high risk and high involvement category. Both datasets contain time series data from Dutch households that are aggregated on a weekly level. For the dairy brand, the data was collected during 104 weeks, and for the energy brand the data was collected throughout 114 weeks. Both datasets are identical regarding the variables included, and contain the following seven variables: ‘Week’, ‘Web Conversations’, ‘Page Views Per Visitor’, ‘Brand Awareness’, ‘Sales’, ‘Offline Expenditures’ and ‘Online Expenditures’. The datasets and variables are described in further detail below.

3.2 Data description

3.2.1 Sample size

Energy brand

Data for the energy brand was collected during 114 weeks, from week 29 in year 2009 to week 38 in year 2011. No missing values were found in the dataset, however, the data showed odd values regarding the variable ‘Web Conversations’ in the second part of the dataset (see Graph 1). It was discovered that the 53 first values showed a realistic distribution of values (mean: 144.2, with values within the range 3 to 655), while the remaining 61 weeks (week 29 in year 2010, to week 38, year 2011) presents one value of ‘1’ and thereafter 60 observations with a value of zero. The sudden decline in web conversations about the brand is considered unrealistic and indicates that there was an error in measuring or collecting the data. To provide an analysis with as accurate data as possible, it was decided to only use the 53 first observations for estimation (week 29 in year 2009, to week 28 in year 2010). One additional observation was lost due to the inclusion of a lagged variable for one period (see section 3.2.3), which lead to a final sample size of 52 observations.

(19)

By removing the observations with low variation, the estimates and thus outcomes will be of higher quality since the effect is more visible. Moreover, even though the size of the estimated observations is reduced to 45.61% of the original dataset, the number of observations is considered sufficient to estimate the model since there are more than 5 observations per parameter (Leeflang et al., 2015).

Graph 1: Frequency of web conversations of the energy brand during the observed period.

Dairy brand

The dairy dataset contained data collected during 105 weeks, from week 1 in year 2009 to week 52 in year 2010, however only 101 observations were used for estimation. This was due to two reasons: First, one missing value was found in week 53 in the variable ‘Offline ‘Expenditures’, which forced a removal of the observation. Second, a variable with three time lags was created for ‘Earned Media’, and therefore the three first observations were lost. Hence, in total 4 observations were lost or removed and the estimation was conducted with 101 observations, from week 6 in year 2009, to week 52 in year 2010.

(20)

Graph 2: Frequency of web conversations of the dairy brand during the observed period.

3.2.2 Data quality

Prior to estimating the data, the datasets were assessed according to the five criteria of good data: Availability, quality, quantity and variability (Leeflang et al., 2015). Availability was not to be tested, since the data was provided via an external source.

The second criterion, data quantity, refers to the number of observations available in the dataset. This research will estimate a maximum 7 parameters per dataset, which requires a minimum of 35 observations. Both datasets contained more than 35 observations available for estimation, so the criterion of at least five observations per parameter (Leeflang et al., 2015) was satisfied.

During the observation period, both census data as well as panel data from Dutch households were collected. More specifically, all variables apart from ‘Brand Awareness’ were collected as census data, while ‘Brand Awareness’ was collected from the household panel. This was done through measuring the number of times that the brand was mentioned by the respondents in a questionnaire that was sent out to the panel. As the respondent fills out the questionnaire, this is considered an active measurement, which increases the risk of self-reporter bias and could negatively impact data quality (Blumberg, Cooper, & Schindner, 2008). However, the data quality increases as the active panel data is combined with census data. A census is the total enumeration of all elements in the population rather than only a sample of the population. This reduces the probability of sampling errors (Malhotra, 2010).

Lastly, it is necessary that the data contain variation. If the data does not present enough variation, the predictor variables impact on the criterion variable cannot be measured (Leeflang et al., 2015). After removing parts of the energy dataset it was concluded that all variables included in this research presented normal variability.

(21)

However, the variability was observed to be relatively low for the variable ‘Brand Awareness’ in both datasets (dairy brand: range 13-30.94, energy brand: range 40-66). This should be taken into consideration when analyzing the estimates in later stages of the data analysis.

As the five criteria for good data was satisfied, the next section will describe the variables in more detail.

Sales

The criterion variable ‘Sales’ is defined as the volume of the brand sold in the corresponding product category during a week. Concerning the dairy brand, the volume refers to units of yoghurts sold (see Graph 3). For the energy brand, the volume is the amount of new/renewed contracts (see Graph 4). For the dairy brand, a slightly downward trend can be seen.

Graph 3: Quarterly volume sales of the dairy brand during the observed period.

Graph 4: Quarterly volume sales of the energy brand during the observed period.

Earned media

‘Web Conversations’ was included in the model as a predictor variable. It is defined as the total number of times the brand was mentioned in web conversations during a week. This is also referred to as e-WOM. In the dairy dataset, this variable shows more variation in the later stages of the observation period (see Graph 2). The number of web conversations for the dairy brand ranged from 0 to 89 during the observed period, with a mean of 14.22.

0   50000   100000   150000   200000   250000   300000   350000   Vo lu me  sa les   0   1000   2000   3000   4000  

Y2Q1   Y2Q2   Y2Q3   Y2Q4  

Vo

lu

me  sa

(22)

For the energy brand, the amount of web conversations showed an increase in the end of the observation period (see Graph 5). The mean number of web conversations was 144.2, with values ranging from 3 to 655.

While currently no universal measurement or scale to measure e-WOM exists, Godes and Mayzlin (2009) investigated the suitability for using online conversation as a measure for e-WOM. The authors state several reasons for using online conversations as a measure, with the main reason being the cost effectiveness and ease of use (Godes and Mayzlin, 2009). This confirms and supports the selection of the variable ‘Web Conversations’ as an predictor variable in this thesis, as it represents e-WOM and is therefore included as ‘Earned Media. The variable ‘Web conversations’ will hereby be referred to as ‘Earned Media’ to avoid confusion around the main concepts and research question of the thesis.

Graph 5: Frequency of web conversations of the energy brand.

Owned media

‘Owned Media’ was included as a predictor variable in the research through the variable ‘Page Views Per Visitor’. The variable is defined as the average number of pages viewed on the brand website by one customer during that week, and was calculated by multiplying the number of visits to the website with the number page views. The mean of the number of page views for the dairy brand was 5.388 with values ranging from 2.751 to 8.704.

For the energy brand, the mean for the number of page views was 5.378 with values ranging from 2.292 to 7.694.

‘Owned Media’ is different in the measurement compared to the other variables, and this must be taken into careful consideration when interpreting the estimates.

(23)

The variable presents the number of page views per visitor during a week, not the total number of visitors as with the other variables that are all presenting cumulative values. Regarding interpretation, this means that it is possible to see if there is a positive or negative effect, however the exact effect cannot be interpreted without data on how many people visited the website during the week.

Previous research used the variable page views as both a predictor of purchase decision and basket value (Mallapragada et al., 2016). Accordingly, the variable ‘Page Views per Visitor’ is confirmed as an appropriate measurement of website effectiveness and thus the owned media of a brand. The variable ‘Page views per visitor’ will hereby be referred to as ‘Owned Media’.

Brand awareness

‘Brand Awareness’ was included in the research as a mediator between ‘Earned Media’ or ’Owned Media’ and ‘Sales’. In the dataset, ‘Brand Awareness’ is defined as the spontaneous knowledge that the respondent has about the brand during that week. This variable was computed as the percentage of the respondents that spontaneously mentioned the brand during that week. During the observation period, the dairy brand was mentioned on average by 23.24% of the respondents, and the energy brand on average by 53.44% of the respondents.

Control variables

‘Offline Expenditures’ is defined as the total cost for paid offline advertising for the brand during a week. The mean of the variable ‘Offline Expenditures’ for the dairy brand was €18,381. During the period, there were 30 weeks with offline advertising spending, and 73 weeks without. The energy brand had offline expenditures throughout the entire observation period of 52 weeks, with €316,989 as the mean expenditure.

‘Online Expenditures’ is the total cost for paid online advertising for the brand during a week. The dairy brand had online expenditures in 61 of the 103 observation weeks, with a mean of €51.11 and values ranging from €0 to €151.67. The energy brand had 48 weeks with a mean online expenditure of €180.51, values ranging from 0 to €421.79.

3.2.3 Lagged effects

(24)

Hence, the correlation coefficient is considered a good measure to indicate the strength of association between the dependent variable ‘Sales’ and different lagged variables.

Calculating the correlation coefficients and its significance was done for both datasets and for both variables ‘Earned Media’ and ‘Owned Media’. For both datasets, ‘Owned Media’ did not present any significant results (see Appendix A). However, although the correlation was insignificant and thus indicated no relationship between ‘Sales’ and ‘Owned Media’, it was decided to continue estimating the models with ‘Lagged Owned Media’ (𝑂!!!). This was done as theory suggested that lagged

effects could affect current sales (Leeflang et al 2015; Rutz and Bucklin 2011) and thus is considered useful to include as they can increase the explanatory power of the model.

The lagged effects of ‘Earned Media’ presented relevant and statistically significant results when examining the correlation coefficients and the corresponding significance. Therefore, a more elaborate analysis of the variable ‘Lagged Earned Media’ for both datasets is provided below.

Earned media lagged effects: Dairy brand

Theory stated that lagged effects of earned media for more than one period are likely (Trusov et al., 2008). To investigate the appropriate lagged effect to include, a correlation matrix was created with lagged variables for t-10 for ‘Earned Media. The p-values and correlation coefficients between ‘Sales’ and the lagged variables were plotted against time in Graph 6 and Graph 7, respectively. The p-values were significant on a 5% significance level for ‘t-3’ value: 0.0265) and ‘t-5’ (p-value: 0.0182). Due to the insignificance of t-1 and t-2, and since it was desired to keep the lagged effect below four periods in support of the findings by Trusov, et al. (2008), a delay of three periods (21 days) was chosen included in the model as ‘Lagged Earned Media’ (𝐸!!!).

Graph 6: P-values of the ’Lagged Earned Media’ variables, ’t’ to ’t-10’.

0   0,2   0,4   0,6   0,8   1   t   t-­‐1   t-­‐2   t-­‐3   t-­‐4   t-­‐5   t-­‐6   t-­‐7   t-­‐8   t-­‐9   t-­‐10   p-­‐ value  

(25)

Graph 7: Correlation coefficients for the ’Lagged Earned Media’ variables ’t’ to ’t-10’.

Earned media lagged effects: Energy brand

In accordance with the selection of the ‘Earned Media’ lagged variable for the dairy brand dataset, the correlation coefficients and p-values of the correlation between ‘Lagged Earned Media’ ‘t’ to t-10’ and ‘Sales’ was used. As can be seen in Graph 8, the p-values of the variables ‘t’ to ‘t-9’ are significant on a 5% significance level. Graph 9 presents declining correlation strength between ‘Earned Media’ and ‘Sales’ with time, from a correlation coefficient of -0.54 in time ‘t’ to a correlation coefficient of -0.17 in ‘t-10’. As with the dairy brand, the lagged variable was considered for the first three periods in accordance with Trusov et al. (2008). Since ‘t’ is already included in the model as the variable ‘Earned Media’, ‘t-1’ was chosen to be included in the model as ‘Lagged Earned Media’ since it has the highest correlation with sales. The variable has a p-value of 0.0004 and a correlation coefficient of -0.54.

Graph 8: P-values of the ’Lagged Earned Media’ variables, ’t’ to ’t-10’.

-0,3 -0,25 -0,2 -0,15 -0,1 -0,05 0 t t-1 t-2 t-3 t-4 t-5 t-6 t-7 t-8 t-9 t-10 S tr en g th o f c o rr el at io n No. of lags 0   0,05   0,1   0,15   0,2   0,25   0,3   t   t-­‐1   t-­‐2   t-­‐3   t-­‐4   t-­‐5   t-­‐6   t-­‐7   t-­‐8   t-­‐9   t-­‐10   p-­‐ value  

(26)

Graph 9: Correlation coefficients for the ’Lagged Earned Media’ variables ’t’ to ’t-10’.

3.3 Statistical power

The statistical power is tested to ensure that the results of the data analyses conducted throughout the thesis are representable. This will reduce the risk of a false H0 rejection, and thus type II errors (Cohen, 1992). While the statistical power is often neglected, it is relevant in this research since the energy brand dataset decreased by 54.39% of the original observation size. Cohen (1992) proposes that a statistical power of 0.8 is acceptable as this indicates that the probability that the variables are interpreted correctly is 80%. The statistical power analysis involves three phenomena: the sample size, the significance criterion, the statistical power and the population effect size. The effect size refers to the size of difference that can be detected, and the interpretation is that for a small effect size, the effect is small but yet not trivial, and for a medium effect size the effect that can be detected is larger than trivial, however usually not visible the naked eye of an unknown observer (Cohen, 1992).

To reveal if a statistical power of 0.8 is present in the data, the population effect size is calculated with the formula 𝑓!= !!

!!!!. Cohen (1992) proposes that the value 0.02 represents small effect

sizes, 0.15 medium effect sizes, and 0.35 large effect sizes. On a .05 significant level, the sample size for a multiple regression with 6 predictors is 686 for a small effect size, 97 for a medium effect size and 45 for a large effect size. The sample size for the dairy brand is 100, which is close to a medium effect size. This means that the sample size is adequate for detecting medium differences and thus is considered appropriate for continuing the analysis. The energy brand has a sample size of 52, which falls between a large and medium effect size. Since the sample size can detect the large to medium differences it is considered adequate for this thesis, however it should be taken into consideration when interpreting the results.

-­‐0,6   -­‐0,5   -­‐0,4   -­‐0,3   -­‐0,2   -­‐0,1   0   t   t-­‐1   t-­‐2   t-­‐3   t-­‐4   t-­‐5   t-­‐6   t-­‐7   t-­‐8   t-­‐9   t-­‐10   Str ength  of  c or re la, on  

(27)

3.4 Model specification

The aim with this thesis is to describe the moderation effect that product involvement has on the impact of earned and owned media on sales. As the research is aimed at describing the factors behind an outcome, and not to predict them, the model is specified with a descriptive purpose in mind. In model building, there are four criteria that are strived for: A model should be simple, complete, adaptive and robust (Leeflang et al., 2015). Achieving a complete yet simple model is a challenge for many researchers since the two criteria often contradict each other. For this thesis, the research question has been thoroughly considered upon before deciding which variables to include. To increase completeness of the model, lagged effects were chosen to be included, as it provides a more complete model of reality in the marketplace. Competition was considered, however chosen not to be included, as it would add another dimension to the model. This would make the model increasingly complicated and thus not fulfill the criteria of simplicity. Moreover, the model is considered adaptive since it allows for an inclusion of more product categories or marketing tactics. This is because the model is built in an evolutionary manner, meaning that it is built stepwise as more variables and interactions become relevant (Leeflang et al., 2015).

Further, the functional form was carefully decided upon. The functional form of a model determines the mathematical relationship between the variables in the model (Leeflang et al., 2015). The two most common functional forms are the linear additive model and multiplicative model. The linear additive model assumes that the total effect of all predictor variables is the sum of their individual effects. The model assumes constant return to scale and no interaction between the variables. A multiplicative model is not linear, but can be made linear through a log-transformation. In the multiplicative model, all variables are assumed to interact (Leeflang et al., 2015).

(28)

(1) 𝑆!=  𝛼 +   𝛽!𝑂!+   𝛽!𝐸!+   𝛽!𝐵𝐴!+   𝛽!𝑂𝑓𝑓𝐸𝑥𝑝!+   𝛽!𝑂𝑛𝐸𝑥𝑝!+   𝛽!𝑂!!!+   𝛽!𝐸!!! +   𝜀!

Equation 1: Linear additive model.

(2) 𝑆!=  𝛼𝑂!!!𝛽!!𝐵𝐴!!!𝑂𝑓𝑓𝐸𝑥𝑝!!!𝑂𝑛𝐸𝑥𝑝!!!𝑂!!!!!𝐸!!!!!𝜀!

Equation 2: Multiplicative model.

(3) 𝑙𝑛𝑆! = 𝑙𝑛  𝛼 +   𝛽!𝑙𝑛  𝑂!+   𝛽!𝑙𝑛𝐸!+   𝛽!𝑙𝑛𝐵𝐴!+   𝛽!𝑙𝑛𝑂𝑓𝑓𝐸𝑥𝑝!+   𝛽!𝑂𝑛𝑙𝑛𝐸𝑥𝑝!+   𝛽!𝑙𝑛  𝑂!!!+

 𝛽!𝑙𝑛𝐸!!!+  𝑙𝑛𝜀!

Equation 3: Multiplicative model, log transformed. Where, 𝑆! = 𝑆𝑎𝑙𝑒𝑠  𝑖𝑛  𝑡𝑖𝑚𝑒  𝑡 𝑂! = 𝑂𝑤𝑛𝑒𝑑  𝑚𝑒𝑑𝑖𝑎 𝐸! = 𝐸𝑎𝑟𝑛𝑒𝑑  𝑚𝑒𝑑𝑖𝑎   𝐵𝐴! = 𝐵𝑟𝑎𝑛𝑑  𝑎𝑤𝑎𝑟𝑒𝑛𝑛𝑒𝑠𝑠 𝑂𝑓𝑓𝐸𝑥𝑝! = 𝑂𝑓𝑓𝑙𝑖𝑛𝑒  𝑒𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒   𝑂𝑛𝐸𝑥𝑝! = 𝑂𝑛𝑙𝑖𝑛𝑒  𝑒𝑥𝑝𝑒𝑛𝑑𝑖𝑡𝑢𝑟𝑒   𝑂!!! = 𝐿𝑎𝑔𝑔𝑒𝑑  𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒  𝑓𝑜𝑟  𝑜𝑤𝑛𝑒𝑑  𝑚𝑒𝑑𝑖𝑎 𝐸!!! = 𝐿𝑎𝑔𝑔𝑒𝑑  𝑣𝑎𝑟𝑖𝑎𝑏𝑙𝑒  𝑓𝑜𝑟  𝑒𝑎𝑟𝑛𝑒𝑑  𝑚𝑒𝑑𝑖𝑎, 𝑤ℎ𝑒𝑟𝑒  𝑥  1  𝑓𝑜𝑟  𝑡ℎ𝑒  𝑒𝑛𝑒𝑟𝑔𝑦  𝑑𝑎𝑡𝑎𝑠𝑒𝑡, 𝑎𝑛𝑑   = 5  𝑓𝑜𝑟  𝑡ℎ𝑒  𝑑𝑎𝑖𝑟𝑦  𝑑𝑎𝑡𝑎𝑠𝑒𝑡 𝛼, 𝛽!, 𝛽!, 𝛽!, 𝛽!, 𝛽!,𝛽!, 𝛽!= 𝑅𝑒𝑔𝑟𝑒𝑠𝑠𝑖𝑜𝑛  𝑝𝑎𝑟𝑎𝑚𝑒𝑡𝑒𝑟𝑠 𝜀! = 𝐸𝑟𝑟𝑜𝑟  𝑡𝑒𝑟𝑚

(29)

Coefficients Additive Linear Model: p-values

Multiplicative Model: p-values

Intercept 1.64e-06 *** <2e-16 ***

Brand Awareness 0.4859 0.446 Offline Expenditures 0.0204 * 0.018 * Lagged Earned Media 0.0596 . 0.041 * Owned Media 0.4628 0.352 Online Expenditures 0.3762 0.615 Earned Media 0.5158 0.446 Lagged Owned Media 0.8080 0.515 R-squared 0.1167 0.1254 Adjusted R-squared 0.0503 0.0596 p-value 0.1056 0.0774 Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Table 1: Regression output comparison, dairy brand.

For the energy brand (see Table 2), the only significant variable in the linear additive model, ‘Earned Media’ becomes insignificant for a multiplicative model. Furthermore, the adjusted R-squared indicates that the additive linear model explains 4.32% more of the variance than the multiplicative model. The p-value also decreases, however both models are significant on a 5% significance level.

(30)

Lagged Owned Media 0.9829 0.7147 R-squared 0.3603 0.3171 Adjusted R-squared 0.2585 0.2085 p-value 0.0042 0.0135 Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Table 2: Regression output comparison, energy brand.

The hypotheses demand a comparison of the datasets, which indicates that for a precise comparison, the same model ought to be used for both datasets. Due to the descriptive and not predictive purpose of the models, significant effects of the predictor variables are of bigger importance than the significance of the models as a whole. Although the multiplicative model presented higher significance for the model as a whole for the dairy brand (Table 1), the multiplicative model for the energy brand presented no significant predictor variables (Table 2), and therefore a linear additive model (Equation 1) was chosen for both brands. Testing for misspecification was conducted with a Regression Specification Error Test (RESET) by Ramsey (1969), which is built upon the idea that there are several misspecifications that can lead to violations of the assumption that the expected value of the residuals is zero. Misspecifications can for example be the inclusion of omitted variables, wrong functional form. The test is appropriate for the purpose of demining misspecification since it detects neglected nonlinearities in the model. The test adds squares and cubes of fitted values to the regression, and the test statistic follows an F(2, n − k − 3) distribution. To ensure good fit of the model, a RESET was performed on the linear additive model for both brands. For the dairy brand, this presented RESET value of 0.4429, which resulted in a p-value of 0.8724. For the energy brand, the RESET resulted in a value of 0.6019 and thus a p-value of 0.727. This means that no misspecification was found in any of the two models, and thereby the results support the chosen functional form.

3.5 Data analysis procedure

(31)

The statistical program R (see Appendix B for the R script used) was used for processing the data, which started with checking for missing values. This was followed by model validation and appropriate correction, before the models were estimated using OLS regression. Mediation was tested for using a Baron and Kenny (1986) test, as well as a bootstrapping method. Lastly, to determine the moderating effect of product involvement, a t-test was conducted between the two datasets to determine if the betas were comparable. These procedures are further explained in the following sections.

3.5.1 Ordinary Least Squares (OLS)

The model was estimated on both brands (dairy and energy) using OLS regression, with which the objective is that the estimated values are as close as possible to the observed values. For each t = 1,…T, there is a distance (residual), defined as 𝑒! =   𝑦!− 𝑦!, between the estimated criterion

variable 𝑦! and the observed value 𝑦!. If the predictor variables are known, the estimated values

of 𝑦! can be obtained by replacing the unknown parameters with estimated values and assuming

that the optimal prediction for the residuals is zero. A good model estimates values that are as close to the observed values as possible, which minimizes the sum of squared residuals (Leeflang et al., 2015).

3.5.2 Mediation

In order to test hypotheses H3a-d and estimate a mediating effect of brand awareness, two different procedures were applied. While it is traditional to also perform a Sobel test, it was excluded since it is more accurate for samples where n>200 and less powerful than other mediation tests available (Hayes, 2009). Therefore, the traditional Baron and Kenny (1986) test, as well as direct testing of the mediation with Bootstrapping, was performed.

The first test performed was the Baron and Kenny (1986) test. According to this test, for mediation effect to be present, the following three conditions must hold: First, the independent variable and the mediator must have a significant effect on the dependent variable. Second, the independent variable’s effect on the mediator must be significant. Third, the effect of the independent variable on the dependent variable must be significant (Baron and Kenny, 1986). By performing the test it can be concluded whether there is a full, partial or no mediation.

(32)

mediation (MacKinnon, Lockwood and Williams 2004), (2) the method does not assume normality of the sampling distribution of the indirect effect, which is in many cases an issue since the sampling distribution of the indirect effect often is asymmetric with non-zero skewness and kurtosis (Hayes, 2009), (3) the direct estimation of the strength of the mediation (Hayes, 2009), which will enable testing of hypotheses H3a-d. Bootstrapping is a resampling method that uses resampling with replacement, meaning that observations are drawn with replacement, followed by an estimation of the mediating effect. The outcome is recorded and the procedure is repeated a number of times (in this case 1000 times) and the estimates are distributed to determine the 2.5 and 97.5th percentile (Hayes, 2009).

3.5.3 Moderation

(33)

4. Results

The results chapter includes the results from the statistical tests performed. The chapter begins with a brief exploratory analysis of both datasets. It is then continued with a thorough testing of model assumptions, as this indicates whether there are any violations in the model. These are then corrected for prior to estimation. The chapter ends with the results from the final model estimations.

4.1 Exploratory analysis

A correlation matrix for each dataset was produced in order to get a good perception about the strength of relationship between the different variables.

4.1.1 Dairy dataset

Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Table 3: Correlation matrix, dairy brand.

The correlation matrix with the variables from the dairy brand can be seen in Table 3. It can be observed that ‘Sales’ has a significant (p < 0.05) and quite strong and positive correlation with ‘Offline Expenditures’ as well as a negative correlation with ‘Lagged Earned Media’. ‘Owned Media’ correlates with ‘Online Expenditures’, ‘Brand Awareness’ and its own lagged variable ‘Lagged Owned Media’. ‘Brand Awareness’ presents strong correlations with ‘Owned Media’ and ‘Earned Media’, as well as ‘Lagged Owned Media’ and ‘Lagged Earned Media’.

(34)

4.1.2 Energy dataset

The correlation matrix with the variables from the energy brand can be found in Table 4. The dependent variable ‘Sales’ presents a significant (p < 0.001) and negative but strong correlation (correlation coefficient -56) with ‘Earned Media’ and ‘Lagged Earned Media’ (correlation coefficient -54). ‘Brand Awareness’ was found to have significant correlations with ‘Owned Media’, ‘Earned Media’, ‘Offline Expenditures’ and Online Expenditures’, as well as the lagged variables ‘Lagged Owned Media’ and ‘Lagged Earned Media’.

Significance codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1 Table 4: Correlation matrix, energy brand.

4.2 Model validation

The model validation assesses the quality of the model. There are several assumptions about the model residuals that need to be tested and satisfied in order to estimate a model with an OLS regression. Before testing if the model satisfies the assumptions of an OLS regression, the variables included in the model were tested for multicollinearity. If high degrees of multicollinearity are present in the model, this leads to specification problems since the exact effects of the predictor variables cannot be correctly distinguished (Leeflang et al., 2015). Multicollinearity arises when there is a high degree of correlation between the variables. Multicollinearity is tested

(35)

through measuring the Variance Inflation Factor (VIF). A VIF larger than 5 indicates problem with multicollinearity in the model. The VIF is computed as (!!!!

!

!)  (Leeflang et al., 2015).

The first assumption about residuals in the model is autocorrelation (𝐸 𝜀!𝜀! ≠ 0 for t,s), which

occurs when there is a dependence on the residuals as the model fails to account for dependencies in time. Failure to detect and correct for autocorrelation results in the variance of effects being estimated wrongfully (Leeflang et al., 2015).

Second, heteroscedasticity in the residuals leads to wrong estimation of the variance of effects, and is caused by the violation of equal variances in the economic agents’ disturbance terms. The Breusch–Pagan test was conducted to detect heteroscedasticity, and calculates the F-statistic with the R-squared from the regression (𝑅!  !!), the number of observations (T) and the number of

predictor variables (K). The formula for computing the F-statistic for the Breusch–Pagan test is: 𝐹 =   !!  !! /!

(!!!!  !! )/(!!!!!), which has a 𝐹!,!!!!! (Leeflang et al., 2015).

Third, the assumptions of the OLS require the residuals in the model to be normally distributed. If this assumptions cannot be satisfied, it can lead to model misspecification and thus that the p-values cannot be trusted since they rely on the assumption of a normal distribution. Non-normality can be detected visually with normality plots or by normality tests, which would suggest a deviation from a normal distribution in the form of either skewness or kurtosis (Leeflang et al., 2015). There are a number of normality tests available, and in this research two of the most commonly used tests will be performed to ensure normality: The first test performed to ensure normality is the Jarque-Bera-test, which has a chi-squared distributed test statistic with two degrees of freedom. The test statistic is calculated using the following formula: 𝑋(!)! = !!!

!! (𝑠𝑘!+

!

!𝑒𝑘!), where T is the number of observations, L is the number of parameters estimated, 𝜎 is the

standard deviation of the residuals, 𝑠𝑘 is the 3rd moment skewness of the residual’s distribution,

and 𝑒𝑘 is the 4th moment excess kurtosis of the residual’s distribution (Leeflang et al., 2015). The

(36)

2011). To correct for non-normality, bootstrapping can be used to ensure the reliability of the p-values.

The omitted variable bias was considered through the RESET test conducted in section 3.3 and by paying attention to face validity and analyzing the included variables in detail.

4.2.1 Dairy brand

The full model, (see Model (1), Table 15 in Appendix C), of the dairy brand presented higher multicollinearity for the variables ‘Owned Media’ (VIF: 6.5699) and ‘Lagged Owned Media’ (VIF: 4.9758). As ‘Lagged Owned Media’ was insignificant (p-value: 0.8080) and potentially distorted the values of other estimates, the variable was removed from the model. By doing this, the VIF score for ‘Owned Media’ decreased to 2.5704, see Table 5 for the VIF scores before and after removing ‘Lagged Owned Media’. While high VIF scores for lagged variables are neither surprising nor considered very critical, the multicollinearity risks distorting the estimates and thus ought to be resolved. The decision to remove the variable was also confirmed by the adjusted R-squared, which increased from 0.0502 to 0.0598 when excluding the variable. This indicated that the variable ‘Lagged Owned Media’ did not add any explanatory power to the model and thus could be removed without affecting the effect in the criterion variable. Additionally, the p-value decreased from 0.1056 to 0.06534, which makes Model (2) (see Table 15 in Appendix C) significant on a 10% significance level.

VIF Model 1 VIF Model 2

Brand Awareness 1.4228 1.4217 Offline Expenditures 1.0818 1.0789 Owned Media 6.5699 2.5704 Online Expenditures 2.5555 2.5545 Earned Media 2.2819 2.2818 Lagged Earned Media 2.2612 2.2604 Lagged Owned Media 4.9758 -

Table 5: VIF scores, dairy brand.

(37)

To determine if a pattern in the residuals was present, this was analyzed visually with a plot of the sequence of residuals (see Appendix D). It could be concluded that no autocorrelation in the residuals was found, and thus the second assumption of an OLS regression was satisfied.

Testing for heteroscedasticity was performed with the Breusch-Pagan test. To calculate the Breusch-Pagan test, the number of predictors, the number of observations, and the r-squared from the regression of the model were used. This resulted in a p-value of 0.2181, meaning that the null hypothesis could not be rejected and thus the results showed no indication of homoscedastic residuals in the model.

Next, normality in the residuals was tested for. In order to investigate the normality visually, a histogram as well as a QQ plot (see Graph 11 and 12 in Appendix E) was created. They both showed that residuals does not follow a normal distribution and thus is a problem in the model. The QQ plot indicated a heavy tailed distribution. The non-normality was confirmed by the Lilliefors (Kolmogorov-Smirnov) test (p-value: 0.001482) and the Jarque-Bera test (p-value: < 2.2e-16). Unanimously, the tests rejected the null hypothesis and it was concluded that a quite strong problem with normality existed. As non-normality does not affect the estimates, only the p-values, bootstrapping was performed as this confirms the reliability of the p-values. The bootstrapping was conducted with 1999 resamples with replacement, and confirmed that ‘Lagged Earned Media’ and ‘Offline Expenditures’ are significant on a 10% significance level with p-values of and 0.0630 and 0.0874, respectively.

Next, the model was re-specified before estimation. The final model used for estimation of the dairy brand was:

(4) 𝑆!=  𝛼 +   𝛽!𝑂!+   𝛽!𝐸!+   𝛽!𝐵𝐴!+   𝛽!𝑂𝑓𝑓𝐸𝑥𝑝!+   𝛽!𝑂𝑛𝐸𝑥𝑝!+   𝛽!𝐸!!!+   𝜀!

(38)

4.2.2 Energy brand

The assumptions described in section 4.2 were tested for the variables for the energy brand as well. First, the variables were tested for multicollinearity, see column ‘VIF Model 1’ in Table 6. It was decided to exclude ‘Lagged Owned Media’, as removing the variable resulted in a more correct comparison between the energy and the dairy datasets since they are then estimated with the same model (see Equation 4). In addition, the adjusted R-squared increased from 0.2585 to 0.275 when removing the variable. This supports the decision to remove the variable as the increased R-squared indicated that the model explains more excluding ‘Lagged Owned Media’ from the model. The p-values also increased, however both models were significant on a 5% significance level (see Table 16 in Appendix C).

VIF Model 1 VIF Model 2

Brand Awareness 1.4950 1.4221 Offline Expenditures 1.3011 1.2257 Owned Media 1.6545 1.3001 Online Expenditures 1.4094 1.4030 Earned Media 2.6377 2.5827 Lagged Earned Media 2.7582 2.7509 Lagged Owned Media 1.6566 -

Table 6: VIF scores, energy brand.

The Durbin Watson test for autocorrelation was performed and gave a result of 1.9302, which is larger than the upper bound for 52 observations and 6 predictor variables (DW > 𝑑!), and thus

the assumption of no autocorrelation can be satisfied.

To ensure that there is no homoscedasticity in the residuals, a Breusch-Pagan test was performed. The test used the number of observations, number of predictor variables and presented a p-value of 0.9189, which means that the null hypothesis could not be rejected and thereby it was concluded that no homoscedasticity was found and thus is not a problem in the model.

Referenties

GERELATEERDE DOCUMENTEN

In this paper we look at memory cues in our environment by comparing the effect of cue modality (odor, physical artifact, photo, sound, and video) on the number of

With the collapse of the diamond market, the number of blacks employed declined from 6 666 in 1928/1929 to 811 in 1932 and workers began to stream back to the

The combination of simplified finite element modelling and low-cost parallel computing makes it possible to establish a real-time interface between the design space and

The main objective of this project – carried out by the Center for Higher Education Policy Studies (CHEPS), University of Twente, the Netherlands, and the Centre for Higher

(upper row 1), coiled-coil formation in the B-loop (blue) enables HA extension and insertion of the fusion peptide into the cell membrane (c1), followed by foldback of the hinge

The representations refer to (a) a skill hierarchy in which all constituent skills and their mutual relationships are described, (b) an overview of the associated knowledge

MoveSmarter trips are compared with reported trips from the recall survey to check the quality of the automatic trip detection, but also to better understand the rate of

The good memories of music tours provided them with a desire to experience the exhilaration of performing and listening, a desire to make meaningful connections