Who are your potentially high-value customers? Using the Hurdle model to test the moderating effect of customer characteristics on the relationship between the number of advertising exposure and

(1)

Who are your potentially high-value customers?

Using the Hurdle model to test the moderating effect of customer

characteristics on the relationship between the number of advertising

exposure and customer’s purchased volume.

(2)

Master Thesis

Who are your potentially high-value customers?

Using the Hurdle model to test the moderating effect of customer

characteristics on the relationship between the number of advertising

exposure and customer’s purchased volume.

Tzu-Chi, Wang (s3077985) University of Groningen Faculty of Economic and Business

MSc Marketing Intelligence

Avondsterlaan 13, Groningen (9742KA) +31-6-17330180

t.c.wang@student.rug.nl

(3)

Acknowledgement

(4)

Table of Content

1. Introduction ... 1

2. Theoretical framework ... 4

2.1 Advertisements and advertising exposures ... 4

2.2 Advertising repetition and stock ... 6

2.3 The moderating effect of customer’s characteristics on the relationship between the number of advertising exposure and customer’s purchased volume. ... 7

2.3.1 Age ... 8

2.3.2 Level of education ... 9

2.3.3 Household’s characteristics: the number of children under 18 years old. ... 9

2.3.4 Product involvement level ... 11

2.4 Competitor ... 11

2.5 Conceptual framework ... 12

3. Methodology ... 13

3.1 Model specification and estimation ... 13

3.2 Model validation ... 16

4. Empirical study and results ... 19

4.1 Data description ... 19

4.2 Descriptive statistics ... 20

4.3 Empirical results ... 23

4.3.1 Key findings ... 24

4.3. Improved predictive model ... 32

5. Discussion and Conclusion ... 33

5.1 Discussion ... 33

5.2 Conclusion and Managerial implications ... 36

5.3 Research limitations and suggestions for future research ... 37

6. Appendix ... 39

6.1 The Dutch education system ... 39

(5)

1

Who are your potentially high-value customers?

Using the Hurdle model to test the moderating effect of customer characteristics

on the relationship between the number of advertisement exposure and

customer’s purchased volume.

Tzu-Chi, Wang

University of Groningen, The Netherlands

Abstract:

This study aims to understand the influence of customer characteristics on the relationship between the number of advertising exposure and customer’s purchased volume. More specifically, it investigates how the interaction effect of customer characteristics and advertisements influences customer’s decision. Furthermore, by the hurdle model and log-linear multiplicative regression, it is identified which variable would have a positive response on purchasing products. These findings indicate that those customers who are highly educated and have a high product involvement level will have a positive advertising response. As for the moderating variables of customer’s age and the number of children under 18 years old, both of them show an insignificant influence on the advertising response. Since this study uses an aggregated level of the advertising exposure, the adstock variable which may influence the present purchasing decision is considered in this empirical study and seen as the control variable. Other control variables such as product’s price and competitor effect are also included in this study. However, because an aggregated level of advertising exposure, customer’s self-selection bias in choosing the advertisement channels cannot be controlled in this study.

Keywords: Hurdle model; Customer characteristics (age, education level, household

characteristic, the number of children under 18 years old, product involvement); advertising exposure, adstock (lagged advertising variable); moderator

1. Introduction

(6)

2

and resources to get information about each individual customer, it will provide the company an ability to select customers and apply treatments. Despite the fact that these observations and databases require customer relationship management (CRM) systems and capabilities that produce customer insights (Reinartz et al, 2004), data opens the door for companies to acquire new customers and retain the existing clients. Therefore, from the company’s perspective, they would like to know the distinct responses from customers. With relevant knowledge, the company is able to target individual customers with different marketing campaigns (e.g., advertisement).

Advertising is a way to convey a message, increase customer’s product awareness and brand loyalty (Aaker & Biel, 2013). Repeated advertising facilitates customer’s memory on naming and categorizing of primary advertising information (Allport & Chmiel, 1984). It strengthens customer’s perception on the advertising and stimulate people to put the focal product in the memory-based consideration set (Shapiro et al, 1997). However, advertisement is not always able to reach all customers. In order to efficiently attract customer’s attention from the advertisement, the organization should know and understand customer characteristics and subsequently segment these customers into different groups. With the knowledge of customer’s response on the advertising exposure, the company is able to provide a personalized advertising to increase the customer’s purchased volume. Because the content of personalized advertising is of greater relevance and personalized (Chellappa & Sin, 2005), these authors also suggested personalized advertising reduces customer’s search time for products and indirectly increases the potential turnover rate per customer. When customers become loyal, it may create a higher customer lifetime value (CLV) to the firm. Once customers like the advertising message and do not feel a strong intrusiveness, the advertising increases people’s purchase intention (Lambrecht & Tucker, 2013; Van Doorn & Hoesktra, 2013). By investigating the customer’s responses to the advertising exposure, the organization is able to provide the personalized advertising according to customer characteristics.

(7)

3

advertising response. These authors mentioned that using random selection samples on the unobserved customer heterogeneity is not ideal. They also noted that these significantly different responses from customers could partly occur on divergent reactions to the advertising and medium. Similarly, this situation is likely to happen when customer see the high number of advertising exposure. Although the study from Bucklin and Sismeiro (2003) has done a similar study on customer’s online advertising response, until now it still did not have the research on investigating heterogenous customer’s response on the number of advertising exposure only. Therefore, this study is different from the previous study and regards an aggregated level of advertisement as the focal independent variable.

Most companies now regard customers as their assets, which are reflected on the growing importance of customer management in many industries (Boulding et al, 2005). From a customer management perspective, understanding individual customers can efficiently assist the company to set a good strategy and then create the value of the customer base. Furthermore, the organization is able to efficiently reach their targeted group via advertising exposure. They can clarify people’s purchase intention and predict the result of advertising. Based on the customer’s advertising response, companies are able to allocate the marketing budget more efficiently. For example, people with a high product involvement would incline to purchase more when they receive the advertising exposure. With this knowledge, firms can design a marketing campaign that combines advertisement with the thing the customer likes. This increases customer’s financial values to the firm and attracts people’s attentions. Therefore, understanding customer characteristics makes the firm able to efficiently convey advertising message and motive customers to purchase more volume.

(8)

4

2. Theoretical framework

In this section, previous and current studies will be discussed to address the factors that might influence customer’s purchasing decision. Depending on different customer characteristics, customers have distinct responses to the firm’s marketing campaigns. In order to find out the relationship, relevant literatures are compared and examined. After these concepts which might influence the dependent variables are introduced, hypotheses are consequently formulated. The conceptual framework which consists of a graphic representation of the hypothesized relationship is provided.

2.1 Advertisements and advertising exposures

Advertising is a mean of communication with the customers of an organization and users of a product or service (Grönroos, 2004). Advertisements are messages which are paid by those who sent to customers and are intended to influence people’s perceptions and/or opinions. It is represented by a picture, short film or song which tries to persuade people to buy a product or service. It is presented by a piece of text that conveys an idea to people. In today’s world, advertising is found in every possible media. In order to reach existing and potential customers, it does so via direct mail, television, radio and newspaper, which are seen as the offline advertisement. With the development of high technology and internet, there is a new process to catch the attention of potential customers for business through particular banner, video and flash animation, which are called online advertisement. According to the proposal of Kotler (1994), advertisements are important tools that companies use to directly persuade buyers and the public.

(9)

5

experience on decreasing importance of online information shows that consumer’s trust of online search engine information decreases with increasing Internet experience”.

Offline advertisements are a traditional marketing strategy that is able to reach all customers. Given the internet’s tremendous rise in today’s world, marketers refer to other media channels which are not connected to the website as offline. These channels include radio, print advertising (e.g., direct mail) and television advertisement. Even though the arrival of the Internet appeared to be the end of print-based advertisements, using offline channels in a marketing campaign is still a fundamental area for many organizations. It creates brand awareness and the corporate reputation. In term of the fact that most businesses have had their own website, offline marketing strategies are now frequently tied to company’s online adverting effect. Nevertheless, it does not mean online businesses should only rely on online advertising to create the product awareness and site visitation (Bellizzi, 2000). When customers have a high levels of internet experience, they are easy to refuse online advertisements. It makes offline advertising able to provide relatively more information to this kind of customers. (Cheema & Papatla, 2010).

(10)

6

and the study from Manchanda et al (2006), it is inferred that the number of ad exposure has a positive effect on the probability of repeated purchase. In other words, the number of ad exposure is likely to increase customer’s purchased volume. In order to focus on the moderating effect of customer characteristics in this study, the focal variable of advertisement uses an aggregated level of online and offline advertisement exposure. Although the customer’s responses on online and offline advertising may be different, this study assumes that customer’s behavior on the purchased volume is mainly impacted by the number of advertising exposure instead of the specific advertising channel. Moreover, the advertising content in all channels is expected to be same and it would not drive different customer’s response.

2.2 Advertising repetition and stock

As previous section mentioned the advertisement exposure would increase the probability of purchasing a product. The effects of message repetition on attitudes and purchase intentions have been often studied (Sawyer, 1974). Advertising repetition leads to a higher persuasion, which is measured by variables such as customer’s attention, recognition, attitude and so on (Tellis, 2003; Dreze and Hussherr, 2003). According to the book “Advertising: Principles and Practice” from Moriarty et al (2014), people must see an advertisement at least nine times before they consider to buy the product and/or service that is advertised. That means an advertisement must be repeated for several times in order to get the attention of potential customer. A term of effective frequency is applied to consider the amount of times after an advertisement no longer has any effect on the potential customers. Most studies have shown that messages gain the impact for a few exposures but further exposures begin to have a negative effect. Such inverted-U curves for advertising repetition emerge in the attitudinal impact of mere exposure (Harrison, 1977) and on advertisement wearout (Calder & Sternthal, 1980). Moreover, Campbell & Keller (2003) described that ads for unfamiliar brand were processed more extensively with repetition than those ads for the familiar brand. They also suggested that at high level of ad repetition, customer may use more diverse processing to consider the inappropriateness of advertising tactics for unfamiliar brand. Batra & Ray (1986) has also suggested ad repetition continues to increase brand attitudes and purchase intentions in conditions where support and counter argument production is expected to be low. With these theory, it is inferred that ad repetition would influence customer’s perception and decision on the product.

(11)

7

period of time. Adverting impression may cumulate over time to build up brand awareness and finally to persuade the customer to buy the focal brand (Tull, 1965). In other words, present-period customer’s purchased volumes are partially a function of past present-period advertising. It may increase customer’s likelihood to purchase the certain product. Past period advertisement is seen as the cumulation of repetitive advertisement. In order to make this study comprehensively consider the advertisement effect, the lagged effects of advertising are included in the model and seen as the control variable. Although the advertisement repetition and adstock are not the main focus in this study, it potentially influences customer’s purchasing decision when they see the advertising.

2.3 The moderating effect of customer’s characteristics on the relationship between the number of advertising exposure and customer’s purchased volume.

The idea that customer characteristics would influence the acquisition of information and its effects on behaviors are not new to economics. Becker (1965) suggested that people’s opportunity value of time and the economies of the household are significant determinants of information acquisition. Homburg & Giering (2001) indicated that the demographic characteristics such as age and education play a crucial role as moderators in the relationship between psychological and behavioral constructs. While direct effects of customer characteristics are fundamental to the understanding of customer behavior, these effects may mask the underlying core difference in the customer’s transaction. Some studies have argued that hypothesizing direct effects on the customer’s response may be obvious but that is more meaningful to investigating moderating effects of external factors such as customer characteristics (Baron et al, 1986). Moreover, Bucklin and Sismeiro (2003) found that due to the cross-sectional heterogeneity among online browser and channels, reliance on observing overall online customer behavior may lead to erroneous conclusion since individual responses to online channel can be significantly different. This study indicated that if these substantial differences in behaviors among customers (or a group of customers), such discrimination could occur at least partly from people’s different reactions to the medium (e.g., the number of advertising exposure the customer gets). Similarly, this situation is possible to happen in the customer’s perception on the advertisement. To gain a more comprehensive understanding of customer behavior, it is essential to incorporate a set of moderating effect that are based on customer characteristics.

(12)

8

customer’s satisfaction level of the product (Jolodar & Ansari, 2011). According to these two studies, these moderating effects such as demographic customer variable and product involvement level, are likely to hold in the relationship between advertisement exposures and customer’s purchased volume because they are related to customer’s decision making. Here, four variables, specifically, customer’s age, education level, the number of children under 18 years old and their product involvement level, have chosen based on the existing literatures. In this section, an overview of rationale for selection of these four moderators is discussed and formulated in hypotheses.

2.3.1 Age

Age is a demographic characteristic which has drawn lots of research attention. Studies have indicated that young and elderly customers concentrate on different elements when they need to evaluate a product and/or receive product information. Compared to young people, the older have restricted information-processing capabilities (Homburg & Giering, 2001). Older adults have reduced ability to abstract construct (Levy & Sharma, 1994). Johnson (1990) suggested that people who are retirement-age (over 65 years old) used less information, spent more time reviewing and re-viewed fewer bits of information than the young people. Because of the reduced ability to evaluate the complex options, older customers reduce their consideration set (e.g., the specific product they want to buy) (Cooil & Hsu, 2007). Therefore, when the advertisement conveys an innovative and novel message to the customers and successfully attracts their attentions, people who are older are more likely to believe the information that the advertisement displays. In the comparison of the younger people, they do not have enough ability to search for other information. It makes people who are older become more easily to be persuaded by the advertisement and then purchased more volume.

(13)

9

• H1: Customer’s age positively moderates the relationship between the number of advertising exposure and customer’s purchased volume.

2.3.2 Level of education

Education increases the customer’s ability to wisely use information related to the purchasing decision (Slama & Tashchian, 1985). Education is an important determinant of an individual ability to process new information into behavioral change (Schultz, 1975). As far as customers with higher education are concerned, they might receive more advertisements and information from different channels. In addition, those people who have a higher education levels engage in greater information gathering and usage before making a decision (Capon & Burke, 1980). Therefore, a highly educated customer has a greater awareness of alternatives. From this view point, when the advertisement repeatedly displays to the individual customer, the probability of recalling the advertising messages from the highly educated people is higher than those who belong to the low education group. Gathering previous advertisement information to the high educated people is able to increase their memory-based consideration on the focal product (Shapiro et al, 1997). It indirectly strengthens customer’s willingness to purchase more volume. In addition, people with higher education are usually associated with higher income levels (Homburg & Giering, 2001). Some studies have argued that people with high education and high income are less sensitive to the product price (Narasimhan, 1984). Low price sensitivity makes people purchase more products without many concerns. When customers who belong to the high education group see the focal product in the physical store, they may have a high probability to recall the advertising information and then purchased the focal product. Since they are not price-sensitive on the product and have a greater ability to gather all information, the advertisement exposure easily persuades the highly educated customers to purchase more. Although customers’ behaviors may perform inconsistently, these reasons that are discussed above potentially motivate high education people to have a larger reaction on the advertising response. Therefore, it is hypothesized that the moderating effect of customer’s education level positively influences the relationship between the number of advertising exposure and customer’s purchased volume.

• H2: Customer’s education positively moderates the relationship between the number of advertising exposure and customer’s purchased volume.

2.3.3 Household’s characteristics: the number of children under 18 years old.

(14)

10

a person’s household size. The common household composition is determined by parental relationship, marital status and so on. A family is defined as a couple with or without children, or one parent and their children, all of whom usually reside together in the same household. According to the finding from Bawa & Ghosh (1999), family size has a positive relationship with frequency of the grocery trip because of high product needs from families. While the customer has children in the family, they would change their perceptions and brand preferences based on their present situation (Ekstrom et al, 1987; Mangleburg, 1990). It is inferred that people who have children under 18 years old would have a higher advertising response on the product volume when they see advertisement exposure. That is because the main shoppers in the household need to fulfill all families’ needs. When they see the advertisement, they are likely to evaluate the advertising message based on own and children’s need. It leads to advertisement would have more influence to the people who have more children in the household. As previous statement mentioned, customers who have more children tend to visit the physical store frequently. When those customers see the focal product in the store, they are likely to recall the advertising message that he/she got within few days since they may take their children’s needs into consideration. Recalling the advertisement makes customers increase their purchasing willingness on the product (Givon & Horsky, 1990). It makes customers who have children under 18 years old may have a high response rate of advertising exposure.

In addition, in general the children under 18 years old are accompanied by their parents. There is a high possibility that the customer and their children would see the advertising exposure together. In that case, children’s opinions may influence their parents’ purchase decisions (Mangleburg, 1990). In order to meet children’s needs, customers may change their brand preference and the purchasing volume (Hastings, 2003). For example, the advertising of food marketing in television channel has a greater influence on unhealthy food consumption of children (Harris et al, 2009). This study proved that children’s perceptions on the advertisement have indirectly influenced to their parent’s response on advertising. Moreover, because children lack the cognitive skill and life experience in resisting the persuasive claims, it is expected that they will have a positive response on most of the advertisement (Moore, 2004). Hence, from these perspectives that have been discussed above, it is hypothesized that the number of children under 18 years old in the household would have a positive moderating effect on the relationship between the advertising exposure and customer’s purchased volume.

(15)

11

2.3.4 Product involvement level

Product involvement is regarded as the personal relevance of the object based on inherent needs, values and interest (Zaichkowsky, 1985; Beatty, 1988). Customers with previous usage experienceon the product would prefer to purchase the same stuff when they are satisfied with the product choice. When those customers are highly familiar with the specific products, it positively affects their attitudes on product perception and brand reputation (Macias, 2003; Sundar and Kim, 2005). Customer’s product involvement becomes a customer’s attitude toward a brand after multiple transactions. According to the study of Quester & Smart (1998), Product involvement significantly influences customer’s product selection. Wright (1973, 1975) manipulated an experiment to the effect of product involvement on advertisement. He noted that “participants with high involvement would subsequently evaluated the product in the advertisement which they were about to see, and were given some additional background information”. Therefore, as customer’s product involvement level is high, the attitude towards ad and corporate images may become a stable construction that is more accessible from memory (Suh & Youjae, 2006). When the advertisement is displayed to the group of high product involvement, the effects of advertisement exposure and repetitive advertising are strengthened. It subsequently reflects on the increased purchased volume because the product is included in the customer’s memory-based consideration set (Shapiro et al, 1997). Compared to this, people with low involvement do not evaluate the advertisement and product information in detail. Another study indicated that highly involved people show more negative evaluations of a communication because the idea of high involvement is associated with an extended attitude of rejection (Sherif et al, 1965). A high probability of being rejected from the high involvement group because they are more critical on the products and product’s information. However, the advertising content and detailed product information are not the main focus in this study, so the exact influences of the advertising content and product quality have already been excluded and would not drive the divergent advertising response from customers. With these reasons that are shown above, it is assumed that customer’s product involvement level positively moderates the link between the number of advertisement exposure and purchased volume.

• H4: Customer’s product involvement level positively moderates the relationship between the number of advertising exposure and customer’s purchased volume.

2.4 Competitor

(16)

12

competitive effect is regarded as the control variable that represents the situation when customer also purchased other brands at the same time. Based on previous studies, the competitor variable indirectly and directly influences the customer’s purchased volume of focal brand (Keel & Padgett, 2015).

2.5 Conceptual framework

Figure 1 shows the conceptual model of this study. The core variables are the customer characteristics, which consist of age, education, the number of children under 18 years old and the product involvement level. Previous studies have shown that the advertisement would positively influence the customer’s purchased volume (Grönroos, 2004). In addition, the advertising stock strengthen the customer’s advertising response on present purchasing volume (Shapiro et al, 1997). The competitor would directly and indirectly affect the customer’s willingness to purchase the focal brand (Keel & Padgett, 2015). Due to the fact that there have already been studies showing (1) the positive relationship between advertisement and customer’s purchased volume and (2) the negative competitor’s effect on focal product, they would not be specifically researched here. Both of them are regarded as control variables. This study will put more focus on the moderating effect of customer characteristics.

(17)

13

3. Methodology

3.1 Model specification and estimation

In this section, a model will be presented and used to test the previously stated hypotheses. The model will assess the effects of customer characteristics on the relationship between the exposure to advertisements and purchasing volume. A multiplicative model (log-linear) is operated in order to assess the interaction effects between the different independent variables. The multiplicative model displays relative coefficients that might benefit the interpretation of variables. For example, the age of customers and the number of children under 18 years old are expected to interact in this study. Young parents might have relatively more children under the age of 18 compared to the elderly, of whom the children already became adults. Besides the interaction term for customer characteristics, it is expected that the price variable will interact with the time exposed to advertisements since information is a fundamental part in price-setting (Grewal & Krishnan, 1998; Kual & Wittink, 1995). In addition, the application of a simple linear regression model might face multicollinearity issues because of additional interaction terms that are created and built by multiplying independent variables into an aggregated term. Another reason to choose this multiplicative function is because not all of the variables are linear. For instance, the dummy variable for young age, that is 1 in case of belonging to this group and 0 if not. Moreover, Tellis (2003) found a relationship between the shape of repetition of advertising stimulus and customer’s response to stimulus shows a log-linear relationship. In terms of these five reasons, applying a multiplicative (log-linear model) is better than a linear regression.

Secondly, previous advertising repetitions are regarded as control variables in this study for a better interpretation and analysis. The theory of Adstock shows that current customer purchased volume is influenced by previous exposure to advertisements (Tull, 1965). Including addition lagged effect of advertising exposure is to simplify the model while remaining well-specified. The reason to consider three lagged advertisement effects is because this is the median of a seven days per week. This is proven by model fit tests and is further explained in the model validation and section 4.3 (Empirical results). An autoregression (AR) model is used to capture the linear interdependencies among multiple time series (Pandit & Wu, 1983). In order to operate this model in reality, these controlled variables assist to predict customer’s purchased volume and detect where changes might stem from.

(18)

14

preference and requirements compared to those of people who have a low involvement level in the product (Li & Hitt, 2008). Adding the customer product involvement variable may decrease the self-selection bias on customer’s product choices, but the channel that the customer uses to see the advertisement still cause a potential self-selction bias. Advertising reviews may be subject to a self-selection bias that impact customer purchase behavior (Li & Hitt, 2008). For example, when people avoid certain media and/or choose to use particular channels only, they are therefore more likely to respond to the certain channel. Under this situation, if the company still targets these customers by using the wrong media channels, the advertising response to these certain channels will appear to be artificially low because of the low probability to respond ad. Subsequently, the reason of low response is not because of the low advertisement effectiveness. This issue leads to a limitation and bias that observe the exact advertising response from the customers.

Fourthly, in this study the dependent variable has a discrete and a continuous aspect. In the discrete part, customers may first decide whether they purchase a certain product or not. The consumers that do have an interest in the product will decide on how much to spend on the article. This makes the dependent variable become a continuous variable. The Hurdle model, which is also called the two-part model, is built on solving the limitation that one model cannot answer both discrete and continuous aspects (Cameron and Trivedi, 2005). Due to this model concerning bounded outcomes, the hurdle model becomes similar to the Tobit model. The difference between those two models is that the Hurdle model provides two separate equations for bounded and unbounded outcomes, whereas the Tobit model (type 1) uses the same equations for both the bounded and the unbounded outcomes. Moreover, the Hurdle model assumes that the unbounded outcomes are a result of clearing a hurdle. The two-equation form of the Hurdle model handles either lower or upper bounding. The first part deals with the probability of observing a positive result, while the second part of model determines the value of the outcome conditional on having cleared the hurdle. This model typically specifies a Logit model first (see formula 1.1). For the second part, only positive outcomes are considered as the dependent variable. These hypotheses used in this study are all built on the second part of the Hurdle model.

Lastly, the Hurdle model includes a disturbance term to contain all variances that are not covered in the systematic part of the model. These two Formulas are being used in the same dataset (same observation and time period), the error term of Formula 1.1 is 𝜀_1𝑡 = 𝑁(0,1). For the error term of formula 1.2, it is 𝜀_2𝑡 = 𝑁(0, 𝜎2_{) . Both error terms may be}

correlated: E(𝜀_1𝑡, 𝜀_2𝑡) = 𝜎12 . In order to find the trade-off between the simple and complete

(19)

15

customers purchase products in other brands on the same day or not. If so, the exact influence on the focal product is considered and estimated in the model. In order to exactly evaluate the moderating effect of customer characteristics on the relationship between the number of advertising exposure and the customer purchased volume, the main effect (the influence of advertising exposures on the customer’s purchased volume) must exist. Therefore, Formula 1.2 is used to test the main effects for the number of advertising exposure on customer purchasing volume. To sum up, by combining the mentioned elements in the last paragraph, it is defined as a log-linear model (see Formula 1.2). In the empirical study section, Table 1 indicates the exact definition of each variable.

Formula 1.1: Probability of positive outcome 𝑃𝑉𝑖𝑡 = {

0, 𝑖𝑓 𝑃𝑉_𝑖𝑡∗ ≤ 0 𝑃𝑉_𝑖𝑡∗, 𝑖𝑓 𝑃𝑉_𝑖𝑡∗ > 0

(1.1) Formula 1.2: Model for 𝑆_𝑖𝑡∗

𝑃𝑉_𝑖𝑡∗ = 𝛽₀𝑃_𝑖𝑡𝛽1_𝛽 2 𝐴𝑑𝑣𝑖𝑡_𝛽 3 𝐴𝑀𝑖_𝛽 4 𝐴𝐸𝑖_𝛽 5 𝑀𝐸𝑖_𝛽 6 𝐻𝐸𝑖_𝛽 7 𝑃ℎ𝐷𝑖_𝛽 8 𝑁𝐵𝐶𝑖_𝛽 9 𝑃𝑂𝑖_𝛽 10 𝑃𝐿𝑖_𝛽 11 𝑃𝑀𝑖_𝛽 12 𝑃𝐻𝑖 𝛽₁₃𝐴𝑑𝑣𝑖𝑡−1 𝛽₁₄𝐴𝑑𝑣𝑖𝑡−2𝛽₁₅𝐴𝑑𝑣𝑖𝑡−3 𝛽₁₆𝐶𝑂𝑀𝑃𝑖𝑡_{exp (𝜀} 𝑖𝑡) (1.2) Where: i: individual customer t: time (day), from 1 to 90.

𝑃𝑉_𝑖𝑡∗: Customer’s purchased volume per customer per day

𝐴𝑑𝑣_𝑖𝑡: The number of advertising exposure per customer per day

𝐴𝑀_𝑖: Customer who belongs to the middle age group (under 40 years old). 𝐴𝐸_𝑖: Customer who belongs to the old age group (over 65 years old). 𝑀𝐸_𝑖: Customer who belongs to the middle education group.

𝐻𝐸_𝑖: Customer who belongs to the high education group. 𝑃ℎ𝐷_𝑖: Customer who belongs to the PhD education group. 𝑁𝐵𝐶_𝑖: How many children under 18 years old customer have. 𝑃𝑂_𝑖: Customer who belong to zero product involvement group. 𝑃𝐿_𝑖: Customer who belong to low product involvement group. 𝑃𝑀_𝑖: Customer who belong to medium product involvement group. 𝑃𝐻_𝑖: Customer who belong to high product involvement group.

(20)

16

The logit model uses the interpretation of coefficient. Positive and significant coefficient means that an increase in the variable leads to an increasing probability of purchasing product from customers. On the contrary, negative and significant coefficient represents that a decreasing probability of buying product from people when the explanatory variable increases. The second part of the model is used for evaluating the effects of customer traits on the amount of customer purchasing volume. Therefore, the original dataset is split into two parts. Consequently, the first part of the observations deals with the samples that have customer transactions, while the second part is according to the positive transaction and the situation when customers receive the advertisement on the same day or within 3 days before purchasing. Those observation are split from the original samples. The key reason to separate data is to make sure the main relationship between the advertising exposure and purchased volume would exist. Both samples are all analyzed by using the formula 1.2

These parameters of Formula 1.2 are estimated using natural logarithms of the observed purchased volume. This transformation leads to formula 2. However, to obtain the original value of these variables, anti-log transformation is applied to get an exact estimate for the variable itself. This leads to the interpretation of an explanatory variable show an elasticity. The dependent variable (customer’s purchased volume) would change n percentage if the independent variable increases one unit. Moreover, relevant analytical packages in R studio version 1.0.143 are used in this study

Formula 2: after logarithmic transformation for 𝑃𝑉_𝑖𝑡∗ > 0

ln(𝑃𝑉_𝑖𝑡) = ln(𝛽₁) + 𝐴𝑑𝑣𝑖𝑡ln(𝛽2) + 𝐴𝑀𝑖ln(𝛽3) + 𝐴𝐸𝑖ln(𝛽4) + 𝑀𝐸𝑖ln(𝛽5) + 𝐻𝐸𝑖ln(𝛽6) + 𝑃ℎ𝐷_𝑖ln(𝛽₇) + 𝑁𝐵𝐶𝑖ln(𝛽8) + 𝑃𝑂𝑖ln(𝛽9) + 𝑃𝐿𝑖ln(𝛽10) + 𝑃𝑀𝑖ln(𝛽11) + 𝑃𝐻𝑖ln(𝛽12) + 𝐴𝑑𝑣_𝑖𝑡−1ln(𝛽₁₃) + 𝐴𝑑𝑣𝑖𝑡−2ln(𝛽14) + 𝐴𝑑𝑣𝑖𝑡−3ln(𝛽15) + 𝐶𝑂𝑀𝑃𝑖𝑡ln(𝛽16) + 𝜀𝑖𝑡, 𝑔𝑖𝑣𝑒𝑛 𝑃𝑉_𝑖𝑡∗ > 0 (2) 3.2 Model validation

(21)

17

different numbers of predictors. The difference between those two methods is that adjusted R-square penalizes for adding variables which do not improve the existing model. Hence, using the adjusted R-square is better than R-square while doing a multivariate linear regression and selecting the best model. The higher the adjusted R-square, the better a model performs (Ebbler, 1975). The second way to estimate the model fit is according to information criteria. Information criteria such as Akaike Information criteria (AIC) and Bayesian information criteria (BIC) is often used to test the quality of statistical models for the given set of data. In general, BIC measure penalizes models with extra parameters more than AIC does (Burnham & Anderson, 2004). The judgement of the information criteria is the lower score of AIC (or BIC), the better model is performing.

In order to test the exact influence of hypotheses, statistical validity tests are operated. Wrong estimation of variance results in incorrect conclusions about the significance of effects. Therefore, at first, the estimation of variances should be considered in order to find if the non-normality is present. Non-non-normality occurs when the residuals are not normally distributed. If the residuals are indeed not normally distributed, the p-values that are found cannot be trusted since those p-values are the result of relying on the assumption that the distribution is normal. Therefore, it should be examined whether these variables are normally distributed. By creating the histogram plot and QQ plot, it can get a first impression on this issue. To better test for normality, several normality tests are performed. For example, Shapiro-Wilk test, Kologorov-Smirnov test and Jarque-Bera test.

Another issue that might arise is the problem of autocorrelation, which entails that the residuals are not free of information. This means that the residuals are not independent, while they should avoid a bias. On the other hand, this model includes a partial adjustment by considering the lagged effects of the number of advertising exposure. The lagged variable includes a summary of the effect of earlier marketing activities and will therefore cover the time dependencies. Nevertheless, it should be tested whether there is an autocorrelation issue in the model. This can be examined by using the Durbin Watson test. The ideal Durbin Watson is two, which indicates that there is no autocorrelation. If the Durbin Watson test is significant (either a positive or a negative correlation), this means that if the first residual is positive (negative), it is highly likely that the next residuals are also positive (negative). In this case, using OLS individually is not appropriate because it assumes that the residuals do not have such a pattern. Therefore, a GLS transformation should be performed.

(22)

18

(23)

19

4. Empirical study and results

4.1 Data description

Table 1: Overview of variables’ constructs and measurements.

Construct Variables and description Measurement

Advertisement The number of Advertisement exposure: The total amount of advertising

exposure in both online (Youtube and website) and offline channel (television).

• On the same day: Customer purchases the focal product and receive the advertisement on the same day.

• One day before: Customer purchases the focal product and receive the advertisement one day ago.

• Two days before: Customer purchase the focal product and receive the advertisement two days ago.

• Three days before: Customer purchase the focal product and receive the advertisement three days ago.

Numeric variable

Customer’s characteristic

Age

• Young: people are under 40 years old (not include 40). • Middle-aged: people are from the age 40 to 65 years old. • Elderly: people are over 65 years old (not include 65).

Ordinal variable. Dummy variable. (0: no; 1: yes)

Education level (The highest diploma)

• Low: People’s highest diploma is under prepared school for research university and applied science university (VWO/HAVO, MAVO and Basisonderwijs)

• Medium: people’s highest diploma is the prepared school for research university and applied science university (MBO, LBO)

• High: people study either research university or applied science university (HBO/WO kandidaats).

• PhD: People get the doctor degree in the research university (WO doktoraal).

The number of children under 18 years old Numeric variable.

Product involvement level:

• No involvement: people do not purchase the specific item before. • Low: light usage (The total volume of the focal product bought from the

customers is lower than 10.5 liters before.)

• Medium: medium usage (The total volume of the focal product bought from the customers is between 10.5 liters and 35 liters before.) • High: heavy usage (The total volume of the focal product bought from

the customers is over 35 liters before.)

• Unclear: people do not buy the specific item in the same panel or they do not clearly know their purchased volume before.

Purchased volume The total volume of units purchased: Due to different package size of

product, units purchased cannot represent the real purchasing situation. Therefore, in this present study the total volume of product is adopted.

Numeric variable. Volume in ml.

Other variables Price: price per liter in euros. It may be influenced by package size and

promotion.

Competitive effect: Customer also purchases other brands in the same

transaction.

Numeric variable. Dummy variable. (0: no; 1: yes)

(24)

20

over the period from 30th December, 2013 to 29th March, 2014. During these 90 days, there are 10,703 respondents who participate the survey every day. Among these 963,270 observations, there are 12,118 transactions that occupy 1.25% in the dataset. It indicates that there are 98.75% observations of customers that do not have any purchases during a 90 days period. As for the probability of the customer’s purchasing, the first part of Hurdle model estimates the explanatory variables that influence customer decision on buying the product or not. In addition, this dataset records the individual customer demographic backgrounds. For example, customer age and his or her highest education level. The data includes individual customer transactions in different grocery stores during these 90 days. Furthermore, the external stimulations such as the television exposure of an advertisement may influence customer purchased volume and are recorded. In this dataset, the online channel only considers the advertising exposure in Youtube and news websites, while the offline channel is only included in the television channel. The advertising exposure to individuals have been tracked on a daily basis. As the section of methodology discusses, customer self-selection may lead to a bias in the aggregated level effect of advertising exposure and cannot be controlled. In addition, a key assumption for the advertisement variable is that the advertising contents in distinct channels are the same and would not drive a significantly different response from customers. Although it is an individual level dataset, these participants’ responses are on behalf of the main shopper in the household. Participants are set as housewives who have more spare time and opportunities to receive advertisements because the research agency tried to understand the exact reaction on the advertisement. This survey includes measures about customer characteristics, the recording of advertising exposures and purchasing data. Table 1 provides an overview of those key constructs and their measurements.

4.2 Descriptive statistics

(25)

21

and improved understanding of these variables, the test of the descriptive statistics assists to obtain a feeling of data first. Table 2 which is based on all respondents shows the descriptive statistics of these key variables. From the descriptive statistic table, it indicated that customer characteristics display a normal distribution. The main age range of customers is in the middle-aged group which is from 40 to 65 years old. Similarly, most of the customer’s education levels are scattered around the middle and high group. In this study, the average number of children under 18 years old is 0.6. Despite the fact that there is a large proportion of unclear and no involvement, approximately one third of participants belongs to the low involvement group.

Table 2: Descriptive statistics (all observed participants)

Variables N % Mean Std. Min. Max.

The number of Advertisement exposure

• On the same day (𝐴𝑑𝑣𝑖𝑡) • One day before (𝐴𝑑𝑣𝑖𝑡−1) • Two days before (𝐴𝑑𝑣𝑖𝑡−2) • Three days before (𝐴𝑑𝑣𝑖𝑡−3)

963,270 952,567 941,864 931,161 - - - - 0.0576 0.0583 0.0589 0.0596 0.01081 0.01085 0.01089 0.01092 0 0 0 0 5 5 5 5 Age • Young (𝐴𝑌𝑖)

• Middle-aged (base line for age variable) • Elderly (𝐴𝐸𝑖) 238,770 525,600 198,900 24.79% 54.56% 20.65% - - - - - - - - - - - - Education level

• Low (baseline for education variable) • Middle (𝑀𝐸𝑖) • High (𝐻𝐸𝑖) • PhD (𝑃ℎ𝐷𝑖) 166,320 452,700 284,400 59,850 17.76% 48.35% 30.38% 6.39% - - - - - - - - - - - - - - - -

The number of children under 18 years old 963,270 - 0.6 1.0 0 5

Product involvement level

• Unclear (baseline for involvement variable) • No involvement (𝑃𝑂𝑖) • Light (𝑃𝐿𝑖) • Medium (𝑃𝑀𝑖) • High (𝑃𝐻𝑖) 208,980 227,700 265,050 156,150 105,390 22.32% 24.32% 28.31% 16.68% 11.26% - - - - - - - - - - - - - - - - - - - -

Purchased volume (in ml) (𝑃𝑉𝑖𝑡) 963,270 - 49.5 654.6 0 54,000

Price per liter (in euros) (𝑃𝑡) 963,270 - 0.02 0.2 0.0 4.75

Competitor effect (dummy variable) (𝐶𝑂𝑀𝑃𝑡) 19,421 2.02% - - - -

(26)

22

Thirdly, to calculate the exact effect of customer responses on purchased volume, a new dataset that excludes no purchasing is extracted from the original dataset. There are 12,118 observations with 3,903 respondents who have a certain purchasing behavior during these 90 days. 1.25% of the observations showed a purchased volume over zero. Although there are small adjustments on the percentage of customer characteristics, the main customers still belong to the middle education and middle-aged group. The only difference is that the percentage of customer’s involvement on medium group (from 16.68% to 28.83%) and high group (from 11.26% to 28.14%) increase. In order to evaluate the moderating effect of customer characteristics on the relationship between the number of advertising exposure and customer’s purchased volume, it not only considers the situation when the advertising exposure and the purchased happened on the same day, but also includes the situation when customers receive the advertising within three days before purchasing. There are 3,398 samples with 1,641 different respondents during these 90 days. That means 28.04% of observations among the transactions received the advertising and then purchased the product either on the same day or within 3 days. Compared to the original data that is shown on the Table 2, the percentage of medium product involvement group significantly increases from 16.68% to 30.55%, while the high product involvement group also grows from 11.26% to 33.46%. As the section of methodology mentioned, this study cannot control the customer self-selection issue. Table 3 indicates the descriptive statistics under both two situations. The mean value of the number of advertising exposure under the situation when advertising is displayed on the same day or within 3 days is higher than when this is not the case.

(27)

23 Table 3: Descriptive statistics (Participants who have a purchasing behavior)

People who have purchasing behavior

People who receive the advertising and also have the purchasing behavior on the same

day or within three days

Variables N % Mean N % Mean

The number of Advertisement exposure

• On the same day (𝐴𝑑𝑣𝑖𝑡) • One day before (𝐴𝑑𝑣𝑖𝑡−1) • Two days before (𝐴𝑑𝑣𝑖𝑡−2) • Three days before (𝐴𝑑𝑣𝑖𝑡−3)

12,218 11,856 11,672 11,661 - - - - 0.0618 0.0577 0.0619 0.0579 3,398 3,398 3.398 3,398 - - - - 0.2204 0.2015 0.2129 0.1994 Age

• Young (baseline for age variable) • Middle-aged (𝐴𝑀𝑖) • Elderly (𝐴𝐸𝑖) 3,358 6,784 1,976 27.71% 55.98% 16.32% - - - 732 2115 551 21.54% 62.24% 16.22% - - - Education level

• Low (baseline for education variable) • Middle (𝑀𝐸𝑖) • High (𝐻𝐸𝑖) • PhD (𝑃ℎ𝐷𝑖) 1,970 5,640 3,770 738 16.26% 46.54% 31.11% 6.09% - - - - 562 1,555 1,062 218 16.57% 45.76% 31.25% 6.416% - - - -

The number of children under 18 years old 12,218 - 0.6 3,398 - 0.6059

Product involvement level

• Unclear (baseline for involvement variable) • No involvement (𝑃𝑂𝑖) • Light (𝑃𝐿𝑖) • Medium (𝑃𝑀𝑖) • High (𝑃𝐻𝑖) 1,822 696 2,696 3,494 3,410 15.04% 5.74% 22.25% 28.83% 28.14% - - - - - 285 201 737 1,038 1,137 8.387% 5.915% 21.69% 30.55% 33.46% - - - - -

Purchased volume (in ml) (𝑃𝑉𝑖𝑡) 12,118 - 3,935 3,398 - 4,158

Price per liter (in euros) (𝑃𝑡) 12,118 - 1.285 3,398 - 1.271

Competitor effect (dummy variable) (𝐶𝑂𝑀𝑃𝑡) 831 6.86% - 270 7.946% -

4.3 Empirical results

(28)

24

4.3.1 Key findings

Before evaluating the exact influence on these hypotheses, the test of model fit is operated. Table 4 shows the result on the comparison of the model fit. According to the section of methodology, model with low AIC/BIC score and high adjusted R-square performs best. Since in this study the advertising lagged variables are seen as the control variable, it needs to prove that the model which includes three lagged advertising variables performs best among these 8 models. By viewing the result of Table 4, adding one more lagged variable would decrease the AIC/BIC score and slightly increase adjusted R-square. However, in order to find the trade-off between simple and complete model, the adjusted R-square is the main judgement to do the model selection. The adjusted R-square in model 3 and 4 are same with the value of 0.4111. That means compared to the model 3, adding another lagged variable of advertisement in the model 4 would not increase the explanation ability of the dependent variable. Similarly, the AIC and BIC score between model 2 and model 3 do not have a significant difference but the adjusted R-square on the model 3 is higher than model 2. It indicated that the model 3 relatively performs better than model 2. With these two reasons, including three lagged advertisement variables in the model is the better choice in this study.

Table 4: The comparison of the model fit

Model The number of lagged variables which is included1 _df _AIC _BIC _R-square _{Adjusted R-square} Model.0 Model.1 Model.2 Model.3 Moder.4 Model.5 Model.6 Model.7

0 (did not consider any lagged advertising.) 1 (includes one day before)

2 (includes one and two days before)

3 (includes from one to three days before)

4 (includes from one to four days before) 5 (includes from one to five days before) 6 (includes from one to six days before) 7 (includes from one to seven days before)

15 16 17 18 19 20 21 22 2607 2553 2519 2516 2492 2458 2417 2409 2618 2560 2531 2530 2506 2473 2433 2425 0.4084 0.4105 0.4116 0.4119 0.4120 0.4128 0.4146 0.4151 0.4077 0.4098 0.4108 0.4111 0.4111 0.4119 0.4136 0.4140

After the test of model fit, the logistic regression is operated in order to investigate which elements would influence customer’s purchasing decisions on the product (Yes/No). However, in this empirical study, the product’s price and competitor variable are only recorded in the situation when the customer has a transaction. In other words, if customers did not purchase the focal product, these two variables show a zero value. Therefore, considering these two variables in the logit regression leads to a large statistical bias. For example, a bias may happen under the circumstance that price positively influence customer’s decision. It means that when product’s price increase one euro, the probability of purchasing the focal product significantly grows. It is not a logical outcome and interpretation. The main reason is because price is a positive and continuous variable and this variable is only recorded when there is a customer’s transaction.

(29)

25

Similarly, the competitor variable faces the same problem. Because of the data limitation on both product’s price and competitor variable, those two variables are not included in this logit model section. In addition, customer’s purchasing decision on buy or not to buy is not main focus of this study, so this logit regression is used to explore the relationship and influence of these customer characteristics. Before explaining these coefficient, the multicollinearity test is operated. All VIF scores of independent variables are under 5, while the highest VIF score is 2.13 for the variable of heavy product involvement. It is concluded that the multicollinearity problem would not exist in this logit model. According to the result of Table 5, it indicated that the advertising effects (either on the same day or within three days before purchasing) do not have a positive stimulus to the customers. It may make the main relationship between the number of advertising exposure become weaker. The reason that shows either insignificant or negative coefficient of advertising variables is because customers who see the advertisement did not have a purchasing behavior. The exact link between the advertising and customer’s purchased volume is investigated by the following section. Furthermore, this outcome shows that young people who are under 40 years old tend to purchase this focal product than older people. Higher education customers will buy the focal product than those who are lower educated. As for the variable of the number of minors, it shows a negative influence on the purchasing decision. When customer’s product involvement level increases, the individual’s willingness to shop for the focal product grows up.

Table 5: Hurdle model- the first part (logistic model, without the price and competitor)

***p < 0.001, **p < 0.01, p* < 0.050.

On behalf of the results from the first part of the model, customers with different characteristics have divergent responses on purchasing decision. Consequently, the second part of the Hurdle model is applied in order to estimate which variables would influence the customer’s purchased volume. Because all observations in this section are based on the positive transaction outcome,

Variables Estimate Std.Error Z-value Pr(>|z|) (Intercept)

The total advertisement exposures on the same day (𝐴𝑑𝑣𝑖𝑡)

-4.59027 0.06713 0.03640 0.04212 -126.09 1.594 2e-16 0.11101 *** Middle age group (𝐴𝑀𝑖)

Elderly age group (𝐴𝐸𝑖)

-0.19753 -0.08632 0.03224 0.02268 -2.677 -8.710 0.00742 2e-16 ** *** Middle education group (𝑀𝐸𝑖)

High education group (𝐻𝐸𝑖) PhD education group (𝑃ℎ𝐷𝑖) -0.06526 0.12544 0.12709 0.02741 0.02922 0.04521 -2.381 4.293 2.811 0.01724 1.76e-5 0.00494 * *** ** The number of children under 18 years old -0.07553 0.01007 -7.502 6.28e-14 *** No involvement group (𝑃𝑂𝑖)

Light involvement group (𝑃𝐿𝑖) Medium involvement group (𝑃𝑀𝑖) High involvement group (𝑃𝐻𝑖)

(30)

26

the variable of product’s price and competitor are included in the following analysis. All the relevant hypotheses are mainly tested here. In order to get the feeling of the dataset first, Table 6 shows the results when customers have the transaction during these 90 days with 12,218 observations. As the section of methodology mentioned, these variables would be explained as an elasticity which influences the dependent variable. This table indicates the general idea about which variable would have an influence on the customer’s purchased volume in the empirical study. By viewing the whole outcome, it shows that the total number of advertisement exposure has a negative influence on the customer’s purchased volume. Customer’s purchased volume (ml) will decrease by 5.75% when the number of advertisement exposure increases one unit, with the p-value lower than 0.05. This analytical outcome is out of expectation because there are many samples (around 72% among 12,218 observations) when customers purchase the certain product without receiving any advertisement. This situation may sign that no matter how many advertisement exposures the customer gets, customers still buy the products without being influenced by the advertisement. This reason is because customer’s self-selection on the product. In order to decrease the potential bias of advertisement effects on customer’s purchased volume, the next part of analysis (Table 7) only considers when customers receive the advertisement on the same day or within three days. As for the exact moderating effect of customer characteristics on the relationship between the number of advertising exposure and purchased volume, it will be discussed on the result of Table 7.

Table 6: Multiplicative model (People who have the purchasing behavior)

Variables Estimate Std.Error t-value Pr(>|z|) real value2 (Intercept)

Total advertisement exposure on the same day (𝐴𝑑𝑣𝑖𝑡)

8.229205 -0.058736 0.028385 0.029889 289.912 -1.965 2e-16 0.04942 *** * 3747.342 0.942535 Middle age group (𝐴𝑀𝑖)

0.027165 0.058968 0.016062 0.022592 1.691 2.610 0.09081 0.00906 ** - 1.061532 Middle education group (𝑀𝐸𝑖)

High education group (𝐻𝐸𝑖) PhD education group (𝑃ℎ𝐷𝑖) 0.026790 0.037570 0.012886 0.019361 0.020619 0.031890 1.384 1.822 0.404 0.16647 0.06846 0.68618 - - - The number of children under 18 years old -0.008494 0.007149 -1.189 0.23431 - No involvement group (𝑃𝑂𝑖)

-0.273890 -0.251252 -0.068898 0.297521 0.032619 0.022885 0.021474 0.021754 -8.397 -10.979 -3.208 13.677 2e-16 2e-16 0.00134 2e-16 *** *** ** *** 0.760011 0.777623 0.933207 1.346198 Ads.one.day.before (𝐴𝑑𝑣𝑡−1) Ads.two.days.before (𝐴𝑑𝑣𝑡−2) Ads.three.days.before (𝐴𝑑𝑣𝑡−3) -0.005224 0.017754 0.015406 0.031003 0.029951 0.031459 -0.169 0.593 0.490 0.86618 0.55335 0.62435 - - - Price per liter (in euros) (𝑙𝑜𝑔(𝑃𝑡)) -1.684422 0.021819 -77.198 2e-16 *** 0.185507 Competitor effect (dummy variable) (𝐶𝑂𝑀𝑃𝑡) -0.258464 0.026754 -9.661 2e-16 *** 0.771961 ***p < 0.001, **p < 0.01, p* < 0.05

(31)

27

As the section of descriptive statistics mentioned, there are 3,398 observations that either receive the advertisement and then purchase the product on the same day or receive the advertisement within three days before purchasing. In order to test these hypotheses, the log-linear regression results are shown in Table 7. However, before these findings are discussed, the validity test is applied to detect the extent to which a concept, conclusion or measure is well-founded and whether it accurately corresponds to the real world. In the methodology section, there are ways to estimate statistical validity. An incorrect conclusion would happen if there is a wrong estimation of variance. Hence, the tests of the disturbance term, which are non-normal and autocorrelated are used in this study. Non-normality tests assume that residuals should show a normal distribution. In order to get a first impression, a histogram for this model was created. Figure 2 shows a bell-shaped histogram. Figure 3 presents a QQ plot which is created to compare the actual value of the residuals with the expected values in a normal distribution. To better test for non-normality, Shapiro-Wilk normality test are performed. The results of the test show non-normality if the p-value is significant, since then it is the case that there is a significant deviation from normality. The p-values of the Shapiro-Wilk test for this model is 0.2929, which means there is no problem arising from non-normality.

Table 7: Hurdle model – the second part of the model (OLS, People who receive the advertising and purchased on the same day or within 3 days before.)

Variables Estimate Std.Error t-value Pr(>|z|) real value2 (Intercept)

8.233296 -0.044877 0.057912 0.034758 142.168 -1.291 <2e-16 0.1967 *** 3757.913 - Middle age group (𝐴𝑀𝑖)

0.013583 0.013803 0.032062 0.044175 0.424 0.031 0.6719 0.7547 - - Middle education group (𝑀𝐸𝑖)

High education group (𝐻𝐸𝑖) PhD education group (𝑃ℎ𝐷𝑖) -0.001346 0.072836 0.016514 0.036371 0.038879 0.059206 -0.037 1.873 0.279 0.9705 0.0411 0.7803 * - 1.074742 - The number of children under 18 years old -0.008110 0.013676 -0.593 0.5532 - No involvement group (𝑃𝑂𝑖)

(32)

28 Figure 2: Histogram of residual Figure 3: QQ-normally plot

Another disturbance test is autocorrelation test. The autocorrelation disturbance test is examined by the Durbin Watson test. The ideal Durbin Watson value is two, which indicates that there is no autocorrelation at all. In this designed model, the DM estimate is 1.4966, which is lower than two and the test is significant with p-value under 0.000. It shows that there is a positive autocorrelation in this model. Due to this autocorrelation problem, using OLS individually is not appropriate because it assumes that the residuals do not have autocorrelated patterns of the disturbance term. The main mistake of autocorrelation is the wrong estimation of the variance of effects, the exact value of significant variables may have deviated. A new GLS model is built according to the significant variables in the result of Table 7 and the advertisements variables that are the focal independent variables. After GLS transformation, the Durbin Watson estimate of this model is 2.0397, with the p-value of 0.8753 that is not significant in Durbin Watson test. This means that there is no autocorrelation issue in this model anymore. A new table (Table 8) is created to interpret the exact estimation of each significant variable.

(33)

29

variables that exclude the explanatory variable of heavy product involvement group are under 3.

Table 8: Hurdle model – the second part of the model (GLS, People who receive the advertising and purchased on the same day or within 3 days before purchasing.

Variables Estimate Std.Error t-value Pr(>|z|) Real value2

(Intercept)

Ads.one.day.before (𝐴𝑑𝑣𝑡−1) Ads.two.days.before (𝐴𝑑𝑣𝑡−2) Ads.three.days.before (𝐴𝑑𝑣𝑡−3) 6.15555 -0.05087 -0.01811 0.03860 0.01511 0.03916 0.03302 0.03313 0.03244 0.03476 157.02 -1.541 -0.547 1.190 0.435 <2e-16 0.1235 0.5847 0.2342 0.6630 *** 470.964 - - - - High education group (𝐻𝐸𝑖)

No involvement (𝑃𝑂𝑖) Light involvement group (𝑃𝐿𝑖) Medium involvement group (𝑃𝑀𝑖) High involvement group (𝑃𝐻𝑖)

0.07207 -0.32567 -0.30556 -0.10302 0.21794 0.02935 0.06844 0.05413 0.05285 0.05389 2.456 -4.758 -5.644 -1.949 4.120 0.0141 2.04e-6 1.79e-8 0.0413 3.88e-5 * *** *** * *** 1.07426 0.72035 0.73563 0.90084 1.24170 Price per liter (in euros) (𝑙𝑜𝑔(𝑃𝑡))

Competitor effect (dummy variable) (𝐶𝑂𝑀𝑃𝑡)

-1.63257 -0.19882 0.03919 0.04608 -41.654 -4.315 2e-16 1.64e-5 *** *** 0.19527 0.81882 Adjusted R-square: 0.400 ***p < 0.001, **p < 0.01, p* < 0.05

As the previous paragraph mentioned, the exact influence on the dependent variable would be analyzed by the GLS model (Table 8). Based on the result of log-linear regression, there are 5 findings. First of all, in the literature review section, the idea that advertisement positively motivates the purchased volume has been discussed. However, the results of Table 7 and 8 indicate that the number of advertisement exposure will not have a significant impact on the customer’s purchased volume. In addition, advertisement stocks within three days do not have any influence on the dependent variable. This analytical outcome is not in line with the original expectation. A possible explanation that lead to this situation might be the advertisement content, the channel on which the advertisement is displayed and the customer’s personal perception.