• No results found

Does online advertising influence in-store purchase?

N/A
N/A
Protected

Academic year: 2021

Share "Does online advertising influence in-store purchase?"

Copied!
77
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Does online advertising influence in-store

purchase?

Attributing credit to the effectiveness of online and offline

advertising channels on offline conversion.

Tim Menkveld

University of Groningen

Faculty of Economics and Business

MSc. Marketing Intelligence

Master Thesis

February 13, 2017

Sluysoort 58,

3602AS Maarssen

Tel: +31 6 13797729

Email: t.m.menkveld@student.rug.nl

Student number: 1994980

1

st

Supervisor:

Prof. Dr. J.E. Wieringa

j.e.wieringa@rug.nl

(2)

Abstract

New technologies and the Internet have facilitated that consumers spend short amount of times on different types of media channels. This poses a challenge for businesses in

allocating their marketing budgets. Understanding the effectiveness of each marketing touch point and their role in the customer’s journey leading to conversion is becoming more

important for understanding how to allocate advertising budget. How much credit should be allocated to each touch point prior to this conversion is a difficult question that is addressed in attribution modeling. Both the Interactive Advertising Bureau and Marketing Science Institute emphasized that cross-channel measurement and attributing are important topics for the coming years. Moreover, researchers have requested to approach this problem with more powerful methods. This paper aims to contribute to both requests by developing an Artificial Neural Network to research the following question: “Is online advertising effective on

offline conversion?”.

To this end, secondary weekly panel data of 11672 households collected over 31 weeks by a market research institute on a large consumer durables store is utilized. The effectiveness of different advertising channels is studied by developing attribution models in order to understand their contribution towards in-store purchase. Four commonly used intuition-based models are developed as well as a state-of-art Bagged Logistic Regression and an Artificial Neural Network.

(3)

Preface

During my Msc. Marketing Intelligence and Marketing Management I have discovered a passion for data-analysis. I am very glad for the professional, knowledgeable and passionate professors and staff that have developed and taught this great masters program. The masters program required a lot of work but was very rewarding in what we learned. These skills and knowledge will certainly prove valuable for my future career. Since 2010 I am studying at the University of Groningen and have enjoyed studying at the university and all that it offers. I think and hope I have taken advantage of all the opportunities and learning experiences it has to offer.

I would like to thank you supervisor, Jaap Wieringa, a lot for helping me with writing this thesis. I am sure that without his help and contribution this paper would not have the same end-result. I would also like to thank my thesis-group for helping and motivating me along the way. Lastly, I would like to thank Hans Risselada for taking the time and effort to be the second supervisor for my thesis.

Tim Menkveld

Groningen,

(4)

Table of Contents

ABSTRACT ... 2 PREFACE ... 3 1. INTRODUCTION ... 6 2. LITERATURE REVIEW ... 11 2.1 EFFECTIVENESS OF ADVERTISING ... 11

2.1.1 The Customer Journey ... 12

2.1.2 Customer and Firm Initiated Contact ... 14

2.2 LASTING EFFECT OF ADVERTISING ... 16

2.3 CROSS-CHANNEL EFFECTS ... 18

2.3.1 Online marketing ... 18

2.3.2 Online targeting ... 20

2.3.3 Retargeting of consumers ... 20

2.4 ATTRIBUTION MODELING ... 22

2.5 CONTROL VARIABLES ... 24

2.6 CONCEPTUAL MODEL AND HYPOTHESIS OVERVIEW ... 24

3. DATA ... 26 3.1 DATA COLLECTION AND DESCRIPTION ... 26

3.1.1 Marketing channels ... 26

3.1.2 In-store purchase ... 27

3.1.3 Household characteristics ... 28

3.2 OUTLIERS AND MISSING VALUES ... 28

3.3 VARIABLE OVERVIEW ... 30

4. MODELING ... 31 4.1 MODEL FRAMEWORK ... 31

4.1.1 INTUITION-BASED RULE MODELS ... 31

4.2 DATA DRIVEN MODELS ... 32

4.2.1.1 Control variables ... 33

4.2.1.2 Lagged effects ... 34

4.2.2 BAGGED LOGISTIC REGRESSION: ... 35

4.2.3 ARTIFICIAL NEURAL NETWORK: ... 37

(5)

6.3 LASTING EFFECT OF ADVERTISING ... 59

6.4 CROSS-CHANNEL EFFECTS ... 60

6.5 ATTRIBUTION MODELING ... 61

7. CONCLUSION ... 62 7.1 SCIENTIFIC CONTRIBUTIONS ... 62

7.2 MANAGERIAL IMPLICATIONS ... 63

7.3 LIMITATIONS & FURTHER RESEARCH ... 63

(6)

1. Introduction

The uncertainty companies face when allocating their marketing budget increases as the complexity of the media landscape increases. The Internet and new technologies facilitated that consumers spend short amount of time on different types of media channels (Steinberg, 2012). However, they also enable firms to make record of every touch point a consumer makes in their journey before conversion (Li and Kannan, 2014). As consumers make multiple touches with advertisers on different media channels, understanding the effectiveness of each touch point and their role in the customer’s journey leading to conversion is becoming more important for understanding how to allocate advertising budget. An attribution model seeks to use advanced analytics to allocate appropriate credit for a customer action to each touch point across all online and offline channels (Moffett et al., 2014). Such models are used to provide insights in the factual contribution of an individual channel in a user’s probability of conversion.

Traditionally, firms use aggregated data and the last touch directly preceding conversion as an indication of the effectiveness of a marketing channel (Kannan, Reinartz and Verhoef, 2016). Such conventional metrics based on last-touch-attribution (LTA) - e.g. Click Through Rate (CTR) or Cost Per Acquisition (CPA) - might show passable results in low-involvement product categories where there are one or two touch-points before conversion. However, these metrics are misleading for high-involvement product categories (e.g. travel service and consumer durables) where there is a longer customer journey and multiple channels (Li & Kannan, 2014). Nonetheless, in practice, these multiple touches that a consumer makes with the firm before a conversion takes place are rarely taken into account when measuring the effectiveness of advertising channels (Li and Kannan, 2014).

(7)

consumer is followed by a Display ad that is followed by a conversion. In practice, firms utilize several intuition-based rules: based on last-touch attribution all credits should go to the Display advertisement. Based on first touch attribution all credits should go to the TV commercial, neglecting the search query and Display ad. Several other artificial rules try to create insights into the effectiveness of these channels. These obtained insight could result in suboptimal media mix budget allocation. The contribution of each channel that played a role in this conversion is not considered. How much credit should be allocated to each touch point prior to this conversion is a difficult question that is addressed in attribution modeling. This improves a company’s understanding of the effectiveness of their marketing activities in order to allocate their budget accordingly. Moreover, this provides companies insights on successful and unsuccessful customer journeys. Not understanding the effectiveness of advertising channels allow firms to over and underspend on certain channels - leaving money on the table (Kireyev, Pauwels & Gupta, 2015).

(8)

attribution models should incorporate the two worlds and investigate cross-channel effects. The general challenge with offline advertising is that this is often based on aggregate-level data making it difficult to study individual-level effects. However, for this study use is made of individual online and offline level data. This paper aims to expand on the current knowledge of the impact of online advertising on offline conversion. Therefore, the following research question is stated:

Is online advertising effective on offline conversion?

To this end, secondary weekly panel data collected by a market research institute on a large consumer durables store is analyzed. Record was made of every marketing touch point with the household over almost a year and their purchase in-store. The effectiveness of different advertising channels is studied by developing attribution models in order to understand their contribution towards in-store purchase. These models show that simple intuition-based rules provide biased results as well as that this company’s utilization of channels is not aligned to the effectiveness of channels. Some marketing channels are used frequently but their contribution is marginal. On the other hand, several channels are used infrequently and their contribution is relatively large. A better understanding of the actual contribution of channels could help this company to spend their advertising budget more effectively – and not leaving money on the table.

(9)

machines while still be able to derive the user-level attribution assignment. Wedel and Kannan (2016) note that machine learning is popular in practice, but have seen little research in marketing academia. Furthermore, this study is relevant for this company in a practical way. The company of this study particular relied heavily on door-to-door folder advertising. Regulators in the Netherlands are creating an opt-in system for these folders, which will decrease the reach drastically - making this advertising tool less effective (RTL Z, 2016). It is therefore interesting to investigate strategies to allocate this company’s budgets after this new regulation is installed in order to optimize the company’s media mix.

(10)
(11)

2. Literature Review

Understanding the individual and complementary roles of different channels, media and devices is important for firms to understand as customers go through a series of touches in their customer journey (Kannan, Reinartz and Verhoef, 2016). Especially for the retail environment, which is increasingly transforming in an omni-channel environment (Verhoef et al. 2015). To understand the role of individual and complementary roles of channels marketers use attribution modeling, the goal of attribution is to provide insights in individual channels’ contribution to the probability of conversion. These insights are helpful for marketers for shaping their media mix.

Providing appropriate credit to the multiple touches of the customer in his journey to purchase has been subject of research over the past years. Studies have provided fruitful insights in a broad spectrum of topics that highlight the importance of measuring the impact of a all channels over a longer period of time. The focus has been especially on four topics: the consumer journey, firm versus consumer initiated contact, carryover and cross-channel effects within and between online and offline channels. The following section argues from existing literature based on these four topics that a data driven multi-touch attribution model is necessary and that this should include both online and offline media channels in assessing the effectiveness of channels. Emphasis is on the effects of online marketing channels on offline conversion.

2.1 Effectiveness of advertising

(12)

2.1.1 The Customer Journey

The consumer journey is a process consumers go through before deciding to buy a product, which is psychological at heart. Wijaya (2012) provides an overview of customer journey models. All theories conceive the same basic idea of Awareness, Interest, Desire and Action as originally developed by Lewis (circa 1900). Wijaya (2012) describes that more recent models have added more stages to incorporate changes in advertising. New aspects are added to the journey such as search or like and share on social media. Marketing models often do not include metrics based on the consumer journey as the minds of consumers are considered a ‘black box’. Information on these metrics are collected through regular surveys. This is a very resourceful practice (Srinivasan, Vanhuele and Pauwels, 2010). However, Srinivasan, Vanhuele and Pauwels (2010) showed that including these metrics in marketing modeling offers extra value. Moreover, advertisers often rely on these related metrics to indicate the effectiveness of an advertisement. Therefore, the four stages (Awareness, Interest, Desire and Action) of the customer journey are studied in further detail to understand the impact of advertising on different stages in the journey.

The awareness stage is influenced by advertisements where consumers take note of a communication by a firm. The consumer is aware of the existence of a brand of product. Several studies indicate that brand awareness - the ability of a consumer can recognize and recall a brand in different situations (Aaker, 1996) - has positive influence on the purchase decision. For example, Li and Kannan (2014) show that branding advertisements (e.g. display banner ads or TV ads) seem to benefit ultimate likelihood of conversion. Therefore, raising awareness of a brand can have a positive effect on the probability to conversion.

(13)

information from a set of products or brands - called the consideration set. Memory factors aid to become part of a consideration set (Nedungadi, 1990). Repeated exposure increases the probability of remembering and liking a message. This links to the findings of Manchanda et al. (2006) that the number of display impressions has a positive effect on purchase probability. Moreover, for a consumer to be even considering a brand during a buying situation, first the salience of a brand must be raised (Percy and Rossiter, 1992). Brand Salience is defined as the propensity of a brand to be noticed or come to mind in buying situations (Romaniuk and Sharp, 2004).

For the desire and action stage, brand awareness still plays an important role. According to Keller (1993) and Macdonald and Sharp (2000), consumers tend to buy a product or brand they know well and are familiar with. Higher brand awareness has a positive relation to the purchase decision. Unlike the attention and interest stage, the effectiveness of latter stages is measured differently. For example, using metrics as conversion rate and cost per acquisition. The former stages are measured using metrics as ad recall and brand recognition.

The customer journey highlights that the different mental stages before purchase and branding efforts are important for marketing practitioners to understand. Only attributing credit to the advertisement that was prior to a conversion does not do justice to the advertisements throughout the customer journey - the influence of the advertisements to raise attention or liking towards a brand. Neglecting this could paint a biased picture. Therefore, this paper argues that different stages in the consumer journey should be incorporated when trying to understand the effectiveness of an advertisement channel. This prevents a marketer from under- or overvaluing advertising channels. Therefore, the following hypothesis is formed:

H1: Including characteristics of the customer journey significantly improves a model’s ability

(14)

2.1.2 Customer and Firm Initiated Contact

Next to focus on the customer journey, research on channel effectiveness has provided insights on the value of firm initiated contacts (FICs) and customer initiated contacts (CICs). Firm initiated contacts, e.g. television, radio or e-mail, are the result of the firm pushing a message (Shankar and Malthouse, 2007). Customer initiated contacts are triggered by (prospective) customers (e.g. search or price comparison websites).

Several studies find that customer-initiated touch points are a lot more effective than firm-initiated touch points (e.g. Li & Kannan (2014) and Bowman and Narayandas (2001)). De Haan, Wiesel and Pauwels (2015) show that firm initiated contact have significant elasticities in 53.3% of the cases. In other words, in half of the cases a positive return on investment is found as advertising spending increased with one percent. The other half of the cases a negative return on investment is found, leaving one uncertain whether advertising activities will be effective. For CIC this study showed a 70.0% elasticity: 21 of 30 cases had significant positive elasticities indicating that these channels often show a positive return on investment. The researchers find that this is because that CIC channels are often further down the funnel. For example, visiting a price comparison site is a CIC channel - already indicating an interest in the product and thus has a higher propensity to convert.

However, to make a distinction between FIC and CIC is troublesome: First, firms can influence customer-initiated contact to the extent that it is unclear whether this is purely an effort by the customer. For example, a search query for a product will list websites at the top with the highest probable rate of conversion based on a number of factors. These factors are actively influenced and manipulated by firms to reach higher positions and thus chances on clicks (Berman and Katona, 2013; Shih, Chen and Chen, 2012), a practice called Search Engine Optimization. Moreover, ads shown among these search results are also considered Customer Initiated Contact even though the firm placed them there for relevant keywords.

(15)

on a platform – e.g. sponsored news articles. For the New York Times, this is a large share of digital income while these are basically sponsored news articles. For Facebook, this is disguised content that are actually advertisements.

A consequence of the fact that the distinguishing between FIC and CIC is troublesome is that some channels are allocated an unfair amount of credit. For example, direct visit on a website is considered Customer Initiated Contact in De Haan, Wiesel and Pauwels (2015) paper. However, this does not take into account that this brand must have had high brand awareness among these customers due to past branding efforts or positive experiences (Percy and Rossiter, 1992).

Therefore, this paper relies on the insights of the De Haan et al. (2015)’s paper: the key in this distinction is making a difference between upper and lower funnel marketing channels. Typically, FIC channels are upper funnel channels. Firms try to bring across a message of a brand or product to increase the awareness or interest in a product. On the other hand, CIC channels - which are sometimes difficult to distinguish from FIC - are typically lower funnel. Namely, a clear interest is shown when a consumer searches for prices, product information, or comparison websites. The distinction between FIC and CIC are not made. However, careful inspection of which channels are typically lower or upper funnel in the consumer journey is still relevant. The findings shown by Wiesel et al. (2011) indicate that simple attribution models overvalue in attributing credit towards the effectiveness of a channel that is lower funnel in regard to not taking into account the upper funnel as well. It is therefore interesting to study the different conclusions one would draw comparing the effectiveness of channels based.

(16)

H2: Lower funnel marketing channels receive a larger share of credits in last-touch attribution

models compared to multi-touch attribution models

H3: Upper funnel marketing channels receive a smaller share of credit in last-touch attribution

models compared to multi-touch attribution models

In summary, the customer journey indicates that a consumer is prone to different advertising in different stages. Not all channels play an immediate role before conversion. Some channels’ role is earlier in the customer journey while some are more relevant later. The models will indicate whether it is relevant to consider the consumer journey – or the past advertising exposures – and whether a biased result will show for lower funnel marketing activities and upper funnel marketing activities.

2.2 Lasting effect of advertising

The above section shows that marketers study the effectiveness of advertising channels through the customer journey. Scientist should also study how long an advertisement is effective, the time frame it has an impact on other actions. In practice, intuition-based rule attribution models only consider when an advertisement was effective, such as first or last touch in a successful the customer journey. Data-driven attribution models also consider when an advertisement was not effective in a customer journey, also when no conversion took place. Therefore, this paper considers carryover effects.

(17)

words, they show that the effectiveness of advertising drops when it is used frequently. Therefore, carryover effects can be positive but also negative and should be part of an attribution model.

In the case of offline conversion such clear demarcations do not exist, as is the case for online conversion (e.g. Li and Kannan (2014)). Therefore, we study the longitudinal effect of advertising on the probability of conversion. How long an advertisement has effect on the probability on conversion is unknown. Studies on long-term effectiveness of ads are inconclusive on this time frame (e.g. Wiesel, 2011; Breuer, Brettel and Engelen, 2011). The only justified conclusion from these studies is that this is industry, marketing channel and data dependent. Therefore, this paper relies on an intuition-based rule and an assumption: Marketing practitioners that apply attribution models often use a 30 days look-back period - or 4 weeks - in which the touches with consumers are attributed for (Google, 2016). Another reason to use a four-week look-back period is that Kireyev, Pauwels and Gupta (2015) find (in accordance to Lecinski (2011)) that consumers perform online searches 1 to 4 weeks prior to purchasing consumer durables goods (which are the products in this research). Since consumers are more prone to respond to advertisements when they are interested in a product this means the advertisements shown in this period could be more effective. Therefore, for this study, the assumption is made that consumers enter the consideration stage four weeks prior to purchase and are prone to advertisements in that period. As carryover effects can be positive or negative – depending on the channel – both types are hypothesized. To test this assumption different models are fitted that include a different time-span for which ads are included in the model. This brings us to the following hypothesis: H4: Marketing channels have positive carryover effects up to four weeks prior to purchase for

consumer durables.

H5: Marketing channels have negative carryover effects up to four weeks prior to purchase for

(18)

2.3 Cross-channel effects

Next to research on the (lasting) effectiveness of advertising, literature on attribution study cross-channel effects. Namely, in a complex media landscape where users utilize different devices at different times and situations one could expect that customers switch between channels often. Therefore, from a marketing practitioner’s point of view it is interesting to further understand these effects of advertising in order to create a fuller picture of its effects. For example, after seeing a TV advertising a customer might perform a search query – or is more prone to that TV advertisings when it is already considering buying a product from that category (Anderl et al. (2016). Because the influence of online advertising on offline conversion is the main research question of this paper a deeper analysis is made on this particular crossover effect.

2.3.1 Online marketing

Cross-channel, or crossover, effect is defined as the probability of returning in a different channel (Li and Kannan, 2014) and often studied simultaneously with carryover effects. Several studies show that customers do not only stay within a single channel but move across channels easily. For example, Ansari, Mela and Neslin (2008) show a customer segment that is migrating towards the internet. For this reason, they state that when utilizing different marketing channels they should be studied simultaneously and not within their own channel (e.g. Wiesel, Pauwels and Arts (2011); Gensler, Verhoef and Bohm (2012) and De Haan, Wiesel and Pauwels (2016)). According to Moffett et al. (2014), this provides firms with a complete view of the customer purchase paths.

(19)

simultaneously. As firms must actually manage multiple forms of online and offline advertising simultaneously strategic budget allocation decisions are made across advertising forms (e.g. Dekimpe and Hanssens, 2007; Lehmann, 2004). For marketers it is therefore important to understand the effects of online and offline channels on conversion in either channel. However, attribution science often omits offline channels (e.g. Anderl et al. (2016); Blake et al. (2015); Li & Kannan (2014)). Few studies that linked online touch points with offline conversion shown a positive relationship (Pauwels et al. (2011); Van Nierop et al. (2011)).

However, these studies use aggregated store sales and aggregated cost of advertising channels to come to these conclusions. They developed market analytic models that use the amount invested in advertising channels - holding all else constant - to estimate that channel’s impact on sales. This method of using aggregated data means loss of information (Clark and Avery, 1976). Although this method is more similar to the available data that firms use in practice, new technological developments will create more individual-level touch points. Therefore, it is relevant to extend the current methodological advances on this topic as well as create generalizable insights that help managers to create more effective marketing strategies and tools when using online and offline channels simultaneously.

(20)

H6: Online marketing has positive cross-channel effect on offline conversion.

2.3.2 Online targeting

To go one step further than stating that online advertising has an impact on offline conversion, internet ads could perhaps be more effective than offline ads. Namely, internet technologies have the unique characteristic that they are targetable to an individual based on external browsing data (Lambrecht and Tucker, 2013). An individual’s device keeps track of search behavior, interests and preference. Based on this information, firms estimate a profile for this user to show relevant advertisements. Traditional advertisers showed e.g. a commercial on TV to a large group of consumers, unsure who has seen that advertisement. The internet allows an advertiser to know who that advertisement will see. This opens an opportunity for marketing practitioners. For example, marketers use cluster analysis to build market segments of consumers with a higher propensity to purchase and focus their advertising budget on this group (Punj and Steward,1983). This tactic can result in a more effective advertising strategy. This brings us to the following hypothesis:

H7: Online marketing is more effective than offline marketing towards offline conversion

2.3.3 Retargeting of consumers

(21)

that will not take no for an answer (Vega, 2010). This is called retargeting. Retargeted online advertising is reactive to a person’s shown behavior and interest in products - and is thus reactive by definition. Because of this, there exists a higher probability of purchase to targeted ads as they have been shown when a known interest exists (Bleier and Eisenbeiss, 2015; Lambrecht and Tucker, 2013).

There are many reasons that can trigger an interest in a product and online behavior. One relevant factor is advertisements shown through other channels. For example, Joo, Wilbur and Zhu (2016) show an increase in brand related keywords after a TV commercial. Moreover, De Haan et al. (2016) show that a direct visit on a website through type-in of the URL could be attributed to goodwill or brand awareness, as well as from traffic from offline advertising forms for which no click occurs. Thus, when a offline touch point occurred, followed by an online touch point, this will show a positive relation to the probability of purchase - because this can be a retargeted advertisement based on a change in browsing history showing that the consumer is interested in the brand or product after an advertisement is shown. This is in line with Anderl’s et al. (2016) finding that users are more prone to firm initiated touches when a customer-initiated contact has occurred. Therefore, customer journeys that include online advertisements show that 1) the potential customer has shown an interest in this product and 2) this customer fits in the target market of this company. Thus, consumers that are subject to online advertising have a higher probability to convert. Hence, the following hypothesis is formalized:

H8: Customer journeys that contain a combination of online and offline advertisements have

a higher probability to customer journeys that only consists of offline touches.

(22)

offline and online advertising to create insights of the contribution to the probability of purchase where only offline conversion can take place shines light on this topic from a new perspective in the light from the customer journey and a multi-touch attribution setting. Provided the findings of the studies above this study also investigates whether the impact of online advertising on offline conversion is positive.

2.4 Attribution modeling

To determine the effectiveness of advertising channels one can investigate the channel’s contribution towards the probability to conversion. This is done in attribution modeling. To determine which channel or ad influenced conversion, practitioners used to rely on intuition-based attribution rules such as last-touch, ‘time-decay’ or ‘equal distribution’ (Chandler-Pepelnjak, 2010). These models are based on simple rules that hardly reflect today’s media complexity. Moreover, these simple models do not take into account paths consumers that did not convert (Petersen et al., 2009). This simplification for attributing credit to the effectiveness of channels paints a biased picture as it does not take into account paths of purchases and cross channel effects.

The first advanced model for understanding the effectiveness of different channels was made when a data-driven model was introduced by Shao and Li (2011). This model did not rely on these intuition-based rules but let the data speak for itself. They saw attribution as a classification problem: given a past chain of touches how likely is it that an event occurs?

(23)

consistent to an attribution model where the effect is measured of an (non-)exposure to an advertising on its ability to influence the probability of conversion.

Many methods of measuring the variable importance interpret the regression coefficients (Johnson and Lebreton 2004). However, if correlation exists this is considered to be inadequate (Budescu, 1993; Green & Tull, 1975; Hoffman, 1960). According to Johnson and Lebreton (2004), in a review of all the relative importance indices, the preferred method is the Dominance Analysis by Budescu (1993). Moreover, Budescu’s ‘partial effects’ is defined as the average increase in R2 by the inclusion of a predictor, conditioned on all subsets of predictors. This is the same as Shapley Value regression (Dalessandro et al., 2012) and decomposes the R2 so that each channel’s Shapley Value (1953) represents the average increase or decrease in the R2. Therefore, the value is useful for analysis of channel importance. As variable importance has the same goal for estimating the individual’s variable contribution to the outcome this is also useful developing an attribution model. In this case, the cooperative team has marketing channel touchpoints as members and the team output is conversion (Google, 2016).

(24)

In conclusion, Shao and Li’s classification problem is transformed using Shapley values (Dalessandro et al. 2012; Li and Kannan, 2014) so that this model estimates the contribution of each channel towards conversion. They estimate the impact of a touch point by comparing the outcome of customer journeys with that and without that touch point holding all else constant. How important a variable indicates to be is how much credit one attributes to that variable which guides media mix optimization.

2.5 Control variables

To measure the effects of online and offline marketing, exogenous effects should be controlled for. Social demographic information could influence to probability to (non-) conversion. For example, a household with a higher income or a larger household could be more likely to purchase consumer durables. On the other hand, individual heterogeneity could exist for the preference of channels (Li and Kannan, 2014). To control for such effects several models will be developed that include these variables.

2.6 Conceptual model and hypothesis overview

Below is provided an overview of the hypothesis that are formalized above:

H1: Including characteristics of the customer journey significantly improves a model’s

ability to classify consumers as (non-) converting.

H2: Lower funnel marketing channels receive a larger share of credits in last-touch

attribution models compared to multi-touch attribution models.

H3: Upper funnel marketing channels receive a smaller share of credit in last-touch

attribution models compared to multi-touch attribution models.

H4: Marketing channels have positive carryover effects up to four weeks prior to purchase

for consumer durables.

H5: Marketing channels have negative carryover effects up to four weeks prior to

purchase for consumer durables.

H6: Online marketing has positive cross-channel effect on offline conversion.

H7: Online marketing is more effective than offline marketing towards offline conversion H8: Consumer Journeys that contain a combination of online and offline advertisements

(25)

Figure 1 below represents the conceptual model. The model makes the distinction between online and offline channels and between upper and lower funnel marketing. Moreover, it contains how the hypotheses are situated in this model in their relation to offline conversion.

(26)

3. Data

This section of the paper introduces the data collection methods and describes the variables used in the further analysis. First, this chapter covers the data collection method followed by data description. Lastly, the inclusion of new variables and data transformations to deal with outliers is described.

3.1 Data collection and description

The data utilized for this research is secondary weekly panel data collected by a market research institute on a large consumer durables store that wishes to remain anonymous. This set contains weekly data of 11672 households and is collected over 31 weeks. The unit of analysis is each household over this period. Many characteristics and actions by the firm and household are measured and explained below.

3.1.1 Marketing channels

(27)

Radio, Print advertising and Door-to-Door Folders) are measured by questionnaire and are the summed probabilities that the advertising is seen. Conventionally, these summed probabilities are interpreted as touches with that advertising chanenl. For example, if a household has 4 times 75% chance of seeing a TV advertisement this is considered as 3 touches.

Most often used is TV and print (60 and 46.6 times per household over 31 weeks). Least used is Banner Alternative and Google Masthead (0.1 and 0.2 times on average per household over 38 weeks). On average, consumers made 136 touches with the different marketing channels. An overview of this data is provided in Table 1.

Channel Total number of

touches per channel

Average number of touches per household

% of total advertising Folder 102754 8,8 6,44% Special Alternative 13200 1,1 0,83% Banner Alternative 723 0,1 0,05% Google Masthead 2117 0,2 0,13% Google Display Network 79882 6,8 5,01% Print 543624 46,6 34,09% Radio 131746 11,3 8,26% TV 720503 61,7 45,19%

Table 1: overview of advertising channels

3.1.2 In-store purchase

(28)

Number of conversions per household Frequency % Conversions % of households 1 884 82,0% 7,57% 2 154 14,3% 1,32% 3 33 3,1% 0,28% 4 5 0,5% 0,04% 5 2 0,2% 0,02%

Table 2: Details on in-store purchases – or, conversions. 3.1.3 Household characteristics

Lastly, record is made of household characteristics. In particular, whether the household has kids or not, living in area of country, size of household, net income and highest level of education of main breadwinner. An analysis of this data shows that there is a relatively large group of 50-64 years old and 64+ years old (>50%). Nevertheless, comparing this to national household statistics this is similar (see Table 4). Next, household income shows a normal-like distribution between >700 euro per month and <4100 euro per month. Income distributions often show a Pareto or Zipf’s law distribution (Neal and Rosen, 2000). These issues indicate that the dataset is not a holistic representation of the population in the Netherlands. Therefore, this paper uses a weighting factor variable provided by the panel. Under represented groups (e.g. young men) are weighted more heavily in contrast to over represented groups (e.g. older women). This variable is used to create a representative sample for the population.

3.2 Outliers and missing values

(29)

for TV and 96.7% for print. For other channels this number discards outliers beyond 99%. Furthermore, this number implies that certain households can see advertisements more often than others - however - not to an exuberant amount.

Furthermore, to analyze the path of purchase of four weeks prior to purchase a new dummy variable is introduced that indicates whether these weeks lead up to a conversion (1) or not at all (0). Moreover, for all marketing channels lagged effects are created for up to 4 weeks. All contacts are lagged one, two and three times.

Next to outliers, there are missing values to take care of. About 15% of the respondents did not answer the question about household income. However, as income is normally distributed it is assumed that these households will fit in general and thus no action is taken.

Moreover, the variable Internet is based on a questionnaire asked at the point of purchase at this store. This does not indicate whether the Internet is utilized when purchasing at other stores, at what point of the purchase funnel or before or after which touches it was utilized. The consequence of this is that for all analyses this variable indicates that a conversion has taken place even if Internet was not used, as there only exists values at the point of purchase. Therefore, the models and coefficients are biased when this variable is included and consequently, it is excluded in further analysis.

(30)

To indicate whether the customer journey plays an important role, a new dummy variable is created. This variable is 0 before purchase and is 1 after a purchase is made at this consumer durable store or at a competing store. The purchase functions as a trigger after which the dummy variable indicates that a consumer finalized its customer journey. Instead of indicating when a customer is in of the customer journey, this shows when the customer is out of the journey. If this dummy variable is included and shows a significant negative coefficient this means that the customer journey is important to include.

3.3 Variable overview

(31)

4. Modeling

In this section of the paper the model framework is explained. Introducing the different models used for this study, detailing the different models built, specified and validated. Lastly, the models are compared on their performance on a training and holdout set.

4.1 Model framework

According to Little (1970), models should be simple but complete and adaptive yet robust. A balance ought to be struck between taking into account only the variables with largest input and taking into account as many variables as possible. As an attribution model takes into account all the touches a consumer makes prior to (non-) conversion each channel must be taken into account and cannot be excluded. Moreover, household characteristics are taken into account to deal with household heterogeneity. The models are also adaptive as one can incorporate new media channels easily. As only positive values for touches exist it is also robust (Theil, 1971). These modeling rules are taken into account when the following models are developed.

4.1.1 Intuition-based rule models

(32)

Last-Touch Attribution: this model assumes that the last touch prior to purchase ought to be credited for the conversion. This dataset is only weekly basis; it occurs that two or more advertisement channel touches are made during this week in which conversion prevails. For this, the credits will be equally distributed among these channels.

Time-Decay: this model assumes that the effectiveness of all advertisement channels are equal and that the effectiveness of an advertisement linearly decays over time. In the customer journey (of four weeks) the shown advertisements are equally effective. However, past advertisements are relatively less effective than the ones closer to conversion.

First-Touch attribution: this model assumes that all credits should be allocated to the advertisement that first touched the consumer. Namely, this is the advertisement that brought the consumer inside the funnel.

Equal distribution: This model assumes that all advertisements play an equal role and all channels are equally effective. Attributed credit is equally shared over time.

As this paper deals with weekly data, true last-touch or first-touch attribution models cannot be developed: the data for the week of purchase is used for Last Touch Attribution model; the data for the week that is four weeks prior to purchase is used for the First Touch Attribution model. For the Time Decay and Equal Distribution model also the data for the second and third week are utilized.

4.2 Data driven models

(33)

what is the probability of conversion? Therefore, the dependent variable (conversion) is binary as only yes or no options exist. The independent variables (the series of past touches and household characteristic) influence this probability. First, to build the models the control variables are selected.

4.2.1.1 Control variables

Next to the advertising exposures that were measured, record is made of several household characteristics. These are investigated, to understand which characteristics are useful to include in the data driven models to improve them, reduce the unexplained variance and estimate coefficients that measure the impact of independent variables closer to reality.

To determine which socio-demographic variables to include in the analysis different Decision Trees are created. Variables that split higher in the tree are more important for determining the dependent variable (in this case: conversion (yes/no)). Four splitting methods are applied. First, three methods based on Chi-Square test: CHAID, Exhaustive CHAID (Hartigan, 1978) and QUEST. The last method is based on the Gini index of diversity (Breiman, 1984) is called CRT. In Table 4, the ranking of splitting is presented: the most important split variables are Age of Housewife (1, 1 & 2, 2, 3), Household composition (-, -, 1, 1) and District (2, 2, 2, -). These variables are taken into account for further developing the models.

CHAID QUEST Exhaustive CHAID CRT

District 2 2 2

Kids (yes/no) 3 - - -

Household Composition - - 1 1

GfK Lifecycle - - - -

Age Housewife 1 1 & 2 2 3

Net Income - - - 2

Level education - - - -

Table 4: Ranking of variables by different Decision Tree splitting rules

(34)

of the models the Log Likelihood, McFadden R2 and Adjusted McFadden R2 is calculated. The results are presented in Table 5.

First, the Model 3 that includes all selected Socio Demographic variables perform best according to the McFadden R2 (0.0099) and Log Likelihood (-8702,5). Much better than the null model (LL=-8733) and the model that includes only the Age Housewife and District variables (LL=8702,5; R2 0,0035). Moreover, when adding these variables in the model to that includes the Marketing Channels (model 5), this clearly shows the superior model. The Log Likelihood is highest for this model as well as the (adjusted) McFadden R2 (LL=8583,5; R2 0,016). Therefore, for further analysis, the socio demographic variables Age of Housewife, Household composition and District are taken into account.

Model Variables -2LL LL McFadden R2 Adjusted

McFadden R2

1 (null model) 17466 -8733 - -

2 Housewife & District 17405 -8702,5 0,0035 0,0033

3 Housewife, District & Household

composition

17293 -8646,5 0,0099 0,0096

4 Marketing Channels 17310 -8655 0,0089 0,0080

5 Marketing Channels and

Socio-Demographic variables

17167 -8583,5 0,017 0,016 Table 5: Performance of Models with Socio-Demographic variables.

4.2.1.2 Lagged effects

As explained in the literature, the customer journey is important to take in consideration when building attribution models. Therefore, lagged effect for the marketing channels is taken in consideration. For each activity the lagged effect of up to four weeks before conversion is modeled as explained in the Literature overview. For example, at any point in time (e.g. point of conversion) for point t the marketing activities of t, t-1, t-2 and t-3 are taken into account:

𝐶𝑜𝑛𝑣𝑒𝑟𝑠𝑖𝑜𝑛!= 𝑓(𝑀𝑎𝑟𝑘𝑒𝑡𝑖𝑛𝑔!, 𝑀𝑎𝑟𝑘𝑒𝑡𝑖𝑛𝑔!!! , 𝑀𝑎𝑟𝑘𝑒𝑡𝑖𝑛𝑔!!!, 𝑀𝑎𝑟𝑘𝑒𝑡𝑖𝑛𝑔!!!)

(35)

0,0199). However, as a model also ought to be simple the Adjusted R2 is considered. Namely,

including all lagged variables adds 24 variables to the equation. The Adjusted R2 penalizes for the amount of variables. This clearly shows that including only the significant lagged marketing channels (Model 3) is the best model as this has the highest Adjusted R2 (Model 2= 0,0168; Model 3 = 0,0180) whereas all lagged marketing effects is heavily penalized for the amount of variables included (24 lagged variables, adj. R2 =0,0168). Although it loses predicting power indicated by a decreased Log Likelihood (-7,72), this is still the preferred model as it has the smallest number of variables.

Model Variables -2LL LL McFadden

R2

McFadden Adj. R2

1 (null model) 17466 -8733 - -

2 Socio Demographic Variables

+ Marketing Channels + Lagged Marketing Channels

17102,56 -8551,28 0,0208 0,0168

3 Socio Demographic Variables

+ Marketing Channels

+ Significant Lagged Marketing Channels

17118 -8559 0,0199 0,0180

Table 6. Lagged effect variable model performance.

The following analyses are performed on the training set of 75% - the holdout set (25%) is used for simulation.

4.2.2 Bagged Logistic Regression:

(36)

Problematic with this approach is that the coefficients in logistic regression are difficult to interpret. Moreover, negative coefficients can arise due to colinearity (Dalessandro et al. 2012). Lastly, logistic regression could estimate values beyond the binary classification (P = <0 or >1) that are illogical for a classification issue. In Little’s (1970) terms, the model would not be robust. However, Shao and Li (2011) also test a Simple Probabilistic Model (based on first and second order conditional probabilities), which does hold to the latter robustness standard and this model showed similar attributed credit to channels.

Another issue with applying logistic regression is that this models a rare event. Conversion takes place in just 0.4% of the instances. According to King and Zeng (2001), this could lead to over or underestimated coefficients. However, the bagging idea might prevent this issue as it averages over the randomness of variable selection (Lee and Yang, 2006). The following model is specified:

𝑃 𝑌 = 1 = 𝑓(𝛼 + 𝛽!𝐴𝑔𝑒 𝐻𝑜𝑢𝑠𝑒𝑤𝑖𝑓𝑒 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑓𝑎𝑐𝑡𝑜𝑟!+ 𝛽!𝐻𝑜𝑢𝑠𝑒ℎ𝑜𝑙𝑑 𝑐𝑜𝑚𝑝𝑜𝑠𝑖𝑡𝑖𝑜𝑛 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑓𝑎𝑐𝑡𝑜𝑟! + 𝛽!𝐷𝑖𝑠𝑡𝑟𝑖𝑐𝑡 ∗ 𝑤𝑒𝑖𝑔ℎ𝑡𝑓𝑎𝑐𝑡𝑜𝑟!+ 𝛽!𝐹𝑜𝑙𝑑𝑒𝑟 + 𝛽!𝑆𝑝𝑒𝑐𝑖𝑎𝑙 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒

+ 𝛽!𝐵𝑎𝑛𝑛𝑒𝑟 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 + 𝛽!𝐺𝑜𝑜𝑔𝑙𝑒 𝑀𝑎𝑠𝑡ℎ𝑒𝑎𝑑 + 𝛽!𝐺𝑜𝑜𝑔𝑙𝑒 𝐷𝑖𝑠𝑝𝑙𝑎𝑦 𝑁𝑒𝑡𝑤𝑜𝑟𝑘 + 𝛽!𝑃𝑟𝑖𝑛𝑡 + 𝛽!"𝑅𝑎𝑑𝑖𝑜 + 𝛽!!𝑇𝑉 + 𝛽!"𝐵𝑎𝑛𝑛𝑒𝑟 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒 !!! + 𝛽!"𝑅𝑎𝑑𝑖𝑜!!! + 𝛽!"𝑇𝑉!!! + 𝛽!"𝑇𝑉!!! + 𝛽!"𝑃𝑟𝑖𝑛𝑡!!! + 𝛽!"𝐵𝑎𝑛𝑛𝑒𝑟 𝐴𝑙𝑡𝑒𝑟𝑛𝑎𝑡𝑖𝑣𝑒!!! + 𝜀)

As mentioned in Chapter 3, a weighting factor is applied to the socio-demographic variables in order to create a representative population of this sample. This weighting factor is a unique score dependent on the household characteristics.

Three models are estimated: First model consists of just the marketing channels. Second model includes the significant lagged marketing channels. The third model also includes the socio-demographic characteristics (see Appendix C). Following common practice (e.g. Haaijer, Kamakura, Wedel (2001)), the bagged logistic regressions are assessed on Pseudo-R2‘s: the McFadden’s R2, Cox & Snell’s R2 and Nagelkerke R2. Next to this, AIC, BIC, AIC3 and CAIC scores are compared. The results are presented in Table 7.

(37)

are better (lower) and for Model 3 the AIC and AIC3 scores are better (lower). These criteria in itself are inconclusive.

Investigating the Pseudo-R2’s, Model 3 has a higher McFadden R2 (0.02 > 0.016) and Nagelkerke R2 (0.02 > 0.017). This indicates that Model 3 is the superior model. Further investigation shows that the third model should be used. Namely, the Chi-Square score shows a score of 242.341 (df=36 p<0.00)) indicating that the model is significantly better than the null model. Moreover, this is significantly better than Model 2 (p<0.01) and 1 (p<0.01) (Model 2: Chi2 =200.165, df=30; Model 1: Chi2 =106.918; df=8).

Model Variables

Chi-Square AIC AIC3 BIC CAIC McFadden

R2 Cox & Snell Nagelke rke R2 1 Marketing Channels 106.918 (df=8) 12091.85 12099.85 12175.82 12183.83 0.009 0 0.009 2 Model 1 + Lagged effects 200.165 (df=30) 12004.45 12015.45 12119.68 12130.69 0.016 0.001 0.017 3 Model 2 + Socio Demographic 242.314 (df=36) 11974.04 11991.04 12152.47 12169.47 0.02 0.001 0.02

Table 7: Model fitness of three models (1: marketing channels 2: marketing channels & lagged effects 3: marketing channels & lagged effects & socio demographic variables).

4.2.3 Artificial Neural Network:

One goal of this paper is to extend on this methodological framework. To his end, paper develops an Artificial Neural Network (ANN) model. These models, inspired by biology, learn and behave similar to a human brain. They are complex networks and comprise large set of simple nodes - or neural cells - and affect the

status of the next cell (Sharma and Kumar Panigrahi, 2011). The weights or values of each layer of the neural cells are modified to obtain the lowest error. ANNs are used to solve pattern classification problems (Bartlett, 1998).

(38)

The ANN algorithm learns different weights for different input variables to learn best how to classify each household. The weights of the input layers and the hidden layers are not observable - but the more hidden layers the deeper the network and the better its capability to understand complex patterns. As the actual conversions are known this is a supervised neural network. This method is often used to minimize the total error (Schmidhuber, 2015). The structure of the Neural Network is illustrated in Figure 1, where the input layers are the marketing channels, lagged marketing effects and the socio demographic variables. The ANN model calculates the probability of classifying a household as 0 or 1 - or in other words: (non-) conversion. The probability to conversion is presented in the output layer. The individual channel’s contribution to conversion can be estimated - making use of Shapley values (1953) in similar fashion to Dalessandro et al. (2012) and Li and Kannan (2014) - to find the individual channel’s contribution to the probability of conversion.

The software program determines the optimal number of hidden layers by adding more only if this improves the model. Overfitting is a common issue in Neural Networks (Blattberg, 2008). Therefore, the model is fitted on a training and holdout set. According to Kübler, Wieringa and Pauwels (2016) the preferred split ratio is 75% and 25%. Blattberg (2008) notes that on large datasets reasonable ratios perform equally well indicating that in our case this might not be an issue. To determine the model fitness the neural network it judged on its ability to classify correctly in both the training and holdout set.

Three models are estimated similar to the Bagged Logistic Regression process: First, a model that only includes the marketing activities, second model also includes the significant lagged marketing activities and the last model also includes socio demographic variables.

Neural Networks should be judged on their ability to correctly classify the dependent variables. In this case, households are classified as converting or non-converting. Therefore the hit rate is calculated for these models

and detailed below in Table 8. The Hit Rate

Model Hit Rate

(39)

for Model 3 is 47,62%. This clearly shows that the model that includes all variables - as with the Bagged Logistic Regression - is the preferred model. The other models score 42.84% and 41,86%. A model could simply predict ‘yes’ all the time and fake a Hit Rate of 100%. Therefore, the Relative Hit Rate is included. This indicates the % of which a model also correctly predicts non-conversion. For model three, 108088 times ‘Yes’ is predicted of which it was correct 629 times. This is also relatively more than the other models.

To further elaborate on the preferred model, Model 3, the software program automatically decided the optimal number of hidden layers and its structure. In this case, 7 hidden layers with a hyperbolic tangent activation function were optimal.

As explained in Chapter 2.4 the Shapley value is used to determine the contribution of each channel towards the probability of conversion. For calculating this, the probability for each possible combination with and without that variable is estimated. Towards this goal the R script STATS RELIMP is utilized (IBM, 2016). The probability to conversion is used as dependent variable and the marketing channels as covariates that influencers. The relative importance is presented in Chapter 5.

4.3 Predictive performance

(40)

The ACC score (accuracy) divides the number of correctly predicted classifications (true positive and true negative) by the total amount of classifications (Wang, 2010). The higher the ACC score the better the model predicts (non-) conversion. When classifying a binary dependent variable it is common to report on The Lift Curve (Wielenga, 2007). This curve is a graphical indication of using the model compared to guessing - which is 0.04% in this case. This shows how much better a model is at determining the probability of conversion for each decimal as an indicator of performance (Neslin et al., 2006). The

Top-Decile Lift shows a ratio indicating how likely the top decile of the population is like to convert

in comparison to guessing. When a marketer only has limited resources this is the target group that is most attractive to approach. The results are presented in the table 10 below.

The Gini-Coefficient: Zero indicates perfect equality and one indicates perfect inequality.

According to Blattberg (2008), the higher the Gini Coefficient, the better the model. Lastly, the Hit Rate is compared between the models. This indicates how often the models have estimated correctly whether a household converts or not.

The results are presented in Figure 2 and Table 9. It seems that the Neural Network is a better predictor overall - concerning the overall percentage predicted correctly. The Top Decile Lift, Hit Rate and Gini Coefficient are much higher. The cut-off rate is 0.4% and the models are therefore susceptible to small increases in probability to conversion. The GINI-coefficient and lift curve also indicate that the Neural Network is a better predictor.

Bagged Logistic Regression Artificial Neural Network ACC 0.717 0.648

Top Decile Lift 2.22 2.63

Gini Coefficient 0.367 0.480 Hit Rate 45.82% 51.35% 0 0.2 0.4 0.6 0.8 1 1 3 5 7 9 11 Cumulative Lift

Lift Curve

Neural Network Bagged Logistic Regression Null model

Figure 2: Lift Curve (Lift =Percentage with event in Group / Percentage with event overall) of Neural Network and Bagged Logistic Regression. Cutoff Value = 0.004%.

(41)

4.4 Cross-validation

The models are trained on 75% on the data. The latter 25% of the data is used for validation, which is subject of this section. The same tests as for the previous section are applied to the validation, or holdout, set. Namely, the ACC, Lift Curve, Top Decile Lift, Gini Coefficient and Hit Rate. The results are presented in Table 10 and Figure 3.

The assessments show that there are larger changes for the Bagged Logistic Regression and few for the Artificial Neural Network. The Gini coefficients are a little bit lower but in similar proportions for the BLR and ANN. The Lift Curve is also similar for both models although slightly steeper for the holdout set. Both models perform better on the Top Decile Lift and Hit Rate. Namely, the Hit Rate for the BLR increased from 45.82% to 73.6% and from 51.35% to 70.2% for the ANN, which indicates potential issues. The ACC is much lower for the BLR (0,483; 0,717) in the holdout set but similar for the Neural Network (0,606; 0,648). Overall, both models perform a little bit better in predicting the conversions – but the BLR also misclassified a lot more cases.

Why could this be the case? On further inspection, the random selection of households indicates that there are relatively more conversions in the holdout sample than in the training set. Where the data is divided by 75% for the training set and 25% for the holdout set, the conversions are divided by 69% (912) for the training program set and 31% (409) for the holdout set. While there are relatively more conversions in the holdout set and therefore the hit rates are much higher – the ANN is less susceptible than the BLR to this increase as the misclassification rate is not much higher. In summary, the performance of both models is similar. However, the ANN is less susceptible to an increase in conversions in the holdout sample as the ACC is similar. The BLR is more susceptible to an increase in conversions and is a less stable predictive model.

(42)

Bagged Logistic Regression Artificial Neural Network

ACC 0,483 0,606

Top Decile Lift 2,518 3,374

GINI-Coefficient 0,318 0,425

Hit Rate 73.6% 70.2%

4.5 Customer Journey

To measure whether the customer journey has influence on the probability to conversion (H1), use is made of a dummy variable. This dummy variable is 0 before purchase and 1 after purchase - it is triggered after a purchase is made (see Chapter 3). An additional benefit of this method is that, as it is uncertain when someone enters the customer journey, this method does not need to take a fixed period into account; it measures when someone is out of a customer journey as a purchase is made in this consumer durable store or a competing one.

It is expected that the coefficient of this variable will show a negative value in a Bagged Logistic Regression. This means that the probability of conversion is lower after a purchase is made and therefore taking into account the customer journey is necessary.

0% 20% 40% 60% 80% 100% 1 3 5 7 9 11 C u m u la ti ve L ift

Lift Curve

ArtiXicial Neural Network Bagged Logistic Regression Null model

Figure 3: Lift curve of ANN and BLR and Null model on the holdout set.

(43)

Moreover, the performances of two models (one that does and one that does not take include this dummy variable) are compared based on Chi-Square values and Likelihood Ratio test.

Second, the attributed credit towards the probability of conversion are compared for the First Touch and Last Touch attribution model between lower and upper funnel marketing channels. It is expected that more credit is attributed in the First Touch attribution model towards upper funnel marketing channels and less in lower funnel marketing channels compared to the Last Touch Attribution model.

Lastly, two models are compared based on several assessment criteria (AIC, AIC3, CAIC, BIC, Chi-Square and several R2 measures) to investigate whether taking into account past marketing variables improves the model’s ability to classify consumers as (non-)

(44)

5. Results

This chapter lays out the results of the analyses performed to test the hypothesis. The same structure will be apprehended as in the literature review. Beginning with the effectiveness of advertising regarding the consumer journey and upper and lower marketing funnels, continuing with the lasting effect of advertising (carryover effects) and finally concerning the crossover effects of online ads on offline conversion. In Chapter 6 (Discussion) the results will be further analyzed and discussed.

5.1 Effectiveness of advertising

To investigate whether taking the customer journey into account improves a model’s ability to classify consumers as (non-) converting several assessments are performed.

First, the results of the Bagged Logistic Regression are presented in Table 11. The model that includes the negative customer journey dummy (see section 4.5) performs significantly (p=0.000) better than the model that does not include this (3728.43 > 6.63 for 1 degree of freedom). Moreover, the Likelihood Ratio Test shows that the model with the dummy variable for the customer journey is significantly better (p<0.000; (2*(-26782.26 + 34237.14)) = 14909.76 > 6.63 for 1 degree of freedom).

The dummy variable is 0 before a purchase is made at this or a competing consumer durable store and 1 after a purchase is made at this consumer durable store or a competing store. The latter indicates that they are out of the customer journey. The dummy has a significant (p=0.02) negative Beta (-17.035) (see Appendix F). This

indicates that after a purchase is made the probability of purchase is significantly lower, in other words, that when the customer left the customer journey their probability to convert is significantly lower. This confirms the expectations that the customer journey is important as indicated by this dummy variable.

With dummy variable Without dummy variable Chi-Square value 4075.01 (37 df) 347.58 (36 df) LL -26782.26 -34237.14

(45)

Moreover, influence of the customer journey is also indicated by observing differences between the credits attributed in the Last Touch and First Touch attribution models (Table 13). It is expected that lower funnel marketing channels receive relatively more share of credit of the probability to conversion than upper funnel marketing channels in Last Touch attribution models; and that these upper funnel marketing channels receives more share of credit towards the probability of conversion in attribution models that emphasize earlier stages in the customer journey e.g. First Touch attribution model. Table 13 indicates that the lower funnel marketing channel Folder receives more share of credit (34,99 > 34,18). Upper funnel marketing tools (Print and TV) receive more share of credit in First Touch attribution models than in Last Touch Attribution models (TV (27,71>24,45); Print (25,23>23,73)). Meaning that these marketing tools have more influence earlier in the customer journey and Folder has more influence at the end of the customer journey.

Finally, the Bagged Logistic Regressions are compared two models that do and do not include the significant lagged marketing variables (section 4.2.1.2). This indicates whether past marketing efforts - that potentially influence earlier stages of the customer journey - significantly contribute to the model. Table 12 shows that the model with previous marketing touches performs better than the model without previous marketing touches. The AIC and AIC3, as well as R2 measures and the Chi-square, show that the preferred model is the model with significant lagged variables. The BIC and CAIC criteria have a chance of preferring a model with too few variables - as seems to be the case in this instance.

Table 12: Model assessment scores for Bagged Logistic Regression with(-out) significant lagged variables.

-2LL AIC AIC3 BIC CAIC Chi

Referenties

GERELATEERDE DOCUMENTEN

Four regression models and a segmentation model are used to explore the effects of the type of medium used in the pre-purchase process on online and offline

From a practical perspective, the insights of this interview-based case study result in increased understanding of how franchisor’s management actions lead to a

Targeted advertising and consumer privacy: practices and underlying reasons that evoke privacy violation feelings in young adults..

Interviewer: What kind of information used in targeted ads would make you feel that your privacy is violated.. Interviewee: If I search for something on the web and then get the ad

This is a valuable contribution to the literature since Voorveld (2011) stated that the influence of media multitasking on affective and behavioral responses is less

If this is the case, it is important to ascertain which combination of cross-media marketing activities might have the greatest influence on the purchase behavior of

A suitable homogeneous population was determined as entailing teachers who are already in the field, but have one to three years of teaching experience after

Because they failed in their responsibilities, they would not be allowed to rule any more (cf.. Verses 5 and 6 allegorically picture how the terrible situation