Attribution Modeling

(1)

0

Attribution Modeling

Switching from a heuristic attribution model to a data-driven

attribution model in a monopolistic environment

Master’s Thesis Marketing Intelligence:

Student:

Marciano Bootsman BSc.

Student Number: S2502321

Supervisor RUG: dr. F. Eggers

Supervisor NS: J.H.H. Kral MSc.

Second reader:

dr. J.T. Bouma

Date:

17-06-2019

(2)

1

Abstract

(3)

2

Preface

(4)

3

Table of content

Chapter 1: Introduction ... 4

Chapter 2: Theoretical Framework ... 7

2.1 Attribution problem ... 7

2.2 Heuristic based attribution models ... 9

2.3 Markov Chain Model ... 10

2.4 Research on Markov Chain Models in Attribution Modeling ... 11

2.5 Monopolistic Markets ... 13 Chapter 3: Methodology ... 15 3.1 NS ... 15 3.2 Descriptive Statistics ... 15 3.3 Data cleaning ... 17 3.4 Analysis ... 17 Chapter 4: Results ... 19

4.1 Estimation of the base model ... 19

4.2 Multi-touch customer journeys ... 20

4.3 Difference between Markov Chain orders ... 20

4.4 Evaluation Criteria ... 21

4.5 Managerial Implications ... 24

Chapter 5: Conclusion, Limitations & Future Research ... 26

5.1 Conclusion ... 26

5.2 Limitations & Future Research ... 26

References ... 28

(5)

4

Chapter 1: Introduction

Digital marketing is covering ground fast and in 2017 the market for digital marketing in the Netherlands was bigger than all the offline platforms combined (Deloitte, 2018). With the ever-growing interest in the internet, digital marketing has become essential to the marketing mix of a wide variety of industries (Raman, Mantrala, Sridhar & Tang, 2012). Its appeal is not solely the ability to precisely target different segments with personalized ads, but more importantly its ability to track responses and performances almost instantaneously (Shao & Li, 2011). While the upcoming of digital marketing has given marketers a wide variety of effective tools to increase the reach of their marketing campaigns, new challenges are likely to emerge alongside this rise. As marketers engage in multiple (online) marketing campaigns, they find themselves in trouble when asked to identify the most effective (combination of) digital marketing tools. Potential customers are ideally targeted and reached by a variety of advertisements across a variety of channels at different points in time, ranging from Google Ads to sponsored display ads on various websites such as Facebook. This process of being exposed to multiple marketing tools of a company is often referred to as “multi-touch” (Kaushik, 2012). When a potential customer converts and becomes a customer, it is probable to assume that the marketing campaign was successful for that customer, and that their decision to convert was influenced by the exposure to certain advertisements. As a marketer it is crucial to know to what extent different touch points affect consumer’s decision making, in order to optimize the spending of the limited marketing resources. Quantifying the influence of each touch point on this customer journey, however, is a very complicated process, known as the attribution problem (Abhishek et al., 2012).

(6)

5

effort to measure the ‘value’ of a specific channel (Tucker, 2013). This technique, however, is flawed, since customers are exposed to multiple items of a campaign. It would be irrational to only give credit to the last link in the chain. To tackle this flawed way of dividing credit, recent research is more focused towards non-heuristic solutions to the attribution problem (Shao & Li, 2011; Abhishek et al., 2012; Dalessandro et al., 2012; Li & Kannan, 2012; Anderl, 2016; Berman, 2018). These solutions are data-driven instead of heuristic based, which makes the model more accurate in predicting the buying behavior of customers (Dalessandro et al., 2012). The research on attribution models mainly shifted towards Markov Chain Models (Abhishek et al., 2015, Anderl et al., 2014, 2016, Archak et al., 2010, Dalessandro et al., 2012, Li & Kannan, 2014), which are considered to be the current state-of-the-art (Kannan et al., 2016). Previous research on Markov Chain Models is somewhat flawed since it doesn’t consider the marketing efforts of competitors. This creates competition related confounds which are very hard to control for (Dalessandro et al., 2012), such that previous findings might be biased. Until now there is no research conducted on Markov Chain Models that controlled for the effect of competition (Li et al., 2017).

(7)

6 How does a transition from using a LTA-model to a Markov Chain attribution model influence the relative importance of the different marketing channels of NS?

The model will be based on Spoordeelwinkel (where people can buy cheaper tickets for a day out) data of 2018. This research will include only the data of the Spoordeelwinkel campaign, since data driven attribution models should be campaign specific according to the properties of good attribution models of Dalessandro et al. (2012). The preliminary analyses will be focused on all the customer journeys, after which the initial model will be tweaked in order to find relevant insights for NS.

(8)

7

Chapter 2: Theoretical Framework

In this section of the paper I will give this paper some more context through a theoretical framework. This will include previous research in the field of attribution modeling. Furthermore, there are some important concepts which need further elaboration. Different attribution models are discussed through literature, and based on this, hypotheses are formed. At the end of this section a conceptual model is drawn from all the relevant concepts.

2.1 Attribution problem

Like mentioned in the introduction, quantifying the influence of each touch point on a customer’s journey to conversion is a very complicated process, known as the attribution problem (Abhishek et al., 2012). More and more companies are beginning to understand that more than one channel is responsible for a conversion, they still find themselves unable to identify a suitable manner to measure this relationship (Lee, 2010). Research by Lovett (2009) concluded that from 275 website decision makers surveyed, 52% acknowledged the improved effectiveness of the marketing spend caused by attribution, but only 31% of the same sample claimed to perform any kind of attribution on their marketing activities. Lee (2010) defined four obstacles in understanding and analyzing the importance of the different touch points, namely:

● Multiple data sources: Traditionally, the different channels kept track of its own data. This created aggregation problems when the different data was combined.

● Data accuracy: When a company was able to take care of the aggregation problems, they often had their questions about the accuracy of the data. This was mainly due to cookie deletion which caused the data to be incomplete.

● Time and money: Companies often find themselves in a vicious circle. There was no empirical evidence which proved the benefits of advanced attribution methods, causing managers to be hesitant towards investing in such methods. The circle starts with the managers who don’t want to invest any time without empirical evidence, followed by the problem that no evidence is possible without investment.

● Organizational structure: Traditionally, companies have a lot of different channel owners which are responsible for one of the marketing channels. This creates competition for the resources instead of synergies within the marketing mix.

(9)

8

extension of this model was created by Anderl (2014) who combines previous research on the acceptance of marketing decision models (Leeflang & Wittink, 2000; Lilien, 2011; Little, 2004; Lodish, 2001) with the criteria coming from the field of attribution modeling (Dalessandro et al., 2012; Shao & Li, 2011). The combination resulted in six criteria for the evaluation of attribution models:

● Objectivity: attribution models must be able to assign credit to the individual channels or campaigns, according to the relative impacts the different channels make on conversions or revenues (Dalessandro et al., 2012). This enables management to make objective budget decisions (Lilien, 2011).

● Predictive accuracy: attribution models must be able to correctly predict conversions (Shao & Li, 2011). This may seem a little bit odd, since attribution models primarily take a retrospective view, but this mainly serves as a method to persuade managers of the credibility of the model.

● Robustness: attribution models should produce reliable results if they run numerous times (Little, 2004).

● Interpretability: attribution models and their results must be easy to interpret for all stakeholders. If a model is easy to interpret, the managerial acceptance will also be higher, since it is easier for them to communicate (Little, 2004).

● Versatility: combines adaptability and ease of control. Adaptability is the capability to incorporate new information over time. Ease of control includes the ease of with users can adjust inputs to fit company-specific requirements. This is very important in the constantly changing digital environment. The model should be able to include changes in channels and it should be easily extendable.

● Algorithmic efficiency: Computing speed of the attribution model. In a world where technology advances at a fast pace, and data can consist out of millions of records, it is important that an attribution methodology is able to handle these volumes efficiently (Lodish, 2001).

(10)

9

2.2 Heuristic based attribution models

Fig. 1: A possible customer journey of a customer of NS

In figure 1 a possible customer journey of a customer of NS is visualized. First the customer clicks on a banner advertisement. After this the customer clicks on an e-mail ad, followed by a visit to the webshop of NS, a click on a Facebook ad and he/she finally converts through a paid search advertisement when he/she searched for ‘a day trip to Amsterdam’. The way that credit is divided between the different channels depends on the attribution method that is used. The five methods which are most commonly used, because they are implemented in Google Analytics, are shown in Table 1, together with the assignment of credit when using one of these methods. These are all heuristic based attribution methods since they rely on predefined rules. From Table 1 it becomes clear that all these methods are very different in assigning credit to the different channels.

Marketing Channel

Attribution Model Display E-mail Direct Facebook Paid

Last Touch Attribution 0% 0% 0% 0% 100%

First Touch Attribution 100% 0% 0% 0% 0%

Linear Attribution 20% 20% 20% 20% 20%

Position Based Attribution 40% 6.66% 6.66% 6.66% 40%

Time Decay Attribution 15% 17% 19% 23% 26%

(11)

10

Last Touch Attribution was , until recently, the most used attribution model due to its ease of use and interpretability (Tucker, 2013). Recently, other forms of attribution models became more frequent and researchers started questioning the approach of LTA (Shao & Li, 2011; Chandler-Pepelnjak, 2009; Jordan et al., 2011). They found that LTA causes different channels to be overvalued and some channels to be undervalued. Kireyev, Pauwels & Gupta (2016) found that display advertising is highly undervalued. Display advertisements are not very often found at the last stage of the customer journey, making it a low-scoring channel based on LTA. However, display advertising appeared to be more impactful than it gets credit for according to LTA (Kireyev, Pauwels & Gupta, 2016). Ghose & Todri (2015) demonstrate that the mere exposure to display advertisements increases the users’ likeliness to search for and buy the brand or product presented in the ad.

2.3 Markov Chain Model

With LTA found to be flawed, other models gained interest. One of the popular models is known as the Markov Chain Model (MCM), which is a probabilistic model that can represent dependencies between sequences of observations of a random variable. It is a mathematical system that computes the probability of transitioning from one ‘state’ to the next (Keilson, 2012). A state, when considering attribution modeling, can be seen as a touchpoint or an end state on the path of the customer journey. This customer journey either results in a conversion or no behavior for a specific period of time. A simplified example of a Markov Chain is visualized in figure 2.

(12)

11

There are different kinds of Markov Chain Models. The most basic one is the one that assumes that what is captured in time t, can be fully explained by the event in t-1, and based on this a probability of transitioning from the state in time t-1 to the state in time t is calculated. This type is known as the first-order Markov Chain Model. The probabilities of transitioning can be calculated with the following formula:

Next to the first-order model there are higher-order Markov Chain Models, also known as second-, third-, and fourth-order models. These models carry more information since the probability of transitioning to another state depends on the last two, three or four touchpoints. In this way higher-order Markov Chains incorporate longer temporal dynamics, ultimately leading to better performance (Keilson, 2012). Therefore, I hypothesize that the effect of the change in attribution model is positively moderated by the order of the Markov Chain (H3). Next to the MCM, there are several other state of the art attribution models, which are also considered to be much closer to the ‘truth’ than heuristic methods. One model is the Shapley model, which uses game theory to estimate the values of different channels. Dalessandro et al. (2012) showed through simulation that using the Shapley value model can estimate the causal effect of different channels. Even though interesting, the study of Anderl et al. (2014) classified the MCM as ‘the best’ model as it was the only model which met all of their six criteria for a good attribution model. Therefore, this model is chosen for this research.

2.4 Research on Markov Chain Models in Attribution Modeling

(13)

12

Advertising. When evaluating their model on the evaluation criteria, they found that a MCM performs significantly better than LTAM. In 2016 they extended their model and found prove for carryover and spillover effects across channels. Xu et al. (2014) developed a customized MCM, called the mutually exciting point process model, of consumer channel choice and conversion, with memory effects built into it. Another customization on the MCM is the Hidden Markov Model of Abhishek et al. (2015), who developed a model of individual consumers behavior based on the concept of a conversion funnel. They found that different channels have different effects on consumers buying behavior, depending on their state in the decision process. More information about these studies can be found in table 2.

Author(s) Attribution Model Scope Competition

Abhishek et al. (2015)

Hidden Markov Model (HMM) Online campaign for the launch of a car.

Not mentioned Anderl et al.

(2014)

Markov Chain Model Online advertisers in general.

Not mentioned Anderl et al.

(2016)

Markov Chain Model Online advertisers in general

Not mentioned Archak et al.

(2010)

Markov Chain model Sponsored search

campaigns of anonymous companies. Not mentioned Li & Kannan (2014)

Markov Chain Model Focus on a franchise firm in the hospitality industry.

Not mentioned

Xu et al. (2014) Mutually exciting point process model (Markov Chain Monte Carlo)

A major vendor of consumer electronics that sells its products online through its own website.

Not mentioned

(14)

13

2.5 Monopolistic Markets

NS has a monopoly on the Dutch railway market. It used to be a (semi-)governmental organization, in order to keep prices low. This changed, but NS is still the only transporter for most routes. There are other railway companies, but they are not competition since they have other, non-overlapping, regional routes. The biggest competition of NS is the car, which is a valid substitute of the train since they are comparable when it comes to speed. According to Reekie (1981), advertising in a monopolistic setting doesn’t immediately lead to higher sales. But he states that is still a very important tool to create goodwill for the company. This goodwill is needed since customers have no choice but to use the monopolist’s product or service, and the public will get frustrated with the monopolist since there’s no alternative. The same holds for NS, when there are train-failures, the public opinion is really negative since there are a lot of people affected by this and for a lot people there’s no alternative. Villeneuve & Pasquier (2017) state that public companies or companies in a monopolistic position have different needs when it comes to marketing than private companies in a competitive environment. These organizations tend to have relatively high direct traffic to their website, since they are well-known and people don’t have to use search engines to reach the websites, meaning that their customer journeys are smaller than in industries with much competition. When the customer journey becomes longer, it will likely contain more channels, meaning that those sales are assigned to a broader range of channels. Therefore, I hypothesize that the effects of the change in attribution model (H1 and H2) are strengthened by the length of the customer journey (H4). The hypotheses are as follows and are visualized in figure 3:

H1: The relative importance of Display increases when switching from Last Touch Attribution to a Markov Chain Model.

H2: The relative importance of Direct traffic decreases when switching from Last Touch Attribution to a Markov Chain Model.

(15)

(16)

15

Chapter 3: Methodology

The goal of the research is to find the fair distribution of credit among the marketing channels of NS. This is done by analyzing the customer journeys using Markov Chains, and by comparing these results to heuristic based models. The aim of the research is to make causal inferences on one side, since I’m interested in how much of the conversions can be explained by a certain channel. On the other side this research is explorative, since I seek to explore unknown factors which may influence channel attribution, by tweaking the MCM. This means making changes to the order of Markov Chain, but also by doing analyses on the length of the customer journeys. The first part of this section consists of a clarification of the dataset including descriptive statistics. After that, I start with cleaning the dataset. The last section presents the experimental procedure, by explaining how the models are created, analyzed and evaluated.

3.1 NS

NS is the biggest railway company of the Netherlands. They take around 10.700.000 unique persons from A to B on a yearly basis (NS, 2019). This is about 62% of the total Dutch population traveling by train every year. Before people get on the train they have to check in with a (personalized) chipcard, and after leaving the train they have to check out again. This is the most used method. Another method is buying tickets at the online-shop, the Spoordeelwinkel. A lot of marketing expenses are directed at this group of people, since these people are considered to be infrequent train users, and are likely to have a realistic alternative, mostly the car.

3.2 Descriptive Statistics

(17)

16

Variable Definition

Customer Journey The channels a converting customer visited in chronological order

Customer Journey Length The length of the customer journeys measured in the number of touchpoints Total Conversions The accumulated amount of sales which

can be assigned to a certain customer journey

Total Conversion Value The monetary value of a certain customer journey

Table 3: Definitions of the variables in the dataset

The descriptive statistics are presented in table 4. What is notable is the fact that the average number of touchpoints per customer journey is low, 2.47 (SD = 4.05), and more than half (52%) of the customer journeys consist out of a single touchpoint. This is further visualized in figure 4, where the total conversions and cumulative total conversions against the length of the customer journey are shown. This has a big impact on the attribution since that sale is fully attributed to that channel, either when applying an LTAM as well as when applying a MCM. To what extent these models are influenced can be found in the Result section of this thesis.

Description Statistic

Number of unique touchpoints 7

Number of customer journeys 75,054

Average number of touchpoints per journey 2.47 (SD = 4.05)

Total touchpoints 184,992

Number of journeys with 1 touchpoint 39,162

Table 4: Descriptive Statistics

(18)

17

3.3 Data cleaning

Before any analyses could be carried out, the dataset had to be cleaned first. Originally Google Analytics defines six basic channels which are: Direct, Email, Organic Search, Paid Search,

Referral and Social Network. This is complemented by Other Advertising and Unavailable,

which include all the uncategorizable and anonymous clicks respectively. Looking at the source of the advertisement it became clear that all the clicks which were categorized under the Other

Advertising label were from websites which were new and not yet included into the Google

Analytics domain of NS, meaning that these labels should be changed to Referral, which was done accordingly. Furthermore, the clicks classified as Unavailable were originating from referrals from the NS App. Since it is very relevant for NS to know how much of the conversions can be attributed to the app, Unavailable was transformed to App. This leads to a total of seven channels for this research, which are defined in table 5. The data did not contain any missing values.

Channel Description

App Clicks on the banners in the NS app

Direct Direct traffic to the NS/Spoordeelwinkel website

Email Measured when people click on a link in an email from NS

Display When people are exposed to a paid banner on a website other than the NS website

Organic Search When people searched for relevant terms related to NS and clicked on the link to the website (SEO)

Paid Search When people clicked on a link in a search engine for which NS paid (SEA)

Referral When people click on a link available on other websites

Social Network When people click on a link on the actual page of NS on their social media. (Not Facebook ads)

Table 5: Channel Definitions

3.4 Analysis

(19)

18

This comparison covers the differences (or similarities) in assigning the relative importance of the different marketing channels. With the package “ChannelAttribution” it is possible to compare these two models. To get the MCM estimation the package does the following: it calculates the relative importance of a specific channel, by running 1 million simulations. In this way the removal effect, which is the decrease in conversions when a specific channel is removed, can be calculated. For example, we see that 20% of the sales would not have happened if the channel Paid Search was removed from the customer journeys. This would result in a removal effect of .20 for Paid Search. In addition to this a transition matrix was generated which illustrates the probabilities of switching from one channel to another. Estimating the LTAM is also included in the package and is much less complex. It simply takes the last channel on the customer journey and attributes 100% of the value to that channel, neglecting the other channels on that customer journey. The output of this analysis is the number of sales that can be attributed to a certain channel. However, these absolute numbers are transformed to relative numbers, due to the fact that this information is company sensitive. This does not change the interpretation.

(20)

19

Chapter 4: Results

This section discusses the results of the analyses. The different attribution models were estimated and can be interpreted. First, I will discuss the differences between the models in the amount of sales which can be attributed to the different channels. This is done for different orders of the Markov chain. A higher order means that more memory is built into the model, and this is done to check whether this has effect on the differences between the two attribution models (H4). After these analyses the proposed MCM will be evaluated against the criteria of Anderl et al. (2014).

4.1 Estimation of the base model

The data was analyzed by the statistical software, and the results are displayed in figure 5. The absolute amount of conversions is translated to percentages since it contains sensitive information about the company. Looking at the results I can conclude a few interesting things. The relative importance of the channel Direct decreases with 23% when switching from a LTAM to a MCM, while the relative importance of Display increases with 657%. Nevertheless,

Direct remains the most important ‘channel’ by far, even though the other channels became

relatively more important compared to the LTAM. Based on these statistics it can be concluded that H1 and H2 can be accepted.

Fig 5: The difference in channel attribution between the LTAM and the MCM + 36% - 23% + 657% + 110% + 15% + 55% + 10% + 23% 0% 10% 20% 30% 40% 50% 60% 70%

App Direct Display Email Organic

Search Paid Search Referral NetworkSocial

Total Conversions in march 2019 per channel in %

(21)

20

4.2 Multi-touch customer journeys

In order to test H4 another estimation was done, but only this time excluding the customer journeys consisting out of one touchpoint. This increased the length of the average customer journey from 2.47 touchpoints to 4.06 touchpoints. The estimation is displayed in figure 6. What becomes clear from the figure is that the differences between the estimations of the LTAM and the MCM become a lot bigger, compared with the estimations in figure 5. The channel Direct becomes 40% less important when switching to a MCM, and Display traffic becomes 864% more important. In the previous estimation this was respectively -23% and +657%, confirming H4. It can be concluded that all the effects are more extreme when the average length of the customer journey increases.

Fig 6: The difference in channel attribution between the LTAM and the MCM without one-channel-customer journeys

4.3 Difference between Markov Chain orders

The Markov Chain order implies the amount of memory that is built into the model. Order 1 is memoryless, while order 4 assumes that the event in time t can be explained by the events in t-1, t-2 and t-3. Order t-1, 2, 3 and 4 were estimated. The results are displayed in figure 7. The graph shows that the estimates for the different orders are the same (difference between different orders is not bigger than 0.1%), and thus, it does not influence the relative importance of the channels Direct or Display. In other words, H3 is rejected.

+ 84% - 40% + 864% + 184% + 91% + 110% + 43% + 185% 0% 10% 20% 30% 40% 50% 60% 70% 80%

Search Paid Search Referral NetworkSocial

Total Conversions in march 2019 per channel in % without

one-channel-paths

(22)

21 Fig 7: The difference in channel attribution between the LTAM and the MCM at different orders

4.4 Evaluation Criteria

Objectivity

The first evaluation criterion is objectivity. The main point of interest here is if the model is able to assign credit to individual channels in a fair manner. Models that reduce user journeys to one click, like the LTAM, eliminate any additional marketing contacts, thus breaking this assumption, since they fail to attribute credit fairly across channels. The proposed model however, passes this criterion as it makes no previous assumptions about the importance of individual channels or channel order. It also uses all the data to estimate the model, so that the attributed values of each channel are distributed fairly and are completely data-driven. In conclusion, the MCM fulfills this criterion.

Predictive accuracy

Predictive performance is not the goal of an attribution model since it is used for evaluating campaigns in the past, and different campaigns are oftentimes hard to compare. However, it takes more of a persuasive role in convincing managers of the superiority of the model compared to other models. Unfortunately, the predictive accuracy cannot be measured with the given dataset, since it solely contains data from customer journeys which lead to conversion.

0% 10% 20% 30% 40% 50% 60% 70%

App Direct Display Email Organic Search Paid Search Referral Social Network

Total Conversions in march 2019 per channel in % with different

markov chain orders

(23)

22

This makes it impossible to measure which combination of channels are probable to lead to conversion, and which are not, as every customer journey leads to conversion. However, previous research confirmed that the predictive accuracy of MCM is superior over the LTAM (Abhishek et al., 2015; Anderl et al., 2014; 2016).

Robustness

This criterion is satisfied when the model produces reliable results if it runs numerous times (Anderl et al., 2014). The results of the estimated model are interesting on itself, but in order to generalize it to the whole campaign, the estimates need to be robust. Therefore, these estimates are compared to those of other months. The differences are then compared to the differences between months in the LTAM. The results are displayed in table 6. The conclusion of this table is that the MCM is slightly more robust than the LTAM. What stands out the most is that the relative importance of Direct is the least robust, which makes sense since it is part of most customer journeys.

Average Difference in Relative Importance Channel LTAM MCM App 0.2pp* 0.3pp Direct 15.9pp 12.1pp Display 9.8pp 8.8pp Email 0.1pp 0.5pp Organic Search 0.8pp 2.5pp Paid Search 3.3pp 3.7pp Referral 4.8pp 3.7pp Social Network 0.2pp 0.4pp Average Difference 4.4pp 4.0pp *pp = percentage point Table 6: Robustness Check Interpretability

(24)

23

relationships among channels the model also configures a transition matrix, which can be seen in table 7. This shows the probabilities from transitioning from.a channel on the vertical axis to a channel on the horizontal axis. To simplify this table, and only highlight the most important relationship I also included a visual representation of the model in figure 8.

To

Conversion App Direct Display Email

Organic Search Paid Search Referral Social Network From Start NA 0.01 0.44 0.06 0.02 0.27 0.04 0.15 0.00 App 0.50 NA 0.41 0.01 0.00 0.06 0.00 0.02 0.00 Direct 0.84 0.01 NA 0.03 0.01 0.05 0.01 0.05 0.00 Display 0.09 0.01 0.40 NA 0.01 0.24 0.04 0.22 0.00 Email 0.33 0.01 0.41 0.03 NA 0.14 0.01 0.07 0.00 Organic Search 0.57 0.00 0.34 0.02 0.00 NA 0.01 0.05 0.00 Paid Search 0.44 0.00 0.24 0.04 0.01 0.18 NA 0.09 0.00 Referral 0.61 0.00 0.26 0.02 0.00 0.09 0.01 NA 0.00 Social Network 0.55 0.00 0.38 0.01 0.00 0.04 0.00 0.02 NA

Table 7: Transition Matrix Versatility

Versatility is the combination of adaptability and ease of control, in which adaptability is the capability to incorporate new information over time, and ease of control refers to the extent to which user can adjust inputs according to the company’s requirements. The proposed model can be considered as moderately versatile. The reason for this is the fact that is linked to the Google Analytics data. The upside of this is that it can easily incorporate data from every previous period, and that there are numerous ways to modify the data to the user’s wishes. The downside of this is that it is very difficult to add external data (e.g. Facebook campaign data) to the model.

Algorithmic efficiency

(25)

24

with 185.000 touchpoints. Even when the combination of channels increases exponentially when estimating the higher-order models, the results flow in after 2.46 seconds. This may seem quick, but the estimation of the LTAM is even faster, 0.04 second. I also measured the time it took to collect the data, import it to R, and run the analysis. This took almost three minutes (230 seconds), which is extremely quick compared to the average analysis carried out at NS.

Figure 8: Visual representation of the transition matrix, only probabilities which are > .1

4.5 Managerial Implications

Since the LTAM is the currently used model, the assumption is that the digital marketing budget is divided accordingly.

(26)

25

adaptations. Ideally this analysis would run behind a dashboard, so that people get a real-time insight into the performance of the campaign.

Secondly, it will be a relief for NS to know that their marketing efforts matter much more than was initially thought, when the LTAM was still used. Direct traffic is, however, still the biggest source of sales. This is due to the fact that NS is a monopolist and people know how to find NS, they do not need to compare different providers of the same servers, since there are none.

Display traffic appears to be of greater importance according to the MCM, in fact, it shows the

highest increase in relative importance of all the channels, when switching from a LTAM to a MCM. This finding is interesting since this channel can directly be influenced by marketing budget. Since Display traffic shows the biggest increase in relative importance, managers should consider increasing the budget for this channel. Right now, this channel is thought of to be of little importance, hence, the budget is divided accordingly. With this small budget it manages to acquire a significant amount of sales, so it would be interesting to see what would happen with an increase in budget.

(27)

26

Chapter 5: Conclusion, Limitations & Future Research

In this chapter I will conclude this research by answering the research question. This will be done by evaluating the hypotheses. In addition to this, the limitations of this study will also be discussed, as well as my ideas for future research.

5.1 Conclusion

This research aims to identify the differences between heuristic attribution methods and data-driven methods, in a monopolistic setting, which is new in this field of research. To do so, I compared a Last Touch Attribution Model to a Markov Chain Model, when it comes to assigning the relative importance to different channels, at NS. I hypothesized that the relative importance of the channel Display would increase when NS switches from a LTAM to a MCM, and that this would decrease for the channel Direct. After estimating both the LTAM and the MCM, these hypotheses are proven to be true. Furthermore, I also hypothesized that these effects are strengthened by both the length of the customer journey and the Markov order. The outcome of the analysis indeed showed that an increasing length of the average customer journey caused the effects to be bigger. However, I did not find any empirical evidence which proved that a higher Markov order also strengthened the before mentioned relations.

After the estimation the model was evaluated on the criteria of Anderl et al. (2014). The model scored well on the following criteria: Objectivity, Interpretability, Robustness and Algorithmic

Efficiency. It scored average on Versatility, mainly because of the difficulties in integrating

external data. The model scored low on Predictive Accuracy, since it has no predictive power due to the fact that the data only includes customer journeys which lead to conversion.

For NS the model has a significant value, since it gives a much more accurate insight into the true values of the different channels. Now they can start applying the model to their current campaigns and make real-time adjustments to their budgets according to the output of the model.

5.2 Limitations & Future Research

(28)

non-27

converting customer journeys. In the future this can be solved by using clickstream data instead of Google Analytics data, which allows to analyze non-converting journeys as well. The potential downside of this type of data is the potential threat of ‘fake clicks’. This phenomenon happens when bots generate traffic to websites to drain advertising budgets, since many companies pay per click. This does not influence the current dataset, since bots do not convert. The second limitation of this study is also data-related. NS is a company in a monopolistic environment. While this is very interesting, as it potentially allows for different dynamics than in competitive environments, it also has one main limitation. People do not have to compare multiple providers, and people know what NS has to offer, that is why customer journeys are very short on average including a high percentage of one-click journeys. For these journeys, no sophisticated attribution models are needed, as it gives the same outcome as heuristic based methods. This means that a sophisticated model is less important for NS than for a company which is in a highly competitive environment, where knowing the customer journey dynamics can create a serious competitive advantage.

The last limitation is the incompleteness of the model. As of now, it is not possible to integrate Facebook campaign data into Google Analytics. Therefore, a channel is left out which, budget wise, plays a significant role in the marketing mix of NS. Also, offline channels are not included. If there is a big television campaign launched during the time of the attribution analysis, then it may have effects on the online channels. These, effects are not measured with an attribution model. The inclusion of on- and offline channels are known as marketing-mix-modeling and are out of scope for this research. Other external effects, like holidays where people buy more tickets for a day out, are also excluded from this research.

(29)

28

References

Abhishek, V., Fader, P., & Hosanagar, K. (2012). Media exposure through the funnel: A model of multi-stage attribution.

Anderl, E., Becker, I., Wangenheim, F. V., & Schumann, J. H. (2014). Mapping the customer journey: A graph-based framework for online attribution modeling. Available at SSRN

2343077.

Anderl, E., Becker, I., Von Wangenheim, F., & Schumann, J. H. (2016). Mapping the customer journey: Lessons learned from graph-based online attribution modeling. International Journal

of Research in Marketing, 33(3), 457-474.

Archak, N., Mirrokni, V. S., & Muthukrishnan, S. (2010). Mining advertiser-specific user behavior using adfactors. In Proceedings of the 19th international conference on World wide

web (pp. 31-40). ACM.

Berman, R. (2018). Beyond the last touch: Attribution in online advertising. Marketing Science, 37(5), 771-792.

Chandler-Pepelnjak, J. (2009). Measuring roi beyond the last ad. Atlas Institute Digital

Marketing Insight, 1-6.

Dalessandro, B., Perlich, C., Stitelman, O., & Provost, F. (2012). Causally motivated attribution for online advertising. In Proceedings of the Sixth International Workshop on Data

Mining for Online Advertising and Internet Economy (p. 7). ACM.

Deloitte (2018). IAB Report on 2017 Digital Advertising Spend.

Ghose, A., & Todri, V. (2015). Towards a digital attribution model: Measuring the impact of display advertising on online consumer behavior. Available at SSRN 2672090.

Jordan, P., Mahdian, M., Vassilvitskii, S., & Vee, E. (2011). The multiple attribution problem in pay-per-conversion advertising. In International symposium on algorithmic game theory (pp. 31-43). Springer.

Lovett, J. (2009). A framework for multicampaign attribution measurement. Forrester

Research.

Kaushik, A. (2012). Multi-channel attribution: Definitions, models and a reality check.

Kannan, P.K., Reinartz, W., Verhoef, P.C. (2016). The path to purchase and attribution modeling: Introduction to special section. International Journal of Research in Marketing, 33(3), 449-456.

(30)

29

Kireyev, P., Pauwels, K., & Gupta, S. (2016). Do display ads influence search? Attribution and dynamics in online advertising. International Journal of Research in Marketing, 33(3), 475-490.

Lee, G. (2010). Death of 'Last Click Wins': Media Attribution and the Expanding Use of Media Data. Journal of Direct, Data and Digital Marketing Practice, 12(01), pp. 16-26.

Leeflang, P. S., & Wittink, D. R. (2000). Building models for marketing decisions: Past, present and future. International journal of research in marketing, 17(2-3), 105-126.

Li, H., & Kannan, P. K. (2014). Attributing conversions in a multichannel online marketing environment: An empirical model and a field experiment. Journal of Marketing Research,

51(1), 40-56.

Li, Y., Xie, Y., & Zheng, E. (2017). Modeling Multi-Channel Advertising Attribution Across Competitors. Available at SSRN 3047981.

Lilien, G. L. (2011). Bridging the academic–practitioner divide in marketing decision models.

Journal of Marketing, 75(4), 196-210.

Little, J. D. (1970). Models and managers: The concept of a decision calculus. Management

science, 16(8), B-466.

Lodish, L. M. (2001). Building marketing models that make money. Interfaces,

31(3_supplement), S45-S55.

Raman, K., Mantrala, M. K., Sridhar, S., & Tang, Y. E. (2012). Optimal resource allocation with time-varying marketing effectiveness, margins and costs. Journal of Interactive

Marketing, 26(1), pp. 43-52.

Reekie, W. D. (1981). Advertising and Monopoly. In The Economics of Advertising (pp. 97-115). Palgrave Macmillan, London.

Shao, X., & Li, L. (2011). Data-driven multi-touch attribution models. In Proceedings of the

17th ACM SIGKDD international conference on Knowledge discovery and data mining, pp.

258-264. ACM.

Tucker, C. (2013). The implications of improved attribution and measurability for antitrust and privacy in online advertising markets. Geo. Mason L. Rev., 20, 1025.

Villeneuve, J. P., & Pasquier, M. (2017). Marketing management and communications in the

public sector. Routledge.

Xu, L., Duan, J. A., & Whinston, A. (2014). Path to purchase: A mutually exciting point process model for online advertising and conversion. Management Science, 60(6), 1392-1412.

(31)

30

Appendix A: R-code

#Install the libraries

install.packages("ChannelAttribution") install.packages("ggplot2") install.packages("reshape") install.packages("dplyr") install.packages("plyr") install.packages("reshape2") install.packages("plotly") install.packages("curl") #Load the libraries

library("ChannelAttribution") library("ggplot2") library("reshape") library("dplyr") library("plyr") library("reshape2") library("plotly") library("purrrlyr") library("visNetwork") library("tidyr")

rm(list = ls()) #clear workspace setwd("H:/My Documents/R")

Dataset_Thesis_mrt2019 <- read.csv("MCF Marciano - mrt2019.csv")

Heuristicmrt2019 <- heuristic_models(Dataset_Thesis_mrt2019, var_path = 'Channel.group.path',

(32)

31

markov_modelmrt2019 <- markov_model(Dataset_Thesis_mrt2019, var_path ='Channel.group.path', var_conv = 'Total.conversions', var_value='Total.conversion.value', order = 1, out_more = TRUE) model_resultsmrt2019 <- markov_modelmrt2019$result # Merges the two data frames on the "channel_name" column.

Rmrt2019 <- merge(Heuristicmrt2019, model_resultsmrt2019, by='channel_name') # Select only relevant columns

R1mrt2019 <- Rmrt2019[, (colnames(Rmrt2019) %in% c('channel_name', 'last_touch_conversions', 'total_conversions'))]

# Transforms the dataset into a data frame that ggplot2 can use to plot the outcomes R1mrt2019 <- melt(R1mrt2019, id='channel_name')

# Plot the LTA vs Markov Chain Model

ggplot(R1mrt2019, aes(channel_name, value, fill = variable)) + geom_bar(stat='identity', position='dodge') +

ggtitle('Last Touch vs Markov Chain 2018') + theme(axis.title.x = element_text(vjust = -2)) + theme(axis.title.y = element_text(vjust = +2)) + theme(title = element_text(size = 16)) +

theme(plot.title=element_text(size = 20)) +

ylab("") + labs(fill = "Attribution Model") + scale_fill_discrete(labels = c("Last Touch Model", "Markov Chain Model"))

# Extract the transition matrix

transition_matrixmrt2019 <- markov_modelmrt2019$transition_matrix

transition_matrixmrt2019$transition_probability <-

(33)

32

# Make a real matrix

realtransition_matrixmrt2019 <- dcast(markov_modelmrt2019$transition_matrix, channel_from ~ channel_to, value.var = 'transition_probability')

#Exclude the probabilities which are <= 0.1

transition_matrixmrt2019 <- transition_matrixmrt2019[which(transition_matrixmrt2019$transition_probability >= 0.1),] transition_matrixmrt2019 <- transition_matrixmrt2019 %>% dmap_at(c(1, 2), as.character) ##### viz ##### edgesmrt2019 <- data.frame( from = transition_matrixmrt2019$channel_from, to = transition_matrixmrt2019$channel_to, label = round(transition_matrixmrt2019$transition_probability, 2), font.size = transition_matrixmrt2019$transition_probability * 100, width = transition_matrixmrt2019$transition_probability * 20, shadow = TRUE, arrows = "to","from",

color = list(color = "000066", highlight = "red") )

nodesmrt2019 <- data_frame(id = c( c(transition_matrixmrt2019$channel_from), c(transition_matrixmrt2019$channel_to) )) %>% distinct(id) %>% arrange(id) %>% mutate( label = id, color = ifelse(

label %in% c('(start)', '(conversion)'), '#FFFFFF',

(34)

33

shadow = TRUE, shape = "box" )

# Make the diagram visNetwork(nodesmrt2019, edgesmrt2019,

main = "Markov Chain Model where prob >.1") %>% visIgraphLayout(randomSeed = 123) %>%

visNodes(size = 5) %>%

visOptions(highlightNearest = TRUE)

#Extracting the amount of conversions per conversion path size

Conversionvalue_by_mrt2019<-aggregate(Total.conversion.value ~ Conversion.path.length, data = Dataset_Thesis_mrt2019, sum)

Conversionvalue_by_mrt2019[,"cum_conversion_value"] <-

cumsum(Conversionvalue_by_mrt2019$Total.conversion.value)

Conversionvalue_by_mrt2019 <- Conversionvalue_by_mrt2019 %>% filter(Conversion.path.length < 40)

Conversions_by_mrt2019<-aggregate(Total.conversions ~ Conversion.path.length, data = Dataset_Thesis_mrt2019, sum)

Conversions_by_mrt2019[,"cum_conversion_value"] <-

cumsum(Conversions_by_mrt2019$Total.conversions)

Conversions_by_mrt2019 <- Conversions_by_mrt2019 %>% filter(Conversion.path.length < 40)

Dataset_Thesis_mrt2019$Total.conversions<-as.numeric(levels(Dataset_Thesis_mrt2019$Total.conversions))[Dataset_Thesis_mrt2019$To tal.conversions]

Conversions_by_mrt2019<-aggregate(Total.conversions ~ Conversion.path.length, data = Dataset_Thesis_mrt2019, sum)

(35)

34 sum(Conversions_by_mrt2019$multiplied)/sum(Conversions_by_mrt2019$Total.conversion s) sum(Conversions_by_mrt2019$multiplied) New_dataset<-Dataset_Thesis_mrt2019 New_dataset$Total.conversion.value <- New_dataset$Total.conversion.value/New_dataset$Total.conversions New_dataset<-New_dataset %>% uncount(Total.conversions) New_dataset$Total.conversions <- 1 Dataset_Thesis_mrt2019$Total.conversions<-as.numeric(levels(Dataset_Thesis_mrt2019$Total.conversions))[Dataset_Thesis_mrt2019$To tal.conversions] Dataset_Thesis_mrt2019[, "multiplied_paths"] <- Dataset_Thesis_mrt2019$Conversion.path.length * Dataset_Thesis_mrt2019$Total.conversions summary(New_dataset) sd(New_dataset$Conversion.path.length) ggplot(data=Conversions_by_mrt2019, aes(x=Conversion.path.length, y=Total.conversion.value, group=1)) + geom_point(color = 'blue')

## 75% of the sample size

smp_size <- floor(0.5 * nrow(New_dataset)) ## set the seed to make your partition reproducible set.seed(123)

train_ind <- sample(seq_len(nrow(New_dataset)), size = smp_size) train <- New_dataset[train_ind, ]

test <- New_dataset[-train_ind, ]

(36)

35

var_value='Total.conversion.value')

markov_modeltrain <- markov_model(train, var_path ='Channel.group.path', var_conv = 'Total.conversions',

var_value='Total.conversion.value', order = 1,

out_more = TRUE) model_resultstrain <- markov_modeltrain$result

# Merges the two data frames on the "channel_name" column.

Rtrain <- merge(Heuristictrain, model_resultstrain, by='channel_name') # Select only relevant columns

R1train <- Rtrain[, (colnames(Rtrain) %in% c('channel_name', 'last_touch_conversions', 'total_conversions'))]

# Transforms the dataset into a data frame that ggplot2 can use to plot the outcomes R1train <- melt(R1train, id='channel_name')

ggplot(R1train, aes(channel_name, value, fill = variable)) + geom_bar(stat='identity', position='dodge') + ggtitle('Train ') + theme(axis.title.x = element_text(vjust = -2)) + theme(axis.title.y = element_text(vjust = +2)) + theme(title = element_text(size = 16)) + theme(plot.title=element_text(size = 20)) +

(37)

36

realtransition_matrixtrain <- dcast(markov_modeltrain$transition_matrix, channel_from ~ channel_to, value.var = 'transition_probability')

###TestData

Heuristictest <- heuristic_models(test, var_path = 'Channel.group.path', var_conv = 'Total.conversions',

var_value='Total.conversion.value')

markov_modeltest <- markov_model(test, var_path ='Channel.group.path', var_conv = 'Total.conversions',

var_value='Total.conversion.value', order = 1,

out_more = TRUE) model_resultstest <- markov_modeltest$result

# Merges the two data frames on the "channel_name" column. Rtest <- merge(Heuristictest, model_resultstest, by='channel_name') # Select only relevant columns

R1test <- Rtest[, (colnames(Rtest) %in% c('channel_name', 'last_touch_conversions', 'total_conversions'))]

# Transforms the dataset into a data frame that ggplot2 can use to plot the outcomes R1test <- melt(R1test, id='channel_name')

(38)

37

transition_matrixtest <- markov_modeltest$transition_matrix

realtransition_matrixtest <- dcast(markov_modeltest$transition_matrix, channel_from ~ channel_to, value.var = 'transition_probability')

removal_effects_test<-markov_modeltest$removal_effects removal_effects_train<-markov_modeltrain$removal_effects arrange(removal_effects_test, channel_name)

arrange(removal_effects_train, channel_name)

t.test(removal_effects_test$removal_effects_conversion,

t.test(removal_effects_train$removal_effects_conversion, paired = TRUE, alternative = "two.sided")

(39)

38

Master’s Thesis: Attribution Modeling

Marciano Bootsman

June 27th 2019

Problem

■ For the first time in 2017,the market for online marketing was bigger than

the market for offline marketing in the Netherlands.

■ Big advantage of online marketing = marketing can be personalized, and

performance can be tracked and analyzed.

■ Quantifying the influence of each channel on the customer journey is a

complicated process, known as the attribution problem (Abhishek et al., 2012).This analysis isa problem for many companies due to a lack of skills and knowledge.

(40)

39

Heuristic based attribution models

3

■ Customer has an own customer journey

leading to conversion.

■ Attribution model decides in what way

credits are divided

■ ATM NS uses Last Touch Attribution (LTA)

model, where all credits are given to the last channel on the journey

Attribution Modeling

Markov Chain Model

■ ‘State-of-the-art’ model

■ Data-driven and the output are the probabilities

of switching from one channel to another

■ Fair: every channel on the customer journey is

considered

■ Removal effects are calculated ■ Memory effects can be included

(41)

40

Contribution?

■ Relatively new field

■ Monopolistic position of NS gives unqique possibility to do research

in a monopolistic enironment, control for competition, which is not done before

5 Attribution Modeling

Hypotheses

■ H1: The value of display increases when

switching from Last Touch Attribution to a Markov Chain Model.

■ H2: The value of direct traffic decreases when switching from Last Touch Attribution to a Markov Chain Model.

■ H3: The effects of H1 and H2 are strengthened

by the length of the customer journey.

■ H4: The effects of H1 and H2 are strengthened

by the order of the Markov Chain.

(42)

41

Evaluation criteria of a good attribution model (Anderl et

al., 2014)

■ Objectivity ■ Predictive accuracy ■ Robustness ■ Versatility ■ Algorithmic efficiency Attribution Modeling 7

Results

Attribution Modeling 8 0% 10% 20% 30% 40% 50% 60% 70%

Search SearchPaid Referral SocialNetwork Last Touch Attribution Model

+ 36% - 23% + 657% _{+ 110%} + 15% + 55% + 10% + 23% 0% 20% 40% 60% 80%

Search SearchPaid Referral NetworkSocial Total Conversions in march 2019 per channel in %

Last Touch Attribution Model Markov Chain Model Order 1

+ 84% - 40% + 864% + 184% + 91% + 110% + 43% + 185% 0% 20% 40% 60% 80%

App Direct Display Email Organic Search

Paid Search

Referral Social Network Total Conversions in march 2019 per channel in %

without one-channel-paths

Last Touch Attribution model Markov Chain Model

0% 50% 100%

App Direct Display Email Organic Search

Paid Search

Referral Social Network Total Conversions in march 2019 per channel in %

with different markov chain orders

Last Touch Attribution Model Markov Chain Model Order 1

(43)

42

Results

9 Attribution Modeling

Implications & Conclusion

■ Marketing efforts of NS are effective & more money should be spend

on display advertising

■ To get a more complete picture, clickstream data should be analyzed ■ Markov Chain model can be used to evaluate NS’s current campaigns

and will help them with optimizing their digital marketing mix