• No results found

‘What is the effect of (combined) FITs on website visits, and how is this effect mediated by search engine visits?’

N/A
N/A
Protected

Academic year: 2021

Share "‘What is the effect of (combined) FITs on website visits, and how is this effect mediated by search engine visits?’"

Copied!
60
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

‘What is the effect of (combined) FITs on website visits, and

how is this effect mediated by search engine visits?’

Master Thesis

MSc Marketing Management & Intelligence

Joris Gigengack First supervisor: P. Van Eck Second supervisor: A. Bhattacharya

University of Groningen Faculty of Economics & Business

Department of Marketing

(2)

ABSTRACT

The online advertisements industry is growing fast and fulfills a crucial role in the success of present-day companies. Customer journeys are becoming more and more important to measure since this data could help managers optimize their online strategies. This study investigates what the influence is of several Firm Initiated Touchpoints (FITs) on website visits and whether this effect is mediated by search engine visits. Furthermore, it investigates whether synergy effects between certain Firm Initiated Touchpoints lead to a higher probability of reaching a website or search engine. A logistic regression analysis is performed on data provided by the Gfk to perform this research.

(3)

Table of Contents

1. Introduction 5

2. Theoretical framework 8

2.1 Definitions 8

2.1.1 FITs and CITs 8

2.1.2 Display advertising 8

2.1.3 Search engine visits 9

2.2 The value of FITs and CITs 9

2.3 Synergistic effects 10

2.4 Branded search engine visits 11

2.5 The secondary effect of display ads 12

2.6 Conceptual framework 13 3. Research design 14 3.1 Data collection 14 3.2 Choice of variables 15 3.2.1 Overview of variables 15 3.2.2 Computed variables 17 3.2.3 Control variables 19 3.3 Synergistic effect 19 3.4 Mediation effect 20 3.5 Choice of technique 21 3.6 Plan of analysis 22 4. Results 24 4.1 Preliminary checks 24

4.2 Generated datasets and descriptives 26

4.3 Model selection 28

4.4 Robustness check 31

4.5 Further model improvements 33

4.5.1 Assessing interval models 33

4.5.2 Assessing binary models 35

4.5.3 Assessing the control variable 36

(4)

5. Discussion and Outlook 40

5.1 Discussion 40

5.1.1 FITs on website visits 40

5.1.2 Display FITs on search engine visits 41

5.1.3 Synergy effects 42

5.1.4 Mediation effect 43

5.1.5 Other discussion topics 43

5.2 Implications 44

5.3 Limitations and further research 44

Acknowledgements 45

References 46

(5)

1. Introduction

The online advertising market is growing, and it is growing fast. According to the American Marketing Association (2018) the global internet advertising spending will grow 40% in the next two years, while it already was a $229 billion industry in 2017 (Statista, 2017). These forecasts are also predicted in literature (Ho and Dempsey 2010). Online advertising has become more complex over the years with the increasing number of touchpoints and different channels that a customer experiences during the customer journey (Lemon & Verhoef 2016). Touchpoints are defined as “any form of contact a customer/consumer has with the firm, brand or product” by Harrison (p. 182, 2013). The understanding of the customer journey of (potential) customers during online browsing is a key factor in the allocation of the online advertising budget of different channels and different touchpoints (Court et al., 2009). It is widely agreed upon that the customer decision journey has to be understood and evaluated on every touchpoint level in order to become and/or remain succesful (Wiesel, Pauwels & Arts, 2010). This is especially the case for digital touchpoints as companies that understand digital touchpoints are 2.5 times more likely to convert sales than firms that do not have this knowledge (Bughin, 2015).

Throughout the online customer journey there are two major distinctions to be made in the type of touchpoints (Bowman & Narayandas, 2001); touchpoints that are initiated by the advertiser, so called Firm Initiated Touchpoints (FITs) such as email, bannering, prerolls, affliate and retargeting. The other type of touchpoints are initiated by the customer itself, so called Customer Initiated Touchpoints (CITs), such as search queries or website visits. It has been widely researched that CITs are more sales effective than FITs (Li & Kannan, 2014; Shankar & Malthouse, 2007; Wiesel et al., 2010) because these touchpoints are in a further stage in the conversion funnel and require own interest and action of the customer (Alba et al., 1997). Even though this is the case, FITs are also proven to be beneficial to firm’s performance (Kumar & Pansari, 2016; Joshi & Hanssens, 2010). There are different types of FITs and there have been several academic papers that research the difference in direct conversion contribution (Anderl et al., 2016; de Haan et al., 2016).

(6)
(7)

By the explanation of focus and goal of this research paper, the following research questions are established:

‘What is the effect of (combined) FITs on website visits, and how is this effect mediated by search engine visits?’

‘Is there a synergistic effect between display FITs on website visits and search engine visits and how strong is this effect?’

This research paper contributes to existing literature since it not only researches the direct effect of FITs on the CIT ‘website visits’, but also takes into account the possible partial mediating effect of search engine visits on several FITs. Therefore, this research is in line with several studies that integrate multiple marketing communications (e.g. display ads, e-mail, affiliate) on website visits (Ansari, Mela and Neslin, 2008; Lewis and Nguyen, 2012; Naik and Raman, 2003), but additionally adds the mediation effect of search engine visits as research objective.

The more extensive research into branded search visits is still lacking according to Hu, Du, and Damangir 2014, who claim that little literature is written about this topic while it definitely requires new insights. Dotson et al. (2017) confirm that explicitly the role of search engine visits towards the performance of a firm severely lack in research insights.

Moreover, this paper does not research purchase conversion but focuses solely on the customer traffic, since there is also significant value to paths that do not lead to a purchase (Petersen et al. 2009). Furthermore, converting to purchase on the website is also influenced by several other factors such as price, where customer traffic is less influenced by this and customer traffic is seen as a reasonable proxy for sales and website success (Pan et al. 2002). This research aims to generate useful insights for managers in several ways. First of all, the traffic stream of customers (through search engine visits or directly to website visits) can decide how firms might design their display ads. If display ads are mediated by search engine visits, it should be more aimed at awareness and brand recall; if it is not, it should be fully aimed at click-through. Second, which FIT has the strongest effect towards generating traffic to the focus brand website and whether synergy effects occur between FITs can imply how to allocate the budget of online marketing spending for firms.

(8)

To answer the research questions stated above, data from a Dutch travel agency will be analyzed, acquired by the Gfk (Growth from knowledge) institute. In the following chapter the theoretical framework of this research paper and existing literature will be described. After this, the methodology and research design will be proposed and the data collection will be explained. Then the results will be shown, followed by a discussion of these findings and a conclusion. Finally, implications and limitations of this study will be elaborated and recommendations for further research will be given.

2. Theoretical framework

In this chapter the definitions and terms used in this paper will be elaborated, and an overview of existing literature regarding the scope of this research will be given. Based on this literature, the expected hypotheses will be constructed, followed by a representation of these hypotheses in a conceptual framework.

2.1 Definitions

2.1.1 FITs and CITs

Companies have several different channels to communicate their online message to the consumer. To explain which touchpoints lead to the most website visits, it is important to distinguish the difference between Firm Initiated Touchpoints (FITs) and Customer Initiated Touchpoints (CITs). The first is defined as every touchpoint that is initiated by the firm itself (Shankar & Malthouse, 2007). The CITs are defined as touchpoints that are triggered by the customer’s actions (Li & Kannan, 2014; Wiesel et al., 2010). The different FITs in this research paper are preroll, bannering, email and affiliate. Preroll is defined as an online display advertisement that is placed before a video starts (Krishnan & Sitaraman, 2013). Affiliate advertising is defined as the joint online advertising of websites where the affiliate website posts a coded link that directs a visitor to the parent website (Haq, 2012).

2.1.2 Display advertising

(9)

is processed by the customer’s mind and have the goal and are proven to create brand awareness/brand recall or click-through behaviour (Fulgoni & Morn, 2009).

Also, bannering and online video advertisements as pre rolls have been grouped as display advertising before, as described in the research of Goldfarb & Tucker (2011). Furthermore, the definition of display advertising as stated in Chapelle et al. (p. 1, 2014) “a form of online advertising in which advertisers pay publishers for placing graphical ads on their web pages” comprise both the channels preroll and bannering.

2.1.3 Search engine visits

In this paper, search engine visits are all search queries that include the focus brand name i.e. branded search. This type search queries are called ‘branded search’ and defined as “any query that contains at least one brand-related keyword” (p. 521, Joo, Wilbur & Zhu, 2016) in order to search brand-related products. Generic search is “any query that does not contain any brand-related keywords” (p. 521, Joo et al., 2016) and this touchpoint is deliberately not included because generic search queries are not interesting in the scope of this research, which will be explained in chapter 3.

2.2 The value of FITs and CITs

The relationship between FITs and CITs has been widely researched with the main focus on sales conversion (De Haan et al., 2016; Li & Kannan, 2014; Wiesel et al., 2010). CITs have a significant stronger effect on direct sales conversion and this effect can be contributed towards the stage that the customer is in. As Lemon & Verhoef (2016) explain, the customer starts in the cognitive stage where it will be confronted with FITs in order to develop to the affective stage where the customer will start to evaluate the offer. In this stage the customer is already closer to a purchase and actively searches for more information which often results in a CIT. Therefore it is evident that CITs are closer to the purchase conversion than FITs which explains the proven effect that CITs contribute more to sales conversion (Sarner & Herschel, 2008; Wiesel et al., 2010). Also, a customer that initiates the touchpoint on his own generates less intrusiveness which creates higher interest (Shankar & Malthouse, 2007), whereas FITs are becoming increasingly unwanted exactly for this feeling of intrusiveness (Blattberg, Kim & Neslin, 2008). Even though a lot of existing literature focuses on the purchase conversion, this is not the only metric of success of FITs on firm’s performance.

(10)

Previous research emphasizes that only focusing on conversion paths leaves out valuable information (Petersen et al., 2009).

The conversion of different FITs to CITs has been researched in previous literature. Li & Kannan (2014) show that several spillover effects between FITs such as referral (affiliate) or display ads and the search channel exist. Duffy (2005) explains that affiliate marketing can lead to more website visits, and Sherman & Deighton (2001) address the effect of email marketing on website visits. The FITs preroll and bannering (display) have very low click-through rates according to existing literature (Fulgoni & Morn, 2009; Li & Kannan, 2014; Kireyev et al., 2015) and this can be partly addressed to the fact that display ads also create their value through brand recall and search visits (Lewis and Nguyen, 2012). Xu, Duan, and Winston (2014) found that display advertisements can lead to search advertisement clicks. Existing literature suggests that all FITs included in this paper have a positive effect on website visits. However, preroll and bannering seem to have a side effect through search engine visits which could reduce the strength of the direct effect. This effect is not shown for FITs email and affiliate, which are regarded as channels that solely generate direct visits (Reimers et al., 2016). Main reason that both channels expect to only generate direct click-throughs is their appearance; where preroll and bannering show images, colors and graphics that increase awareness and recall of the brand (Luk et al., 2002), affiliate advertising is very often only a coded link (Goodman, 2005). Therefore it is expected that the direct effect of FITs email and afffiliate is stronger than the direct effect of FITs preroll and bannering. Summarizing the statements above results in the following hypotheses:

H1: Preroll has a positive effect on website visits. H2: Bannering has a positive effect on website visits. H3: Affiliate has a positive effect on website visits.

H3.B: Affiliate has a stronger positive effect on website visits than preroll and bannering. H4: Email has a positive effect on website visits.

H4.B: Email has a stronger positive effect on website visits than preroll and bannering.

2.3 Synergistic effects

(11)

When commonalities between different marketing channels are developed and shown, this combination will create a unity in the consumer’s mind, which is beneficial to the company in positive contribution towards the brand image and recall.

Also, Lim et al. (2015) found the multiple source effect, which explains that consumers put more effort in exploring a new message when it is shown in a new source or channel. Since prerolls and banners are different channels, but have commonalities, both theories support a synergy effect here. A synergy effect is defined as the phenomenon where the combined effect of multiple activities exceeds the sum of their individual effects (Naik and Raman, 2003). There is extensive research about the requirement of using multiple marketing channels to get customers to purchase (Frambach, Roest & Krishnan, 2007; Gensler, Verhoef & Böhm, 2012) where many studies also only consider online channels (Blake et al., 2015; Li & Kannan, 2014). Multiple studies have found evidence about synergy effects between communication channels, such as the research by Naik and Peters (2015), which shows that 39% of overall media effectiveness in advertising is attributed to synergies. Naik and Peters (2015) also give extra ground to research this effect on the display channels preroll and bannering in particular: they show that synergistic effects have the strongest effect in the awareness/ affective stage. Since preroll and bannering are known to have impact on the awareness of the customer instead of direct click-through, as is stated in existing literature before, the synergistic effect between both channels is interesting to research. With the literature stated above, the following hypotheses regarding synergy effect are stated:

H5: Preroll and bannering relate synergistically to positively affect search engine visits H6: Preroll and bannering relate synergistically to positively affect website visits

2.4 Branded search engine visits

(12)

Several papers show significant positive results between search visits and sales (Lewis & Reiley, 2010), between search visits and conversion rates or between search visits and offline sales (Dinner et al., 2011; Ghose and Yang, 2009). However, Rutz and Bucklin (2011) argue that search visits, even if branded, have very low impact on sales.

This research focuses on the consumer traffic flow instead of sales conversion. Multiple researchers state that website visits is a good metric for the success of a site (Pan et al., 2002) and that branded search engine visits increase the chance of customers going to the website, which generates visits (Li et al., 2016). Branded search visits occur in a stage that consumers are further down the purchase funnel and like the brand. Search visits have proven to increase website visits, and therefore the following hypothesis is formulated:

H7: Search engine visits have a positive effect on website visits

2.5 The secondary effect of display ads

As explained in chapter 2.2, FITs bannering and preroll have multiple effects that can cause conversion (Fulgoni & Morn, 2009); direct click-through to website or through search engine visits by brand recall/brand awareness. According to eMarketer (2011) judging the effectiveness of prerolls on click-through is not sufficient; the indirect effect of these types of advertisements is also very important. This effect is addressed by several papers; Lewis and Nguyen (2012) found that online display advertising increased searches for the advertised brand by 30% to 45%, whereas Joo et al. (2011) found a significant correlation between television advertising and consumers’ tendencies to search for these brands online.

(13)

on the online platform (Chatterjee, 2008). This effect has also been tested by Drèze and Hussherr (2003), who showed that customers tend to avoid looking at display advertisements, but that multiple display ads still generate unaided brand recall.

Goldfarb & Tucker (2011) found that online display ads generate little brand recall or impact on customers. However, Ghose and Todri-Adamopoulus (2016) show in their research paper that consumers are influenced by display ads; they find that after seeing display advertisements consumers more actively seek information through search engines and use branded search queries. Furthermore, Song (2001) states that there is no strong relation between click-through rates and sales, but customers who are confronted with display ads do generate 10% more sales ánd traffic. This effect is due to the so called ‘awareness conversion’ which entails the indirect effect of these advertisements. The research also states that “Eighty percent of the overall sales increase resulted from customers who didn’t click on any ads, but eventually converted on the advertiser’s site”(p. 1). Since this research is based on data from an online travel company, it can be assumed that similar results will appear in this research study. This effect is also explained in the works of Papadimitriou et al. (2011) and Chan et al. (2010) as the ‘search lift’ that is caused by display advertisements.

Two effects of the display advertising channels preroll and bannering have been discussed. Since existing literature argues the direct effect, the indirect effect and both effects simultaneously, it is expected that there is a mediation effect, but no full mediation. This effect is in literature mainly found for display advertisements and not for the channels email and affiliate. As explained before, this relates to the structure of the channel whereas affiliate and email are more structured as direct click-through channel this is not necessarily the case for the display channels. Based on the literature and research results above, the following hypotheses are formulated:

H8: Search engine visits partially mediates the positive effect of preroll on website visits H9: Search engine visits partially mediates the positive effect of bannering on website visits

2.6 Conceptual framework

(14)

Lastly, control variables of demographics will be added to the research. This will be further elaborated in chapter 3.

Figure 1: Conceptual framework

In the next chapter, the methodology of the research design will be discussed. The researched variables will be thoroughly explained and the dataset will be described.

3. Research design

In the following chapter, the data and technique used to answer the research questions will be discussed. First, the research type of this paper will be explained and how the data is collected. Second, the data will be described and explanation will be given why this data is relevant and fits well to this research. Third, the choice of variables included in the research models will be explained as well as the choice of technique will be explained. Lastly the plan of analyses that will be executed is discussed.

3.1 Data collection

(15)

and realistic outcome of the stated hypotheses. Small part of the research is descriptive due to the descriptive statistics of the dataset that will be explained.

The sample used in this research is retrieved from the Gfk institute. In the time period between 1-6-2015 and 31-9-2016 online, event based data was collected among Dutch participants of the Gfk panel. The data is retrieved by a panel that measures online browsing behaviour and media consumption, which was tracked by browser plug-ins and/or audio devices. Since the data shows all touchpoints of each of the panelists as events with an time stamp, the data are event-based and time-series recorded, in other words panel (longitudinal) data (Wooldridge, p. 10, 2012). The detailed sampling process of the panelists is executed and performed by the Gfk.

The panel data consists of 9678 unique users who engaged in 29.012 customer journeys. Along this customer journey there are 22 different touchpoints identified with which the customer makes contact. The dataset used in this research consists of two types of data; retrieved from either a fixed panel or a mobile panel. The source of both data sets is the same. The first dataset contains 2.456.414 events (or ‘touchpoint touches’), which led to 3674 successful purchase conversions, of which 192 purchases are of the focal brand. The second dataset contains 11 demographic variables that describe the panelists. Part of those demographics will be used as control variables to minimize the effects of factors other than the independent variables.

3.2 Choice of variables

In the following chapter the variables ultimately used in this research will be explained. The chapter will start with an overview of all used variables, followed by explanation of the chosen computed variables and after this the control variables will be elaborated.

3.2.1. Overview of variables

(16)

The data will be analyzed on customer journey level (variable PurchaseID), resulting in every customer journey containing several touchpoints.

The touchpoints in this dataset are measured by a single variable, type_touch, which describes per event what touchpoint is ‘touched’ at this event.

Table 1: Overview of used variables

Variable Meaning of value

UserID Unique ID number of the corresponding user in the data.

PurchaseID Unique ID number of the customer journey that a specific user makes. GenderID 1 = Male

2 = Female

Age Age in years of corresponding user

Type_touch The type of touchpoint an user touches during their customer journey. 1 = Accommodations website

2 = Accommodations app 3 = Accommodations search

4 = Information/ comparison website 5 = Information/ comparison app 6 = Information/ comparison search

7 = Tour operator/ travel agent website competitor 8 = Tour operator/ travel agent app competitor 9 = Tour operator/ travel agent search competitor 10 = Tour operator/ travel agent website focus brand 12 = Tour operator/ travel agent search focus brand 13 = Flight tickets website

14 = Flight tickets app 15 = Flight tickets search 16 = Generic search

18 = Affiliate touchpoint (FIT) 19 = Banner touchpoint (FIT) 20 = Email touchpoint (FIT) 21 = Preroll touchpoint (FIT) 22 = Retargeting touchpoint (FIT)

(17)

the variable retargeting. For the scope of this research, the FIT retargeting will not be analyzed, since the aim is to research the stream of customers before they reach the website (DV) and retargeting often occurs after this stage (Anderl et al. 2016).

The mediation variable search engine visits only comprises branded search engine queries (Type_touch = 12, see table 1) and does not incorporate generic search queries (Type_touch = 16). Reason for this is that branded search implies interest, brand recall & awareness by the consumer (Dotson et al., 2017) whereas generic search does not. Therefore there is no ground to imply that generic search arises from FITs which is a crucial requirement in this research. Furthermore, the touchpoints that are not connected to the focus brand (Type_touch = 1 ~ 9 and 13,14,15,16) will not be used in this research, since the focus of this paper is the relationship between FITs and CITs. This relationship can only be suggested when the CITs comprise firm-linked browse behaviour.

3.2.2 Computed variables

Since the most important information about the customer journey is retrieved from one variable (Type_touch), new variables have to be computed to split the information. There are several ways to recode new variables originating from an existing variable. It is desirable to maintain as much information as possible when computing the new variables. Therefore the independent variables will be assessed on two levels; interval scale and binary scale. Reason for this is that it can be interesting to test both relations and discuss the difference in findings, whereas the binary interdependent variables might show a linear relation and where interval data could show more details about the frequency of variables and their impact.

The DV is on binary scale since the aim is to measure the occurrence of an event, not the amount. Therefore it is not necessary to assess the DV on interval level.

To compute variables that are as realistic as possible reflecting the traffic flow of customers, several conditions are written that have to be met. When these conditions are met the data will be assessed and when not the customer journey will generate zeros. Since the independent variables will be assessed as both interval and binary data, their values will either be the sum of events, or 0/1.

The conditions for the DV’s Website visits (WV) and Search engine visits (SV) are as followed:

(18)

The conditions for the IV’s Affiliate (FA), Bannering (FB), Email (FE), Preroll (FP), Search engine visits (SV) and interaction variable Preroll*Bannering (FP*FB) are as followed:

- FITs FA, FB, FE, FP, CIT SV and FP*FB are only counted when the corresponding customer journey has a WV (or SV, depending on the selected model) after one of the IV’s.

- The IV’s will be counted as interval data as well as binary, which results in number of contacts of that touchpoint in the selected customer journey assuming the condition above is met regarding the data is interval. It results in a 1 when the conditions above are met and a 0 when conditions are not met regarding when the data is assessed on binary level.

Table 2: Computed variables derived from Type_touch

Variable Type Description Index interval Index binary

WV DV Dummy

variable of website visits

0: no website visit made in customer journey

1: website visit made in customer journey

0: no website visit made in customer journey

1: website visit made in customer journey

SV DV Dummy

variable of search engine visits

0: no search engine visit made in customer journey

1: search engine visit made in customer journey

0: no search engine visit made in customer journey

1: search engine visit made in customer journey

FA IV Computed variable of FIT affiliate

Number of affiliate encounters in a certain customer journey.

0: Affiliate not encountered in customer journey 1: Affiliate encountered in customer journey SV IV Computed variable of CIT search engine visits

Number of search engine visits in a certain customer journey.

0: Search engine visit not encountered in customer journey 1: Search engine visit

encountered in customer journey

FB IV Computed variable of FIT Bannering

Number of bannering encounters in a certain customer journey.

0: Bannering not encountered in customer journey 1: Bannering encountered in customer journey FE IV Computed variable of FIT Email

Number of email encounters in a certain customer journey.

0: Email not encountered in customer journey

1: Email encountered in customer journey

FP IV Computed variable of FIT Preroll

Number of preroll encounters in a certain customer journey.

0: Preroll not encountered in customer journey 1: Preroll encountered in customer journey FP*FB IV Computer variable of interaction effect FITs Preroll and Bannering

Multiplied number of encounters where Preroll and Bannering occur in a certain customer journey.

0: Both Preroll and Bannering does not occur in a certain customer journey

(19)

The detailed conditions of the data loop and when a row is written will be explained in chapter 3.6. As computation steps are explained above, the variables that will be used in the analysis, derived from variable Type_touch, are shown in table 2.

3.2.3 Control variables

In the dataset used for this research, 11 demographic variables are included. The demographic control variables used in this research are gender and age. Both variables can influence browse behaviour according to Dannaher et al. (2006). Also Dreze & Hussherr (2003) find correlation between age and website visits. To the author’s knowledge, only age and gender have proven to affect the relation between FITs and website visits/ search engine visits and therefore the 9 other demographic variables are excluded from this research. Without academic relevance it is not interesting for this research to test extra demographic variables. The variable gender will be recoded as a binary variable, so that 0=male and 1=female for interpretation purposes. Besides the provided demographic variables, a third computed control variable is added; the length of a customer journey.

This variable is computed by summing up all events in a single customer journey, resulting in the length of a customer journey measured as the amount of events that occur in a journey. Manchanda et al. (2006) confirm in their research that the number of touches to which a customer is exposed can influence their purchase probabilities and browse behaviour. Therefore, three control variables will be used in the model.

3.3 Synergistic effect

The synergy effect between display advertisements preroll and bannering are hypothesized for both the CIT website visits and the search engine visits. Synergistic effects occur when the combined effect of multiple activities exceeds the sum of their individual effects (Naik and Raman, 2003). Since both individual effects are already incorporated in the model, as the direct effects are separate hypotheses, only the interaction (or moderation) effect has to be included. This means that the variable FP*FB will be tested on the DV website visits, as well as on the mediating variable search engine visits (the moderated mediation effect will be discussed in chapter 3.5).

(20)

The chosen model (binomial logit model) allows moderation effects by incorporating the multiplication term in the model.

3.4 Mediation effect

To estimate the mediation effect, the widely used model of Baron & Kenny (1986) is used. According to their article, mediation effect can be researched in three steps. First, regressing the mediator on the independent variable. Second, testing the relation of the independent variable on the dependent variable; and third, performing the regression of the dependent variable on both the independent variable and on the mediator. This results in figure 2 (derived from Baron & Kenny, 1986):

Figure 2: Mediation effect of Search engine visits

The steps described above are a widely used approach to determine a mediation effect. Even though some authors argue that other statistical tests can be more accurate (MacKinnon et al., 2004), this stepwise approach is easy to use and has a high reliability (MacKinnon, Fairchild & Fritz, 2007). As explained before, it is expected and hypothesized that there is a partial mediation effect and no full mediation.

(21)

3.5 Choice of technique

The dependent variable website visits is a binomial variable; either the purchase journey leads to a website visit (1) or not (0) with probabilities P(𝑌! = 1),P(𝑌! = 0), which means that a binomial model is required. The most important requirement of this model (Leeflang et al. 2015) is that the sum of the outcomes of the DV equals 1, as shown below in formula 3.1.

𝑃 𝑌! = 1 + 𝑃 𝑌! = 0 = 1 (3.1)

As Franses and Paap (2001, p. 52) argue, in this type of model the focus is on modeling the probabilities (𝜋!) of an outcome instead of the observed values, which is done by latent variables. To make sure that the probabilities of the model will be between 0 and 1 (to meet the requirement of formula 3.1) the model has to be log-transformed. The outcome of this model can be assessed by computing the odds ratios of the independent variables. This is in line with the assumption that the dependent variable should be binairy and the independent variables should be either continuous or or binairy. When the probability model is log-transformed, the model can lead to either a logit model or a probit model. According to Leeflang et al. (2015) the interpretation of parameter estimates is easier for the logit model compared to the probit model and the logit model is often preferred because of it’s mathematical convenience. Kliestik, Kocisova & Misankova (2015) also argue that probit models are harder to interpret than logit models. Therefore the model used in this research will be a binomial logit model. However, there is also an disadvantage of using a logit model: the restrictive assumption that choices are independent across alternatives. This downside is no restriction for probit models and can therefore also be a preferred method (Dow and Endersby, 2004).

The equation of the logit function is as followed: 𝜂 = 𝜋! 𝑙𝑜𝑔𝑖𝑡 = 𝑙𝑜𝑔 !!

!!!! (3.2)

For a logistic regression the independent variables will have to be exponentiated since the dependent variable has to maintain a non-logistic value. The logit model can be equated as followed (Nordberg, 1981):

𝜋

!

=

!"# (!!!!)

!!!"# (!!!!) (3.3)

where 𝑋!! is the matrix of observations of the independent variables for purchase journey i, whereas β represents the vector of the corresponding parameters. This formula can be rewritten in the following way:

𝜋

!

=

!!!"# (! !!

(22)

This gives the following model, where H1, H2, H3, H3B, H4, H4B, H6 and H7 are incorporated:

𝜋(𝑊𝑉)! = 1

1 + exp − 𝛽!+ 𝛽!𝐹𝐴! 𝛽+ 𝛽!𝐹𝐵! + 𝛽!𝐹𝐸! + 𝛽!𝐹𝑃! + 𝛽! 𝐹𝐵 ∗ 𝐹𝑃 ! + !𝐴𝐺𝐸! + 𝛽!𝐺! + 𝛽!𝑆𝑉! + 𝛽!𝐽𝐿!

For the mediation effect, as shown in figure 2, also the following model has to be tested, where the search engine visits is the DV. This relates to H5, H8 and H9:

𝜋(𝑆𝑉)! = 1

1 + exp − 𝛽!+ 𝛽!𝐹𝐴! + 𝛽!𝛽𝐹𝐵! + 𝛽!𝐹𝐸!+ 𝛽!𝐹𝑃! + 𝛽! 𝐹𝐵 ∗ 𝐹𝑃 !+ !𝐴𝐺𝐸! + 𝛽!𝐺!+ 𝛽!𝐽𝐿!

Where:

𝜋(𝑊𝑉)! = probability that customer journey i results in the focus website visit; 𝜋(𝑆𝑉)! = probability that customer journey i results in a branded search query;

𝛽! = intercept;

𝛽! = parameter of corresponding variable;

𝐹𝐴! = Computed variable of touchpoint affiliate for customer journey i; 𝐹𝐵! = Computed variable of touchpoint bannering for customer journey i; 𝐹𝐸! = Computed variable of touchpoint email for customer journey i; 𝐹𝑃! = Computed variable of touchpoint preroll for customer journey i;

𝑆𝑉! = Computed variable of touchpoint search engine visit for customer journey i; 𝐹𝐵!∗ 𝐹𝑃! = Interaction term of touchpoints bannering and preroll for customer journey i; 𝐴𝐺𝐸! = Age of user for customer journey i;

𝐺! = Gender of user for customer journey i;

𝐽𝐿! = Journey length of user for customer journey i.

3.6 Plan of Analysis

Before the models described above can be tested, the data needs to be checked and cleaned. Therefore the first step before analyzing is data cleaning. In this step the check for outliers, oddities and missing values will be done. Outliers and oddities can generate inaccurate outcomes which do not reflect the truth. If the outliers have a high impact on the predictability of the dataset, those cases will be treated. Furthermore, missing variables e.g. N/A’s will be assessed, in other words their impact will be checked and if necessary their corresponding rows will either be deleted or new values will be imputed by imputation methods.

(23)

computed touchpoint variables, it will be possible to test the effect of individual touchpoints on another. The variables are computed by a loop that creates new variables on the conditions described in chapter 3.2.2.

This loop will count the number of IVs before the DV is touched in a certain customer journey. When both conditions are met, the loop will write a new row of data in the new dataset. After a DV is touched, the variables will be reset and the counting will restart. This means that for one purchase journey, several rows can be written in the new dataset, if more DV’s occur in one journey. This is done because otherwise only the first string of touchpoints would be counted until one DV is touched. A new aggregated dataset is generated where the variables have the necessary conditions to be able to measure the relation between several touchpoints. For every customer journey that has no DV touchpoint, but has FITs, a row with DV = 0 will be written in the dataset. This will create a more realistic representation of the original data.

After this the data set will be described with graphs and figures to give valuable insights in the cleaned data. After this, the preliminary checks for the logistic regression will be done. Unlike Ordinary Least Squares (OLS), logistic regression has less strict assumptions that require to be met. However, some restrictions still need to be met, which will be treated. After this the model will follow a stepwise estimation, which can clearly indicate influence of separate explanatory variables on the dependent or outcome variable. The models will be estimated by a Maximum Likelihood Estimation (Nordberg, 1981), which is the most common method for estimating logit models. This estimation results in the parameter values for which the model most likely matches the data, or in other words: at which value do the computed individual probabilities match the observed choices? This test is done by using the GLM package in R-Studio.

(24)

4. Results

The original dataset with the selection of variables that is required for the model tests is checked on several issues. After the checks and alterations are made in this set, it is used to generate two new datasets. Both datasets are generated by a created loop, which main difference is the dependent variable (either web visits or search engine visits). The difference in used datasets will be discussed in chapter 4.2.

4.1 Preliminary checks

The first check in the data is possible missings of values. The independent variables have no missings, however the demographic variables do. Age and Gender both have 304292 N/A’s. The missings in the demographic variables can be assigned to the missing panel data of the corresponding users. Therefore it gives a better image to show the amount of users that have not filled out all demographic details. This amounts to 1603 unique users. Since Age is a numerical variable and Gender is a categorical variable, the missings will be treated differently. For the variable Age, the N/A’s are replaced with the mean (=51). The reason for doing this is that deleting the rows containing these missings would also result in deleting valuable touchpoint data. Replacing the values with the mean will of course bias the Age variable slightly. However, since Age is a control variable the value does not have to be interpreted. This reasoning also applies to Gender. The N/A’s of this control variable are replaced with the mode (=1) so all replacements will be either 0/1. After this, the check for oddities and outliers was done. No oddities are shown in the original data, however there are some outliers. Outliers were checked for the length of each purchase journey, as well as the volume of the individual touchpoints in a journey. This led to deleting purchase journey 11878, which has 64503 touchpoints and is therefore not representative for the average purchase journey (825 touchpoints). This outlier can be clearly identified in figure 3.

(25)

Furthermore, also the individual touchpoints were checked for outliers. All computed variables show several values outside the boxplot range. This is caused by the fact that every computed touchpoint variable has for the majority part identical values (value = 0), as is visualized in figure 4.

Figure 4: Relative distribution of zero/non-zero values in touchpoint variables

These outcomes are expected and explainable since only 1.83% of the touchpoints are FITs and only 8.13% of the touchpoints are website visits. Therefore it is expected that a lot of journeys result in zeros meaning that the touchpoint did not occur in that journey. This results in boxplots showing all values > 0 as outliers. Having this in mind, only the extreme values are treated as real outliers. For the FITs there were no extreme outliers as all purchase journeys ranged from 0-135 touchpoints of one type of FIT. The boxplots (1-4) can be seen in Appendix I. For the CIT website visits ten cases (purchase journeys) are deleted which showed extreme outlier values. In figure 5 and 6 is shown how deleting the 10 most extreme cases altered the data.

Figure 5: Website visits without alteration Figure 6: Website visits with alteration

80% 82% 84% 86% 88% 90% 92% 94% 96% 98% 100%

Affiliate Banner Preroll Email Search Web

>0 0

(26)

The check for multicollinearity is not performed on the original dataset, but on the two generated datasets derived from the loop. Reason for this is that new variables are created and therefore it would make no sense to perform multicollinearity checks on the original data. The datasets will be thoroughly explained in the next chapter. VIF scores were calculated for the final model of dataset 1 (DV= website visits) as well as for the final model of dataset 2 (DV= search engine visits). A VIF score indicates if the level of collinearity between predictor variables is too high, which can result in biased parameters. Scores above 5 should be investigated and scores above 10 are regarded as ‘high multicollinearity’ and should be treated (Belsley et al., 1980). The VIF scores of the final models are shown in table 3.

Table 3: VIF scores of final models of both datasets

Variable VIF model 1 (DV = web visits) VIF model 2 (DV = search visits)

FB 1.486 8.734 FP 1.224 3.431 FA 1.005 FE 1.000 SV 1.002 FB*FP 1.749 9.911 JL 1.002 1.017 Age 1.017 G 1.013

The VIF scores of model 1 are all accepted, which means that the variables are not regarded to have biased parameters due to multicollinearity. The VIF scores of model 2 are accepted, but the synergy variable and corresponding variables from which the synergy variable is derived have a high VIF score. Due to the fact that FB*FP is derived from FB and FP, it is not necessary to take action upon the high scores since this effect is expected (Brambor et al., 2005).

4.2 Generated datasets and descriptives

Two different datasets are generated by two loops, both originating from the same data. After removing the outliers, the dataset contains 2.355.782 events. The details of the datasets and variables are shown in table 4.

(27)

This is not representative if compared to the original data, since only 8.13% of the touchpoints are website visits. However, this outcome is justifiable. The loop constructs a row for each DV = 1, or writes a row when either one of the FITs > 0. As explained in chapter 3.6, the loop can write more rows for a single purchase journey, if several DV’s are encountered. This is the case for DV website visits. Despite this fact, purchase journeys that contain several website visits now have large influence on the data since a purchase journey with 1000 website visits would result in 1000 rows of data where DV = 1. To counter this effect, Dataset1 is aggregated on PurchaseID and Website visits. All other variables in the dataset are enumerated. This results in Dataset1.2, containing 7.534 rows of data. For the second dataset where the DV is search engine visits, this was not necessary (referred to as Dataset2), since the values of this dataset corresponded with expected amount of rows and distribution of 1/0 of the DV. Reason for this difference is that the DV search engine visits is more equally distributed among the customer journeys; where website visits can occur 5000 times in a single customer journey, search engine visits only occurs maximum of 10 times in a single journey (see boxplot 5 in Appendix I).

Also, if looked at the correlation matrix of the original dataset, there seems to be a very weak negative relation between the FITs and website visits (for all correlation matrices, see Appendix I). However, in Dataset1 this correlation has become much stronger due to the over representation of the DV. In Dataset1.2 the correlation is back to the values of the original dataset, suggesting the data is more aligned with the original data (see correlation matrix 1, Appendix I). The values and differences in data are shown in table 4.

(28)

Table 4: Descriptives of datasets

Model

Original dataset Dataset1 Dataset1.2 Dataset2

Rows 2.355.782 180.450 7.534 29.278 Unique UserID’s 9.674 2.715 2.718 8.708 Unique PurchaseID’s 29.001 5.117 5.117 29.000 FBi (mean) 0.0007675 0.09767 0.2981 0.06175 FPi (mean) 0.000818 0.1085 0.3253 0.06582 FEi (mean) 0.0012 0.05218 0.2109 0.09656 FAi (mean) 0.0006681 0.07443 0.2257 0.05376 SVi (mean) 0.0002165 0.004012 0.01699 0.01742 WVi (mean) 0.06552 0.8554 0.6764 NA Agei (mean) 50.96 51.19 50.69 51.78 Gi (mean) 0.6563 0.6772 0.6667 0.6521 JLi (mean) 773.1 810.8 171.2 86.89 FB*FP (mean) NA 0.2735 0.7668 0.15 Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

4.3 Model selection

Since there are no strict assumptions or requirements for logistic regression, models can be estimated and assessed on several validation criteria. For both datasets the best model will be estimated, resulting in two models. First the models for Dataset1.2 will be discussed. All models are created based on Stepwise Regression because of its flexibility, possibility to add interaction terms and easy application (Ciampi et al., 1986). The selection is based on the following criteria; AIC, BIC, Hitrate, Null model and TDL. It is preferred to look at multiple measurements of fit because the measurements show different aspects of the model and because there is no single statistical significance test that identifies a correct model (Schermelleh-Engel et al., 2003).

Dataset1.2

(29)

this did not make the model better and therefore model 1.5 was used as final model (outputs of all estimated models can be found in Appendix I). The outcomes of the validation criteria are shown in table 5.

Table 5: Validation criteria for Dataset 1.2

Model (Dataset1.2) AIC BIC Hitrate Null model TDL

Model 1.1 FB+FP+FA+FE+SV+ (FB*FP) 9488.972 9537.462 0.6759 0.08433 1.012 Model 1.2 +Age 9490.879 9546.296 0.6759 0.1290 1.023 Model 1.3 +G 9492.814 9555.158 0.6759 0.1857 1.019 Model 1.4 +JL 9171.065 9240.337 0.6760021 < 2.2e-16 *** 1.331 Model 1.5 -Age -G 9167.571 9222.2988 0.6760021 < 2.2e-16 *** 1.327 Model 1.6 - (FB*FP) 9167.7 9216.226 0.6760021 < 2.2e-16 *** 1.327 Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

The AIC has the lowest score for model 1.5, which means this model has the best fit according to this criterium. BIC is similar to AIC but also penalizes for extra degrees of freedom (extra variables).

Since several explanatory variables seem to have little impact on the prediction power of the model, excluding the interaction term in model 1.6 results in the highest BIC score (9216.226), however the difference is very small. The hitrate is calculated as the percentage of the total cases it can predict correctly (Pituch et al., 2016) and therefore the higher the hitrate, the better the prediction capabilities of the model. The hitrates of models 1.1-1.3 are the same as well the hitrates of models 1.4-1.6, which is remarkable. Looking closer at those numbers, the following conclusion can be drawn; all models almost perfectly predict if a DV is 1 (5089 of 5096 cases) where it almost never predicts the zero correctly (2 or 3 of 2438 cases). This means that the models are heavily over predicting the model in favor of the DV = 1 event. Possible explanations for this will be discussed in the discussions section.

(30)

Lastly the TDL (Top Decile Lift) was calculated, which is the highest for model 1.4. Based on all criteria, model 1.5 will be estimated since it scored the best on the most methods, still incorporates the synergy effect and has the most significant variables.

Dataset2: Measurement choice

In contradiction to the estimation of Dataset1.2, during the estimation of the model of Dataset2 by the GLM method, warnings for complete separation of the models were shown. Reason that the warning shows here but not in the other dataset might be because of the difference in DV and effect of explanatory variables, since the warning assumes one variable predicts the whole model. Therefore it is important to first assess this warning before continuing with the GLM method. Several steps can be followed to assess the impact of this warning and to analyze if another method should be used to estimate the models of Dataset2. First, the ‘brglm2’ package in R-studio can detect separation in a GLM model. This method was used to check all models, as it shows if separation is detected (FALSE/TRUE) and gives either 0/INF/-INF per explanatory variable, showing if estimates have infinite values which can result in complete separation. The results of the ‘brglm2’ detect separation method shows that none of the models suggest separation and all values have finite values (for full results see table 6 in Appendix I). Second, according to Konis (2007) there are several methods to estimate models with data separation. Firth’s penalized method (Firth, 1993) is the basis for different methods in R-studio to estimate a model. The penalized method penalizes variables that can cause separation by converting all values to finite values. Three different penalize methods are tested on final model 2.5 to check differences, and as can be seen in table 7 the estimates have similar values as when estimated without penalization, except for FB (However, this variable is insignificant so the estimates are not interpretable).

Table 7: Comparison of alternative estimation methods for complete separation

Variable Normal GLM Firth’s Logistf BayesGLM BRGLM

(31)

Third, as Heinze & Schemper (2002) also illustrate, if a model shows complete separation it is likely that the standard error is ‘blown up’; it shows extreme values.

However, this is not the case when model 2.5 is estimated (see table 12 in chapter 4.5.1 for all estimates). Based on the three arguments above it is decided to continue with a normal GLM test.

Dataset2: Model selection

The models of Dataset2 have the same measurement method as Dataset1.2. Based on the criteria in table 8 model 2.5 is the best model. Therefore this model will be estimated and hypotheses will be tested with the variables in this model.

Table 8: Validation criteria for Dataset2

Model (Dataset2) AIC BIC Hitrate Null model TDL

Model 2.1 FB+FP+FA+FE+ (FB*FP)+ Age+G+JL 4559.764 4634.325 0.7853 < 2.2e-16 *** 5.49 Model 2.2 -JL 5110.959 5177.236 0.62244 4.901e-08 *** 1.451 Model 2.3 -G 5113.855 5171.847 0.6207 1.537e-07 *** 1.137 Model 2.4 -Age 5147.138 5196.845 0.9682 0.2125 1.137 Model 2.5 -FA -FE 4561.828 4619.82 0.7843 < 2.2e-16 *** 5.529 Model 2.6 -(FB*FP) 4565 4614.714 0.7846 < 2.2e-16 *** 1.137 Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

4.4 Robustness check

(32)

validity (Steyerberg et al., 2001). For Dataset1.2 a bootstrap was performed with 8000 resamples, as a wide-used threshold is to have more resamples than rows in the original dataset; otherwise R tends to give a measurement error; this is solved by using more resamples than rows in the bootstrapping method (see text box 1, Appendix I). In table 9 the original values are shown with their significance levels, followed by the bias that is retrieved from the bootstrap. In the last column the new estimate is shown.

Table 9: Values of model 1.5 after bootstrap method on Dataset1.2

Variable Original β Sig level Bias New β

Intercept 0.481 < 2e-16*** -9.425e-04 0.47982

FB -0.015289 0.2783 -8.8296e-04 -0.01587 FP -0.024443 0.1381 -1.1240e-03 -0.02524 FA -0.006023 0.6069 -2.2175e-04 -0.00575 FE -0.005019 0.6330 -8.2164e-04 -0.00501 SV -0.266341 0.0797 . -3.4026e-02 -0.28064 (FB*FP) 0.005091 0.1680 6.5488e-04 0.00537 JL 0.002152 < 2e-16*** 2.5514e-05 0.00217 Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

The results of the bootstrapping procedure show that the new proposed Bèta-coefficent differs little from the estimates derived from the original model data, which is preferred. This means that the data is robust and estimate outcomes change only little after 8.000 resamples.

Also for Dataset2 the bootstrapping method is performed. Since this dataset has 29.278 rows, the bootstrap is performed with 30.000 resamples. The output is organized in table 10 and can be interpreted the same way as table 9.

Table 10: Values of model 2.5 after bootstrap method on Dataset2

Variable Original β Sig level Bias New β

Intercept -3.154 < 2e-16*** 4.8749e+14 -3.1620727

FB -0.04825 0.59743 2.6927e+13 -0.0774814

FP -1.234 0.01333* 8.6019e+13 -1.4553357

Age -0.02132 8.37e-11*** 7.5658e+12 -0.0215537

G -0.2689 0.00547** 2.7042e+14 -0.2684525

(FB*FP) 0.03188 0.03252* 4.7774e+12 0.0414708

JL 0.001916 < 2e-16*** 2.8058e+11 0.0019532

(33)

As can be seen in table 10, the newly estimated coefficients by the bootstrap method are again similar to the original estimates. Therefore the internal validation of both models is good and the estimates are reliable.

4.5 Further model improvements

To further improve the model, the selected models will be assessed on both the interval data level as well as on binary data level. After both models are discussed, the model for hypotheses testing will be selected.

4.5.1 Assessing the interval models

(34)

Table 11: Estimation results of model 1.5 from Dataset 1.2

Variable β Std.Error Sig.value OR ME

Intercept 0.481 0.030305 < 2e-16*** FB -0.015289 0.014102 0.2783 0.9848271 -0.0032277 FP -0.024443 0.016410 0.1381 0.9759597 -0.0051372 FA -0.006023 0.011706 0.6069 0.9939947 -0.0012716 FE -0.005019 0.010511 0.6330 0.9949936 -0.0010596 SV -0.266341 0.151997 0.0797 . 0.7661780 -0.056227 . (FB*FP) 0.005091 0.003693 0.1680 1.0051038 0.0010747 JL 0.002152 0.000154 < 2e-16*** 1.0021548 0.00045442*** Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

The explanatory variables seem to have very little impact on the prediction power of the model, in model 1.5 only SV is significant. However, in model 1.1 IV FP is also significant. Due to the adding of control variable JL this variable has become insignificant. Therefore this will be checked at further model improvements later in this chapter.

The estimation results of model 2.5 are shown in table 12. FP has a negative significant effect on SV (β = -1.234, p = 0.01333) This effect means that for every preroll touchpoint a customer touches, the probability this customer will make a search engine visit is decreased by 0.0051372%. Furthermore, FB has no significant effect on SV (p= 0.59743) and all control variables (Age, Gender and JL) are significant.

Table 12: Estimation results of model 2.5 from Dataset 2

Variable β Std.Error Sig.value OR ME

(35)

The synergy effect of FP*FB is significant and positive (β = 0.03188, p = 0.03252) which would suggest a positive synergy effect. Even though this effect was hypothesized, it is surprising when looked at the estimates of the individual variables that form the synergy variable, which are both negative and of which only one (FP) is significant.

To further look into this matter a graph is built to show the effect of the different values of FP and FB on the DV search engine visits. The Y-axis shows the value of the DV and the X-axis shows the different steps of values for FB*FP. For example, line 1(FB=1) has throughout the line the same FB value, but for each step the value of FP is +1. Therefore, for line 1, X-axis 1 gives FP = 1 and FB = 1, X-axis = 2 gives FP = 2 and FB =1 and so on till X-axis = 5 where FP = 5. Interpreting this graph, it becomes clear that FP causes the main effect of a negative DV (this is calculated while controlling a constant value for the other explanatory/control variables). As can be seen in graph 7, when FB = 5, search engine visits still has a positive probability, but when FP changes from 1 to 2 it almost instantly becomes negative. Therefore it can be concluded that in this synergy effect, the effect can be positive, but only when FP and FB are kept low, where the effect of FP is the most significant. This shows that it is possible to have a positive synergy effect while the separate effects are negative.

Figure 7: Interaction results with different levels of FP and FB for Dataset2

4.5.2 Assessing the binary models

After the data of the interval model is shown, now the results for the binary model will be discussed. As is shown in table 14, the model has more significant values when the data is recoded to binary level. However, this will result in loss of data and for managerial implications this would generate less interesting insights since the variables then only show if the touchpoint happened/not happened without giving information about the frequency.

(36)

Table 14: Re-estimation results of model 1.5 from Dataset 1.2 (binary values)

Variable β Std.Error Sig.value OR ME

Intercept 0.6069854 0.0327090 < 2e-16*** FB -0.5789244 0.1271942 5.33e-06*** 0.5605009 -1.3177e-01*** FP -0.6507636 0.0896901 4.00e-13*** 0.5216473 -1.4809e-01*** FA -0.3785199 0.0904138 2.83e-05*** 0.6848743 -8.4162e-02*** FE -0.2572119 0.1192488 0.03101* 0.7732044 -5.6450e-02* SV 0.5109285 0.1660510 0.00209** 1.6668381 9.6571e-02 *** (FB*FP) 0.6248770 0.1903025 0.00102** 1.8680162 1.1512e-01*** JL 0.0020907 0.0001546 < 2e-16*** 1.0020929 4.3974e-04*** Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

As the estimates show, recoding the explanatory variables to binary values give more significant results to interpret. However binary odds ratios and marginal effects should be interpreted differently than the explained numerical values. The OR of FB (0.5605009) means that for the change of FB =0 to FB=1 the odds of W=1 is decreased by 1.78411 times (1/0.5605009). In terms of the ME, the binary variables should be interpreted the following way: the marginal effect of SV on WV is 0.096571. This means that the probability of observing WV=1 increases by 0.096571 when SV changes from 0 to 1.

All variables are now significant. This does not mean that the model is better for hypotheses interpretation, since the direction of the significant variables can be wrong.

The same re-estimation is done for model 2.5 of Dataset2, results are shown in table 15 in the Appendix I. The results for this model have the same direction and significance results as when the model is assessed on interval level.

4.5.3 Assessing the control variable

(37)

touching a touchpoint is 8.04 times lower when JL = middle compared to JL = long, whereas the odds are 528.54 times lower when JL = short compared to JL = long.

Concluding, the choice is made to continue the hypotheses testing with the binary model where journey length is categorized for Dataset 1.2 and Dataset 2. Reason for this is that more significant variables now can be interpreted, which makes it possible to draw conclusions and discuss the results in contrast to when the results are insignificant as occurred for the interval model of Dataset 1.2. Furthermore, because a mediation effect is hypothesized through the combination of both datasets and models, it is required that the data is on the same level otherwise it will become hard to interpret whether the mediation effect exists and when it does, what the effect is.

4.6 Hypotheses testing

The final model estimates of both datasets are shown in table 17. H1-4 suggest a positive relation between the FITs and CIT website visits, which is not the case. The effects are significant, but negative (H1; β = -0.6441, H2; β = -0.5385, H3; β = -0.3644, H4;

(38)

Table 17: Final model with improved model results of both Datasets

Variable β Std.Error Sig.value OR ME

Dataset1.2 DV = website visits

Intercept 1.71333 0.06517 <2e-16*** FB -0.53851 0.12667 2.12e-05*** 0.5836161 -0.12356*** FP -0.64418 0.09040 1.03e-12*** 0.5250941 -0.14825*** FA -0.36443 0.09026 5.40e-05*** 0.6945954 -0.08201*** FE -0.24484 0.11974 0.04089* 0.7828311 -0.05442* SV 0.55414 0.16494 0.00078*** 1.7404495 0.10589*** (FB*FP) 0.51678 0.19084 0.00677** 1.6766258 0.09967*** JLshort -1.41566 0.07859 <2e-16*** 0.2427664 -0.3258*** JLmiddle -0.88829 0.07156 <2e-16*** 0.4113601 -0.1879***

Dataset2 DV = search engine visits

Intercept -1.795801 0.183075 < 2e-16*** FB -0.215769 0.420396 0.6078 0.80592170 -0.0006522 FP -2.541602 1.003375 0.0113* 0.07874017 -0.003252** Age -0.017409 0.003267 9.91e-08*** 0.98274157 -0.000058** G -0.201607 0.094784 0.0334* 0.81741619 -0.000697 . (FB*FP) 2.068485 1.234573 0.0938 . 7.91282904 0.022393 JLshort -6.270067 1.001100 3.77e-10*** 0.00189210 -0.02171*** JLmiddle -2.084562 0.113581 <2e-16*** 0.12436161 -0.00745*** Signif. codes: 0 ‘***’ 0,001 ‘**’ 0,01 ‘*’ 0,05 ‘.’ 0,1 ‘ ’ 1

(39)

An overview of the hypotheses and their support status according to this research is presented in table 18.

Table 18: Overview of hypotheses

Hypothesis Description Direction Support

1 Preroll has a positive effect on website visits. - (sig.) Not Supported 2 Bannering has a positive effect on website

visits.

- (sig.) Not Supported 3 Affiliate has a positive effect on website

visits.

- (sig.) Not Supported 3.B Affiliate has a stronger positive effect on

website visits than preroll and bannering.

- Not Supported

4 Email has a positive effect on website visits. - (sig.) Not Supported 4.B Email has a stronger positive effect on

website visits than preroll and bannering.

- Not Supported

5 Preroll and bannering relate synergistically to positively affect search engine visits.

+ (sig.) Supported 6 Preroll and bannering relate synergistically

to positively affect website visits.

+ (sig.) Supported 7 Search engine visits have a positive effect on

website visits.

+ (sig.) Supported 8 Search engine visits partially mediates the

positive effect of preroll on website visits.

- (sig.) Not Supported 9 Search engine visits partially mediates the

positive effect of bannering on website visits.

- Not Supported

Referenties

GERELATEERDE DOCUMENTEN

This study showed that visitors are not having any trouble with a substantial amount of verbal content regarding the usability of the website and they profit from it by a

On the other hand, interviewing (qualitative) and questionnaires as part of the quantitative approach was applied to collect data.. For the purpose of this research,

‘Dutch live music venues and festivals facts &amp; figures 2018’ is a publication of the VNPF, Amsterdam © 2019. All

International Documentary Festival Amsterdam 30,405 International Film Festival Rotterdam 25,816 Movies that Matter Film Festival 21,091 Nederlands Film Festival 16,591

Wen 2014: Assessment of roughness length schemes implemented within the Noah land surface model for

The PLS-SEM toolbox includes a broad range of evaluation criteria to assess the adequacy of the measurement and structural models as described in the extant literature (Chin 1998,

›  H4: Average product price positively influences the effect of the amount of opens on customer churn.. ›  H5: Average product price positively influences the effect of the amount

The main objective of this research was to understand the relationship between CSR and sales and the moderating effects of brand equity and GDP on the relation between CSR and