• No results found

Diving deeper into the Customer Journey: The Dynamic and Timing of Retargeting

N/A
N/A
Protected

Academic year: 2021

Share "Diving deeper into the Customer Journey: The Dynamic and Timing of Retargeting"

Copied!
42
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Diving deeper into the Customer

Journey: The Dynamic and Timing of

Retargeting

By

A.C. Westendorp

S2745739

Faculty of economics and business

University of Groningen

Date of submission: 12 January 2020

Supervisor:

P.S. Van Eck

Contact information: A.C.Westendorp@student.rug.nl

(2)

2

Abstract

Lately, consumers are exposed to more and more advertisements. Therefore, insights in the online marketing activities and their effectiveness became more important and relevant. Previously the effectiveness of the touchpoints in the customer journey are measured by the last-click model. Recently, more sophisticated and precise models are made, which include the dynamics between the touchpoints. Nevertheless, these models are unable to look into the specific dynamics, so it is hard to determine their true nature and make strategic decisions. This study focuses on one specific dynamic regarding the touchpoint of retargeting. Retargeting should lead to brand awareness and advertisement recall, which could serve as input for subsequent search queries in the customer journey. Both retargeting and focal search queries should lead to more website traffic, whereas the effect of retargeting is partly mediated by the focal searches. Also, the timing of retargeting seems important, being more effective in the later stages where the preferences are more narrowly construed. Results of this study could not conclude that such a dynamic is present. It could be established that timing is important for the direct relation of retargeting to website traffic. Indeed, using retargeting in the later stages does increase its effectiveness. In contrast, this moderating effect is not found for the spillover to focal search queries. At last, future research directions are given to look deeper into the true nature of the proposed dynamic to uncover its effect.

(3)

3

Introduction

With the growing amount of online marketing messages, firms struggle to gain consumers’ attention (Cho & Cheon, 2004). Some even speak of an information overload for the consumers (Cheung, Kwok, Law & Tsui, 2003). To overcome this information overload of marketing messages, the messages need to be (more) personally relevant. This can be done by incorporate browsing behaviour or personal information of the consumer into these marketing messages (Bleier & Eisenbeiss, 2015). One of the means to incorporate such behaviour is to use retargeting advertisements (further referred to as retargeting). Retargeting is a special form of banner advertising, where previously seen products (which did not lead to a conversion) are incorporated into the message. It helps the consumer to identify the products he/she is interested in, or at least was interested in by showing them to make them salient (again).

Making messages more personally relevant increases their effectiveness. Previous marketing literature stated that retargeting is six times more effective than general display/banner advertising, resulting in much enthusiasm for the use of retargeting by managers (Hargrave, 2011). However, several authors (i.e. Lambrecht & Tucker, 2013 and Bleier & Eisenbeiss, 2015) found some boundary conditions and moderations for the effectiveness of retargeting, meaning that retargeting is not always the best choice of advertisement. In fact, the effectiveness of retargeting could even backfire in several situations leading to more need for caution when implementing retargeting as an online marketing activity. Meaning a manager cannot simply always use retargeting. It can be perceived as annoying or irrelevant when the wrong products are shown or being intrusive since it uses personal or browsing information (Bleier & Eisenbeiss, 2015). All making the optimal timing, dynamics and boundary conditions of retargeting more relevant.

The topic of retargeting is not that often the main focus for researchers. Recently, many studies focus on the overall online customer journey and new types of attribution models (i.e. Anderl, Becker, Wangenheim & Schumann, 2016; Xu, Duan & Whinston, 2014). Others focus on other marketing activities such as display advertisements or search queries (i.e. Kireyev, Pauwels and Gupte, 2016; Blake, Nosko & Tadelis, 2015).

Following up on the former, Anderl et al. (2016) focused on the overall journey, finding that retargeting should be higher attributed when dynamics are considered in the customer journey. Including dynamics, doubles the effectiveness of retargeting when compared to direct effects by means of last-click models. However, they were unable to focus on specific dynamics. Basing your full attribution based on general models and effects can be harmful since it all depends on context. Knowing where these interactions reside from, makes it easier to understand and adjust your advertising strategy accordingly. So, the question why retargeting is under attributed by last-click models and through which touchpoint gets the ‘profits’ of this under attribution still remains unanswered.

Authors focusing on the specific touchpoints found interesting indirect effects of display advertisements leading to conversion. Indicating increased importance for these touchpoints compared to initially thought. For example, Kireyev et al. (2016) concluded that display advertisements positively affected search queries, which lead to increased conversion.

(4)

4 decreasing its direct effect. This dynamic could explain the under attribution of retargeting by Anderl et al. (2016).

Next to the dynamics regarding retargeting in the customer journey, the timing of retargeting also seems important. Lambrecht and Tucker (2013) found that the further the consumer was in the purchase funnel, the more effective retargeting was in leading to sales. However, unclear is whether this effect will still stand with the proposed dynamics present and if the place or stage in the funnel also moderates the relation between retargeting and other (positive) outcomes. Also, questions arise to what extend the found relation by Lambrecht and Tucker (2013) does hold when dynamics are accounted for.

Knowing how retargeting effects the customer journey, how it leads to sales and the optimal timing of retargeting is especially relevant for managers. Hargrave (2011) state that retargeting could have greater effectiveness compared to similar, but more general, types of online advertisements such as display advertisement. Gaining insight in retargeting and its dynamics makes it easier for managers to make strategic decisions regarding this touchpoint. I.e. decisions regarding the attribution or choosing the most effective touchpoint to optimize in order to gain more website traffic or sales.

Central in this research is the question: what is the dynamic effect of retargeting on generating website traffic and which moderators play an important role? All in all, this study has a three folded goal: identifying the dynamic regarding retargeting, testing the effect and size of this dynamic and identify when retargeting is more effective by including meaningful moderators. In this study, the words dynamic, spillover and indirect effect have the same meaning. In the next chapter the previous literature of retargeting is looked at including the precise indirect effect of retargeting. Followed by the methodology, data analysis and results to confirm or reject the hypothesis. Lastly, the discussion will be covered where the research question will be answered, leading into the conclusion where everything will be summarized.

Literature review

Recently, more insights about the effectiveness and dynamics or synergies in the customer journey are uncovered (Anderl et al., 2016; Klapdor, Anderl & Schumann, 2015). Due to new attribution models, new insights are derived. Previously, the last-click attribution model was the industry standard for measuring the effectiveness of different touchpoints. However, these days, the biases of the last-click attribution models come to light, resulting in a different attribution for the touchpoints than initially thought. These attributions are especially important to allocate the budgets and identifying which touchpoints lead to sales in the customer journey. Most studies regarding the customer journey have as common goal to describe the journey and understand the customer’s choice for and effectiveness of touchpoints in the multiple purchase phases. These insights are derived by looking at how customers interact with multiple touchpoints and in what way they move through the purchase funnel (Verhoef, Kooge & Walk, 2016). Some leading papers in this domain are for example Anderl et al. (2016), Li & Kannan (2014) and De Haan, Wiesels and Pauwels (2016), who all use different type of attribution models within the online customer journey in order to explain certain consumer behaviour or map the online customer journey.

(5)

5 effectiveness to these channels when it is merely used as a navigational tool. Which, in turn, leads to under attribution of emails, display advertisements and referrals in their study. Xi, Duan and Whinston (2014) also address the underestimation of display advertisement and retargeting and overestimation of search advertisements by using a new way of calculating the attribution compared to the commonly used last-click model. The direct effect of display advertisement is low. Nevertheless, it stimulates the subsequent visits through other touchpoints, leading indirectly to a conversion. In addition, Anderl, Becker and Wangenheim (2016) found that together with other touchpoints, retargeting was also under attributed when comparing several Markov models with the last-click attribution, reproducing these results with two different data sets.

Abhishek, Fader and Hosanager (2015) also address the importance of dynamics in the purchase funnel. By dividing the purchase funnel into different stages, they find that display advertisements are more effective than thought at first. However, the effectiveness is indirect, causing the consumer to move forward in the purchase funnel, instead of immediately leading to a purchase. Much more literature about the customer journey and the new attribution models conclude that there are dynamics in the customer journey, meaning that some touchpoints seem unimportant but are in fact of great importance due to their indirect effects (i.e. Anderl et al., 2016; De Haan et al., 2016; Blake, et al., 2015). These insights are the start of addressing specific touchpoints within the funnel to obtain more detail about the customer journey and the dynamics and interactions therein.

Whereas Anderl et al. (2016) and Xi et al. (2014) keep their focus on the attribution and general results, Kireyev, Pauwels and Gupta (2016) dive deeper into specific touchpoints and their dynamics. They focus on the effect of display advertisements on search engine queries, finding that display advertisement indeed influences the conversion indirectly: increasing conversion through search queries two weeks later. Derived from Lecinski (2011), Kireyev et al. (2016) conclude with the notion that these dynamics are stronger for products that take longer to consider, such as financial investments, home purchases, cars and technology products. Fast moving consumer goods, or consumer-packaged goods are less likely to have this dynamic. Since they generally have a shorter journey and purchased on the same day, leaving little to no room for such dynamic effects.

The conclusion of Kireyev et al. (2016) gives a good argument against the common scepticism about the effectiveness of banner advertisements. Scepticists argue that advertisements should prove themselves worthy in terms of click-through rate and/or conversions. Whereas many banner advertisements lack a good click-through rate or conversion (Sherman & Deighton, 2001). In response, Ilfed and Winer (2001) address the importance of on- and offline media in generating website traffic, which increases the probability of a conversion. The mechanisms that generate this website traffic (Ilfed & Winer, 2001) or increasing the search queries (Kireyev et al., 2016) is supposedly through increased brand awareness and advertisement recall (Drèze & Hussherr, 2003). Seeing and recalling the specific retargeted advertisement could increase the brand and product awareness, making it more salient in the consideration set. Consciously or unconsciously recalling the advertisement or brand can serve as input for the search query (Li & Kannan, 2014).

(6)

6 like Kireyev et al. (2016) found in their study with display advertisements. Searches are typically overattributed by last-click models (i.e. Li and Kannan, 2014; Xi et al., 2014; Anderl et al., 2016), whereas retargeting is under attributed (Anderl et al., 2016). Possibly search queries are capturing some of the direct effect of retargeting, making retargeting more effective than initially thought. The increase of search queries will in turn lead to increased website traffic, which leads to more focal sales (Ilfed and Winer, 2001). Whereas the direct effect of retargeting will still hold and increase the website traffic, it will be diminished by including the mediation of focal searches.

This all leads to the first set of hypotheses:

Hypothesis 1a: Retargeting has a positive relation with the focal site visits Hypothesis 1b: Retargeting has a positive relation with the focal search queries

Hypothesis 1c: The focal search queries have a positive relation with the focal site visits

Hypothesis 1d: The direct relation of retargeting on focal site visits (h1a) is partially mediated by the focal search queries, diminishing the direct effect

Nevertheless, retargeting is tricky to implement. Bleier and Eisenbeiss (2015) studied retargeting and concluded it could backfire due to the use of personal or browsing information. The use of this information could be felt as a violation of the consumers privacy. They find that trust in the company moderates the effect of retargeting on the click-through rate. Which means that personalization isn’t always more effective and that the earlier mentioned dynamic could have boundary conditions. Finding conditions where retargeting is more and less effective could help in the implementation of retargeting and finding the right consumers to retarget, optimizing the effectiveness.

Lambrecht and Tucker (2013) argued that retargeting is more effective if the consumer is past the browsing stage and has a limited consideration set. This way the products that are showed in the retargeted advertisements are more salient and are more closely reflecting the consumers’ actual choices. In their study they also found that generic display advertisements are more effective in the earlier stages of the purchase funnel. These advertisements are more effective in the earlier stages since it matches the consumers’ preferences better compared to retargeted advertisements. Preferences in the earlier stages are more general, therefore general display advertisements work best. In the later stages’ alternatives are weighted (Abhiskeh et al., 2015), therefore specific product’ advertisements (retargeting) will be more effective in these stages.

(7)

7 matches the consumers’ preferences in this stage, making the advertisement more relevant for the consumer (Lambrecht & Tucker, 2013).

Lambrecht and Tucker (2013) already tested the moderation of the consideration stage on the direct effect of retargeting and sales. In addition, this study will try to replicate their results (with another, but similar DV) and investigate to what extend this effect holds. Does the place in the purchase funnel also influence the dynamic, or only on the direct effect? For setting up the hypothesis it is thought that the effect will also hold for the effect of retargeting on search queries. The stage a consumer is in determines the level of salience for specific brand and/or products. This is also meaningful for the proposed dynamic effect, since this relies on brand/product awareness and advertisement recall. When the advertisement matches the products the consumer is considering, it will be remembered better. This remembrance will increase the effect of retargeting on search queries, resulting in the following hypotheses:

Hypothesis 2a: Being further down in consideration stage increases the effect of retargeting on focal site visits

Hypothesis 2b: Being further down in consideration stage increases the effect of retargeting on focal search queries

All the hypothesis lead to the conceptual model shown in figure 1. The dotted lines represent a possible relation, but due to the aggregation of the data set, it is not possible to check for this relation in the full model.

Figure 1: Conceptual model for this study, note: all arrows in the model depict a positive relation.

Methodology

To answer the hypotheses in the conceptual model, data is analysed with the statistical programming language R (script can be found in the appendices). The data set is provided by a big marketing analytics firm, containing data of a travel agency. The analytic firm made it possible to monitor and report the whole customer journey for the individuals. Also, the ID’s of the individuals could be matched with available demographic data. Combining everything gives a very broad, almost complete view of the online customer journey. Even mobile devices are tracked and included in the data set, which are typically excluded due to the inability to match the cookies of different platforms. Consumers deleting cookies is not harmful for this data set since the information is retrieved through a browsing extension/plug-in and a mobile app. The data set includes all the browsing behaviour of the consumer regarding the booking of a travel trip, including price comparison websites, competitor websites, focal

Retargeting Focal Site Visits

Focal Searches Consideration

Stage

(8)

8 brand website, accommodation sites, plane ticket sites and the focal company marketing activities. Next to the mentioned websites themselves, the search queries and app uses for all these websites are tracked and included in the data set.

In total, 9678 participants were tracked during a time period of 17 months (June 2015 to October 2016), resulting in 3674 purchases of which 192 were at the focal company. For 1603 of these 9678 participants it was not possible to find and match their demographics with other available data sources, resulting in missing demographic information. Some descriptives of the touchpoints used in the conceptual model are given in table 1 below.

Originally the aggregation level of the data set is on touchpoint-level. To include possible demographics in the analysis as control variables, the data set needs to be aggregated to customer-level. Also, the aggregation of customer level is chosen over journey level since it is believed that the proposed effects can be carried over to the next journey. For example, if a customer gets retargeted in the first journey, he/she can start the second journey with a search for the focal site since he/she (un)consciously recalls the retargeting advertisement. Meaning the effect does not end after the journey has ended. Still, some problems arise with this level of aggregation, especially regarding the timing and sequencing of the different touchpoints which is crucial for establishing causality.

The causality problem will be investigated at the end of the analysis. The focal searches will be split into searches that occurred before and after (the first time of) retargeting has happened. All the searches that happened before the first time a consumer is retargeted, cannot be caused by the retargeting itself and should have some other origin. The searches after the first retargeting exposure can (but do not have to) be caused by retargeting, which will be tested. It is naïve to think that the problem is solved by this division. If there is reversed causality, the focal searches after the first retargeting exposure can still cause the second and next retargeting exposures. So, by this division only a part of the problem is solved. Nevertheless, it is a first step in establishing some causality by excluding the instances that could not have any causality with the independent variable. Also testing if the searches before the retargeting are correlated with retargeting gives a first indication of the causality between these concepts.

(9)

9 implicit assumptions is made that a consumer cannot move back once the consideration stage is entered, only forward, which in reality is possible.

Making this split and aggregation makes it hard to determine the direct effect of the consideration stage on the focal site visits. Since a customer moves through different stages, there could not be a single stage for the customer per row. However, the aggregation is made on customer level due to the carry over effect between journeys and inclusion of possible demographics. Adding the information site as a variable in the models could help, however only the first instance is used to determine the stage and the amount someone visited the information site does not affect the stage. This problem could be solved with another level of aggregation, but that will contradict the previously explained effect and inclusion demographics. Therefore, it is chosen to not address this direct effect in the analysis.

To test the hypotheses, different types of tests are used. For most of the hypothesis a normal regression will suffice since the mean of the outcome variable is quite high. In figure 2 the frequency of the focal search variable is mapped, and together with table 1 can be seen that this variable has a maximum of 16 searches. Arguably, the variable is best shown in a count model instead of a normal regression. Count models are generally used when an outcome variable has a low mean (0.0552) and low variance (SD = 0.520 / var = 0.72) (Gourieroux, Monfort & Trognon, 1984). Different kind of Poisson models could be fit for the data. Especially the negative binomial Poisson regression should seems to fit since the mean is much smaller than the variance. Which model will be most effective and best to determine the nature of the hypothesis will be tested in the results paragraph.

Figure 2: Histogram of the focal search variable

For the other variables of interest (Retargeting and Focal site visits) other problems arise. There are a few outliers that have very high counts on both or one of the variables. One instance, for example, has over 1600 retargeting exposures and over 10,000 focal site visits. These seem like clear outliers; however it is not sure how they occurred. Simply deleting them could be tampering with the data, therefore both a model with and without the outliers are tested to see the impact of deleting the outliers. Doing the analysis on both data sets will show the impact of the outliers on the performance and the estimates of the model.

(10)

10 searches), path b is the effect of the mediator (focal searches) on the dependent variable (focal site visits), path c is the direct effect of the independent variable (retargeting) on the dependent variable (focal searches) and at last, path c’ is the direct effect of the independent variable (retargeting) with the mediator (focal searches) added in the model on the dependent variable (focal site visits). All the paths are graphically showed in figure 3. Although, the moderators are left out to keep the simplicity of this visual representation, they will be considered in the analysis. Next to these described paths, Baron and Kenny (1986) also set up some assumptions which should be tested: The dependent variable significantly explains variance of the mediator (i.e. path a), variation of the mediator significantly explains variance of the dependent variable (i.e. path b) and at last, when controlled for the previous assumptions, the previously significant relation between the independent variable and the dependent variable (i.e. path c) is now zero (full mediation) or at least smaller in its effect (partial mediation).

Figure 3: Graphical representations of the paths defined by Baron and Kenny (1986)

Results

The assumptions of Baron and Kenny (1986) are tested before the full model will be discussed. After the full model is set-up, different data sets are used to see the effect of outliers, and at last a model is estimated with improved causality. Only the last model will be interpreted, however the other models are included in the tables to show the differences between the models.

The general, full, model that is being specified is stated below. The focal site count for consumer i is: (1) 𝐹𝑜𝑐𝑎𝑙𝑆𝑖𝑡𝑒𝐶𝑜𝑢𝑛𝑡𝑖 = 𝛼 + 𝛽1𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖+ 𝛽2𝑅𝑒𝑡𝑥𝐶𝑜𝑛𝑠𝑆𝑡𝑎𝑔𝑒 + 𝛽3𝐹𝑜𝑐𝑎𝑙𝑆𝑒𝑎𝑟𝑐ℎ𝑒𝑠𝑖 + 𝛽𝑗𝑋𝑖𝑗+ 𝜀𝑖

Where Xj is a vector of different covariates serving as control variables defined in equation 1.1.

The vector consists of covariates which are checked for. The ones that made the cut are specified below. Others are checked for but didn’t influence the results:

(1.1) 𝛽𝑗𝑋𝑖𝑗 = 𝛽1𝐺𝑒𝑛𝑒𝑟𝑖𝑐𝑆𝑒𝑎𝑟𝑐ℎ𝑒𝑠 + 𝛽2𝐶𝑜𝑚𝑝𝑒𝑡𝑖𝑡𝑜𝑟𝑆𝑖𝑡𝑒𝐶𝑜𝑢𝑛𝑡 + 𝛽3𝐴𝑔𝑒

Different models are tested to see the general effect of retargeting and the interaction effect of retargeting and the consideration stage. Equation (2) is a more specified version of equation (1). Here the retargeting variable is split in order to contain the proxy for being in the consideration stage. This makes it possible to test hypotheses 2a and 2b. The focal site count for consumer i is defined as:

Retargeting Focal Site Visits

Retargeting Focal Site Visits

Focal Searches

Path c

Path c’

(11)

11 (2) 𝐹𝑜𝑐𝑎𝑙𝑆𝑖𝑡𝑒𝐶𝑜𝑢𝑛𝑡𝑖 = 𝛼 + 𝛽4𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖+ 𝛽5𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜 + 𝛽6𝐹𝑜𝑐𝑎𝑙𝑆𝑒𝑎𝑟𝑐ℎ𝑒𝑠𝑖 + 𝛽𝑗𝑋𝑖𝑗+ 𝜀𝑖

Where Xj is a vector of different covariates serving as control variables defined in equation 1.1.

The following equations also plays a role in the model. The following beta’s should all be significant in order to interpret the mediating effect: 𝛽7 𝑎𝑛𝑑 𝛽9 and/or 𝛽8 𝑎𝑛𝑑 𝛽10. These beta’s hold the assumptions for the earlier defined path a (𝛽7 𝑎𝑛𝑑 𝛽8) and path b (𝛽9 𝑎𝑛𝑑 𝛽10). If these assumptions hold, the mediating effect is determined by comparing 𝛽4 with 𝛽9 and 𝛽5 with 𝛽10.

(3) ln(𝐹𝑜𝑐𝑎𝑙𝑆𝑒𝑎𝑟𝑐ℎ𝐶𝑜𝑢𝑛𝑡𝑖) = 𝛼 + 𝛽7𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖+ 𝛽8𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜𝑖+ 𝛽𝑗𝑋𝑖𝑗+ 𝜀𝑖 (4) 𝐹𝑜𝑐𝑎𝑙𝑆𝑖𝑡𝑒𝐶𝑜𝑢𝑛𝑡𝑖 = 𝛼 + 𝛽9𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖+ 𝛽10𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜 + 𝛽𝑗𝑋𝑖𝑗+ 𝜀𝑖

Where Xj is a vector of different covariates serving as control variables defined in equation 1.1.

Path A – effect of retargeting on focal searches

At first, path a is tested, which is the effect of retargeting (independent variable) on focal searches (mediator). In the simplest model, these are only two variables, however one could assume that there are some meaningful covariates that also influence the variable of interest. To make meaningful comparisons and include the right variables, several models are made to test which is best. The results of the Poisson models are reported in table 2. The estimates in the table are already calculated as exponents from their original testing estimate, such that they are interpreted as multipliers. Meaning estimates above 1 have a positive effect and the estimates below 1 have a negative effect. Since the mean (0.052) is lower than the variance (0.720) of the outcome variable (overdispersion), a negative binomial Poisson seems better fit. Therefore model 5 in table 2 is estimated according to the negative binomial Poisson. The models specified here are different versions of equation (3).

(12)

12 set, this leads to a 2.6% increase. At first glance this seems quite low, but since these numbers are multipliers, they stack up quite big. For example, being retargeted 50 times (in the consideration stage), this leads to a 1.01050 = 64% increase in searches. For retargeting before the consideration stage

this effect is even bigger.

To test for hypothesis 2b, the standardized betas should clarify both the effects and their size in the NB Poisson model (5) in table 2. Model 5 is expected to be the best model to test the hypothesis with The standardized coefficients for model 5, are: 𝛽7𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖 = 0.356 (s.d. = 0.063) or exp(0.356) = 1.428 and 𝛽8𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜𝑖 = 0.348 (s.d. = 0.059) or exp(0.348) = 1.416. Given the standard deviations they seem not to differ from each other and therefore their effect on creating searches seems the same. Leading to the rejection of hypothesis 2b. Other models are tested further on; however this does not lead to (big) changes in the ratio between these two variables.

Path b – The effect of focal searches on focal site visits

The next path (b) assumes that the mediator has an influence on the dependent variable. Here, also some control variables are used to make the model better and check for covariates. Next to the mediator, also the independent variable is included. Afterwards, path b and c can be compared and the (reduced) effect of adding the mediator is captured by the model. The different models for path b are shown in table 3 below. The models 1 and 2 are specified according to the earlier defined equation (1) and model 3 and 4 correspond to equation (2).

(13)

13 the average consumer that has searched and is being retargeted after visiting an information website this means an increase of 309 visits on the focal website. It should be noted that excluding the age variable (and therefore including the 1603 extra observations) seems to have a big impact on the size of the focal search variable.

Patch C – the direct effect of retargeting on focal site visits

At last, path c will be considered, which is the direct effect of the independent variable on the dependent variable, without the inclusion of the mediator. To establish the mediation effect, the beta of retargeting in path c should be compared with the beta of retargeting in path c’ (given in the table 3, path b). If there is a mediation effect, the beta in path c’ should be (significantly) lower than the beta in path c. To determine the right estimates some covariates are added in the table to capture the right estimate(s). The models being specified here are versions of equation (4).

Before the mediation effect can be interpret, the last assumption should be considered. Which is the direct effect of retargeting on focal site visits. As can be read from table 4, the assumption does hold. All the assumptions given by Baron and Kenny (1986) do hold, meaning that the mediation effect can be interpret from these tables. As for now, this is as far as the first analysis will be on the mediation. The previous models are shown to see the effect of the adaptation is the next paragraphs. Up next different models will be tested to see what the effect will be on the analysis and its outcome.

Improving the model – Outliers and Causality

(14)

14 Focal Searches) is estimated with a NB Poisson regression. Again, the exponented estimates are shown in the table for the NB Poisson regression. Therefore, they should be interpret as multipliers. The Age variable is included since it seems to impact path A significantly and does not affect that portion of mediation for path C vs path C’ as can be seen in table 3 and table 4.

Outliers

To get an indication of the impact of the outliers, the following graphs are made (figure 4). On the left side, a graph with the outliers included and on the right side, the graph without the outliers is shown. User 1612 and 2043 both have more than 5000 focal site visits, whereas user 1612 also has over 1600 retargeting exposures. The curve in both the plot shows the fitted relation between two variables. Deleting these two instances seem to impact the curve and its mathematical form.

Retargeting vs Focal site count (with outliers) Retargeting vs Focal site count (without outliers)

Figure 4: Two graphs of the effect of retargeting on focal site visits. Left: with outliers. right: without outliers

(15)

15 exponential line. However, this exponential curve indicates diminishing returns instead. Being retargeted once more seems to have less effect when you are already exposed to much retargeting. To check the impact of these two outliers and the exponential relation, the last, summarizing step of the full model will be performed again with the same parameters, but this time with a different data set. Namely a data set with the outliers being deleted. This results in the following equation, which is shown in table 6 below:

Where Xj is a vector of different covariates serving as control variables defined in equation 1.1.

Comparing the models based on their information criteria and log likelihood is not allowed, since different data sets are used. Nevertheless, the beta’s and the significance of the parameters can be compared. As expected, the quadratic diminishing effect of retargeting is significant indicating that the relation is quadratic. Excluding the outliers also seems to have a big effect on the focal search estimate (changing from 71.513 (sd = 3.302) to 46.677 (sd = 2.322)).

One could argue that more can be done to solve issues regarding other outliers. For example: more outliers can be deleted, some variables can be changed to a chosen maximum or even be truncated at some level. However, since the origin of these outliers is not known, one could be rather careful with deleting or changing the data. For this reason, only these two outliers are deleted and no more analysis will be done on other ways of dealing with outliers. All have their own drawbacks and probably cause some change, either good or bad, to the betas of the models. Also, leaving the data points and their scaling as close to their origin as possible will represent reality the best.

Causality

(16)

16 sequence is lost in the data and everything is piled on the customer level. So, for this type of aggregation, retargeting and focus search queries are intertwined, making it hard to determine which causes which. To check for this reversed causality and separate some of the interwovenness, a new variable is made that captures all the searches that are done after someone is retargeted. This new variable makes it possible to identify the searches before and after any retargeting has happened. The starting point for counting the searches after retargeting is an exposure to retargeting. The causality problem is way more complex and is not solved by adding this variable. It is still possible that the other retargeting exposures are caused by searches that happen after the first retargeting exposure. However, it is a first indication of the problem and establishing the causality for every retargeting or focal search is almost impossible since many other factors can influence the timing of the touchpoint(s).

When regressed with each other, it is to be expected that the searches after the first retargeting exposure are correlated with retargeting, since they are marked as such when retargeting has happened. On the other side, the searches before the first instance of retargeting shouldn’t influence the retargeting significantly. Whether they are correlated or not will give an argument for or against reversed causality in this model. The latter is the case: searches before retargeting do not lead to retargeting (p > .325). The results of model 3 in table 7 are set up according to the following equation: (6) 𝐹𝑜𝑐𝑎𝑙𝑆𝑖𝑡𝑒𝐶𝑜𝑢𝑛𝑡𝑖

= 𝛼 + 𝛽11𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖+ 𝛽12𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜 𝑖2+ 𝛽13𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜

+ 𝛽14𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜 𝑖2 + 𝛽16𝐹𝑜𝑐𝑎𝑙𝑆𝑒𝑎𝑟𝑐ℎ𝑒𝑠𝐴𝑓𝑡𝑒𝑟𝑅𝑒𝑡𝑖+ 𝛽17𝐹𝑜𝑐𝑎𝑙𝑆𝑒𝑎𝑟𝑐ℎ𝐵𝑒𝑓𝑜𝑟𝑒𝑅𝑒𝑡𝑖 + 𝛽𝑗𝑋𝑖𝑗+ 𝜀𝑖

(17)

17 Retargeting cannot influence the searches before retargeting. In addition, the searches after retargeting could not influence the (first instance of) retargeting. When these searches after retargeting are used as mediator, the model should score better since some possible reversed causality is taken out, or at least reduced. The results of this analysis are shown in table 7. The data set without the two outliers is used to generate these results.

Validation of the model:

Now the model with better causality and less outliers is defined, the assumptions for a regression should be checked in order to interpret the right results. Violation of these assumptions can lead to for example: biased parameter estimates, unreliable p-values or unreliable parameter estimates. In the following paragraph the assumptions are checked for the latest mediation model and possibly solved in order to make the model and interpretation more reliable. The assumptions that are being checked are:

The relation must be linear

Leeflang, Wieringa, Bijmolt and Pauwels (2015) argues that this is the most serious assumption. The assumptions expect that there is a linear relation between the dependent and independent variable(s) in the model. If this assumption is violated the parameter estimates will be biased. This bias can reside from omitted variable bias or a wrong functional form. To test for the functional form, the Ramsey Regression Equation Specification Error Test (RESET) is used. The null hypothesis assumes there current model has the right functional form. When using only quadratic powers (argument ‘power = 2’) the test score is F(1,8063) = 2.2954 (p > .13), meaning that the current functional form of the parameters seems fit. Checking for missing quadratic and cubic parameters, the test score changes to F(2, 8062) = 62.812 (p <.000), meaning that there should possibly a cubic function instead of a quadratic one. Adding a cubic parameter for ret_after_info decreases the violation of the test (F(2, 8062) = 18.691, p <.000), but resulting in a parameter which is negligible and therefore adding no relevance to the data. So, adding the cubic function tend to overfit the data and doesn’t add any relevance to the interpretation of the data itself. Therefore, it is chosen to put more weight on the ‘simple’ criteria (Leeflang et al, 2015), instead of being fully complete. Being simple and complete also contradicts each other, since simple models tend to be incomplete and vice versa. By adding other powers, the models become more complete, but also less simple. Since this doesn’t add much relevance to the model in terms of estimated beta’s and interpretation, it is choses to continue with the current model.

There could still be issues regarding omitted variables, which is a big danger to the estimates in the model. Unfortunately, it is hard to test which omitted variables should be included, since most of the times the are out of scope of the data set. By including other variables and covariates in the model, it does not get meaningfully better in terms of AIC, BIC or Loglikelihood. Therefore the current model will be continued with to check the other assumptions.

Heteroscedasticity

(18)

18 equal variance across the sample, which possibly causes the outliers and higher values for the independent variable to be over weighted. By using the right weights for the GLS estimation this problem is solved. In table 8 below, both the outcomes for the OLS and GLS estimation are shown. The Age variable has been dropped since this caused issues in estimating the GLS model. Dropping Age was not detrimental for the model, since this parameter wasn’t significant. Next to that, the age variable had 1603 missing rows which could now be analysed in the model. Concluding from table 8, the weighting of the variables does indicate that there is indeed a big bias in parameters. Weighting the parameters seems to give different, but more robust estimates for the model. Nevertheless, as can be seen in appendix 1, the problem is only partly solved. The typical cone shape is still present when the residuals are plotted against the newly obtained fitted values from the GLS. The standard errors in the GLS estimation are way bigger compared to the OLS. Generally speaking the GLS standard errors are more robust than the OLS. This is in line with the heteroscedastic-consistent/white standard errors, which are calculated in appendix 2 as a control. Nevertheless, the estimates of the GLS are further used to check the assumptions.

Autocorrelation

If the residuals exhibit a systematic pattern over time, there is so-called autocorrelation in the model (Leeflang et al., 2016). This means that the residuals are related to each other. This will lead to inefficient parameter estimates, just as the previous violation. Normally, autocorrelation is a big problem in time-series data. However, in non-time-series data autocorrelation is not likely to be present and therefore no issue for this model.

Nonnormal Errors

(19)

19 data better because of the weights that are used. When comparing the newly obtained confidence intervals with the previously OLS there seems to be a difference in the significance of the generic search variable which is not significant with the bootstrapping (bootstrapped p > .17) and the ret_before_info variable becomes marginally significant (bootstrapped p < .10). Therefore these results should be handles with some more care because their significance is debatable. These significance changes are also captured by the GLS model, where generic searches are not significant anymore and ret_before_info is significant.

Multicollinearity

At last, multicollinearity might disturb the estimates of the final model. When the independent variables are depending on each other, they are multicollinear. Resulting in unreliable parameter estimates. To test for multicollinearity the Variance Inflation Factor (VIF) score must not exceed the threshold of 4 (Leeflang et al., 2016). Testing for this results in a highest VIF score of 3.93 for the ret_after_info variable and 3.86 for its quadratic form. These are still below the threshold of 4, which means multicollinearity is no issue in this model.

The mediation effect

Checking for the assumptions does change the model quite a bit. Especially the use of GLS (which uses weights to determine the outcome) changes the parameters and creates more robust results. To see if there still is a mediation going on, the same model without the mediator is specified. The results are shown in table 9 below. Equation 6 is used to determine the model:

(20)

20 dependent variable. Both conditions are satisfied for the comparison in this study. Performing a Z-test gives a Z-score for ret_after_info of: 0.594 and ret_before_info of: 0.407 (calculation in appendix 3), both meaning the difference is not significant from each other (p > .05). The parameters do slightly change; however this effect is not significantly different from each other.

Also, the standardized betas for the focal searches differ from each other, being in favour of the mediation. The searches after the retargeting are have a bigger beta in explaining website traffic compared to the searches before retargeting. The standardized beta’s are: 𝛽16𝑆𝑒𝑎𝑟𝑐ℎ𝐴𝑓𝑡𝑒𝑟𝑅𝑒𝑡𝑖 = 39.884 (s.d. = 11.807) and 𝛽17𝑆𝑒𝑎𝑟𝑐ℎ𝐵𝑒𝑓𝑜𝑟𝑒𝑅𝑒𝑡 = 28.296 (s.d. = 12.627), however given the high standard deviations for these beta’s it is hard to conclude if these do statistically differ from each other. Even if significant, these beta’s only give secondary evidence of the mediating effect. The betas for the retargeting variables determine whether there is a mediating effect or not.

Standardizing the beta’s for both the retargeting variables should be done to interpret the results for hypothesis 2a. Both the standardized beta’s are: 𝛽11𝑅𝑒𝑡𝐴𝑓𝑡𝑒𝑟𝐼𝑛𝑓𝑜𝑖 = 106.013 (s.d. = 18.673) and 𝛽13𝑅𝑒𝑡𝐵𝑒𝑓𝑜𝑟𝑒𝐼𝑛𝑓𝑜𝑖= 39.884 (s.d. = 11.807), meaning that retargeting in the consideration stage is indeed more effective in generating website traffic.

Discussion

The central question in this research was: What is the dynamic effect of retargeting on generating website traffic and which moderators play an important role? Which could be divided into three goals: identifying the dynamic, testing the effect size and provide meaningful moderators to increase the timing effectiveness of retargeting.

To answer the first part of the question, literature of similar touchpoints is reviewed to find the logical dynamic. Based on Kireyev et al. (2015) and secondary data of others like Li and Kannan (2014), Xu et al. (2014) and Anderl et al. (2016) it is hypothesized that retargeting leads to increased website traffic due to a direct and indirect effect. The direct effect is typically seen as the click-through-rate of the advertisement, whereas the indirect effect is the influence the advertisement has on other touchpoints, which in turn increase website traffic. Based on the literature it is assumed that this other touchpoint is the focal searches (i.e. Kireyev et al., 2015; Lambrecht and Tucker, 2013). Retargeting will lead to increased brand awareness and advertisement recall, serving as input for the search queries.

To test the size of this dynamic and serve the second goal of this study (identifying the effect size), an intensive data analysis is conducted which resulted in several models. Different models were made, all having different assumption and using different data sets or variables. The latest model (table 9) gives the most reliable output in terms of outliers, causality and regression assumptions, meaning this model should be used in interpreting the hypotheses.

(21)

21 direct click effect of retargeting advertisement that leads to the visiting of the focal site. Also, the same mechanism of brand awareness and advertising recall could result in a spillover to the direct type-ins (instead of searching), but this effect could not be tested within this data set given the way the data is collected.

Secondly, hypothesis 1b assumed there was a positive effect of retargeting on the focal search queries. By means of a count model this effect is tested and enough evidence is gathered to accept the hypothesis. This effect is key to determine the mediating effect of focal searches on site visits. It is also one of the assumptions that should hold for a mediation effect to be possible. Factors as brand awareness and advertisement recall after seeing the retargeting advertisement probably caused this effect. Which is the same mechanism that Drèze and Hussherr (2003) found for banner advertisements.However, there is some uncertainty regarding this effect, since it did not hold under hypothesis 1d. Still, this seems like the logical explanation. Brand awareness and advertisement recall are the logical results for banner advertisements, which is the broader category for display and retargeting advertisements. Meaning it should also hold for retargeting advertisements.

Thirdly, hypothesis 1c assumed that focal search queries have a positive relation with focal site visits. The results show a very clear relation between these two, which is to be expected. If you search for a brand, then you will likely go to the site of this brand. Unless the search engine optimization is very badly managed by the brand. This positive effect is also found by many others for example Li and Kannan (2014) and De Haan et al. (2016), however stating that typically searches are over-attributed since they are also used as a navigational tool. Distinguishing between browsing/searching and navigating is not within the scope of this study. Therefore, it could be possible that this beta is also over-attributed as Li and Kannan (2014) found in their study. Nevertheless, the positive correlation should still hold. The impact of deleting the two outliers seems to have a great impact on this hypothesis. While still being significant, the effect size drops significantly. Therefore, the estimates in the GLS seem to be more reliable since this gives a weight to the independent variables and therefore produces more robust results. The positive effect still holds under the GLS model and therefore this hypothesis is accepted.

The last hypothesis that is being part of the first hypothesis is the mediating effect of focal searches on the relation between retargeting and focal site visits. Comparing the models (including mediator vs not including the mediator) shows that the beta’s decrease slightly when the mediator is included. However, given the high standard deviations for both the beta’s cause the difference to not be significantly different. Therefore, hypothesis 1d is rejected and the mediation effect is not clearly present in the analysis. Especially the assumption of heteroskedasticity caused the beta’s and standard errors to change, concluding there is not enough evidence to establish the mediating effect. In the earlier OLS model(s), the mediating effect did seem present and given the (low) standard deviations this effect should have been significant.

There could be several reasons why the proposed effect is not found in this study. At first, it is possible that the effect is present, but couldn’t be found by this analysis. This means that issues regarding the analysis or the data caused the rejection of hypothesis 1d. Meaning that future researchers can investigate this to not stumble upon the same result. Some possible reasons are explained in the following paragraphs.

(22)

22 and robust leaving no room for the mediating to take place. The second effect is part of the direct, click effect, whereas (3) is another dynamic using the same proposed mechanism as the search queries in this study.

Another reason might be that the type of measurement is wrong in this data set. The focal site visits is a continuous variable, capturing the count of how many pages (in total) a consumer viewed on the website. Heavy differences in browsing between customers might have caused the variation to be too high, which lead to high standard errors which deflated the significance. Other data construction techniques could have lighten this problem. Counting how many different times the consumer visited the focal site, where subsequent visits are disregarded, is another way of testing presence of the spillover. So, browsing on the focal site without leaving it counts as one visit (instead of counting every page that was viewed and code it is ‘focal site visit’). If this effect is found, then the depth of the page visits could be studied.

If the proposed adaptations to the analysis do not lead to the acceptation of hypothesis 1d, it is likely that the effect was not there to begin with. The proposed dynamic (via search queries) in this study was found for display advertisements by Kireyev et al. (2015) and secondary data of Li & Kannan (2014). Given the similarity between display and retargeting advertisement it was thought that this effect was extendable. However, it might be that this effect is dependable on the stage a consumer is in. In the earlier stages there is more need for general display advertisements, whereas in the later stages more focused, retargeting advertisements seem more effective (Lambrecht & Tucker, 2013). Maybe the dynamic between display advertisements and search queries is therefore only present in the earlier stages. Generic display advertisement raises brand awareness, which will be searched. Whereas retargeting is more precise, the brand is probably already known to the consumer. Therefore, the consumer will directly go to the website, either by clicking or (later on) by a direct-type in. The direct effect of retargeting seems way stronger and there is no need for the consumer to use search queries, hence no spillover. This effect can be explored by looking at the extend of which the spillover proposed by Kireyev et al. (2015) depends on the stage the consumer is in.

For the hypothesis 2a, evidence is found that there is indeed a difference in the effectiveness for retargeting in the consideration stage versus earlier stages. The estimates of the OLS model were not reliable due to heteroskedasticity. Using the GLS model lead to a very different estimate of ret_before_info variable (βOLS: 1.061 (p > 0.05) vs βGLS: 2.991 (p <.000). Standardizing the variables in

the GLS model makes them comparable in size to each other. Interpreting results in retargeting in the consideration stage (after visiting an information site) seems to be 2.5 times more effective compared to the other stages (before visiting an information site). Therefore, hypothesis 2a can be accepted. Meaning that the effect found by Lambrecht and Tucker (2013) also extends to generating website visits, and not only to purchases. This was to be expected since website traffic and purchases are highly related (Ilfed and Winer, 2001). The effect is caused by the fact that in the consideration stage the preferences for products are more narrowly construed, meaning they respond better to more detailed forms of advertisements. Also, when the preferences are more narrowly construed, the products shown in the retargeting advertisements are more salient and relevant to the consumer compared to earlier stages when their product preferences are quite broad. Nevertheless, the retargeting advertisements in the earlier stages still have a positive relation with focal site visits.

(23)

23 touchpoints such as the search queries. Possibly an explanation for the lack of this effect could be found by combining the preferences in the stages with the touchpoints. In the earlier stages, when the preferences are more broadly construed, there is more need for searching for brands. It could be that the consumer doesn’t know the website of the brand yet, or isn’t fully aware of the brand and therefore searching for it. Whereas in the later stages, the brand is already known which leads to less searching and more direct browsing behaviour (i.e. direct type-ins, click on advertisements, reactions to other marketing activities etc.). At the same time, retargeting is more effective in the later stages (hypothesis 2a; Lambrecht & Tucker, 2013), so even though there is still a positive (direct) effect of retargeting in the later stages, this is cancelled out by the lesser need for searches in this stages. Resulting in no moderation for the consideration stage on the effect between retargeting and focal searches.

This latter explained (lack of) effect aligns with the previously explanation why it is possible that there is no mediation of the focal searches. However, more tests should be done to find the real nature of the results in this study. Regarding hypothesis 2a, looking at the direct effect of the stages on the searches might give an indication whether the explanation is correct.

Some remarks can be made about the proxy that is used to measure the consideration stage. It might be quite simplistic given the reality. More complex models and estimation of the stages could be used to check if retargeting is indeed more effective during the later stages of the purchase funnel. Nevertheless, Lambrecht and Tucker (2013) thought so too and checked the same proxy in an experimental setting. Their experiment confirmed their use of this proxy meaning that even though it was simplistic, it did reflect the underlying hypothesis of being later in the purchase funnel well. This study replicated and extended their findings, making them more reliable. Also, being simplistic yet accurate makes it easier for managers to implement this proxy without needing thorough data analysis.

Another limitation of this study regards the data set. Since only the (retargeting) information of one focal brand is used, there could be some covariates regarding this brand. These covariates are likely to be found outside the data set. For example, the panel probably already has general preconceptions about the focal brand, which influences the effectiveness of the focal marketing activities. Or including competitors marketing actions could be important in moderating firm-initiated touchpoints for the focal firm. Problems or enhancements in SEO/SEA could also determine partly the effectiveness of the spill over. If you are hard to find when consumers are searching for you, it is likely that some other brand ‘stole’ this potential customer.

(24)

24 For the implications with this study the focus will be namely on hypothesis 2a, since this is the only effects that could be established. The hypothesis indicates that the timing is important when using retargeting as a marketing activity. Since retargeting can backfire (Bleier & Eisenbeiss, 2015), it should only be used if it’s expected to be effective. And even though retargeting is positive in all the consumer stages, it effect differs quite much in explaining website traffic. Considering the study of Lambrecht and Tucker (2013) it is even more effective to use other types of advertising in the earlier stages, for example general display advertisements that have focus on the brand (and not specific products as with retargeting) which is more congruent with the consumers’ preferences. Therefore, managers should implicate retargeting when they are sure that the consumer is in the later stages of the customer journey, aiming for the optimal results. Preferably in the consideration stage, where alternatives are weighted and the product preferences are more narrowly construed. Showing the right products makes the advertisement more salient and therefore more effective in causing website traffic.

More implications regarding the dynamics in the customer journey could be given if hypothesis 1d and 2b were significant. This study was not able to find enough evidence to accept the hypothesis. Therefore there will be more focus on future research, instead of implications. Future research regarding this hypothesis is given earlier in the discussion. To summarize this, looking into the effect of the stages can shed light on the lack of dynamic and could explain why retargeting does not lead to more searches in the consideration stage. Next to this effect, some more extensive analysis could be done to check the true nature of the dynamic, since this study had timing restrictions and no further analysis could be done on the effect. Routes for extra research regarding these hypotheses is earlier mentioned. In the following paragraphs more general future research directions are given as a result of this study.

At first an interesting covariate comes up regarding the search queries. Older the participants are less likely to use the searches for the focal brand. Since this effect is quite big, one might investigate the how and more importantly the why of this effect. This could be part of a bigger question regarding the browsing strategies of different age groups or generations. Insights in the online browsing strategies of generations could lead to insights into which marketing activity is most effective for each generation. Also issues such as timing and dynamics could be addressed within this study.

Other further research could be directed at the generalizability of this study to other products and brands. As mentioned, since only the marketing activities of one focal brand could be analysed, it will be insightful to see if these results could be replicated by other brands. Also, different product categories should be taken into account. Are the effects for other brands and products stronger? Are the effects still there? To what degree do other things such as Search Engine Optimization help (obstruct) in order to (not) find this spillover effect? And many more questions regarding the specific brand choices could be answered and compared.

Conclusion

(25)
(26)

26

References

Abhishek, V., Fader, P., & Hosanagar, K. (2012). Media exposure through the funnel: A model of multi-stage attribution. Available at SSRN 2158421.

Anderl, E., Becker, I., Von Wangenheim, F., & Schumann, J. H. (2016). Mapping the customer journey: Lessons learned from graph-based online attribution modeling. International Journal of Research in

Marketing, 33(3), 457-474.

Baron, R. M., & Kenny, D. A. (1986). The moderator–mediator variable distinction in social

psychological research: Conceptual, strategic, and statistical considerations. Journal of personality

and social psychology, 51(6), 1173.

Blake, T., Nosko, C., & Tadelis, S. (2015). Consumer heterogeneity and paid search effectiveness: A large‐scale field experiment. Econometrica, 83(1), 155-174.

Bleier, A., & Eisenbeiss, M. (2015). The importance of trust for personalized online advertising. Journal of Retailing, 91(3), 390-409.

Cho, C. H., & Cheon, H. J. (2004). WHY DO PEOPLE AVOID ADVERTISING ON THE INTERNET?. Journal

of Advertising, 33(4), 89-97.

Cheung, K. W., Kwok, J. T., Law, M. H., & Tsui, K. C. (2003). Mining customer product ratings for personalized marketing. Decision Support Systems, 35(2), 231-243.

Clogg, C. C., Petkova, E., & Haritou, A. (1995). Statistical methods for comparing regression coefficients between models. American Journal of Sociology, 100(5), 1261-1293.

De Haan, E., Wiesel, T., & Pauwels, K. (2016). The effectiveness of different forms of online advertising for purchase conversion in a multiple-channel attribution framework. International

Journal of Research in Marketing, 33(3), 491-507.

Drèze, X., & Hussherr, F. X. (2003). Internet advertising: Is anybody watching?. Journal of interactive

marketing, 17(4), 8-23.

Goldfarb, A., & Tucker, C. (2011). Online display advertising: Targeting and obtrusiveness. Marketing

Science, 30(3), 389-404.

Goldsmith, R. E., & Freiden, J. B. (2004). Have it your way: consumer attitudes toward personalized marketing. Marketing Intelligence & Planning, 22(2), 228-239.

Gourieroux, C., Monfort, A., & Trognon, A. (1984). Pseudo maximum likelihood methods: applications to Poisson models. Econometrica: Journal of the Econometric Society, 701-720.

Hargrave, S. (2011). Targeted ads aim for greater accuracy. Marketing Week.

Ilfeld, J. S., & Winer, R. S. (2002). Generating website traffic. Journal of Advertising Research, 42(5), 49-61.

(27)

27

Klapdor, S., Anderl, E., Schumann, J. H., & Von Wangenheim, F. (2015). How to Use Multichannel Behavior To Predict Online Conversions: Behavior Patterns across Online Channels Inform Strategies for Turning Users Into Paying Customers. Journal of Advertising Research, 55(4), 433-442.

Leeflang, P., Wieringa, J. E., Bijmolt, T. H., & Pauwels, K. H. (2016). Modeling markets. Springer-Verlag New York.

Lambert-Pandraud, R., Laurent, G., Mullet, E., & Yoon, C. (2017). Impact of age on brand awareness sets: a turning point in consumers’ early 60s. Marketing Letters, 28(2), 205-218.

Lambrecht, A., & Tucker, C. (2013). When does retargeting work? Information specificity in online advertising. Journal of Marketing Research, 50(5), 561-576.

Lecinski, J. (2011). Winning the Zero Moment of Truth: ZMOT. Zero Moment of Truth.

Lemon, K. N., & Verhoef, P. C. (2016). Understanding customer experience throughout the customer journey. Journal of marketing, 80(6), 69-96.

Li, H., & Kannan, P. K. (2014). Attributing conversions in a multichannel online marketing

environment: An empirical model and a field experiment. Journal of Marketing Research, 51(1), 40-56.

Sherman, L., & Deighton, J. (2001). Banner advertising: Measuring effectiveness and optimizing placement. Journal of Interactive Marketing, 15(2), 60-64.

Verhoef, P. C., Kooge, E., & Walk, N. (2016). Creating value with big data analytics: Making smarter

marketing decisions. Routledge.

(28)

28

Appendix 1 – plots for checking the heteroskedasticity assumptions

OLS regression: plotting the residuals

GLS regression: Plotting the residuals

(29)

29

Appendix 2:

The heteroskedastic-consistent or white standard errors for the OLS model.

Appendix 3:

(30)

30

RScript:

#some of the code is deleted to save paper. When deleted it is indicated with ‘… …’. This only concerns code were some iterations were done repetitively (for example when aggregating the data). The get the full script an email could be send to the author (A.C.Westendorp@student.rug.nl).

TravelData <- read.csv("C:/Users/Alwin/Dropbox/Scriptie 2019/data/TravelData.csv")

TravelDataDemos <- read.csv("C:/Users/Alwin/Dropbox/Scriptie 2019/data/TravelDataDemos.csv")

#searches before/after retargeting:

Model1_ExpS <- data.frame(UserID=integer(), PurchaseID=integer(), Focal_Search=integer(), Retargeting=integer())

Teller=1 Focal_Search=0 Retargeting=0 pID=TravelData$PurchaseID[1] uID=TravelData$UserID[1] AlreadyWroteALignForThisJourney=0

while (Teller <= nrow(TravelData)) {

if (Teller > 1) {

if ((TravelData$UserID[Teller] != TravelData$UserID[Teller - 1])) {

Model1_ExpS <- rbind(Model1_ExpS, data.frame(UserID=uID, Focal_Search=Focal_Search, Retargeting=Retargeting)) #hier schrijf je een regel weg naar de nieuwe dataset

Retargeting = 0 Focal_Search=0 uID=TravelData$UserID[Teller] } } if ((TravelData$type_touch[Teller]==22)) {Retargeting=Retargeting + 1}

if (Retargeting >= 1) {if (TravelData$type_touch[Teller] == 12) {Focal_Search=Focal_Search + 1}

}

Teller = Teller + 1 }

#retargeting after info

Model3_ExpS <- data.frame(UserID=integer(), Information_Site=integer(), Ret_After_Info=integer())

Teller=1

Information_Site=0 Ret_After_Info=0 uID=TravelData$UserID[1]

while (Teller <= nrow(TravelData)) {

if (Teller > 1) {

if ((TravelData$UserID[Teller] != TravelData$UserID[Teller - 1])) {

Model3_ExpS <- rbind(Model3_ExpS, data.frame(UserID=uID, Information_Site=Information_Site, Ret_After_Info=Ret_After_Info)) #hier schrijf je een regel weg naar de nieuwe dataset

Ret_After_Info = 0 Information_Site=0 uID=TravelData$UserID[Teller] } } if ((TravelData$type_touch[Teller]==4)) {Information_Site=Information_Site + 1}

if (Information_Site >= 1) {if (TravelData$type_touch[Teller] == 22) {Ret_After_Info=Ret_After_Info + 1}

(31)

31 Teller = Teller + 1

}

#aggregating

TDCount1 <- aggregate(TravelData$type_touch[TravelData$type_touch==22], by = list(TravelData$UserID[TravelData$type_touch==22]), FUN = length)

TDCount2 <- aggregate(TravelData$type_touch[TravelData$type_touch==12], by = list(TravelData$UserID[TravelData$type_touch==12]), FUN = length)

TDCount3 <- aggregate(TravelData$purchase_own, by = list(TravelData$UserID), FUN = max) TDCount4 <- aggregate(TravelData$purchase_any, by = list(TravelData$UserID), FUN = max) …

TDCount22 <- aggregate(TravelData$type_touch[TravelData$type_touch==14], by = list(TravelData$UserID[TravelData$type_touch==14]), FUN = length)

#left join:

TDCount <- merge(TDCount1, TDCount2, by = 'Group.1', all.x=TRUE, all.y = TRUE) colnames(TDCount)[colnames(TDCount)=="x.x"] <- "Retargeting_Count" colnames(TDCount)[colnames(TDCount)=="x.y"] <- "Focal_Search_Count" TDCount <- merge(TDCount, TDCount3, by = 'Group.1', all.x=TRUE, all.y = TRUE) colnames(TDCount)[colnames(TDCount)=="x"] <- "PurchaseOwn_Count" …

TDCount <- merge(TDCount, TDCount22, by = 'Group.1', all.x=TRUE, all.y=TRUE) colnames(TDCount)[colnames(TDCount)=='x'] <- 'FlightTicket_App_Count'

#changing group name to user id and changing na's to 0 colnames(TDCount)[colnames(TDCount)=="Group.1"] <- "UserID" TDCount[is.na(TDCount)] <- 0

#adding focal_Search_After_Ret:

Model1 <- aggregate(Model1_ExpS$Focal_Search, by = list(Model1_ExpS$UserID), FUN=sum) TravelDataAGG <- merge(TravelDataAGG, Model1, by = 'Group.1', all.x=TRUE, all.y=TRUE) colnames(TravelDataAGG)[colnames(TravelDataAGG)=='x'] <- 'FocalSearch_After_Ret'

#adding Competitor_Search_After_Ret:

Model2 <- aggregate(Model2_ExpS$Competitor_Search, by=list(Model2_ExpS$UserID), FUN=sum) TravelDataAGG <- merge(TravelDataAGG, Model2, by='Group.1', all.x=TRUE, all.y=TRUE) colnames(TravelDataAGG)[colnames(TravelDataAGG)=='x'] <- 'Competitor_Search_After_Ret'

#adding Ret_After_Info:

Model3 <- aggregate(Model3_ExpS$Ret_After_Info, by=list(Model3_ExpS$UserID), FUN=sum) TravelDataAGG <- merge(TravelDataAGG, Model3, by='Group.1', all.x=TRUE, all.y=TRUE) colnames(TravelDataAGG)[colnames(TravelDataAGG)=='x'] <- 'Ret_After_Info'

#adding FSC_After_Ret

Model4 <- aggregate(Model4_ExpS$Focal_Site_Count, by=list(Model2_ExpS$UserID), FUN=sum) TravelDataAGG <- merge(TravelDataAGG, Model4, by='Group.1', all.x=TRUE, all.y=TRUE) colnames(TravelDataAGG)[colnames(TravelDataAGG)=='x'] <- 'FSC_After_Ret'

#changing name of group.1/userid + na's as 0

colnames(TravelDataAGG)[colnames(TravelDataAGG)=="Group.1"] <- "UserID" TravelDataAGG[is.na(TravelDataAGG)] <- 0

#adding demographics

TravelDataAGG <- merge(TravelDataAGG, TravelDataDemos, by = 'UserID', all = TRUE)

#adding Count variables:

TravelDataAGG <- merge(TravelDataAGG, TDCount, by='UserID', all.L=TRUE)

#Ret_Before_Info:

TravelDataAGG$Ret_Before_Info <- TravelDataAGG$Retargeting_Count - TravelDataAGG$Ret_After_Info

#FocalSearch_Before_Ret:

TravelDataAGG$FocalSearch_Before_Ret <- TravelDataAGG$Focal_Search_Count - TravelDataAGG$FocalSearch_After_Ret

TravelDataAGG$FSC_SO_Ret <- TravelDataAGG$Focal_Site_Count - TravelDataAGG$FSC_After_Ret

#descriptives

Referenties

GERELATEERDE DOCUMENTEN

1) The general manager finds it difficult to define the performance of the physical distribution and reverse logistics at Brenntag. This makes it impossible to ensure that

Judicial interventions (enforcement and sanctions) appear to be most often aimed at citizens and/or businesses and not at implementing bodies or ‘chain partners’.. One exception

Keywords: customer journey, touchpoints, segmentation, profit drivers, online, discounted expected transactions, negative binomial regression, latent class cluster

impact of average satisfaction levels during prior experiences on the current overall customer experience is mediated by the level of pre-purchase satisfaction. H4 Customers

Conceptual model 29-06-15 | 4 Customer touchpoints Physical stores Online stores Catalogues Mobile phones Touchpoint experience - Disconformation - Positivity

Marc: Ja daar wordt nu nog niet zo heel veel naar gekeken maar dat gaat wel steeds meer spelen. Uiteindelijk worden we ook weer verplicht om daar naar

Hypothesis 1a: Consumer privacy increases firm performance in the reputation stage of the online customer journey, and hypothesis 4: Firm performance in the

This is a test of the numberedblock style packcage, which is specially de- signed to produce sequentially numbered BLOCKS of code (note the individual code lines are not numbered,