• No results found

Brace for impact? Brace for impact?

N/A
N/A
Protected

Academic year: 2021

Share "Brace for impact? Brace for impact?"

Copied!
92
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)
(2)
(3)

Brace for impact?

The impact of external negative news on the

customer purchase journey within the travel industry.

Marc van Eck

University of Groningen

Faculty of Economics and Business

MSc Marketing Intelligence

Master thesis

15-06-2019

Marc van Eck Coendersweg 96A 9722 GK Groningen (+31)615517278 marcvaneck@gmail.com S2724626 Supervisor (first): Dr. P.S. (Peter) van Eck

p.s.van.eck@rug.nl Supervisor (second): M.T. (Martine) van der Heide

m.t.van.der.heide@rug.nl

(4)

Abstract

Firms do their best to make the customer journey pleasant and easy, and optimize the journey so that customers convert. However, firms are not isolated. External events, such as crises and weather, have impact on the customer journey. Negative news as externality has not yet been investigated by academia. This gap is investigated by this research and also brings practical insights to marketeers and managers. In the context of the travel industry, it is relevant to investigate the effect of plane crashes (as negative news) on the customer purchase journey and the different touch points within. Travelling by plane is one of the safest modes of transportation, therefore plane crashes are rare and newsworthy. In this research, the impact of negative news (plane crashes) on the customer journey in a travel industry context is examined. Also the effect on attribution and the effect length are examined.

Based on an extensive literature analysis, five hypothesis are formulated. The first hypothesis expects that the impact of negative news regarding a plane crash will result in customers applying risk aversion behavior, which therefore will result in a decrease in (search) activity in the pre-purchase stage. Hypothesis two focuses on the purchase stage and expects that the impact of negative regarding a plane crash will result in customers applying risk aversion and will therefore result in a decrease in purchase activity, and so a decrease in conversions. The thirds hypothesis expects that the effect of negative news is moderated by time, so that the impact will be large when the event is happening and will reduce to zero over time. In addition, the fourth and fifth hypothesis focus on Customer Initiated Touchpoints (CITs) and Firm Initiated Touchpoints (FITs). The fourth hypothesis expects that both CITs as FITs will decrease in attribution effectiveness in the pre-purchase stage in case of negative news impact. The last hypothesis expects that CITs in the pre-purchase stage will decrease more severely in attribution effectiveness in case of negative news impact than FITs in the pre-purchase stage. Time-series panel data with customer journey data from the Dutch travel agency industry is obtained from GfK, a German market research institute. In order to add the effect of negative news shocks on the Dutch society, proxy data is obtained from Google Trends. A Vector Autoregression (VAR) model is used to analyze the impact of shocks on the subjects of interest. The Bayesian Causal Impact (BCI) analysis is applied as a controlling technique.

(5)

Preface

When I started my bachelor Communication & Information Sciences in 2014, I did not expect to make the drastic switch to a master that is mostly focused on data and statistics. Yet during my bachelor, it became clear that marketing and data were a field of interest for me. Something that was lacking in the bachelor program. Looking back, I am very glad that I took the effort to do the shortened pre-master program during my bachelor.

The master program has flew past. The program has been very interesting and during the year, I have learned practical new skills besides the new academic knowledge. The year flew past, mostly because it was a lot of fun. For this, I want to thank the amazing people that I got to spend the last year with. I also want to thank the team of professors and teachers. Besides the new knowledge they taught me, their passion and enthusiasm has also contributed to the fun aspect.

Taking a larger step back, this is the final document that I present to the University of Groningen. I feel privileged that I got the opportunity to study here, and I am grateful for all the opportunities that were given to develop myself, also outside the standard curriculum. As student mentor for my bachelor program, I got the opportunity to help first-year students to get familiar with the university and its organization. And as board member for study association Commotie, I was able to contribute to the university by running the association and make sure that students were engaged with the study, the association and the university as a whole. And of course, it was a year where I developed myself and learned more about organizations and management.

As I stated, this is the final document I present to the university. You are on the verge of reading my master thesis. I have had the pleasure to be assisted by dr. Peter van Eck, with whom I coincidentally share a last name, during the execution of my thesis. I thank him for taking the time to read all preliminary work and for providing constructive feedback during the process. I especially appreciate the guidance to new possibilities and the incentive to work autonomous and look for answers on my own. I also want to express my gratitude to the thesis group, and all the (extra) sessions we had to help each other make the most of our theses. Further, I want to thank my friends, family and Anna for all their interest, support and feedback during the writing of this thesis. Lastly, I also want to thank Martine van der Heide for taking the time and effort to act as my second supervisor.

(6)

Table of Contents

1. Introduction ... 1

2. Theory ... 3

2.1. External influences ... 3

2.1.1. News as external influence ... 3

2.1.2. Risk aversion ... 3

2.1.3. Plane crashes as negative news ... 4

2.2. Customer experience ... 5

2.3. Customer journey ... 5

2.3.1. Defining the customer journey ... 6

2.3.2. The stages of the customer journey ... 7

2.4. Touchpoints ... 7 2.5. Channels ... 9 2.6. Conceptual model ... 9 3. Research design ... 10 3.1. Data collection ... 10 3.1.1. Internal data ... 10 3.1.2. External data ... 10

3.2 Choice of technique: Vector Autoregression analysis ... 11

3.2.1. Non-stationarity ... 12

3.2.2. Cointegration ... 12

3.2.3. Lag length selection ... 12

3.2.4. Error term assumptions ... 13

3.2.5. Vector autoregressive (VAR) model ... 13

3.2.5. Impulse response function ... 14

3.3. Controlling technique: Bayesian causal impact analysis ... 14

3.4. Variables ... 15

3.4.1. Touchpoint attribution ... 16

3.4.2. Transforming variables ... 17

3.5. Plan of analysis ... 18

4. Analysis ... 19

4.1. Descriptive statistics & preliminary checks ... 19

4.1.1. Google Trend data ... 19

4.1.2. Customer journey data ... 20

4.1.3. Final dataset ... 22

4.2. Vector Autoregression Analysis ... 22

4.2.1. Stationarity & Cointegration ... 22

4.2.2. Optimal lag length selection & Autocorrelation ... 23

4.2.3. Results ... 24

(7)

4.4. Overview of hypotheses outcomes ... 31

5. Conclusion ... 32

5.1. Impact of shocks on attribution ... 32

5.2. Enduring effect of shocks ... 33

5.3. Impact of negative news shocks on the customer journey ... 34

5.4. Managerial implications ... 34

6. Discussion & Limitations ... 36

6.1. Further research ... 37

Bibliography ... 39

Appendix ... 43

1. Overview of touchpoints ... 43

2. Calculation of day-level proxy data ... 43

3. Relative Search Interest boxplot ... 44

4. Attribution outcomes per touchpoint ... 44

5. ACF Correlograms ... 45

6. IRF plots for VAR models with heuristic attribution ... 46

7. IRF plots on RSI ... 49

8. BCI plots ... 49

(8)

1. Introduction

Customer well-being is very important to firms. Firms try to create a strong customer experience, for example through delivering great service and products via the customer purchase journey (Lemon & Verhoef, 2016). To be able to create a strong customer experience through the customer journey, the journey needs to be understood and analyzed from the customers’ perspective. Analysis of the customer journey by firms focuses on how customers interact with multiple touch points, “moving from consideration, search, and purchase to post-purchase, consumption, and future engagement or repurchase” (Lemon & Verhoef, 2016). Understanding this process is of great importance to firms, altough the customer journey is relatively new in academic literature (Halvorsrud, Kvale, & Følstad, 2016). The Marketing Science Institute (MSI) (2018) denoted the customer journey as one of the research interests for 2018-2020, to shift attention to the subject and increase academic research regarding the customer journey.

So far, it is known that the customer journey is part of the customer experience and exists of touchpoints that can be owned by the company, for example search engine advertisement, or can be earned by the company, for example electronic word of mouth (eWOM) (Lemon & Verhoef, 2016). We also know that major internal events have impact on the customer experience, as proven by recent research focusing on service crises (Gijsenberg, Van Heerde, & Verhoef, 2015). The authors show that service crises have both short- and long-term effects. The authors also prove that these effects can come from external crises, for example a sector-wide financial crisis. These internal and external effects can both have a significant effect on the customer experience, the customer purchase journey, and the influence of touchpoints within the customer purchase journey. Lemon & Verhoef (2016) also conclude that external factors have an influence on the customer journey, but do not extend on this statement. One of the subjects the MSI (2018) suggests for further research is macro trends. This is sensible as the impact of only few externalities on the customer journey have been investigated. One of the few external factors that has been proven to influence customer experience, and therefore the customer journey and its touchpoints, is weather (Lemon & Verhoef, 2016, p. 78).

(9)

effects on the actual customer journey and the touchpoints within. This research will investigate this gap of knowledge and add to academic literature by doing so.

Negative news is an interesting phenomenon to investigate in the context of the customer journey, as this type of news behaves like crises. Characteristics of (industrial) crises (Shrivasta, Mitroff, Miller, & Miclani, 1988) can be used to describe this type of news events. Negative news events are for example triggered by specific events and can be traced back to these events, including an identifiable place, time, and actors (Shrivasta et al., 1988). In this research, negative news concerns the topic of plane crashes. Plane crashes have an identifiable place, time, and actors such as an airline brand and pilots. Furthermore, there is often a causing event that has resulted in a crash, which ends in negative news. Also, these events have large-scale damage to human-life and environment, large economic costs, and large social costs, which are described by Shrivasta et al. (1988) as typical characteristics for crises. Negative news regarding a plane crash is triggered by the event of a crash, which is on its own triggered by multiple events that are the cause of the plane crash. These events are not evident beforehand, making such events unexpected. Firms do not know in advance that a negative (crisis) event will occur, nor whether the event will end up as negative news. Acting on negative news is therefore often a form of crisis management. Marketing managers would greatly benefit if they know how negative news impacts their customer journey, making this research also relevant for practice.

In the context of the travel industry, it is relevant to investigate the effect of plane crashes (as negative news) on the customer purchase journey and the different touch points within. The travel industry one of the safest modes of transportation (Elvik & Bjørnskau, 2005), making plane crashes a rare and newsworthy event. As the causes of negative news in this industry are so significant and observable, such events are suitable to investigate in their impact on the customer journey. In this research, the following research question will be examined and answered:

What is the impact of negative news (plane crashes) on the customer purchase journey in a travel industry context?

Two sub questions are also formulated:

What is the impact of negative news on the attribution of customer journey touchpoints?

How enduring is the effect of negative news on the customer journey?

(10)

2. Theory

2.1. External influences

External influences on the customer journey are central in this research. In a study by Tax, McCutcheon & Wilkinson (2013, p. 461), the authors recognize that “firms are not isolated: their behavior and performance depend not only on their own efforts, skills, and resources but also on those of other whom are directly and indirectly connected”. Recent research by Lemon & Verhoef (2016, pp. 78–79) proves that past experiences can influence current experiences. The authors also mention broader themes, for example that “external environments can act as influential drivers of the customer experience” (p. 78). A well-known and proven example is the impact of weather conditions on customer experience. Other studies move to a macro level and prove that service crises have both short- and long-term effects on the customer experience (Gijsenberg et al., 2015). Service crises are firm internal, but also sector-wide crises (e.g., a financial crisis) can impact the customer experience according to Gijsenberg et al. (2015).

2.1.1. News as external influence

Little research is done on the impact of (negative) news in the field on economics. Research that exists focusses for example on the effect of news on stock volatility (Conrad et al., 2002; Eagle & Victor, 2013). Their models prove a stronger effect size for negative news than for positive news (Eagle & Victor, 2013, p. 1773), which corresponds with the research of Gijsenberg et al. (2015) where losses loom larger and longer than gains. Eventually, all models that are tested by Eagle & Victor (2013) find that “negative shocks introduce more volatility than positive shocks” (p. 1776).

2.1.2. Risk aversion

The evidence from Gijsenberg et al. (2015) that losses loom larger and longer than gains and the proven larger effect of negative news on stock volatility (Conrad et al., 2002; Eagle & Victor, 2013) can be explained through the concept of risk aversion combined with mortality risk aversion. Risk aversion can be defined as “a preference for a sure outcome over a gamble that has higher or equal expectation”, whereas risk seeking is defined as rejecting a sure thing in favor of a gamble (Kahneman & Tversky, 1984, p. 341). According to Kahneman & Tversky, most people are risk averted and choose a sure thing over a gamble, even if the gamble has a higher (mathematical) expectation. Eeckhoudt & Hammit (2004) describe risk aversion in a less economical perspective, as a distaste for mortality risk (p. 13).

(11)

decomposed into risk aversion and risk acceptance”. In the highly unlikely event of a plane crash, it can be expected that media coverage will broadcast such event. As reported by Janic (2000), increased media coverage will increase the societal risks perceived by customers. As customers have a distaste for mortality risk (Eeckhoudt & Hammitt, 2004), customers will become risk averted as they prefer the certainty to stay alive above gambling with their life (Kahneman & Tversky, 1984).

A customer could completely restrain from the risk by not buying the product, but customers more often show altered behavior to reduce risks. Willingness to pay (WTP) is a customer behavior that alters when risk aversion is at play. Eeckhoudt & Hammit (2004) state that the WTP to reduce mortality risk is often greater among individuals who are more risk-averse by nature. Research by Pandey & Nathwani (2004) states that it is not only risk aversion, but striking a balance between risk and the cost of risk reduction. The authors define the societal willingness to pay (SWTP), which estimates the cost level that can be justified for safety programs. Graham & Bansal (2007) investigate drivers of WTP for airline companies, including crash history in their research. The authors find that crash status has the second largest impact on an airline’s reputation. Furthermore, the perceived reputation is highly related to customers’ WTP according to Graham & Bansal (2007). Thus, several studies show that WTP increases as customers are willing to reduce the risk of an accident. As risk can only be reduced to a certain amount and price, expectation is that customers in the travel industry will show risk aversion behavior during and (shortly) after plane crashes.

2.1.3. Plane crashes as negative news

Some research has been executed regarding plane crashes. Often, academic research is mostly focused on the direct effect of a plane crash, like post-traumatic stress disorder (PTSD) on community residents (Chung, Chung, & Easthope, 2000). Chung et al. have also investigated behavioral changes among Coventry residents. In 1994, a Boeing 737 crashed in an area of woodland close to Coventry housing estates. All on board died, all Coventry residents escaped death. Chung et al. find that the Coventry residents had experienced a high level of impact of the crash and that the residents showed significant higher avoidance behavior (Chung et al., 2000): “over 30% often tried to consciously remove the disaster from memory and not think about it. Over 20% often found themselves consciously trying to stay away from reminders, avoided letting themselves get upset and tried not to talk about it. Twenty-seven percent experienced feelings of numbness. Less than 20% often felt that they had lots of feelings that they had not dealt with or felt that the disaster was not real” (2000, pp. 696–697). Their research proves that human behavior alters after being exposed to a near plane crash. After the Germanwings crash on March 24th 2015, recent research investigates the impact of this

(12)

“some stigmatizing attitudes towards persons with a depression have increased” (p. 263). The Germanwings crash had such significant impact on the German population that it changed the overall opinion of citizens.

Both researches prove that plane crashes have impact on humans. The amount of impact does depend on whether someone lives near the crash site or is exposed to the crash via media coverage. The latter will not result in PTSD but does affect population’s behavior and thoughts (Von Dem Knesebeck et al., 2015).

2.2. Customer experience

The impact of an occurring plane crash will affect customers, as can be concluded from discussed literature. Such events have a broad impact, as it can change a customer’s view on certain topics (Von Dem Knesebeck et al., 2015). In other formulation, the customer experience alters due to such events. Lemon & Verhoef (2016, p. 74) define the customer experience as multidimensional construct, with the focus on the customer’s “cognitive, emotional, behavioral, sensorial and social responses to a firm’s offering during the customer’s entire purchase journey”. Over the entire customer journey, the construct elements are most likely altered when a plane crash occurs.

According to Schmitt (1999), there are five elements of experiences. These are sensory (sense), affective (feel), cognitive (think), physical (act), and social-identity (relate) experiences. As the customer journey is a component of the customer experience, these elements considered by customers will result in customers’ behavior within the customer journey. Within the journey, the customers’ acting is measured by recording clicks and touchpoint visits. The combination of experience elements leads to behavior of the customer, for example purchasing or not. In this thesis is reasoned that the customer experience and customer journey are simultaneously affected by externalities, as the customer journey is part of the customer experience. With this perspective can be concluded that the impact of negative news events is measurable through the customer journey. The customer journey will be discussed more deeply as the journey will be used to measure impact.

2.3. Customer journey

The customer journey is relatively new in academic literature compared to the customer experience (Halvorsrud et al., 2016). As the phenomenon is relatively new, Halvorsrud et al. (2016) also note that there is no agreement in terms and definitions. Multiple versions and multiple perspectives exist in academic literature.

(13)

841). Halvorsrud et al (2016, p. 842) define service blueprinting as a flowchart method that clarifies steps taken in a service delivery process in a visual manner. In Lemon & Verhoef (2016, p. 79), they speak of the use of customer journey knowledge to develop an optimal service design. According to Lemon & Verhoef, “service blueprinting can provide a solid starting point for customer journey mapping” (p. 79).

2.3.1. Defining the customer journey

Contradictory to service blueprinting, the customer journey “represents what actually happens from the customer’s point of view” (Zomerdijk & Voss, 2010, p. 74). Multiple definitions exist for the customer journey. Halvorsrud et al. (2016, p. 844) define customer journeys as “visual representations of events or touchpoints depicted chronologically, often accompanied by emotional indicators”. They stress that this tool has been used extensively in recent years for the design of services. Rudkowski et al. (2018, p. 5) also focus on the visual component, defining customer journeys as “a process whereby firms map the customers’ touchpoints along pre-purchase, purchase and post-purchase stages from a customer’s perspective”. In this research, the purchase stages as also mentioned in Lemon & Verhoef (2016) are included in the definition. We will discuss purchase stages in section 2.3.2.

Research of Anderl, Becker, Von Wangenheim & Schumann (2016) is used to define the customer journey for this research. The authors speak more specifically of the online customer journey and move away from the visualization perspective. They define the online customer journey as “including all touch points over all online marketing channels preceding a potential purchase decision that lead to a visit of an advertiser's website” (p. 457). In this research, the definition of the customer journey is based on Anderl et al. and formulated as: all touchpoints over all online marketing channels in chronological order preceding a potential purchase that eventual leads to a purchase at a (focal or competitor) travel agency or airline.

(14)

2.3.2. The stages of the customer journey

Lemon & Verhoef (2016, p. 76) state three purchase phases in the customer journey: pre-purchase, purchase and post-purchase. The first stage consists of all actions and experiences before the actual purchase. The customer considers awareness (what is this?), need recognition (do I need this?), search (are there others like this?), and consideration (should I buy this?) (Rudkowski et al., 2018, p. 5). Lemon & Verhoef (2016) state that all customer interactions during the purchase event are included in the second stage. The customer now only considers two questions: choice (which product do I buy?) and payment (how do I pay?) (Rudkowski et al., 2018, p. 6). In the final stage, post-purchase, the product is crucial and central. The customer considers usage and consumption of the product, but also satisfaction, referrals (will I recommend this product?) and loyalty (Rudkowski et al., 2018).

In this research, the focus is on the pre-purchase stage and purchase stage of the customer purchase journey. These two stages can be investigated through the available data. The external influence of negative news and how this alters customer behavior are the subjects of interest in this thesis. In the context of the travel industry, negative news is narrowed to only consider plane crashes as negative news events. Other negative news regarding (specific) aspects of airlines, plane safety and other industry related topics will not be considered. When plane crashes happen and get media attention, people are reminded of the possibility of such event happening. The reminder will increase perceived societal risks (Janic, 2000). As such events have proven impact on populations (Von Dem Knesebeck et al., 2015) and remind customers that such event could possibly happen to themselves, expectation is that they will prevent any behavior that could put them in this risk. Based on discussed literature with regard to risk aversion and in the context of travel agencies and plane crashes, it is expected that people will avert search and purchase behavior when a plane crash has (recently) occurred. The following hypotheses are formulated:

H1 The impact of negative news regarding a plane crash will result in customers

applying risk aversion behavior and therefore result in a decrease in (search) activity in the pre-purchase stage.

H2 The impact of negative news regarding a plane crash will result in customers

applying risk aversion behavior and therefore result in a decrease in purchase activity in the purchase stage.

2.4. Touchpoints

(15)

touchpoints as events dominate with terms as “contact point” (Stauss & Weinlich, 1997), “moment of truth” (Carlzon, 1989) and “service moment” (Koivisto, 2009). In their customer journey framework (CJF), they define a touchpoint as “instance of communication between a customer and a service provider” (Halvorsrud et al., 2016, p. 845). Homburg (2017) define touchpoints as “any verbal (e.g., advertising) or non-verbal (e.g., product usage) incident a person perceives or consciously relates to a given firm or brand”. In both cases, touchpoints are broadly defined and are any contact with any brand in a certain industry. A customer journey includes all touchpoints approached or seen by the customer.

For this research, touchpoints are categorized in two different types: the firm-initiated touchpoint (FIT) and the customer-initiated touchpoint (CIT). As possibly can be derived from the term alone, FITs are defined by Wiesel, Pauwels & Arts (2011, p. 605) as “any contact with a customer initiated by the firm”. An example of a FIT given by the authors is email campaigns. CITs are, contradictory to FITs, defined as “any contact with a firm that is initiated by a customer or prospective customer” (Wiesel et al., 2011, p. 605). The authors present Google search as an example of CIT. Touchpoints are defined following the FIT and CIT definitions of Wiesel et al. (2011) so that possible differences in sensitivity to external factors can be investigated, which is negative in this research.

Based on discussed literature, it is expected that customers will stop searching when a plane crash occurs and gradually return to searching for trips and tickets when the time-distance between the customer and the incident increases. This reasoning is based on Gijsenberg et al. (2015, pp. 650–652) where the authors use asymmetric structural vector autoregression to test the effect of losses and gains on perceived service quality. In their research, the effect of negative (positive) service encounters have a large negative (positive) effect when the event just happened. This effect reduces over time, to eventually becoming almost zero (Gijsenberg et al., 2015). A hypothesis is therefore added:

H3 The effect of negative news is moderated by time so that the impact will be large

(16)

We define the hypotheses as follows:

H4 Both CITs as FITs in the pre-purchase stage will decrease in attribution

effectiveness in case of negative news impact.

H5 CITs in the pre-purchase stage will decrease more severely in attribution

effectiveness in case of negative news impact than FITs in the pre-purchase state.

2.5. Channels

Following Halvorsrud et al. (2016, pp. 845–846) terminology, the last aspect of the customer journey to discuss are channels. Halvorsrud et al. state that firms communicate with customers through channels and that they are used to convey communication between customer and firm. So to say, channels are “carriers of touchpoints, and they can be digital (e.g. e-mail), human-served (e.g. a desk in a shop), or a combination of the two” (Halvorsrud et al., 2016, p. 845). With touchpoints as events within the customer journey and channels as carriers of touchpoints, the communication model of Shannon & Weaver (1963) can be applied where the firm as sender transmits a message, which is a touchpoint, to the customer through a certain channel. With that, channels go from call centers to online chats.

This research will focus on different touchpoints and not per se on channels, as channels are seen as an aggregated version of multiple touchpoints.

2.6. Conceptual model

With the literature presented and hypotheses formulated, a conceptual model is drawn.

FITs

Purchase CITs

Pre-purchase stage Purchase stage

(17)

3. Research design

3.1. Data collection

Two types of data sources are used. A distinction is made between internal and external data collection. The internal dataset is provided by GfK, a German market research institute, and consists of customer journey data and demographics. The external dataset is collected in addition to the internal dataset and consists of Google Trend data. The combination of both is needed to answer the research questions and hypotheses.

3.1.1. Internal data

GfK has collected and provided the customer journey dataset with demographics. This dataset is collected through GfK Crossmedia Link, an intelligent system that analyses the internet usage of panelists. By installing a browser extension or mobile app, customers enter the GfK panel. The plug-in or app registers what customers see and do on the screen, for example which search queries are made, which ads are viewed and if purchases are made. By collecting data via such plug-in or app, GfK obtains online event-based data which contain the true online behavior of a customer.

In this specific dataset, GfK collected data in the Dutch travel agency industry. Therefore, the population consists of Dutch citizens. The timeframe of the data is a year and five months, from May 31st 2015 till October 31st 2016. For each customer, time-series data is collected.

Therefore, the data classifies as longitudinal panel data (Leeflang, Wieringa, Bijmolt, & Pauwels, 2015). The event-based nature of the data means that each observation corresponds with an event, which is a touchpoint. The touchpoints can be either firm-initiated or customer-initiated (Wiesel et al., 2011).

In the dataset, 2,456,414 events spread over 29,012 journeys and 9,678 customers are found. There are 3,674 purchases made in the period, 192 at the focal brand.

3.1.2. External data

(18)

Fig. 1. Search queries regarding plane crashes in The Netherlands (Google, 2019)

With search query “neergestort vliegtuig” showing the clearest spikes and highest relative search interest, this query will be used as proxy. As all queries of Google Trend are indexed on their own trend with 100 for the highest relative search interest for the time period, it is not possible to create a variable with multiple queries summed. Combining multiple queries would lead to unrealistic observations.

Google Trend data is often only available on indexed week level, showing the relative search interest for the topic in a certain week. However, by using the R-package ‘gtrendsR’ (Philippe & Eddelbuettel, 2018), it is possible to obtain indexed day level for a specific week. Via a simple calculation, which can be found in appendix 2, the true relative search interest is obtained for each specific day within the timeframe.

3.2 Choice of technique: Vector Autoregression analysis

This research investigates the causal effect of negative news on CITs, FITs and purchases in time-series data. Negative news events are earlier defined as plane crashes and are included via proxy data obtained from Google Trend. These events are often named shocks or impulses in literature (Colonescu, 2016; Gijsenberg et al., 2015; Mohr, 2018). Negative news events will be referred to as shocks from now on.

As the data consists of time-series, a time-series analysis will be executed. The goal is to analyze two time-series, customer journey series and shock series, and investigate the influence of the shock series on the customer journey series. This goes beyond an Autoregressive (AR) model, which only considers one time-series and includes only its own lagged variable (Colonescu, 2016). Regressing a variable solely on its own lag is too restrictive for this research, since the lagged customer journey variables and lagged shock variable need to be included. By including these lagged variables, the assumption that shocks negatively

0 10 20 30 40 50 60 70 80 90 100 31- 05-15 30- 06-15 31- 07-15 31- 08-15 30- 09-15 31- 10-15 30- 11-15 31- 12-15 31- 01-16 29- 02-16 31- 03-16 30- 04-16 31- 05-16 30- 06-16 31- 07-16 31- 08-16 30- 09-16

(19)

impact customer journey series can be tested. Exogenous variables can be added in Autoregressive Distributed Lag (ADL) and Vector Autoregressive (VAR) models (Mohr, 2018). In this research, multiple endogenous variables will be tested: amount of purchases (focal and competitor firms), amount of CITs and FITs, and attribution of CITs and FITs. Taking amount of purchases as exemplary endogenous variable 𝑦" and relative search interest (RSI) shocks as

exogenous variables 𝑥", the following ADL model with one lag can be specified:

𝑦" = 𝛼&𝑦"'&+ 𝛽*𝑥"+ 𝛽&𝑥"'&+ 𝜀" (1)

Where 𝛼&, 𝛽*, and 𝛽& denote the parameters coefficient with corresponding lag and 𝜀"

denotes the error term. While this function would be sufficient to investigate the assumed relationship of shocks on sales, the model will only consider this one-way relationship. With a VAR model, the relationship can be investigated in both directions. With logical reasoning, the assumption could be made that the shocks in this research (RSI peaks as a result of negative news) are not dependent on lagged values of purchases and other customer journey variables. To not assume this relationship beforehand, a VAR model will be used and both directions can be tested. However, the focus will remain on the impact of shocks on the customer journey variables. Before discussing the VAR models that will be used in this research, the assumptions that need to be met for a VAR model need to be discussed. We will discuss the requirements regarding non-stationarity, cointegration, lag length selection and autocorrelation.

3.2.1. Non-stationarity

Following the definition of Colonescu (2016), a time-series is non-stationary if the distribution of the mean, variance or timewise covariance change over time. Non-stationary data cannot be used in regression models as they could result in spurious regression. Colonescu (2016) describes spurious regression as a false relationship where there is an apparent relationship, while the variables are not related in reality. Non-stationary time-series can be part of a VAR model if they are cointegrated. In that case, the variables are in a stationary relationship. To investigate non-stationarity, the Augmented Dickey-Fuller test will be run. If tests point out that the data is non-stationary, the data can be transformed and stationarized by taking the first difference of the data. If that does not solve the issue of non-stationarity, cointegration can be tested for.

3.2.2. Cointegration

(20)

be obtained from the correlogram (Colonescu, 2016). Mohr (2018) states that the optimal lag length can also be found via model comparison. The authors suggest to compare on information theory, using AIC, HQ, SC and FPE as measurements tools. The AIC is preferred by Mohr and is already included in the ‘vars’ package in R (Pfaff & Stigler, 2018). Both tests will be used in this research. As these tests are necessary to obtain the optimal number of lags, the amount of lags to include in the VAR model will be decided while performing the analysis. Therefore, the VAR formula will be written with having lag 𝑡 − 𝑗, as the number of optimal lags is still unknown.

3.2.4. Error term assumptions

In ordinary least squares models, there are assumptions that should be met for the residuals. For VAR models, academic literature and discussions are ambiguous on which assumptions should be tested and how. Research using VAR (Gijsenberg et al., 2015; Pauwels, Silva-Risso, Srinivasan, & Hanssens, 2004; Pauwels & Weiss, 2008) do test for non-stationarity, cointegration and optimal lag length selection but seem to leave residuals untested. The only available tool within the VAR package tests for autocorrelation, for example the Portmanteau and Breusch-Godfrey LM as described by Luetkepohl (2011, p. 12). The Edgerton-Shukur F test will be used in this research, as this test has been proven to have the best performance (Teräsvirta & Yang, 2014).

Autocorrelation will be the only assumption tested and described for the models. Other error terms assumptions will not be tested, following procedures in available research and the lack of suitable tools to test these assumptions.

3.2.5. Vector autoregressive (VAR) model

Now, we include all variables that will be tested. The six variables that will be tested on how they respond to shocks are amount of purchases (focal and competitor), number of CITs and FITs, and attribution of CITs and FITs. This results in the following formula:

𝑌" = Α + 2 𝛽34 5 46&,..,5 𝑌"'4 + 2 𝛿4 5 46&,…,5 𝑅𝑆𝐼"'4+ 𝜀" 𝑡 = 1, 2, … , 𝑇, (2)

where 𝑌" is an (6 x 1) vector of endogenous variables consisting of amount of purchases for

the focal brand, amount of purchases for competitor brands, amount of CITs, amount of FITs, attribution of CITs and attribution of FITs. 𝐴 represents a (6 x 1) vector of intercepts. 𝑌"'4 is

the lagged version of 𝑌". 𝑅𝑆𝐼"'4 is the exogenous variable: the relative search interest shock

for lag 𝑡 − 𝑗. 𝛽 and 𝛿 are parameters to be estimated, where there is an 𝛽 estimated for each variable 𝑖 within 𝑌 and for each lag of 𝑌, and a 𝛿 estimated for each lag of 𝑅𝑆𝐼. 𝑝 represents the optimal lag length which will be selected as discussed in 3.2.3. 𝜀" is the residual

(21)

𝑌" = ⎣ ⎢ ⎢ ⎢ ⎢ ⎡𝑃𝑂𝑃𝐶" " 𝑆𝐹" 𝑆𝐶" 𝐴𝐹" 𝐴𝐶"⎦⎥ ⎥ ⎥ ⎥ ⎤ (3)

Where for time 𝑡,

𝑃𝑂"= amount of purchases (own, focal brand) 𝑃𝐶" = amount of purchases (competitor brands) 𝑆𝐹"= amount of FITs

𝑆𝐶" = amount of CITs 𝐴𝐹"= attribution of FITs 𝐴𝐶" = attribution of CITs

The information regarding contemporaneous variables (non-lagged, like 𝑅𝑆𝐼") are included in

the model via the variance-covariance matrix 𝜀" (Mohr, 2018).

3.2.5. Impulse response function

Colonescu (2016) states that the outcomes of VAR models are more insightful by analyzing the visual time response to shocks. This is accomplished by plotting impulse responses. The graphs show the response to a shock over time and the effect of the shock on the variable. Impulse response graphs are the final step of a VAR analysis and will present the outcome of the VAR model in a straightforward manner.

3.3. Controlling technique: Bayesian causal impact analysis

(22)

of the BCI model is effortless and uncomplicated, as Brodersen & Hauser provide extensive instructions (2015). Based on descriptives of the external data, the pre- and post-period can be given to the model. With the pre- and post-period specified, the BCI model will predict the unobserved value if the shock did not happen: the synthetic counterfactual. This is modeled on the data that is specified to the model. The BCI model will compare the synthetic counterfactual to the actual observed values and give a posterior interference including the absolute and relative effect of the shock. Afterwards, the predictor variables can be investigated on their predictive value to the model. By doing so, the model can be adjusted and variables that do not add informational value can be removed if necessary. This ensures that the model is kept simple and is still complete, following the model criteria of Little (Leeflang et al., 2015, pp. 26–32).

To be able to compare the outcomes of the BCI and VAR analysis, the same six variables will be used for both. The exogeneous shock variable 𝑅𝑆𝐼 is not included as the BCI model calculates the synthetic counterfactual of the observed variables. The formula of a BCI is very similar to the ADL model, but without lags (formula 4).

𝑦"= 𝛼 + 𝛽&𝑥&"+ 𝛽N𝑥N"+ 𝛽O𝑥O"+ 𝛽P𝑥P"+ 𝛽Q𝑥Q"+ 𝜀" (4)

Six formulas will be created as each of the six variables will be used as dependent variable. When for example amount of purchases of the competitor brands is used as 𝑦", this variable

will not be used as independent variable as this would lead to a self-fulfilling prophecy. Therefore, there are five independent variables in the formulas.

3.4. Variables

The objective of this research is to investigate whether shocks have impact on the pre-purchase (i.e., touchpoints before pre-purchase) and pre-purchase stage (i.e., conversion). This is analyzed with historical time-series data via a VAR model, rather than creating a forecast model. The VAR model needs relatively little amount of variables as it is focused on lagged variables and information contained in these lagged variables. The BCI analysis can use more variables, as it uses information from multiple variables to model the synthetic counterfactual. However, the BCI model will be given the exact same variables as the VAR model so that the outcomes of both can be compared.

Originally, the data was sorted first on UserID, then PurchaseID, and then Timestamp. The dataset will be aggregated and transformed to truly time-series with the data sorted on Date, where one observation equals all aggregated data of one specific day.

(23)

The Amount of FITs and Amount of CITs is the daily sum of the two types of touchpoints. 3.4.1. Touchpoint attribution

Touchpoint Attribution is also included as variable, but is not yet present in the dataset. These variables need to be created via an attribution model, which will be explained in this section. Touchpoint Attribution is operationalized following Kannan, Reinartz & Verhoef (2016). Based on their article, the output of attribution modeling is defined as attributing the appropriate credit to a touchpoint for its share in the conversion.

As the attribution value is modeled, this is where the variables differ from Amount of FITs and Amount of CITs. Attribution is only calculated for those touchpoints that are part of a journey that ends in conversion. How many FITs or CITs are present for a specific day tells little about the contribution of that specific day to sales. The latter is observed through the attribution values. Collinearity between Amount of CITs or FITs and Attribution of CITs or FITs is not expected, as a day can have less observed CITs or FITs but still can attribute more than average to conversions.

Multiple attribution models exist. Well-known models are first click, last click and linear click attribution, which are often collectively called heuristic attribution models. These models are heuristic as they are arbitrary. The first click model assigns all attribution value to the first touchpoint, the last click assigns it all to the last touchpoint and the linear click model assigns it in a linear fashion over all touchpoints in journeys that end in conversion. Heuristic models are most widely used in practice, but have been proven to be biased and can lead to incorrect predictions (Anderl et al., 2016). Therefore, academics have strived to find alternatives to these heuristic models. One outcome of this search for a better attribution model is the state of the art Markov attribution model (Anderl et al., 2016). Markov attribution model uses probabilistic Markov chain models that can represent dependencies between touchpoints as presented by Anderl et al. (2016). A higher-order Markov model is used for the attribution calculation, with an order of 2. By doing so, a memory of 2 steps is included in the model. With a high-order Markov model, possible spillover effects as found by Anderl et al. (2016) can be captured by the model.

(24)

3.4.2. Transforming variables

By aggregating the dataset to day level, the correct form of dataset for VAR and BCI analysis is obtained. With day aggregation, UserID and PurchaseID will no longer be necessary in the dataset. As the data is aggregated on day level, Date will be present in the dataset. POS Own and POS Competitor will be transformed to the sum of purchases on a specific day. Both are necessary to calculate the impact of shocks on focal and competitor sales.

The 20 touchpoints each obtain their own dummy variable. An overview of touchpoints and whether they are categorized under FIT or CIT can be found in appendix 1. As described under 3.4.1., four attribution models are used to calculate different attribution values for touchpoints that are part of a journey with conversion. The attribution outcomes do not have a pre-defined scale. The higher the value of attribution, the more important that touchpoint is for conversion. The variable will be transformed to day aggregation and the attribution will be summed to Attribution of CITs and Attribution of FITs. Attribution of CITs and Attribution of FITs are added for each attribution model so that the impact on attribution can be compared between different attribution methods. Eight attribution variables are eventually obtained. The short overview of variables and their description is presented in table 1.

Table 1. Overview of variables

Variable Description

Date Transformed from day-time to day level

POS Own Sum of conversions on day level

POS Competitor Sum of conversions on day level Touchpoints

T1, T2, …, T22 views Sum of views of each touchpoint on day level, used for calculation Amount of CITs and FITs

Amount of CITs Sum of clicks of T1-T16 on day level Amount of FITs Sum of clicks of T18-T22 on day level

(25)

3.5. Plan of analysis

To analyze and estimate the VAR and BCI models, RStudio will be used as statistical program. Before the analysis can begin, the data needs to be cleaned and prepared. This includes earlier mentioned transformations to day level aggregation. Aggregation will always lead to some loss of information but is necessary to apply time-series analyses. Data cleaning will focus on consistency checks (checking for inconsistent and/or extreme values) and missing values. If outliers are found, the handling will be discussed and the outliers will be treated. If the data contains missing value, their impact will be assessed and discussed as well. A possible solution for eventual missing values is imputation, which will be discussed if applied.

When the data is cleaned and prepared, some descriptive analyses will be executed to get grip on the data. After descriptive analysis, the assumptions needed for VAR analysis will be assessed. Non-stationarity will be examined first, followed by cointegration and lag length selection. Then, the serial correlation will be assessed. Appropriate steps will be taken if serial correlation is present. When serial correlation is assessed and (if necessary) treated, the VAR models will be executed and estimated. The impulse response function will be used to plot the outcomes of the VAR models.

(26)

4. Analysis

4.1. Descriptive statistics & preliminary checks

4.1.1. Google Trend data

A time-series dataset is obtained with data of 520 days, from May 31st 2015 till October 31st

2016. There are four periods where the indexed search interests exceeds 25 at least once, as can be seen in the plot of the external data (figure 2). These four periods will be discussed. An RSI threshold of 25 is selected based on the boxplot of RSI, which can be found in appendix 3.

Fig. 2. Plot of relative search interest over time

The first clear spike with a RSI of 73 is on October 31st 2015. Multiple peaks follow, with the

highest observed RSI of 100 on November 1st 2015. Literature study results in a Reuters’ report

stating that a Russian airliner with 224 passengers aboard crashed on October 31st into Egypt’s

Sinai peninsula (Mohammed & Hassan, 2015). The eight o’clock Dutch news opened their broadcast with this news (NOS, 2015). This negative news results a great increase in Dutch search terms regarding plane crashes on that date and the four days afterwards (fig. 2). A smaller RSI peak of 27 is found on March 19th 2016. Literature study finds multiple Dutch

and international news articles reporting on a Flydubai flight crashing in Russia (NOS, 2016a). The NOS included this event in their wake-up service, as the crash happened overnight (NOS, 2016b). The impact of this crash on Dutch citizens seems to be smaller than the first observed crash in the dataset.

The third peak is on May 19th 2016 with a RSI of 85. The search interest on the day after obtains

also a high RSI of 68.85. Literature study finds again multiple news articles reporting the crash of EgyptAir flying from Paris to Cairo, for example by the NOS (2016c). This crash also happened overnight, and was again included in wake-up services (NOS, 2016d).

The last clear spike is visible on June 9th 2016 with a RSI of 76. This is quite an odd peak, as it

(27)

airbase of Leeuwarden (NOS, 2016e). The pilot escaped via the ejector seat, resulting in no casualties. However, the crash is remarkable and on Dutch soil, resulting in a clear RSI peak. All notable peaks with an RSI of more than 25 have an apparent underlying plane crash that (most likely) are causes of the peaks. Therefore, Google Trend data can serve as a useful proxy to represent the impact of plane crashes on Dutch citizens.

4.1.2. Customer journey data

The customer journey data is originally panel data. To investigate the impact of external shocks on certain aspects of the customer journey, the data need to be transformed to time-series data. However, the quality of the data needs to be examined before transformation. Outliers and missing values are discussed, as well as the calculation of new variables before transformation.

4.1.2.1. Outliers and missing values

A thorough examination of the customer journey data did find a few oddities. It was discovered that there are many journeys with multiple UserIDs. Clarification was asked and GfK explained that multiple persons can take part in one journey, for example when being part of the same household. No further action is taken with regard to this discovery, as the behavior is normal.

Another discovered oddity is the presence of a journey with 64,503 touchpoints. Touchpoints can follow in quick succession. It shows that touchpoints median duration is only 21 seconds. However, more than 60,000 touchpoints seems to be an outlier. When plotting a boxplot, this is clearly visible (fig. 3). After removing this outlier, the boxplot shows a more normal distribution. After removal, the average amount of touchpoints within a journey is 413.1 with a median of 153. There are still observations outside the whiskers, which are journeys with more than 448 touchpoints and a maximum of 8,891 touchpoints. These numbers are still plausible given that touchpoints can follow in quick succession with only short visiting time for each touchpoint. Therefore, no further journeys are deleted from the dataset.

(28)

There were no missing values present for the variables in the dataset.

4.1.2.2. Attribution calculation

The attribution of CITs and FITs is calculated for all journeys that end in conversion. The calculation methods of Markov, first touch, last touch and linear touch are applied so that the analysis can also compare different attribution methods. The outcomes of the four methods for each touchpoint are presented in appendix 4. A summary of the grouped output can be found in table 2. Not all journeys end in conversions, as only 3,674 purchases are made in the period. Therefore many touchpoints obtain a value of zero, as touchpoints that are part of a journey that do not end in conversions do not attribute. Furthermore, attribution has no pre-defined scale. A higher value of attribution implies that that touchpoint has a higher attribution to conversion compared to a lower value of the same attribution model, for example.

Table 2. Summary of attribution of CITs and FITs Numbers rounded to two decimal place.

Markov First touch Last touch Linear touch

CIT FIT CIT FIT CIT FIT CIT FIT

Min. 0 0 0 0 0 0 0 0 1st Quarter 0 0 0 0 0 0 0 0 Median 0 0 0 0 0 0 0 0 Mean 212.50 0.78 345.10 0.25 423 0.16 399.90 0.23 3rd Quarter 487.90 0 298 0 208 0 246.50 0 Max. 746.80 105.15 1417 33 2137 22 1885.10 32.31 4.1.2.3. Data transformation

With the data quality checked and improved where needed, and the calculation of all necessary variables, the transformation can be executed.

The data is grouped on the date variable, resulting in a transformation of more than 2,4 million observations to 520 observations as the dataset timespan is 520 days long. The transformation makes aggregation necessary. Therefore, the variables POS Own, POS Competitor, T1-T22, Amount of CITs, Amount of FITs, Markov Attribution CITs, Markov Attribution FITs, First Touch CITs, First Touch FITs, Last Touch CITs, Last Touch FITs, Linear Touch CITs, Linear Touch FITs are summed on day level, giving the total sum of each variable on a specific day.

(29)

4.1.3. Final dataset

With the data transformed, the final dataset on day level aggregation is created. An overview of the variables and their descriptive statistics can be found in table 3. The RSI shock mean is low with 2.81. It is seldom that there are RSI spikes, as can be seen in figure 2. Sales are split between the competitor brands and the focal brand. On average, there are almost seven sales per day for the competitor brands. The mean amount of sales for the focal brand is much smaller with 0.38, as presented in table 3.

Furthermore, there are more CITs on average per day than FITs, showing that customer initiate more on their own behavior than that they are triggered by firms via advertisements. In total, there are 2,348,842 CITs in the dataset, compared to 43,069 FITs.

The attribution values are again split between CITs and FITs. As there are many more CITs than FITs, the averages are larger for the attributions of CITs. The attribution values are scaleless, where larger values mean that there is more attribution to conversion on a specific day. If there is a daily CIT Markov attribution of more than the average of 977,638.7, it can be stated that the attribution for that day is above average and attributes more to sales than days with lower values. The same reasoning holds for the other attribution methods and variables.

Table 3. Descriptive statistics of final dataset Numbers rounded to two decimal place.

Variable Mean S.D. RSI shock 2.81 9.28 POS competitor 6.94 3.40 POS own 0.38 0.68 Amount of CITs 4517 1337.35 Amount of FITs 82.83 80.22

Markov Attribution of CITs 977,638.7 422,834.1

Markov Attribution of FITs 3,575.30 4,583.31

First Touch Attribution of CITs 1,587,623 678,620.5

First Touch Attribution of FITs 1,128.07 1,439.10

Last Touch Attribution of CITs 1,945,856 819,082.0

Last Touch Attribution of FITs 726.01 956.18

Linear Touch Attribution of CITs 1,839,552 777,511.5

Linear Touch Attribution of FITs 1,074.48 1,404.83

4.2. Vector Autoregression Analysis

(30)

test is significant (p < .05), H0 stating the data is non-stationary is rejected and H1 stating that

the time-series is stationary will be accepted. When running the DF test, multiple variables appeared to be non-stationary. In order to resolve non-stationary, the first difference of the variables has been tested as well with the DF test. This solved all non-stationary variables, making the whole dataset stationary as needed for VAR. The results of this process can be found in table 4.

Table 4. Dickey-Fuller test for stationarity

Numbers rounded to three decimal place. Lag order is 8 for all variables.

Variable Original variable First difference

Dickey-Fuller

statistic p-value Dickey-Fuller statistic p-value

RSI shock -7.120 <.01***

POS competitor -5.135 <.01***

POS own -6.485 <.01***

Amount of CITs -3.101 .112 -9.322 <.01***

Amount of FITs -4.459 <.01**

Markov Attribution of CITs -3.022 .146 -9.544 <.01*** Markov Attribution of FITs -3.813 .018**

First Touch Attribution of CITs -3.154 .096* -9.652 <.01*** First Touch Attribution of FITs -3.811 .018**

Last Touch Attribution of CITs -3.407 .052* -9.929 <.01*** Last Touch Attribution of FITs -3.818 .018**

Linear Touch Attribution of CITs -3.301 .071* -9.812 <.01*** Linear Touch Attribution of FITs -3.821 .017**

* p<.1, ** p<.05, *** p<.01 & values are smaller than printed

Cointegration can be tested for when non-stationarity is present and cannot be solved. However, as described above, the non-stationary variables are stationary when the first difference is taken from the variable. Therefore, testing for cointegration is not necessary as the problem of non-stationarity is no longer present in the data.

4.2.2. Optimal lag length selection & Autocorrelation

(31)

With a lag of 6 selected, the VAR model (𝑝 = 6) is run and tested for autocorrelation. As autocorrelation is found for the initial model (𝑝 = 6) with a p-value of .001 for the Edgerton-Shukur F test (F = 1.308), models with either more or less lags included are tested to find a model without autocorrelation. The outcomes of this process can be found in table 5. Eventually, VAR model with 𝑝 = 8 is not showing prove of autocorrelation as p > .05. The null hypothesis stating that there is autocorrelation present in the model can therefore be rejected and it can be assumed that there is no autocorrelation when the model includes 8 lags. Noteworthy are the AIC outcomes. Earlier tests of optimal lags with AIC gave the best score for VAR (𝑝 = 6). Running the AIC for each model individually now shows the best AIC score for 𝑝 = 8. With an improved AIC and no autocorrelation present, VAR (𝑝 = 8) is selected for further analysis.

Table 5. Edgerton-Shukur F test and AIC scores

Model (lag) Edgerton-Shukur F test p-value AIC

VAR (𝑝 = 1) 3.5898 .000*** 45,241.11 VAR (𝑝 = 2) 2.5244 .000*** 44,984.61 VAR (𝑝 = 3) 2.3351 .000*** 44,836.57 VAR (𝑝 = 4) 2.1739 .000*** 44,751.5 VAR (𝑝 = 5) 1.8081 .000*** 44,631.21 VAR (𝑝 = 6) 1.308 .001** 44,441.21 VAR (𝑝 = 7) 1.1609 .049** 44,364.58 VAR (𝑝 = 8) 1.1541 .057* 44,296.71 * p<.1, ** p<.05, *** p<.01

VAR (𝑝 = 8) includes the Markov attribution variables, as this attribution method is state of the art compared to the heuristic methods of first, last and linear touch. Markov attribution variables are therefore included for finding the optimal lag and testing autocorrelation. The heuristic methods will be used for comparison of outcomes.

4.2.3. Results

The results of a VAR model are hard to interpret since each included variable obtains its own equation and therefore its own estimation results. The general estimation outcomes will be discussed for each equation, followed by the outcomes of the Impulse Response Function (IRF). IRF outcomes are much easier to interpret, as this function visualizes the impact of a shock on a specific variable. IRF plots show the effect on the response variable if the shock variable increases by one standard deviation. A period of 50 days after a shock is visualized.

4.2.3.1. Equation estimation outcomes

(32)

focal brand sales is even lower (26.4%), leaving almost three-fourth of the variance unexplained. This might be caused by missing explanatory variables, such as price information. Variance of the Markov Attribution of CITs is explained for 40.2% by the variables in the dataset, whereas Amount of CITs is explained for 48.9% by the present variables. Sales of competitor brands, Amount of FITs and Markov Attribution of FITs are explained considerably well with respectively 83.4%, 84.6% and 76.3% of the variance of these variables explained by the other variables in the dataset. As all equations are statistically significant, IRF plots can be produced to investigate the impact of external shocks on individual variables.

Table 6. Estimation results per equation for VAR (𝑝 = 8)

Equation y-variable R2 Adj. R2 F-statistic p-value

RSI .374 .297 4.856 .000***

POS competitor .852 .834 47 .000***

POS focal brand .344 .264 4.271 .000***

Amount of CITs 1 .546 .489 9.776 .000***

Amount of FITs .863 .846 51.08 .000***

Markov Attribution of CITs 1 .467 .402 7.147 .000***

Markov Attribution of FITs .789 .763 30.47 .000***

* p<.1, ** p<.05, *** p<.01 | 1 First difference variable 4.2.3.2. Impulse response function outcomes

IRFs are run to investigate the impact of a RSI shock on each of the other six variables. The results are compared with the five hypotheses formulated for this research.

Hypothesis 2 states the expectation that shocks will have a negative impact on sales, which would be a very insightful outcome from a business point-of-view. The IRF plots of the competitor sales (fig. 4) show indeed a short negative impact of RSI on sales. However, around day four after the shock, the sales recover and even improve with positive peaks. In the cumulative IRF plot, the positive long-term effect is visible. The sales of the focal brand (fig. 4) behave partially different. The shock results in a restless effect on sales. The shock causes a short and small positive effect, followed by a short and small negative effect that is again quickly recovered, followed by a somewhat larger negative effect around a week after the shock. Around two weeks after the shock, the effect is recovered and positive. This creates a positive cumulative effect on the long-term. Restlessness will be subject of discussion in the conclusion, as the IRF plots show a volatile impact. In the very specific subject of focal brand sales, the restlessness can be increased by the fairly low R2 of the equation and little presence

(33)

We also see that the impact is more negatively and more severe in the first ten days after the shock. After those ten days, the impact stabilizes but remains positive, which is deviating from the expectation of hypothesis 3.

Fig. 4. IRF plots for sales variables

(34)

Fig. 5. IRF plots for Amount of CITs and FITs

The attribution from CITs and FITs are subject of discussion in hypothesis 4 and 5. Hypothesis 4 presents the expectation that the attribution for both types of touchpoints will decrease as a result of a shock. Hypothesis 5 states that CITs in the pre-purchase stage will decrease more severely in attribution effectiveness as a result of a shock than FITs in the pre-purchase state. Again, the expectation is that the effects are large shortly after the shock and decreases to almost zero on the long-term, as described in hypothesis 3.

The impact of a shock on Attribution of CITs (fig. 6) is intense with first a clear negative peak, followed by a clear positive peak which after a few days goes down again. After approximately two weeks, the effect becomes calm and eventually becomes minimal, supporting hypothesis 3. In the cumulative plot, a minimal and stable negative long-term effect can be seen. In other words, customers cause jumpy behavior with regard to the attribution of CITs to conversions in the period directly after a shock and a negative impact on long-term CIT attribution. Both hypothesis 3 and 4 are partially supported for this variable.

(35)

this variable. Hypothesis 5 is supported by this finding, as the Attribution of CITs is more negative than the Attribution of FITs.

Fig. 6. IRF plots for Attribution of CITs and FITs

4.2.3.2. Comparison of attribution methods

In the VAR (𝑝 = 8) model, Markov attribution is used. Heuristic methods, like first, last and linear touch, are often used instead as heuristic methods are easier to implement. Different methods might lead to different outcomes, therefore the VAR (𝑝 = 8) model with Markov attribution is compared to VAR (𝑝 = 8) models with respectively first, last and linear touch. All models are free of serial correlation with a p-value larger than .05. All equations within the VAR models are again significant (p = .000). The calculated AIC shows better scores for the heuristic methods compared to the state of the art Markov method (table 7). The model with last touch attribution scores best with an AIC of 43,622.65.

Table 7. Edgerton-Shukur F test and AIC scores

Model (lag) Attribution method Edgerton-Shukur F test p-value AIC

VAR (𝑝 = 8) Markov 1.1541 .057* 44,296.71

VAR (𝑝 = 8) First touch 1.1442 .069* 43,685.9

(36)

When plotting the IRF plots for the first, last and linear touch models, it stands out that the plots are almost identical across the different models and compared to the Markov model plots. The plots visualizing Attribution of CITs and FITs will be discussed here. All plots can be found in appendix 6.

When comparing the three heuristic methods in figure 7 and 8, there is almost no visible difference between the methods. The most apparent differences can be found in the 95% confidence interval lines. When comparing the outcomes of the heuristic methods to the Markov method (fig. 6), no differences can be detected. It can be concluded that it does not matter which attribution method is chosen with regard to the outcomes of the VAR model, the outcomes will remain the same.

Fig. 7. IRF plots comparing Attribution of CITs for first, last and linear touch respectively

Fig. 8. IRF plots comparing Attribution of FITs for first, last and linear touch respectively

4.2.3.3. Impact of variables on RSI

As stated in section 3.2., the assumption could be made that the RSI shocks do not react to the (lagged) variables such as purchases. With VAR analysis, both directions of shocks can be tested instead of assuming a one-way direction (Colonescu, 2016).

(37)

these variables have minuscule and neglectable impact on the relative search interest regarding plane crashes. This outcome seems sensible, as the variance of the proxy data is explained for only 29.7% by the present variables and is highly dependent on the occurrence of crashes and the news coverage of aforementioned crashes.

Less minuscule, but still small are the impact of the sales of focal brand and competitor brands shocks. In figure 9, it is visible that a shock in sales of competitor brands leads to a small increase of relative search interest in the search term. The effect endures slightly positive over time. It can be reasoned that more sales lead to more flights, which leads to a higher chance of a crash occurring, leading to an increased relative search interest. This impact is an indirect effect at best. The investigation of such indirect effect is out of scope and irrelevant for this research, but possibly exists in the data.

For sales of the focal brand, the effect starts with an increase of RSI, but goes down to a decrease quickly afterwards, to go one final time to short increase and reduce to almost zero over time. Compared to sales of competitor brands, the effect is more volatile for the focal brand. Again, this indirect effect might exist but will not be investigated in more detail in this research as it is out of scope. This might be of interest for further research.

Fig. 9. IRF plots for RSI

4.3. Bayesian Causal Impact analysis

With the VAR results presented, it is interesting to see whether the same outcomes will be found via the BCI analysis. To apply the BCI model, both pre- and post-period need to be defined. The post-period will again be 50 days. The shock of October 31st 2015 will be used as

there is enough time before and after this particular shock to analyze a 50 day post-period. The pre-period starts on May 31st 2015 and ends on the day of the shock, October 31st 2015,

Referenties

GERELATEERDE DOCUMENTEN

Previous research suggests that not all newcomers are directly accepted within an existing team (Moreland &amp; Levine, 1982; Choi &amp; Thompson, 2005). The objective

Decisional conflict may lead to the inability to select one alternative, or a discussion in a group with different preferences may lead to decision refusal, when certain group

In addition, in the first part of the questionnaire, respondents were asked to provide the name of a specific retailer they had a personal omni-channel experience with (using both an

Where most studies on the psychological distance to climate change focus on the perceptions of outcomes over time, the present study focuses on the subjective

When the importer’s judicial quality is much better than the exporter’s, a higher level of generalized trust from the importing country would cause a drop in trade

An appropriator effect, a concept related to the number of episodes the contestant has been in, and a learning effect, a reference point related to the prizes won by

A consumer is closer to the conversion square when visiting the focus brand’s website, than an information/comparison website or app, a generic search or a competitor’s

Coffman (2014) demonstrated empirically across decision domains with varying gender stereotype that when holding ability constant, women (men) are less likely to put forward their