Interaction between online channels and devices in different stages of the customer purchase journey

(1)

Interaction between online channels and devices in different

stages of the customer purchase journey

Florine van Helsdingen University of Groningen Faculty of Economics and Business

MSc Marketing Intelligence Master Thesis

26-07-2019

Florine Frederique Maxime van Helsdingen Nieuwe kijk in ‘t Jatstraat 102

9712 SL, Groningen + (31) 6 10 62 77 66

f.f.m.van.helsdingen@student.rug.nl

Supervisor (first): dr. P.S (Peter) van Eck Supervisor (second): prof. dr. J.E (Jaap)Wieringa

University of Groningen Faculty of Economics & Business

Department of Marketing

(2)

Preface

This is the thesis "Interaction between online channels and devices in different stages of the consumer purchase journey ". I wrote this thesis as part of my graduation from my master Marketing Intelligence at the University of Groningen. I have been working on this thesis from February 2019 up to and including June.

Hereby I would like to thank my supervisor Dr. Peter van Eck for the guidance during my thesis. While writing my thesis, he has always taken the time to answer my questions in a comprehensive way.

I would also like to thank the rest of my thesis group for all the useful brainstorming sessions in which we have looked at each other's thesis with a critical eye. I would also like to thank my parents, Don and my aunty who have helped me a lot and have read through my thesis. Last, I would like to thank GFK for making it possible to share this dataset.

I wish you a lot of reading pleasure!

Florine van Helsdingen

(3)

Management Summary

The online shopping process, often referred to as the customer purchase journey, is

characterized by different online channels which can be classified as firm-initiated channels, customer-initiated channels or both. These channels have their own characteristics which makes them suitable for different stages in the shopping process. Traditionally, three stages are common: the pre-purchase stage (search), the purchase stage and the post-purchase stage. Not only channel characteristics but specific preferences of customers is also important in choosing channels in stages of the customer purchase journey.

These channels can be used on different devices such as laptops and desktops (fixed) and tablets and mobiles (mobile). Some channels, due to their characteristics, are more suitable on a specific device than other channels. This study offers a new perspective in the use of channels in different stages of the customer purchase journey. Not only different channels but also the use of device is a main interest of this study. Being able to predict which channels people are likely to use in specific stages and being able to predict which channels are used on specific devices, could be relevant for managers and their bidding strategies. The research question that this study seeks to answer is as follows:

How does the device type influence the relationship between (the use of) channels and their performance in different stages of the customer journey?

Based on an extensive review of literature, five hypotheses are prepared. In order to see if someone is in the “search of purchase” stage, different distributions based on timing are tested. The first hypothesis tests if there is a difference in the use of online channels in the various stages of the customer purchase journey. Online channels that are measured are: e-mail, website, search, affiliates, retargeting, pre-roll and banner. The second hypothesis is about the difference between customer-initiated and firm-initiated channels. Based on literature it is predicted that the use of customer-initiated channels is higher in the purchase stage than in the pre-purchase stage.

(4)

The fourth hypothesis extends the first hypothesis by means of a moderator. More

specifically it tests if there is a moderated effect of device type in the use of online channels in the different stages of the customer purchase journey. The last hypothesis is also an extended version of the first hypothesis by means of a mediation. With this hypothesis it is expected that there is a mediated effect of device type in the use of online channels in the different stages of the customer purchase journey

Data is obtained from research company GFK. The dataset concerns the travel industry and can be classified as panel data and is online and event based. All the hypotheses are measured by means of logistic regression.

The outcomes of the research are presented below. This study fully supports that there is a difference in the use of online channels in the various stages of the customer purchase journey. More specifically, the outcomes of this research shows that website, banner and pre-roll are channels that people on average use more in the search stage of the customer

purchase journey and e-mail, retargeting, affiliate, and search are channels that appear more in the purchase stage of the customer purchase journey. A lot of the channels used in the purchase stage of the journey can be classified as firm-initiated channels, only search is classified as a initiated channel. The second hypothesis about the use of customer-initiated channels in the purchase stage is therefore not supported as this research shows that customer-initiated channels are less apparent in the purchase stage. The third hypothesis about the differences in the use of devices in the various stages of the customer purchase journey is partially supported as for two of the distributions no significant effect is found. This study shows that the mobile device is less used in the purchase stage.

(5)

The fifth hypothesis that studies device as a mediator is partially supported as for a lot of the distributions no significant effect is found. For affiliates and pre-roll device fully mediates the relationship between the stage someone is in and the channel someone uses. So, for these channels the influence of a stage on an online channel exists due to the indirect influence of device. For e-mail, retargeting, website, search and banner devices partially mediates the relationship between the stage someone is in and the channel someone uses. For these channels the influence of a stage on an online channel exists partially due to the indirect influence of device.

(6)

Table of Contents

1. Introduction ... 8

2. Theory ... 10

2.1 The online customer purchase journey ... 10

2.2 Steps in the customer purchase journey ... 10

2.3 Online Channels in the customer journey ... 11

2.4 Device types in the customer journey ... 14

2.5 Conceptual model ... 18

3. Methodology ... 20

3.1 Data collection ... 20

3.2 Research design ... 20

3.2.1 Research type ... 20

3.2.2 Population and sample ... 20

3.3 List of analyses ... 20

3.3.1 Determining the stages of the customer journey ... 20

3.3.2 Logit model ... 21

3.3.3 Moderation ... 22

3.3.4 Mediation……….23

3.3.5 Model variables ... 24

3.4 Model specification ... 25

3.4.1 Model specification hypotheses 1,2,3 ... 25

3.4.2 Model specification hypothesis 4 with moderator ... 26

3.4.3 Model specification hypothesis 5 with mediator ... 26

3.5 Plan of analyses ... 27 4. Data description ... 28 4.1 Data pre-processing ... 28 4.2 Preliminary checks ... 29 5. Results ... 31 5.1 Descriptives ... 31

5.1.1 Customer purchase journey ... 31

5.1.2 Demographics ... 31

5.2 Assumptions logistic regression ... 31

5.3 Model estimation ... 33

5.4 Model validation ... 33

5.5 Hypotheses testing ... 34

5.5.1 The use of online channels in different stages of the customer purchase journey ... 34

5.5.2 The use of CIC’s in different stages of the customer purchase journey ... 39

5.5.3 The use of devices in different stages of the customer purchase journey ... 41

5.5.4 The use of online channels in different stages of the customer purchase journey moderated by device 5.5.5 The use of online channels in different stages of the customer purchase journey mediated by device ... 47

6. Discussion ... 50

6.1 Online channels and different stages of the customer purchase journey ... 50

(7)

6.3 Search and purchase distribution ... 53

6.4 Implications ... 53

6.5 Limitations and future research ... 54

References ... 56

Appendix I – Overview of variables ... 62

Appendix II: Results hypothesis 1 ... 63

Appendix IIa: Binary Logistic Regression Hypothesis 1 ... 63

Appendix IIb: Binary Logistic Regression Hypothesis 1 ... 64

Appendix IIc: Binary Logistic Regression Hypothesis 1 ... 65

Appendix IId: Binary Logistic Regression Hypothesis 1 ... 67

Appendix IIe: Binary Logistic Regression Hypothesis 1 ... 68

Appendix IIf: Binary Logistic Regression Hypothesis 1 ... 69

Appendix IIg: Binary Logistic Regression Hypothesis 1 ... 71

Appendix III: Results Hypothesis 2 ... 73

Appendix IV: Results Hypothesis 3 ... 75

Appendix V: Results Hypothesis 4 ... 77

Appendix Va: Binary Logistic Regression Hypothesis 4 ... 77

Appendix Vb: Binary Logistic Regression Hypothesis 4 ... 79

Appendix Vc: Binary Logistic Regression Hypothesis 4 ... 81

Appendix Vd: Binary Logistic Regression Hypothesis 4 ... 83

Appendix Ve: Binary Logistic Regression Hypothesis 4 ... 85

Appendix Vf: Binary Logistic Regression Hypothesis 4 ... 87

Appendix Vg: Binary Logistic Regression Hypothesis 4 ... 89

Appendix VI: Results Hypothesis 5 ... 91

Appendix VIa: Binary Logistic Regression Hypothesis 5 ... 91

Appendix VIb: Binary Logistic Regression Hypothesis 5 ... 93

Appendix VIc: Binary Logistic Regression Hypothesis 5 ... 95

Appendix VId: Binary Logistic Regression Hypothesis 5 ... 97

Appendix VIe: Binary Logistic Regression Hypothesis 5 ... 99

Appendix Vif: Binary Logistic Regression Hypothesis 5 ... 101

Appendix VIg: Binary Logistic Regression Hypothesis 5 ... 103

(8)

1. Introduction

Mobile devices are more prominent in life than ever before. Google even stated mobile as one of the most important themes of 2019. “The mobile traffic has now grown larger than desktop traffic” (Hogenboom, 2019). Not only the amount of people owning a smartphone is increasing but also the time people are spending on the devices. Where in 2010 people were spending on average 24 minutes on their mobile phone, in 2014 this has grown to 171 minutes (Chaffey, 2018). It is even expected that the growth of mobile devices will only continue in the future (De Haan, 2018). The growth of mobile devices is also apparent in the online customer purchase journeys.

The customer purchase journey can be defined as a funnel with steps customer(s) go through in engaging with your company, whether it is a product, service or experience (Richardson, 2010). “Online the journey includes all visited websites, links and marketing content the customer encounters” (De Haan, 2018:5). The customer journey can be divided into three stages: the pre-purchase, the purchase and the post-purchase stage. In all of these stages fixed and mobile devices appear. Important is that devices have their own characteristics which makes them suitable for different stages in the purchase process. “Mobile channels for example have specific characteristics that make them more suitable for search and less suitable for purchase” (Lemon & Verhoef, 2016: 80). Often multiple devices are used together in the customer purchase journey.

Besides different devices advertisers use a variety of online marketing channels to reach potential consumers such as e-mail, banners, affiliates and paid advertising. Retailers that were used to only sell their products in a store are now selling their products online via multiple channels such as websites and search channels (Shankar & Malthouse, 2007). Many customers are using different online channels multiple times making a purchase (Li &

Kannan, 2014). A distinction can be made between the type of online channels. Online channels can be classified as customer-initiated or firm-initiated channels. “In firm-initiated channels, the advertiser determines timing and exposures while in customer-initiated

channels, customers actively trigger the communication” (Anderl, Becker, Von Wangenheim & Schumann, 2016:460). The main difference is that the customer is more active and

(9)

(FIC) the company is “pushing information” (De Haan, Wiesel & Pauwel, 2016, Shankar & Malthouse, 2007).

Both device and online channels are researched in previous studies. De Haan, E., Kannan, P. K., Verhoef, P. C., & Wiesel, T. (2018) have researched device switching in the customer purchase journey. This research takes the stages of the purchase funnel into account but does not pay attention to online channels. Anderl et al., (2016) did research to the effectiveness of different forms of online channels in a multi-channel attribution work. However, they only used the type of channel and not the type of device. To my best knowledge, none of the researchers above or other researchers have researched the interaction between online channels and devices in different stages of the customer purchase journey. The goal of this study is to research this interaction by means of the following research question:

How does the device type influence the relationship between (the use of) channels and their performance in different stages of the customer journey?

The aim of this research is twofold. First, with this research and the corresponding research question a relevant addition will be made to the existing literature. Second, this research aims to provide current managers insights on the specific purpose of different channels and

devices. This is relevant because managers can design their campaigns effectively if they understand the main effect of channels and devices and the interactions between those two (Kannan, Reinartz & Verhoef, 2016).

The data that will be used has been retrieved from research company GFK and is focused on the travel industry. This is a very interesting industry because it changed dramatically due to the digitization and the proliferation of online channels. Customers are no longer limited to the travel agent website but websites as Booking and TripAdvisor have become very important (Verhoef, Kannan & Inman, 2015).

(10)

2. Theory

In this literature review, more in-depth information will be given about different theories relevant for this research. First a description of the customer purchase journey and the accompanying stages will be given, after which different channels and device types will be discussed. Based on this, various hypotheses are formulated.

2.1 The online customer purchase journey

The online customer purchase journey is a subject that has been the research object of a lot of studies and according to the Marketing Science Institute (2016) the customer journey is a very relevant research area for the future. However, literature is not conclusive in providing a comprehensive definition of this construct. Some researchers are more specific where others are more general in their definition. Richardson (2010) defines the customer journey as steps customers go through in engaging with a company. This journey can end up with a purchase but does not have to. Anderl et al., (2016:457) are more specific in their definition and define the customer journey of an individual customer as “including all touch points over all online marketing channels preceding a potential purchase decision that lead to visit of an advertiser’s website. The customer purchase journey is often referred to as a funnel. Wiesel, Pauwels & Arts (2011:605) describe this purchase funnel “as customers that move toward a purchase in a series of stages”.

What becomes evident in the different definitions is that the customer purchase journey consists of multiple steps/stages, online channels and touchpoints that can lead to an actual purchase or a visit to an advertiser’s website. Nowadays the customer journey is more

complex as customers interact with multiple channels and devices (Lemon & Verhoef, 2016). These channels are not only limited offline channels, but also online channels are common in modern customer journeys (Rangaswamy & Van Bruggen, 2005).

2.2 Steps in the customer purchase journey

As the literature is not conclusive in providing one comprehensive definition of the customer purchase journey, there are also several frameworks of the stages in the customer purchase journey. Typically, these frameworks include (some of) the following stages: need

(11)

post-purchase evaluation (Van der Heijden, Verhagen & Creemers, 2003). Another perspective of the stages can be a more psychological approach. Wiesel, Pauwels and Arts (2011) describe a cognitive, an effective and a conative stage.

Lemon and Verhoef (2016) and Frambach, Roest & Krishnan (2007) have a more general distinction consisting of three stages: the pre-purchase stage, the purchase stage and the post-purchase stage. The pre-post-purchase stage is the first stage and “holds the customers

interaction with the brand” (Lemon and Verhoef, 2016:76). In this stage consumers primarily seek information and is often referred to as the search stage (Frambach, Roest & Krishnan, 2007). Behaviors that characterize this stage are need cognition, search and consideration (Lemon & Verhoef, 2016).

The second stage is the purchase stage. This is the stage where the conversion is made. “It entails all the customer interaction with the brand itself during the purchase event itself” (Lemon and Verhoef, 2016:76). Behaviors that are common in this stage are choice, ordering and payment (Lemon & Verhoef, 2016).

The last stage is the post-purchase stage which entails “the customer interaction with the brand following the actual purchase” (Lemon and Verhoef, 2016:76). Behaviors that are central in this stage are post-purchase engagement, consumption, usage and service requests (Lemon & Verhoef, 2016).

Important in the definition of the various stages is that they do not consider the medium through which the consumer buys (Van der Heijden, Verhagen & Creemers, 2003). But within these stages consumers use different channels as a medium to buy a product. These channels can be classified as CIC’s or as FIC’s.

2.3 Online Channels in the customer journey

(12)

(2007) and Keeney (1999) in each of the three stages of the journey, consumers evaluate the various marketing channels on their ability to satisfy the benefits they seek. Consumers are not passive but active and scrutinize each channel on the benefits it provides versus the incurred search costs. Based on this, consumers will make their channel decision (Li & Kannan, 2014).

Not only channels differ in their characteristics, but they also differ in how they affect consumers in different stages. According to Abhishek, Fader and Hosanagar, (2015) display ads seem to have more effect on the consumer in the search stage than in the purchase stage. Customers on their turn, differ in their preference for (the usage of) channels.

The existing literature of drivers of channel choice mainly focused on offline channels or the interplay of online channels. As there is a shift from a multi-channel environment to an omni-channel environment, the understanding of the main effects of omni-channels needs more research (Schoenbachler & Gordon, 2002 and Verhoef, Kannan & Inman, 2015). Therefore, the first hypothesis is as follows:

H1: There is a difference between the use of online channels in the various stages of the

customer purchase journey.

The introduction and the rise of the internet has changed the customer journey and the role of the consumers. Nowadays, it is not only the company sending information, but the consumer has the power to engage with companies. In general, there are two ways in which a customer and a company can reach each other in the online customer journey. “In FIC’s, the advertiser determines and exposures the communication. In CIC’s, customers actively trigger the communication” (Anderl et al., 2016: 460). The channel e-mail is an example of a FIC as an e-mail is “pushed” by the company to the customer. An example of a CIC is online search as this is initiated by the customer itself (De Haan et al., 2016). So, the origin of contact can be seen as an important differentiator for online channels (Anderl, Schumann, Kunz, 2016).

(13)

Online channel Description Channel type

Display Display advertising, respectively banner advertising, entails embedding a graphical object with the advertising message into a website. Timing and exposures of display banners are determined by the firm.

Firm-initiated

Newsletter Newsletter marketing, also known as e-mail marketing, encompasses sending marketing messages toward potential customers using e-mail.

Firm-initiated

Retargeting Retargeting is a subclass of display advertising that is personalized toward the user based on his or her browsing history. It aims to re-engage users who have visited an advertiser's website but did not complete a purchase.

Firm-initiated

Social media Social media advertising comprises a set of advertising platforms belonging to the field of social media, such as social networks (e.g., Facebook), micromedia (e.g., Twitter), or other (mobile) sharing platforms (e.g., Instagram). In one of our data sets, the advertiser uses targeted Facebook display ads, which we define as social.

Firm-initiated

Type-in Visits are classified as (direct) Type-in if users access the advertiser's website directly by entering the URL in their browser's address bar, or by locating a bookmark, favorite, or shortcut.

Customer-initiated

Search

(SEA/SEO) A consumer searching for a keyword in a general search engine (e.g., Google) receives two types of results: organic search results ranked by the search algorithm, and sponsored search results, also known as paid search or search engine advertising (SEA). While organic search or search engine optimization (SEO) results are available for free, SEA clicks are sold via second-price auctions.

Customer-initiated

Price comparison Price comparison websites are vertical search engines that allow users to compare products by price and features. They aggregate product listings from a multitude of businesses, and direct users toward their websites.

Customer-initiated

Affiliate Affiliate marketing is a form of commission-based marketing in which a business (e.g., retailer) rewards the affiliate (e.g., a product review website) for referring a user toward the business's website. As affiliate in our data sets may include both coupon websites that are customer-initiated and ads provided by affiliate networks that may be more firm-initiated, a clear differentiation between customer- and firm-initiated contacts across data sets is not possible.

Customer-initiated/firm-initiated

Referrer Referral or referrer traffic covers all traffic that is forwarded by external content websites (with or without remuneration)—for example, by including a text link. As traffic sources vary across data sets, a clear differentiation between customer- and firm-initiated contacts across data sets is not possible.

Customer-initiated/firm-initiated

(14)

searches for a video about a specific subject, the customer cannot decide which pre-roll he or she sees, therefore it can be classified as a FIC.

When comparing the effect between CIC’s and FIC’s, CIC’s seem to have more effect than FIC’s in the customer journey because they already require a level of interest from the

customer. In general CIC’s are very interesting for companies because for CIC’s such as paid advertising, this only cost money when a customer clicks on the paid ad (De Haan et al., 2016). Besides they are also perceived as far less intrusive (De Haan et al., 2016 & Shankar and Malthouse, 2007).

The stage where these channels take place in the purchase journey are different with FIC’s earlier in the journey as consumers did not yet identify themselves with a product. On the contrary, CIC’s are more used later in the journey and represent the previous buying behavior (De Haan et al., 2016). Previous research has focused on the effect of CIC’s and FIC’s with data of the fashion industry. Compared to the fashion industry, the travel industry is different as it offers products that are bought occasionally instead of products that are bought more often. Therefore, it is interesting to see if the differences in the usage of FIC’s and CIC’s that holds in the fashion industry, also exists in the travel industry. Based on previous research, the following hypothesis is made to test this:

H2: The use of customer-initiated channels is higher in the purchase stage than in the

pre-purchase stage.

2.4 Device types in the customer journey

It is already evident that the consumer purchase journey is not limited to one channel, but it is also not limited to one device. Often, people are using multiple online and offline devices in their journey and also switch between these devices (de Haan et al., 2018). Customers are not only using these devices in their online journey but also in their offline journey. For example, showrooming where people use their mobile device in a physical store to compare prices (Gensler, Neslin & Verhoef, 2017). But as the online journey is the main interest of this study, showrooming will not be discussed further.

(15)

because a smartphone for example can come in many sizes. Also, there is a difference in laptops and computers as a desktop is fixed and a laptop is easy to carry. In this research two different categories are existent: fixed being laptop and desktop devices and mobile being mobile and tablet devices.

The rise of the mobile phone has changed online shopping behavior resulting in a more complex journey (de Haan et al., 2018). According to Kannan, Reinartz and Verhoef (2016) the effect of a channel can have a different impact when accessed using a smartphone versus a desk top. Just as channels, devices have their own characteristics that are suitable for different stages of the consumer journey. Mobile device channels for example, are, due to their characteristics, suitable for the search stage because the search time is smaller compared to a fixed device (De Haan et al., 2018, Cui and Roto 2008). In the travel industry “search” is the most common starting point for mobile research (Nielsen, 2013).

Often people begin their search for a product on a mobile device, but the final purchase is often made on a fixed device. Therefore, mobile devices are more common in the search stage and fixed devices are more common in the purchase stage (Lemon & Verhoef, 2016). The difference in the likelihood of ordering is remarkable as the likelihood of ordering is 70% higher when a consumer begins their journey with a search on the mobile device (Verhoef, 2018).

(16)

H3: There is a difference in the use of devices in the various stages of the customer purchase

journey.

The rise of multi-channels and devices has resulted into a more complex journey where

devices and channels are used simultaneously, constantly and interchangeably (Verhoef, Kannan, and Inman, 2015). Research of Verhoef, Kooge, & Walk (2016) showed that different devices are suited for certain touchpoints. Kannan et al., (2016) have extended this and found that the effect of a touchpoint depends on the type of device used. A touchpoint on a mobile device has a different impact than a touchpoint on a computer. Existing literature already found an interaction effect of device for touchpoints. But, the interaction effect of device and online channels is still something that needs a good understanding (Kannan et al., 2016). The following hypothesis is made to see whether this interaction effect of device also exists for online channels:

H4: There is a moderated effect of device type in the use of online channels in the various

stages of the customer purchase journey.

In order to develop a good understanding about why people choose certain channels, marketers must develop a clearer understanding of the causal determinants influencing this (Mitchell & Olson, 1981). Although the mediated role still remains unexposed in existent literature, the influence of a stage on an online channel could probably be (partially) mediated by device. The pre-purchase stage includes behaviors as need recognition, search and

consideration (Lemon & Verhoef, 2016). Therefore, “A channel in this stage must have the ability to enable the consumers to identify product information and obtain access to this information” (Frambach, Roest & Krishnan, 2007:29). The internet is easy to access and very effective in finding an organizing information for people. It is often the preferred choice for information-seeking (Shankar, Smith, & Rangaswamy, 2003).

(17)

such as a mobile application but for channels that exist across multiple devices research showed that the experience of a channel can differ per device. In order to get the best experience, people prefer certain channels on specific devices.

This sounds as this process is deliberate and people carefully think about these decisions, but these choices are part of an unconscious process (Martin & Morich, 2011), The causal

relationship between the stage and online channels is tested in hypothesis 1. In order to test if the influence of a stage on a channel is (partially) mediated by device the following

hypothesis is made:

H5: The relationship between the different stages and the various channels in the purchase

journey is mediated by device.

(18)

2.5 Conceptual model

Figure 1 – Conceptual model

In every definition of the different stages of the customer journey, the post-purchase stage includes different behaviors after the purchase like usage and consumption. Besides, loyalty is an important aspect of the post-purchase stage. As the channels leading to a purchase are the main interest, the post-purchase stage will not be taken into account in this study. For device type, three devices captured in two types will be researched in this study. Fixed devices are computers but also laptops where mobile devices are tablets and mobiles. As this division is often made in different studies, this division will also be used in this study. The channels that can be classified as CIC’s are website of the focal firm and the search channel. FIC’s are e-mail, banner, pre-roll, affiliates and retargeting.

(19)

Besides the variables in the conceptual model there are also control variables included in the study, like demographic variables such as age, gender and education level. They are not the main interest of this study, but they are expected to affect (the usage) of online channels. When it comes to age, Oksman and Turtianen (2004) have found that especially children and younger people use their mobile phone a lot in everyday live. The difference in age is also apparent in online buying as research has shown that older shoppers search for less products than younger shoppers, but the purchase amount is the same for both groups (Sorce, Perotti & Widrick, 2005). Contradicting to this research Johnson, Bellman and Lohse (1999) found that the likelihood of ordering increases when age and education goes up. Zhou, Dai & Zhang (2007) have extended this research to education and found that online shopping is easy nowadays compared to the earlier online shopping process. They state that therefore online shopping does not require high education. When it comes to gender there are difference between men and women in the likelihood of purchasing online (Brown, Pope & Voges, 2003) Men both shop and purchase more online compared to women Rodgers & Harris (2003).

(20)

3. Methodology

3.1 Data collection

The data in this research is retrieved from research company GFK and provided by Peter van Eck of the Rijksuniverisiteit Groningen. GFK measures a small chunk of big data. The dataset concerns the travel industry, but the specific company is not provided. The dataset can be classified as panel data and is online and event based. “A panel data set consists of a time series of each cross-sectional member in the data set” (Leeflang, Wieringa, Bijmolt & Pauwels, 2016:66).

3.2 Research design

3.2.1 Research type

The aim of this research is to find a causal relationship between the usage of channels and devices in the different stages of the customer purchase journey. This can be classified as a causal research design.

3.2.2 Population and sample

The sample of this research are Dutch citizens who have orientated themselves on travel information. The data sample includes a total of 9678 individuals. The time period of this research is between 1-5-2015 and 31-09-2016. The specific sampling method is performed by GFK.

3.3 List of analyses

The purpose of this study is to answer the following question: How does device type influence the relationship between (the use of) channels and their performance in different stages of the customer journey? In order to give a reasonable answer to this question first the stage in which the online channel is used must be determined. Therefore, the stages of the customer journey must be defined.

3.3.1 Determining the stages of the customer journey

(21)

from literature whether a channel is used in the search stage or the purchase stage. What is stated by different researchers such as Kannan, Reinartz and Verhoef (2016) and Berman (2018) is that the last-click has a lot of shortcomings as it does not take all the possible channels and touchpoints in the journey into account. In order to capture the different stages, a method is designed consisting off several random pre-purchase and purchase distributions. The most important factor for these distributions is timing. The data will be divided into several distributions of time of when channels are used in the search and the purchase stage (50-50/60-40/70-30/80-20). For example, the first 50% of the channels used in the journey will be assigned to the search stage and the remaining 50% will be interpreted as purchase channels.

3.3.2 Logit model

In order to analyze if there is an associative relationship between a metric dependent variable and independent variables, regression analysis is used. “It is a procedure that can be classified as powerful and flexible for analyzing associative relations” (Malhotra, 2010: 568).

As all the hypotheses have a dependent variable a special regression analysis needs to be performed. Two most common methods are: logistic regression (logit model) and probit regression. Often, in marketing the first method is preferred as probabilities are easier to calculate in comparison with probit models where the coefficients resemble changes in the z-scores. Not only probabilities but also the interpretation of the parameters is ‘easier’ for a logit model (Kliestik, Kocisova, & Misankova 2015).

A logit model shows the relationship between a binary outcome variable and one or more predictors that can be continuous or binary (Wilson & Lorenz, 2015). Logistic regression fits this study as all the dependent variables are dichotomous and also the explanatory variables are binary. A logit model can account for an interaction effect as well by simple including a moderator into the model.

(22)

interpret outcomes of a logit model. A globally accepted way is by means of the odds ratio, “Which is the probability of Yi = 1 divided by the probability of Yi = 0” (Leeflang et al., 2016: 268).

The logit model is part of the linear regression family, therefore multiple assumptions must hold for the disturbance term. These assumptions need to hold in order to interpret the results. The assumptions will be discussed in the results chapter.

3.3.3 Moderation

A moderator is defined as a variable that affects the direction and/or strength of the relationship between an independent variable and a dependent variable (Baron & Kenny 1986). In literature, a moderator is often referred to as an interaction. When the effect of one x-variable on the y-variable depends on the level of the other x-variable interaction occurs. “In such cases, the latter x-variable is said to moderate the effect of the first x-variable on y” (Leeflang et al., 2016:38). In figure 2 below a moderation is visually presented.

Figure 2 – Schematic representation of moderation

For hypothesis 4, the type of device is the moderator and thus affecting the direction and/or strength of the relationship between the stage and the type of online channel. In other words, the use of a mobile device influences both the stage and online channel type. In order to capture the full effect, the model should also contain the main effect of type of device. If the type of device is excluded, the omitted main effect is incorporated in the product term, which no longer reflects an interaction effect (Wieringa, 2018).

! "_#

(23)

In order to capture the moderator in the model a new predictor variable is made and added to the model that captures the interaction effect.

3.3.4 Mediation

A moderator captures when a certain effect occurs, and a mediator captures how or why a certain effect occurs. With a mediated effect there is a causal relationship between X and Y but the effect of X and Y goes through M (a mediator) (Zhao, Lynch & Chen, 2010). In figure 3 a mediation analysis is visually presented.

Figure 3 – Schematic representation of mediation

The mediator reflects the process between independent variable X and dependent variable Y and a mediation is often referred to as an indirect effect. According to Baron and Kenny (1986), James and Brett (1984) Judd and Kenny (1981) there are several steps in establishing mediation:

1. Perform a regression analysis where y is the criterion variable and "_# is a predictor and estimate and test path c.

2. Perform a regression analysis where "# is the predictor and "$is the mediator and

estimate and test path a.

(24)

significantly predicts path b, while controlling for path c′” (Leeflang et al., 2016:143).

In order to have a significant mediation effect all these steps must be significant and

substantially different from zero (Baron & Kenny, 1986). There are two types of mediation known: full mediation and partial mediation. In full mediation "_$ completely mediates the relationship between "_# and y. If the effect of "_$ lowest the direct effect of "_#on y

substantially a partial mediation is reported.

In the literature different methods of mediation analysis are mentioned. The steps reported above are the same for all these methods and are independent of the estimation method of the model. As the variables in this research only include binary variables, the recommended estimation method is a logistic regression (Leeflang et al., 2016). A method that is

recommended for R is the Monte Carlo Simulation by MacKinnon, Lockwood and Williams (2004) which is also captured in the mediation package for R by Tingley, Yamamoto, Hirose, Keele, & Imai (2014).

One last important notion to mention is that a mediation model can be classified as a causal model which has little value if the researcher is incorrect on the causation (Kenny, 2013).

3.3.5 Model variables

There has been a careful consideration of the variables that are included into the final model. The selection of the appropriate variables is based on literature and on the model

requirements of Little (1970). According to Little (1970) models should be complete, simple, robust, evolutionary and adaptive. The purpose of this study is to find the best balance

between a complete but simple model. The mathematical specification for the different models will be discussed in the upcoming section.

(25)

3.4 Model specification

3.4.1 Model specification hypotheses 1,2,3

The specification of a logit model is based on the specification of a linear regression model. The main difference is that the logit model is log-transformed in order to limit the dependent variable to the 0-1 range (Rodriguez ,2007). As the dependent variable of a logit model needs to be non-logistic, the independent variables are exponentiated. The logistic regression specification is presented below:

p₎ = exp(/012) 1 + exp(/₀1₂₎

"p) represents the matrix of observations of the independent variables for consumer i, and β is a vector of parameters” (Leeflang et al., 2016: 266).

The model that is used for the first, second and fourth hypothesis includes different dependent variables with the same independent variables. The model can be denoted as:

7_0,#,$,9 = exp(2; + 2# <0 + 2$=0 + 29>0 + 2?@0 )

1 + ∑CD#exp(2_;B+ 2_#<_0B+ 2_$=_0B + 2₉>_0B+ 2_?@_0B)

BE#

Where:

7_0,# = Probability that consumer uses an online channel

(e-mail/retargeting/affiliate/pre-roll/display/website/search) in purchase journey i 7_0,$ = Probability that a consumer uses a CIC in purchase journey i

7_0,9 = Probability that a consumer uses a mobile device in purchase journey i <_0B = Stage in purchase journey i

=_0B = Gender of consumer in purchase journey i >0B = Age of consumer in purchase journey i

(26)

3.4.2 Model specification hypothesis 4 with moderator

The model specification that is used to test the moderated relationship (stage and device) can be specified as:

7₀ = exp(2;B+ 2#<0B + 2$=0B + 29>0B + 2?@0B + 2FG0B+ 2HG<0B ) 1 + ∑CD#exp(2_;B+ 2_#<_0B+ 2_$=_0B+ 2₉>_0B+ 2_?@_0B+ 2_FG_0B+ 2_HG<_0B)

BE#

Where:

70 = Probability that consumer uses an online channel

(e-mail/retargeting/affiliate/pre-roll/display/website/search) in purchase journey i <_0B = Stage in purchase journey i

@_0B = Completed education of consumer in purchase journey i G_0B = Type of device used by consumer in purchase journey i G<0B = Interaction variable (stage*device) in purchase journey i

3.4.2 Model specification hypothesis 5 with mediator

The model specification that is used to test the mediated relationship (stage and device) can be specified as:

7₀ = exp(2;B + 2#<0B + 2$=0B + 29>0B + 2?@0B + 2FG0B) 1 + ∑CD#BE#exp(2;B + 2#<0B + 2$=0B + 29>0B + 2?@0B + 2FG0B)

Where:

70 = Probability that consumer uses an online channel

(e-mail/retargeting/affiliate/pre-roll/display/website/search) in purchase journey i <_0B = Stage in purchase journey i

@_0B = Completed education of consumer in purchase journey i

(27)

3.5 Plan of analyses

The following steps briefly summarize the methodology that is used in this study:

Step 1: The first step is to process the data in such a way that the data is ready for analysis. Therefore, the data will be merged, and missing values and outliers will be removed from the dataset.

Step 2: When the data is “clean” the first step is to define the two stages (search and

purchase), as there is no universal definition found in literature. Based on common sense it is assumed that the purchase funnel is a chronological process in which the purchase stage follows the search stage. It is also assumed that people do not switch back between stages. In order to determine the stages, the data will be divided into several distributions where the first 50, 60, 70 and 80 percent of the touchpoints in a certain journey are categorized as search stage touchpoint. The last 50, 40, 30, 20 percent are categorized as purchase stage touchpoint.

Step 3: After step 2, data will be analysed in order to see how often, under various distributions, different channels occur in certain stages moderated by the type of device.

(28)

4. Data description

This study uses online, event-based data from one of the Dutch travel agents between 1-5-2015 and 31-09-2016. There are 2,456,414 lines of data in total, of which 3647 purchases

4.1 Data pre-processing

As the variable PurchaseID is constructed in a way that one journey consists of multiple touchpoints, a variable is computed that contains the total amount of touchpoints used per single journey. Also, another variable “touch number” is computed out of the time variable that contains the actual chronological number of the touchpoints used in a single journey. If for example a journey consists of 4 touchpoints and e-mail is the first channel used in the journey, this channel is classified as 1.

Table 2 gives an overview of the online channels in this study and their contribution to the overall touchpoints that were available in the dataset (after data manipulation). The

definitions of the different online channels can be found in Table 1 of the literature review. All the FIC’s are based on tagging. It is assumed that either someone has loaded the

information or has seen the information. For example, either someone has seen the pre-roll or the pre-roll has been loaded.

The channel search is constructed out of multiple search variables including

“Accommodations Search”, “Information/comparison Search”, “Touroperator/Travelagent Search Competitor”, “Touroperator/travel agent search focus brand”, “flight tickets search” and “generic search”. In a real journey it could be possible that information search about the competitor could lead to a purchase of the focal brand. These variables are all captured by the search variable because they all cover the search initiated interact.

(29)

Online channel Type of _channel Frequency Relative % of channel in _{the dataset} Banner FIC 1264 0.40% E-mail FIC 2518 1.00% Affiliate FIC 1147 0.40% Retargeting FIC 34681 1.40% Type-in/website CIC 176213 7.11% Search CIC 57818 2.33% Pre-roll FIC 1036 0.40%

Table 2- Online channels and their contribution in the overall dataset

4.2 Preliminary checks

First, outliers, missing values, odd values and extreme values were identified. We obtained missing values for the following control demographic variables: “age”, “gender” and “employment type” The variables age and gender both have 304929 missing values and employment type 376318 missing values. Missing values could lead to biased outcomes, therefore should be imputed. Often you try to seek a balance between the impact of the missing values and the amount of data loss you will have by deleting these values. In this study, missing values appear only in the control variables: gender, age and education. The missing values are imputed by means of the distribution that was observed in the specific variable. The reason for these missing values in this dataset is due to technical reasons. As not every household was part of the panel sample, not every household did participate in every measure. As a result, GFK could not match this data to the purchase journeys which caused the missing values.

(30)

Besides the preparation of the dataset, the average length of a journey is calculated. The average journey contained 83 touchpoints. The purpose of this research is to only take the journeys into account that are on purpose and meaningful. Therefore, the relation between the number of touchpoints and conversion is checked to see if there is a reasonable development in the number of touchpoints used. In table 3 an overview is given.

Amount of

touchpoints No - conversion

Percentage

no-conversion Percentage conversion Conversion Totals 1-10 46364 1758 48122 96,43% 3.65% 10-20 51685 4475 56160 92.03% 7.96% 20-30 49772 5795 55567 89.57% 10.42% 30-40 47377 6199 53576 88.42% 11.57% 40-50 45543 8609 54152 84.10% 15.89% 50-60 41035 9225 50260 81.64% 18.35%

Table 3 – Conversions per points per journey

There needs to be a suitable consideration between the percentage of conversion and the amount loss of data. Therefore, the shortest journeys consisting of 1-10 touchpoints are being removed from the dataset. This range of 1-10 touchpoints has relatively few conversions, where the second range of 10-20 touchpoints has double the amount of conversions. It is assumed that the journeys including 1-10 touchpoints are no serious journeys with an underlying purchase intention. The touchpoints that are not interesting for this research are

Figure 4 - Boxplot of “number of contacts” with outlier

(31)

removed from the dataset. The data set contains 274,667 observations, after removing the outlier, the useless touchpoints and the journeys consisting out of a single touchpoint.

5. Results

5.1 Descriptives

5.1.1 Customer purchase journey

The dataset consists of 3368 journeys in total. Of these journeys 29.25% has resulted in a conversion of which 15.63% for the focal brand. The average length of a journey is 83 touchpoints. Of these touchpoints a small part, a total of 14.80% are the firm-initiated

touchpoints and the larger part are consumer-initiated touchpoints with a share of 85,20%. On average 1.768 different channels are used in a journey. The channel that appears most is the search focus brand with 74.05% and the channel that appears least is banner 4.28%. Of all touchpoints 27,52% happened on a mobile device and 72.48% happened on a fixed device. 5.1.2 Demographics

In terms of demographics, 2394 people participated in this study of which 66.88% is female and 33.12% is male. Ages range from 18 years old to 94 years old with an average of 52.99. This explains that 42% of the sample is at least in the mature life cycle stage. The household size ranges from 1 to 6 with an average of 2.75. The education level ranges from 1 – 8 and the average education level = 4. An extensive overview of all the variables in the dataset including their mean and standard deviation can be found in appendix II.

5.2 Assumptions logistic regression

The strong assumptions that hold for Ordinary Least Squares (OLS) does not hold for logistic regression. For example, the assumption of linearity of the relationship between the

dependent and independent variables. Actually, “logistic regression can hold non-linear relationships because it applies a non-linear log transformation of the linear regression” (Park, 2013: 156). Also, the assumption of normality of the error distribution does not hold for logistic regression. Although the standard OLS assumptions do not hold for logistic regression, six other assumptions do apply.

(32)

used a channel (1) or someone did not use a channel (0). The second assumption is that it is necessary to code the outcome variable accordingly. This means that the desired outcome must be classified as 1 (Park, 2013: 157). In every model the preferred outcome is classified with a 1. The third assumption concerns the model fit. The model must be of a good fit and not over fitted, with meaningless variables included or under fitted with meaningful variables not included. When creating the model every variable, including control variables, is taken into careful consideration. All the variables that are in the model are explicitly described in the literature part of this study. The fourth assumption states that the model should not suffer from multicollinearity which is described by Keller (2012: 680) as “a condition wherein the independent variables are highly correlated”. The existence of multicollinearity could lead to wrong interpretation of the parameters. To test whether the model suffers from

multicollinearity a Variance inflation factor test is performed. “A VIF greater than 5 is often taken to signal that collinearity is a problem” (Leeflang et al., 2016: 140). The VIF scores will be discussed in the results part.

(33)

5.3 Model estimation

As already mentioned earlier, the logit model is estimated by means of maximum likelihood. In order to test the impact of the independent variables of logit models three methods are common: coefficients of the original model, odds ratios and the marginal effects. When interpreting the coefficients, a positive outcome is interpreted as: an increase in the variable leads to an increase in the probability of Y. Whereas a negative coefficient is interpreted as an increase in the variable leads to a decrease in the probability of Y. With the interpretation of the log odds, the likelihood of happening versus not happening is measured. Where the odds ratio is the ratio between the two odds. When this value is 1 there is no relation between the odds, when this value is > 1 there is a positive relation between the odds and when this value is < 1 there is a negative relation between the odds. Important when interpreting odd ratios is that you only interpret the odds of that variable, ceteris paribus, for the other variables. The last method, marginal effects is interpreted as: the change in the dependent variables if there is a marginal change in the independent variable. As in this dataset both the independent and dependent variables are binary, marginal effects do not give meaningful insights. Therefore, marginals effects will not be measured. Note that only odd ratios are interpreted when a significant effect is observed (Dehmamy, 2019).

5.4 Model validation

There are several methods known to assess the internal validity of a model. The purpose of this study is not to make a predictive model but to test hypotheses with the best model possible. Therefore, the following methods will be used to assess the validity of the models: hit rate, top decile lift (TDL), Gini coefficient and the pseudo I$_{. When using these methods}

in testing the internal validity of the models, the models are compared to a so-called “NULL” model.

“The hit rate measures the percentage of correctly classified observations” (Leeflang et al., 2016:269). Often the threshold of the probabilities of the hit rate is 0.5. In this case

(34)

The TDL is defined by Leeflang et al (2016:322) as the fraction of customers in the top decile of the lift curve divided by the fraction of customer in the whole set. The TDL demonstrates the power of the model to be better than a random model (Xie, Li, Ngai & Ying, 2009). Another measurement that is used to compare the power of the model is the Gini coefficient. Where the TDL measures the top decile the Gini coefficient measures the double area between the cumulative lift curve and the equivalence line (Tziafetas, 1989). It therefore considers the performance of the model across all customers and not only the top percent (Leeflang et al., 2016). The last measure is the pseudo I$_{which is based on the criteria of the}

log likelihood (Simonetti, Sarnacchiaro & Rodríguez, 2017). There are three Log-Likelihood-based measures: Nagelkerke R2_{, McFadden}_R2_{and Cox & Snell R}2_{. The three different}

methods can be compared with each other as they all share the same underlying dataset. As with I$_{measures the value of the pseudo I}$_{is between 0 and 1. The closer to 1 the better}

your model duty is and the closer to 0 the worse your model duty is (Dehmamy, 2019). In other words, a model with a pseudo I$_{score of 1 perfectly incorporates the variance in the}

dependent variable associated with the independent variables.

The Akaike Information Criterion (AIC) is an information theoretic criterion for the

identification of a parsimonious model used to validate models (Bozdogan, 1987). The AIC is closely related to likelihood value but penalizes for the number of parameters (Dehmamy, 2019). Therefore, the model with the lowest AIC is the best model.

5.5 Hypotheses testing

First, the independent variables of the models will be interpreted. After this, the model will be validated. For every model the four distributions hold and will be treated the same. By means of the AIC it will be determined which distribution performs best.

(35)

Model estimation

Model 1

The output of the first model with a random distribution where the first 50% of the

touchpoints is defined as the search stage and the remaining 50% is defined as the purchase stage is presented below. The first thing that stands out is that all the variables are significant, indicating that all these variables have a significant effect on the use of e-mail. The

search/purchase variable is significant (p = < ,001) and the coefficient is positive (b = 0.126). Therefore, it can be concluded that there is an increased probability of using e-mail when a person is in the purchase stage. Also, gender is significant (p = < ,001) with a negative coefficient of (b = -0.223). Therefore, it can be concluded that there is an increased probability of men using e-mail. Age is also significant (p = < ,001) and the coefficient is positive with a value of (b = 0.029). The last parameter, education is significant (p = < ,001)

with a positive estimate (b = 0.205). It can be concluded that on average, as age and/or education level increases, the probability that a certain individual uses e-mail also increases.

Predictors Coefficients z-scores Std.Error p _{Odds ratio} (Intercept) -7.187819 -58.444 0.122987 <0.001*** 0.000 Purchase 50 0.126839 3.156 0.040191 <0.01** 1.1352 Gender ID -0.223658 -5.419 0.041273 <0.001** 0.7995 Age 0.029615 18.720 0.001582 <0.001*** 1.0300 Education 0.205407 18.893 0.010872 <0.001*** 1.2280 Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 11 Table 4 – Outcome model 1

(36)

In Table 4 besides the distribution variable all other variables are discussed but as these variables are control variables and therefore not the main interest of this study, they will not be discussed in the following results.

Model Distribution b p Odds AIC

Model 1 (e-mail) 50/50 0.126 <0.001*** 1.135 27984 Model 2 (e-mail) 40/60 0.158 <0.001*** 1.172 27978 Model 3 (e-mail) 30/70 0.166 <0.001*** 1.181 27978 Model 4 (e-mail) 20/80 0.153 <0.001*** 1.165 27984 Model 5 (Retargeting) 50/50 0.357 <0.001*** 1.429 205708 Model 6 (Retargeting) 40/60 0.340 <0.001*** 1.406 205793 Model 7 (Retargeting) 30/70 0.348 <0.001*** 1.416 205836 Model 8 (Retargeting) 20/80 0.370 <0.001*** 1.448 205911 Model 9 (Affiliates) 50/50 -0.063 0.287 14783 Model 10 (Affiliates) 40/60 0.011 0.850 14785 Model 11 (Affiliates) 30/70 0.105 0.089 14782 Model 12 (Affiliates) 20/80 0.209 <0.01** 1.232 14776 Model 13 (website) 50/50 -0.216 <0.001*** 0.804 356538 Model 14 (website) 40/60 -0.260 <0.001*** 0.770 356239 Model 15 (website) 30/70 -0.314 <0.001*** 0.730 355934 Model 16 (website) 20/80 -0.387 <0.001*** 0.678 355679 Model 17 (search) 50/50 0.077 <0.001*** 1.080 278509 Model 18 (search) 40/60 0.139 <0.001*** 1.149 278364 Model 19 (search) 30/70 0.200 <0.001*** 1.222 278182 Model 20 (search) 20/80 0.275 <0.001*** 1.317 277986 Model 21 (pre-roll) 50/50 -0.131 <0.05* 0.876 13393 Model 22 (pre-roll) 40/60 -0.125 0.051 13394 Model 23 (pre-roll) 30/70 -0.142 <0.05* 0.866 13393 Model 24 (pre-roll) 20/80 -0.127 0.111 13395 Model 25 (banner) 50/50 -0.511 <0.001*** 0.599 15915 Model 26 (banner) 40/60 -0.478 0.001*** 0.620 15932 Model 27 (banner) 30/70 -0.460 0.001*** 0.630 15946 Model 28 (banner) 20/80 -0.418 0.001*** 0.658 15965 Signif. Codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Table 5 – Outcome models hypothesis 1

(37)

almost all above 1 indicating a positive relation between the odds. When taking e-mail as example, the probability of using e-mail in the purchase 50/50 stage is 13.5% larger

compared to not using e-mail. For website and pre-roll the odds ratio is below 1. Therefore, for website as example, the probability of not using website in the purchase 50/50 stage is 20% higher compared to using website.

The bold AIC score highlights the lowest within the models of a dependent variable. The AIC for the pre-roll model is the lowest of all. For all the models, it is quite different which

distribution performs best. Therefore, it is hard to conclude on which distribution performs best for this hypothesis.

Model validation

(38)

Table 6 – Model validation outcomes hypothesis 1

Model Hit rate

NULL

model Hit rate alternative model TDL Gini coeffcient McFadden Pseudo !" Nagelkerkre _Pseudo_!"

(39)

The Nagelkerke R2_{, McFadden}_R2_{and Cox & Snell R}2_{all have the highest value for the}

models with the 50/50 and the 20/80 distribution but for some dependent variables this does not hold. When comparing the !"_{of the different models it is hard to conclude which model}

and the accompanied distribution is best as none of the distributions of the models really stand out.

5.5.2 The use of CIC’s in different stages of the customer purchase journey

The second hypothesis that is tested is a model with the use of CIC’s as a dependent variable and the search/purchase variable (and others) as independent variables. There is no

multicollinearity between the independent variables in any of these models as the VIF test outcomes are below the benchmark of 4. The overall output of the models including the VIF can be found in appendix III.

Table 7 shows that for all models the search/purchase variable is significant. Therefore, it can be concluded that there is a significant relationship between the use of CIC’s and whether individuals are in the purchase- or search stage within a certain customer purchase journey. For all the models the coefficient is negative, and it can be concluded that there is a decreased probability of using CIC’s when people are in the purchase stage. The odds of the models are all below 1 indicating a negative relation between the odds. When taking model 29 as

example, the probability of not using a CIC in the purchase 50/50 stage is 26% larger compared to using a CIC.

Model Distribution b p Odds AIC

Model 29 50/50 -0.296 <0.001*** 0.743 228008

Model 30 40/60 -0.291 <0.001*** 0.747 228040

Model 31 30/70 -0.303 <0.001*** 0.738 228051

Model 32 20/80 -0.329 <0.001*** 0.719 228093

(40)

Model Hit rate NULL model Hit rate alternative model

TDL Gini coeffcient Model 29 0.50 0.56 1.060 0.036 Model 30 0.50 0.56 1.059 0.035 Model 31 0.50 0.55 1.053 0.035 Model 32 0.50 0.55 1.055 0.033

Mc Fadden

Pseudo $% Nagelkerke _Pseudo_$% Cox & Snell _Pseudo_$%

Model 29 0.0099 0.0145 0.0082

Model 30 0.0097 0.0143 0.0081

Model 31 0.0097 0.0143 0.0081

Model 32 0.0095 0.0140 0.0079

Table 9 – Pseudo !"_{outcomes hypothesis 2}

In table 8 different validation measurements are presented. For all the models the hit rate of the model is (somewhat) higher than the hit rate of the NULL model. So, the alternative models are more accurate in predicting observations. The models are better at predicting the TDL than a NULL model. The Gini coefficient for the models is (somewhat) higher than the Gini of the NULL model (0). The performance of the model across all customers is better for this model than for a NULL model. The Nagelkerke R2_{, McFadden}_R2_{and Cox & Snell R}2

(41)

5.5.3 The use of devices in different stages of the customer purchase journey

The following hypothesis that is tested is a model with type of device as a dependent variable and the search/purchase variable (and others) as independent variables. There is no

multicollinearity between the independent variables in any of these models as the VIF test outcomes are below the benchmark of 4. The overall output of the models including the VIF can be found in appendix IV.

Table 10 shows that only two distributions are significant and for these distributions it can be concluded that the type of device people use is different in the search and the purchase stage. The coefficients of these variables are negative. This means that there is a decreased

probability of using the mobile device in the purchase stage. The odds ratio of the models is close below 1. For model 33 as example, the probability of not using a mobile device in the purchase 50/50 stage is 6.00% larger compared to using a mobile device.

Although there is a relatively small difference between the different AIC’s, model 33 is best in predicting the type of device.

Model Distribution b p Odds AIC

Model 33 50/50 -0.061 <0.001*** 0.940 236877

Model 34 40/60 -0.032 <0.01** 0.967 236902

Model 35 30/70 -0.018 0.114 236909

Model 36 20/80 -0.002 0.853 236911

Model Hit rate NULL

model Hit rate alternative model

(42)

Mc Fadden

Pseudo $% Nagelkerke _Pseudo_$% Cox & Snell _Pseudo_$%

Model 33 0.0131 0.0195 0.0113

Model 34 0.0130 0.0193 0.0112

Model 35 0.0129 0.0193 0.0112

Model 36 0.0129 0.0193 0.0112

Table 12 – Pseudo !"_{outcomes hypothesis 4}

In table 12 different validation measurements are presented. The hit rate of all the alternative models is slightly higher than the hit rate of the NULL model. The alternative models are (somewhat) more accurate in predicting observations than a NULL model. The TDL of all the alternative models is higher than the TDL of the NULL model, therefore the alternative models are able to better predict the TDL than the NULL model. The Gini coefficient for the alternative models is higher than the Gini coefficient of the NULL model (0). Therefore, the performance of the alternative models across all customers is better than for a NULL model. The Nagelkerke R2_{, McFadden}_R2_{and Cox & Snell R}2_{(Table 12) all have the highest value}

for model 33. As this model has also the lowest AIC this model, when using a 50/50 distribution, is best in predicting the use of devices.

5.5.4 The use of online channels in different stages of the customer purchase journey moderated by device

For this hypothesis multiple logit models are tested with a specific channel as a dependent variable and stage/device as a moderated variable. Table 13 summarizes the outcomes for the performed tests. From these results it can be concluded that there is no multicollinearity between the independent variables in any of these models as the VIF test outcomes are below the benchmark of 4. The overall output of the models including the VIF can be found in appendix V.