• No results found

SEGMENTING AND TARGETING YOUR MARKET:

N/A
N/A
Protected

Academic year: 2021

Share "SEGMENTING AND TARGETING YOUR MARKET:"

Copied!
48
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

SEGMENTING AND TARGETING YOUR MARKET:

The influence of different aspects of online touch points on purchase

(2)

2

SEGMENTING AND TARGETING YOUR MARKET:

The influence of different aspects of online touch points on purchase

Master Thesis Marketing Intelligence

17th of June, 2019

Femke I. Sougé

Jozef Israëlsplein 2A

9718 EN Groningen

+31 6 46 62 28 20

femkesouge@hotmail.com

S2752492

Supervisor (first):

dr. P.S. (Peter) van Eck

p.s.van.eck@rug.nl

Supervisor (second):

M.T. (Martine) van der Heide

m.t.van.der.heide@rug.nl

University of Groningen

Faculty of Economics and Business

(3)

3

ABSTRACT

The customer journey is the interaction between a firm and a customer consisting of multiple touch points. These touch points can be divided in different categories. A distinction can be made between customer-initiated touch points (CICs) and company-initiated touch points (FICs), which in this research will be used both. Customers come into contact with multiple touch points during the journey, which possibly could lead to a purchase. This research will focus on providing insightful advice on which touch points should be used for which type of customer in order to have a higher chance of making a purchase. Hence, the analysis will start with a segmentation of the customers in the dataset. To provide marketing advice, several aspects of online touch points will be used. In this research the type of touch point, the frequency of touch points, the average duration spent on a touch point and the influence of weather will be used. The data analysis will be done using 29.011 customer journeys of 9.678 customers. First, clusters are formed using hierarchical and non-hierarchical clustering methods and second, per cluster, a logistic regression will be performed to determine the effect of the independent variables on the probability of purchase. The results show that the average duration and the weather do not have a significant effect on the probability of purchase in this research. However, most of the touch points do have an influence on the probability of purchase, either only in one of the clusters or in multiple. Whether it is a positive effect or a negative effect also differs per cluster. An overview of the effect of the different touch points and the marketing implications per cluster can be found in the discussion chapter.

(4)

4

PREFACE

Dear reader,

Thank you for taking the time to read my thesis. During my Bachelor International Business at the University of Groningen I developed my interest for marketing. The marketing intelligence courses I followed during my Master year have been interesting, informative and challenging and I have learned many new aspects of marketing. The lectures and the tutorials have given me new insights and prepared me for my future career. I would like to thank the staff of the marketing department of the Faculty of Economics and Business for all the effort they have put into these courses. Moreover, during the second semester of my Master I wrote my master thesis. Even though this has been a challenging task at times, I truly enjoyed performing my own research project and I am very satisfied with the results. I would like to thank dr. Peter van Eck for supervising me while writing my thesis. His feedback and our meetings have contributed to moving my thesis in the right direction and taking it to a higher level. I would also like to thank Martine van der Heide, my second supervisor, for taking the time and effort to read my thesis. Lastly, I would like to thank my fellow students in my thesis group for all the useful discussions and meetings in the library to help each other to improve our theses.

(5)

5

TABLE OF CONTENTS

1. Introduction 6

2. Literature review 8

2.1 Type of touch points 9

2.2 Frequency of touch points 11

2.3 Duration of touch points 12

2.4 Weather influence 13 2.5 Conceptual framework 14 3. Methodology 15 3.1 Data collection 15 3.2 List of analysis 16 3.2.1 Cluster analysis 16

3.2.2 Binary logit model 17

3.3 Variables 19

3.3.1 Variables used for cluster analysis 19

3.3.2 Variables used for binary logit model 20

3.4 Model specification 21 3.5 Plan of analysis 22 4. Analysis 22 4.1 Preliminary checks 22 4.2 Statistical descriptions 24 4.3 Cluster analysis 25 4.3.1 Cluster analysis 25 4.3.2 Cluster descriptions 27 4.4 Logistic regression 29

4.4.1 Comparing the results between clusters 31

4.4.2 Validation of the models 33

5. Discussion 34

5.1 Type of touch point 34

5.2 Duration of touch points 36

5.3 Weather influence 36

5.4 Implications per cluster 37

5.5 Limitations and implications for future research 38

6. References 40

(6)

6

1. INTRODUCTION

The customer journey is a sequence of interactions between a customer and a firm, which occurs when a customer is considering making a purchase. In the past, these customer journeys were not that extensive, but as the focus changed from one-way communication to two-way communication and interaction between customers increased, the journeys became more comprehensive. Traditional marketing, such as direct mail and television commercials, ensured that the communication mostly came from the firm and the customer could respond to this. Due to the rise of social media the customer journey and customer experience is truly expanded. The customer experience is now a multidimensional process, consisting of cognitive, emotional, physical, sensorial and social elements as De Keyser et al. (2015) describe it. Touch points, which are defined as “episodes of direct or indirect contact with the brand” (Baxendale, Macdonald & Wilson, 2015, p. 236), are a long existing phenomenon and inseparably connected to the customer journey. The number of variety in touch points is ever growing. In previous research the effect of touch points has been discussed and how it can help in building a brand or increase brand consideration (e.g., Hogan, Almquist & Glynn 2005;Baxendale, Macdonald & Wilson 2015).

Times are changing and the online part of the customer journey is becoming increasingly important. It is a challenging task to determine which touch points actually matter and to decide how to incorporate these touch points (Rawson, Duncan & Jones, 2013). Hence, firms should acknowledge the importance of adopting customer experience management (Homburg, Jozić & Kuehnl, 2017). The purpose of

customer experience management is to design, and frequently renew, the touch point journey to achieve long-term customer loyalty (Homburg, Jozić & Kuehnl, 2017). Therefore, it is valuable for firms to know what the effect of online touch points is, and consequently, firms can design the customer journey in such a way that the customer experience is maximized and the appreciation for the brand and/or firm increases.

(7)

7 e.g. if a segment for which the touch point generic search leads to a high number of purchases, a firm can decide to invest in sponsored search advertisements, where a firm pays a fee to a search engine operator to display its ad on clear position (Ghose & Yang, 2009). Also, a firm could invest in search engine optimization, where a firm tries to improve the position of its ad in the unsponsored search results (Skiera, Eckert & Hinz, 2010).

The research of Lemon and Verhoef (2016) states that preferences and influences of touch points change over time, and firms are faced with challenges on how to deliver a beneficial experience. Therefore, research on segmentation and advice on which touch points are most profitable for a certain segment is valuable. Previous research has been done on segmentation in the consumer decision process (e.g.,

Bhatnagar & Ghose 2004; Konuş, Verhoef & Neslin 2008). Konuş, Verhoef and Neslin (2008) focus on the channels used in the search and purchase phase of the consumer decision process. Segments in Konuş, Verhoef and Neslin’s (2008) research are formed on the base of attitude toward multiple channels, while segmentation in this research will focus on the behaviour regarding the touch points in the customer journey. Moreover, their research has been conducted eleven years ago, and there have been many changes in the decision process the past decade. Bhatnagar and Ghose (2004) use segmentation to make web shopper segments based on purchase behaviour. They have the same motivation as they believe that it is crucial to understand your customer’s preferences to be able to effectively target them. However, that research focusses on a general segmentation of web shoppers and is conducted already fifteen years ago. These researches have contributed in understanding the segmentation approach, but this research will contribute by focusing on the online touch points in the customer journey and will give an up-to-date advice per segment in the market.

Hence, the main objective of this research is to test what the influence is of different aspects of online touch points on purchase for different segments in the market. To research the influence of the different aspects of online touch points, four sub questions have been defined:

1. Which touch points result in making a purchase?

2. How many touch points per journey results in making a purchase?

3. How much time spent on a touch point results in making a purchase, i.e. what is the influence of duration on purchase?

4. What type of weather results in making a purchase, i.e. during what type of weather should a segment be targeted?

(8)

8 To investigate these questions, data about a travel agency, collected by GfK, will be used. The data consists of 29.011 customer journeys as well as demographic information about the customers. The first part of this research will cover an extensive explanation of previous research on this topic, hereafter multiple hypotheses will be formulated and a conceptual framework will be presented. The second part of this research will start with cluster analysis to form the segments, after which a more comprehensive data analysis will be done on the segments. Next, the results of the data analysis will be discussed and marketing implications will be provided. Lastly, this research will end with limitations and directions for future research.

2. LITERATURE REVIEW

The customer journey is the interaction between a firm and a customer consisting of multiple moments of contact, which starts when a customer is pursuing a certain goal. Previous research of Neslin et al. (2006a) and Lemon and Verhoef (2016) both state that the customer journey consists of three phases. Neslin et al. (2006a) refer to this phases as ‘search’, ‘purchase’ and ‘after sale’ and Lemon and Verhoef (2016) call these phases ‘prepurchase’, ‘purchase’ and ‘postpurchase’. In this research, the terminology of Lemon and Verhoef (2016) will be further used. The prepurchase phase contains all customer interaction with a brand, category or the environment before a purchase (Lemon & Verhoef, 2016). The second stage, purchase, involves the interaction with the brand and its environment during the purchase, which is characterized by behaviour concerning, among others, the choice of a product or service and the payment (Lemon & Verhoef, 2016). Postpurchase, the last stage, covers the customer interaction, such as usage, postpurchase engagement and service requests after the purchase has been made (Lemon & Verhoef, 2016). A customer moves through these three phases at its own pace and it can take days, weeks or even months before a journey has ended. Thus, all journeys are different and customer-specific.

(9)

9 The following section will further elaborate on the literature per sub question, after which a hypothesis for each sub question will be formed. It should be noted that in this research, first cluster analysis will be conducted based on behavioural variables, after which several clusters will be formed and described with demographical characteristics. It is expected that the results between these clusters will vary, and therefore, most likely will not all suit with the hypotheses that will be mentioned later in this paper. However, it is still important to give a broad overview on what is currently known about the touch points and the customer journey. Hence, this research can been seen as an exploratory research.

2.1 Type of touch points

This paragraph will first give a brief introduction to the different type of touch points discussed in previous literature. Second, the difference in effect of CICs and FICs will be discussed, and third the contribution of this research will be debated, followed by the first hypothesis.

A large number of touch points has been recognised in the literature, which are, among others, social media, websites, search engines, referral websites, email, display ads, paid search ads and loyalty programs (Li & Kannan 2014; Lemon and Verhoef 2016; Anderl et al. 2016 ). All these touch points can be categorized based on several criteria. Lemon and Verhoef (2016) identify four categories of touch points: brand-owned, partner-owned, customer-owned and external. Customers might use touch points from all the aforementioned categories at some point in their journey, and the importance of a touch point within a category might differ per stage of the customer journey. The brand-owned touch points are interactions that are managed and designed solely by the firm, while partner-owned touch points are managed and designed by the firm and one or more of the firm’s partners (Lemon & Verhoef, 2016). Firms cannot influence the customer-owned touch points, which are decisions that the customer takes, and external touch points, which are social influences from other customers or independent information sources (Lemon & Verhoef, 2016). Baxendale, Macdonald and Wilson (2015) divide the touch points in three categories, namely firm-initiated touch points (e.g. brand advertising), retailer touch points (e.g. retailer advertising or in-store communication) and third party touch points (e.g. word-of-mouth received, peer observation or traditional earned media). Anderl et al. (2016) identify only two categories, namely firm-initiated touch points (e.g. displays, social media and newsletter) and customer-initiated touch points (e.g. direct type-in and price comparison). Hereby is the focus on whether the firm or the customer has started the interaction. Touch points can also be categorized based on personal and non-personal touch points (Payne, Peltier & Barger, 2017). Personal touch points mean that customers have direct contact with the brand, either face-to-face or digitally, while there is no direct contact between the customer and the firm with non-personal touch points (Payne, Peltier & Barger, 2017).

(10)

10 Kannan 2014; Baxendale, Macdonald & Wilson 2015; De Haan, Wiesel & Pauwels 2016). Therefore, it is difficult to determine the effect of online touch points on purchase. Baxendale, Macdonald and Wilson (2015) focus in their research on six types of touch points, and argue that in-store communication is most influential, followed by brand advertising and peer observations. Even though this research emphasises on offline touch points, the results can still be used to understand that every touch point has a different effect on purchases. Note that a distinction can aslo be made in the level of influence between FICs and CICs. CICs are proven to be more effective than FICs (Shankar & Malthouse 2007; Sarner & Herschel 2008; Wiesel, Pauwels & Arts 2011; Li & Kannan 2014; De Haan, Wiesel & Pauwels 2016).

Research on the effect of CICs and FICs is done with data on CICs and FICs of the focus brand (Li & Kannan 2014; De Haan, Wiesel & Pauwels 2016). However, the addition of also including competitor touch points could lead to valuable insights of the effect on purchase. Moreover, previous research has focussed on CICs and FICs of a large online retailer (De Haan, Wiesel & Pauwels, 2016). The threshold for frequently searching for, and making a purchase at an online retailer is rather lower, however, other branches, such as a travel agency or telephone company, offer products that are only purchased occasionally. Hence, it is relevant to investigate whether there is a difference in effect between websites, where you often can make a purchase (online retailer), and websites where you only make a purchase once in a while (travel agency or phone company). Furthermore, previous research often uses a combination of online and offline touch points (Baxendale, Macdonald & Wilson 2015; De Haan, Wiesel & Pauwels 2016). In the research of Baxendale, Macdonald and Wilson (2015) online touch points are only one subset of all the touch points. There is no distinction between all the different online touch points available, while currently there is a broad variety of online touch points accessible, which requires a much broader and more detailed explanation of the effect of online touch points. Previous literature has greatly contributed to the knowledge of the effect of CICs and FICs, however, many opportunities are still left open for further research. This research aims to provide an in depth understanding and contribution to previous literature focussing on CICs and FICs of a focus brand as well as the competitors, including a great variety of different online touch points.

In conclusion, this research can give a more detailed description of the effect of online touch points. Using a more extensive selection of online touch points, it will be tested if the findings of previous research also apply to this research. Therefore, based on previous research, the first hypothesis is:

(11)

11 2.2 Frequency of touch points

This paragraph will first give an overview on the psychological side of repetition. Second, the advantages and disadvantages of frequency for the firm and customer will be highlighted, followed by the second hypothesis.

The study of repetition is a well-known topic in psychology and consumer behaviour. The ‘exposure effect’ explains the effect of making an individual’s attitude toward an object more positive, when those objects are repeatedly shown to an individual (Zajonc & Markus, 1982). Research on repetition of banner advertisement has been performed, in which the authors found that people, who were in the research group that was repeatedly exposed to the banner ad, have a higher perceptual fluency than people in the research group that have not been exposed to the banner ad before (Fang, Singh & Ahluwalia, 2007). Thus, the attitude towards the ad is more positive when seen more often (Fang, Singh & Ahluwalia, 2007). These results from studies in consumer behaviour show that frequent exposure leads to a more positive associations with a brand. This could lead to a higher chance of making a purchase. To relate it to the setting of this research, it would signify that it is beneficial for firms to frequently use banners, email, prerolls or other FICs in the customer journey to increase the amount of purchases.

(12)

12

banner ads, and Schlosser, Shavitt and Kanfer (1999) argue that web ads are less preferred than ads in traditional media.

There is a division in previous research between the positive and negative effects of frequent exposure to touch points. One group argues that higher frequency leads to higher commitment, brand consideration and purchase intention, while the other group argues that higher frequency results in advertising clutter, which will decrease the attitude toward the brand and purchase intention. In this research, CICs as well as FICs will be used in the data analysis. CICs start due to an interest and action from the customer (De Haan, Wiesel & Pauwels 2016). Hence, customers make their own decision to interact with the brand, and will most likely not do this if this results in negative experiences. The FICs, on the other hand, are an action of the firm. Nevertheless, many researches have shown that frequent exposure to FICs result in a positive attitude towards the brand, a positive word-of-mouth, and a higher purchase probability (Manchanda et al. 2006; Fang, Singh & Ahluwalia 2007; Ieva and Ziliani 2018). However, firms have to be careful that they do not overdo it as this could lead to advertising clutter. Thus, both CICs and FICs have, presumably, a positive effect. Therefore, the second hypothesis is:

H2: The frequency of online touch points has a positive effect on purchase

2.3 Duration of touch points

This paragraph will first give a brief introduction to website visit duration linked to touch points. Second, the influence of visit duration and the factors influencing visit duration will be discussed. This paragraph will be concluded with the third hypothesis.

Website visit duration is a widely investigated research topic. Many authors show that visit duration leads to several benefits to the firm (e.g. Hanson 2000; Bucklin & Sismeiro 2003; Danaher & Mullarkey 2003; Moe & Fader 2004; Lin 2007; Mallapragada, Chandukala & Liu 2016). In this research, website visit duration is mostly interesting for the CICs, as these could include the websites of both the focus brand and competitor brands. However, even though most research is focused on website visit duration, the effect of duration of a touch point on purchases is also relevant for the FICs. Firms are able to measure how long customers spend on prerolls and banner ads, hence, knowing whether duration leads to more purchases, firms can anticipate to this. Furthermore, Bucklin and Sismeiro (2003) and Danaher and Mullarkey (2003) both report that longer website visit durations are more likely to result in actively noticing banner ads by customers.

(13)

13 (2007), who states that the consumer’s willingness to ‘stick’ to a website is a strong predictor of purchase intention. These results are consistent with Bucklin and Sismeiro (2003), who argue that longer visit duration gives consumers more time to consider and complete a purchase. Furthermore, Hanson (2000) and Bucklin and Sismeiro (2003) also state that longer visit duration helps to retain the consumer’s interest in the website. A greater consumer’s interest leads thereafter to repeat visits, which are followed by more long-term purchases (Moe & Fader, 2004). Thus, previous research all acknowledge the importance of duration and its additional advantages. Park and Chung (2009) also argue that customers, who spend more time on a website, are more likely to make a purchase, however, surprisingly they also find that customers who enter the website through referral are more likely to make a purchase if they spend a shorter time on the website. Customers, who directly enter the website have a goal-directed search motivation, whereas entering the website through a referral is associated with an exploratory search motive (Park & Chung, 2009). These findings are insightful for this research, as it could mean that the optimal duration might differ per type of touch point. In the research of Park and Chung (2009) the effect of duration is only tested for direct access and referral, however, this research intends to use many more touch points.

Danaher, Mullarkey and Essegaier (2006) investigate the factors that affect website visit duration. The results show that only two demographic variables, gender and age, have a significant effect on visit duration. Education does not seem to have a significant effect on visit duration. Findings indicate that visit duration is longer for women and visit duration increases with age (Danaher, Mullarkey & Essegaier, 2006). Moreover, they argue that entertainment and auction websites have longer visit duration as well as websites with high graphical content. However, the results also show that websites with higher level of advertising result in shorter visit duration. A possible explanation could be that, as mentioned before, many consumers actively avoid banner ads (Drèze and Hussherr, 2003).

Research has shown that longer website visits are beneficial for purchases as well as noticing banner ads. In this research, it will be tested whether these positive duration effects hold for all different touch points. Hence, the third hypothesis is:

H3: Duration of online touch points has a positive effect on purchase

2.4 Weather influence

(14)

14 Weather appears to have an influence on consumer behaviour in various ways. Weather can affect consumers mood and well-being, but also purchase behaviour, which is indeed confirmed by previous research. The effect of weather has been researched for both offline shopping (Parsons 2001; Murray et al. 2010) and online shopping (Steinker, Hoberg & Thonemann, 2017). Parsons (2001) finds that as rain as well as temperature increases, less consumers go out for shopping, however, Murray et al. (2010) argue that a rise in the exposure to sunshine leads to a decline in the negative affect, which thereafter leads to an increase in consumer spending. Though, these results were found in studies focused on offline purchases.

This research will focus on online purchases, and therefore it is expected that these patterns might be different. For example, on rainy days consumers will not be enthusiastic to go outside, which might increase the probability that consumers will make an online purchase. This expectation is in line with the results found in Steinker, Hoberg and Thonemann’s (2017) research on the effect of weather for an online retailer. The findings suggest that sales decrease on days with good weather, i.e. rainy or cloudy days would be beneficial for online purchases. This corresponds with the conclusions that on days with bad weather mood is affected and as a result, people dedicate their leisure time to more entertaining and rewarding activities such as watching television or online shopping (Eisinga, Franses & Vergeer, 2011).

Based on previous research on the effect of weather on consumer behaviour and purchase behaviour, it can be concluded that bad weather (e.g. rain and cloudiness) has a positive effect on online shopping. As the focus in this research is comparable with Steinker, Hoberg and Thonemann (2017), both researches focus on online purchases, even though Steinker, Hoberg and Thonemann (2017) focus on daily purchases, similar results are expected to be found. Hence, the fourth hypothesis is:

H4: Bad weather (e.g. rain and cloudiness) has a positive effect on purchase

2.5 Conceptual framework

(15)

15 Figure 1: Conceptual framework

3. METHODOLOGY

3.1 Data collection

In this research data from GfK, Germany’s largest market research institute, will be used to test the hypotheses. The dataset contains data about a large travel agency, consisting of data on the demographic information of the customers as well as on the customer journeys. The data is collected through the GfK panel between May 31st 2015 and October 31st 2016. The data is collected through a Dutch panel, and hence the dataset contains information about Dutch consumers. The dataset consist of panel data, also called longitudinal data, which means that the dataset consists of “a time series of each cross-sectional member in the dataset” (Wooldridge, 2012, p. 10). Furthermore, it consists of online data and the data is event-based, which means that each observation represents an event, which in this case is an interaction between the firm and the customer through either a CIC or a FIC. This research can be qualified as quantitative research, which is a research methodology that “seeks to quantify the data and, typically applies some form of statistical analysis” (Malhotra, 2009, p. 171).

(16)

16 this dataset will be used in this study: the time stamp, duration, the type of device, the type of touch point (appendix 1) and whether there has been a purchase.

Furthermore, the variable ‘bad weather’ will be used in this research. Since the dataset contains information regarding Dutch consumers, the data regarding the weather should also be Dutch weather data. Hence, the data is collected through the Royal Dutch Meteorological Institute (Dutch abbreviation: KNMI). The KNMI provides a database with daily weather measurements per weather station in the Netherlands. In this research, the weather station ‘De Bilt’ is chosen, as this is located in the middle of the Netherlands.

3.2 List of analysis

3.2.1 Cluster analysis

Cluster analysis is a technique to classify observations from a heterogeneous sample into homogenous groups. Cluster analysis can be defined as “given a representation of n objects, find K groups based on a measure of similarity such that the similarities between objects in the same group are high while the similarities between objects in different groups are low” (Jain, 2010, p. 652). The first step in cluster analysis is to decide on which variables the clusters will be formed. Active variables, which are the variables that will be used to form the clusters, and passive variables, which are the variables that are used to describe the customers in the clusters, are both important parts of the cluster analysis. The selection of active and passive variables will be further discussed in section 3.3.1.

(17)

17

potential outliers will be detected and solved during the preliminary checks, no outliers will occur in this dataset, and therefore, the Ward’s method will be the most optimal hierarchical cluster method. The Ward’s method generates clusters that minimize the within cluster variance (Punj & Stewart, 1983).

Based on several criteria, the ideal number of clusters can be determined. The first criteria that will be used to determine the appropriate number of clusters is the dendogram. Using the dendogram, the optimal number of clusters can be determined by the distance from moving from one cluster solution to another. If this distance is small, this step can be taken, however, if the distance is large, much information will be lost and this step should not be taken. The optimal number of clusters is the cluster solution before much information is lost. The second criteria is the elbow method, which looks at the within-cluster sum of square. These results can be plotted in a scree plot and the number of clusters at the bend in the plot is generally considered to be the appropriate number of clusters. The third method is the average silhouette method, which computes the average silhouette of observations for different values of K, and the optimal number of clusters is the one with the highest average silhouette (Kaufman & Rousseeuw, 1990).

Thereafter, non-hierarchical clustering will be applied, often also called K-means clustering. Again, the elbow method will be used to determine the appropriate number of clusters. Furthermore, the optimal number of clusters can be measured with the internal validation of the clusters. Internal validation can be measures with connectivity, the silhouette coefficient and the Dunn index. Connectivity focuses on to what extent an observation in placed in the same cluster as their nearest neighbours in the dataset. This value should be minimized. The silhouette coefficient looks at the average distance between the cluster and this number should be maximized. The Dunn index is calculated by dividing the distance between observations of different clusters by the distance between the observation within the same cluster. Hence, the Dunn index should be maximized.

Once the clusters are formed with K-means clustering, it should be tested whether the clusters are significantly different from each other. This can be tested using the Analysis of Variance (ANOVA), however, there are three assumptions that have to be met before ANOVA can be performed. These assumptions are that the error term is normally distributed, the error terms are uncorrelated and the categories of the of the independent variables are assumed to be fixed (Malhotra, 2009). If all assumptions are met, ANOVA can be applied.

3.2.2 Binary logit model

(18)

18 the most preferred. The binary logit model is favoured over the linear probability model, because the estimation model of the linear probability model does not restrict that the probability has to lie between 0 and 1, which results in negative probabilities or probabilities larger than 1. The binary logit model is preferred over the binary probit model, because of the mathematical convenience of the specification of the logit model and the interpretation of the parameters is slightly easier for the logit model (Leeflang et al., 2015). Hence, after the cluster analysis, the binary logit model will be used for every cluster to determine the effect of the independent variables on the dependent variable. The binary logit model, also called logistic regression, “commonly deals with the issue of how likely an observation is to belong to each group” (Malhotra, 2009, p. 621). The binary logit model estimates the probability of success, and based on a cut-off value determines if an observation belongs to a certain group and has 0 and 1 as asymptotic value. The probability, using the binary logit model, can be modelled as:

𝑝 =

1

1 + exp −(∑

𝑘𝑖=0

𝛽

𝑖

𝑋

𝑖

)

where p = probability of purchase 𝑋𝑖 = independent variable i 𝛽𝑖 = parameter to be estimated

The validation of the model can be determined by several criteria. First, the model will be compared to the naïve model, which is a model without explanatory variables. Three methods will be used to check whether the presented model outperforms the naïve model. The methods that will be used are the hit rate, the top decile life (TDL) and the Gini coefficient. The hit rate shows how many outcomes of the dependent variables are correctly classified by the model (Leeflang et al., 2015). The TDL focuses on the top 10 percent of the customers, which are the customers that have the highest predicted probability of making a purchase, and is calculated by dividing the rate of the top 10 percent by the overall rate (Lemmens & Croux, 2006). The higher the TDL, the better is the classifier. The Gini coefficient, on the other hand, focuses not only on the top 10 percent, but also on the other scores, thus it is a broader measure than the TDL (Lemmens & Croux, 2006; Neslin et al., 2006b). Second, the Nagelkerke R2, McFadden R2 and Cox & Snell R2will be calculated, whereby the higher the R2 is, the better the model is. These R2s are based on “comparing the likelihood of a model with only an intercept to the

(19)

19 3.3 Variables

3.3.1 Variables used for cluster analysis

Active variables: Punj and Stewart (1983) argue that it is crucial to pay attention to variable selection for cluster analysis, because one variable could already distort an useful cluster analysis, and therefore, there should be some rationale for the selection of the variables. Demographic segmentation is the most popular form of segmentation (Beane & Ennis, 1987). However, Wells et al. (2010) state that the use of demographic variables for explaining product choice, brand loyalty and price responsiveness has often been accused of over simplification. Hence, using behavioural characteristics will provide better results, and demographic variables will be used as passive variables to describe the segments.

For cluster analysis the dataset will be aggregated on customer level. This indicates that all active variables will be formed per unique user ID. The first variable that will be used for segmentation is share of device, which indicates how often the customer has used a mobile device (smartphone/tablets). The share of device is calculated by dividing the total number of times a customer uses a mobile device by the total number of touch points. The research of Ström, Vendel and Bredican (2014) shows that mobile device shoppers are potentially valuable to the firm, due to higher income and/or education. The second variable that will be used is the share of customer-initiated touch points, which is calculated by the number of CICs divided by the total number of touch points of a customer. FICs are often pushed to the customer and are not always wanted by the customer (Blattberg, Kim & Neslin, 2010). CICs, on the other hand, require an action, and thus a level of interest, of the customer (De Haan, Wiesel & Pauwels 2016). Hence, one could argue that customers using more CICs are more involved and are more beneficial for the firm. The third variable that will be used for segmentation is the share of competitor touch points and the fourth variable that will be used is the share of neutral touch point (e.g. accommodation website/app/search, comparison website/app/search, flight tickets website/app/search and generic search). The share of competitor touch point is measured by the total number of the competitor touch points divided by the total number of touch points per user. The share of neutral touch points is calculated by the total number of neutral touch points divided by the total number of touch points per user. Baxendale, Macdonald and Wilson (2015) discuss that competitor touch points negatively affect brand consideration for the focus brand. One could argue that a cluster consisting of customers that encounter many competitor touch points, could be less interesting for the focus brand, as there is a higher chance that this cluster will choose a competitor’s brand. The final variable that will be used for segmentation is the number of customer journeys per customer.

(20)

20

𝑧 =

𝑋 − 𝜇

𝜎

Passive variables: to describe the clusters after cluster analysis, passive variables will be used. Passive variables can be used for group identification, and describe what type of customers are in the different clusters. The variables that are used to describe the clusters are gender, age, education (ranging from 1 to 8, for which 8 is the highest level), income (ranging from 1 to 7, for which 7 is the highest level), household size and life stage. These variables are chosen as they can give valuable insights in which variables are most interesting to target. The study by Dittmar, Long and Meek (2004) acknowledges that gender differs in attitudes toward online buying and Hasan (2010) finds that women value the utility of online shopping less than men. Furthermore, “the higher a person’s income, education, and age, the more likely that person will buy online, and the higher a person’s income, the more online transactions that person is likely to make” (Bellman, Lohse & Johnson, 1999, p. 37). Nagra and Gopal (2013) also confirm that age, gender and income all have an impact on the frequency of online purchases. Clusters that consist of customers with these attributes are probably more valuable to firms. Moreover, family size significantly influences the total spend on internet shopping (Richa, 2012). It should be noted that Bellman, Lohse and Johnson (1999) also mention that demographics only predict a small percentage of the decision to buy or not to buy.

3.3.2 Variables used for binary logit model

(21)

21 Dependent variable Description

Purchase any This indicates whether a customer has made a purchase, either at the focus brand or at a competitor. The dependent variable is a dummy variable with the value 0 meaning no purchase and the value 1 meaning purchase.

Independent variable Description

Type of touch point The type of touch point is an interval variable, indicating on which touch point the customer and the firm interact. Each touch point is a separate variable, showing how often that touch point has occurred in the customer journey (appendix 1).

Frequency of touch points The frequency of touch points is an interval variable, indicating how many touch points, either customer-initiated or firm-initiated, a customer has faced during his or her journey.

Duration of touch points The duration of touch points is an interval variable, indicating the average time a customer has spent on a touch point. For this variable the average will be taken by dividing the total duration spent on touch points per customer journey by the number of touch points in the journey.

Bad weather The variable bad weather can be divide into two sub variables: rain and cloudiness. Rain is an interval variables, which is measured as the sum of the rainfall per day. Cloudiness is a categorical variable measured as the average cloudiness per day, ranging from 0, which indicated no cloudiness at all, and 9, which indicates that the sky is completely covered by clouds.

Table 1: Dependent and independent variables

3.4 Model specification

Based on the abovementioned independent variables, the binary logit model in this research will be the following:

𝑝 = 1

1 + 𝑒𝑥𝑝 (−(𝛽0+ 𝛽𝑗∑20𝑗=1𝑇𝑃𝑖𝑗+ 𝛽21𝐹𝑅𝑖+ 𝛽22𝐷𝑈𝑅𝑖+ 𝛽23𝑅𝐴𝑖+ 𝛽24𝐶𝐿𝑖))

Where

𝑝 = the probability that a customer makes a purchase in customer journey i 𝛽0 = the intercept

𝑇𝑃𝑖𝑗 = frequency of touch point j in customer journey i (overview touch points: see appendix 1)

𝐹𝑅𝑖 = frequency of touch points in customer journey i

𝐷𝑈𝑅𝑖 = average duration of touch points in customer journey i

𝑅𝐴𝑖 = rainfall in customer journey i

(22)

22 3.5 Plan of analysis

To properly analyse the data, RStudio will be used. The first step of the data analysis will be preparing and cleaning the data. Data cleaning deals with “detecting and removing errors and inconsistencies from data in order to improve the quality of data” (Rahm & Do, 2000, p. 3). Data quality problems can occur due to misspellings, missing data or invalid data (Rahm & Do, 2000). The data must be arranged in such a way that the aforementioned analyse methods can be applied. Furthermore, new variables have to be created and the weather variables have to be added to the dataset. Thereafter, the data will be tested for outliers and missing values. The cause of the outliers and missing values will be determined and based on this, a proper solution will be applied. After this the dataset is ready for the analyses and an overview will be provided with descriptive statistics to get a feel for the data in the dataset.

The second step is performing cluster analysis. First, hierarchical cluster method, Ward’s method, will be performed, and the most preferred number of clusters will be determined. Once it is known how many clusters is optimal, the non-hierarchical clustering method will be performed to optimize the clusters. Then, ANOVA will be used to check if the cluster are significantly different from each other. The descriptions of the clusters will follow after.

The third step is performing the logistic regression, which provides the results of the effect of the independent variables on the dependent variable. A separate logistic regression will be performed per cluster to check if there are differences in the effect on the dependent variable per cluster. First, it will be checked whether there is multicollinearity and thereafter, the validation of the model will be checked with the hit rate, TDL, the Gini coefficient and the Nagelkerke R2, the McFadden R2 and the Cox & Snell R2.

After the analyses in RStudio, the results will be discussed in the discussion chapter.

4. ANALYSIS

This chapter will describe the procedure and the results of the data analysis. First, the preliminary checks will be performed and the issue of outliers and/or missing values will be solved. Next, the statistical descriptions will be provided to get a feel for the data. Thereafter, the cluster analysis will be performed, followed by a logistic regression for each cluster.

4.1 Preliminary checks

(23)

23 dataset. However, there are outliers and missing values that have to be handled. Checking for outliers, one extreme outliers is detected, which consist of 64.503 touch points for one user ID (graph 1). This number deviates too much from the other observations and hence, this user ID will be removed from the dataset.

Graph 1: Boxplot of number of touch points per user

Furthermore, for several variables missing values have been detected. Missing values should be solved as the analysis procedures are developed for complete datasets, otherwise inaccurate predictions or biased estimates can occur. The variable ‘duration’ contains 141.030 missing values (graph 2).

Graph 2: Missing values of duration per touch point

(24)

24 extremely low and hence, it can be concluded that the variables in this dataset are not suitable for MI of the duration of the CICs. Therefore, it is chosen to use mean substitution, which entails that you calculate the mean of a variable and use the calculated mean for the missing values, which solves the missing values. Again, the mean of the duration is calculated separately for every touch point, i.e. the missing values of the first touch point are imputed by the average duration of the first touch point. After the process of mean substitution, no missing values occur anymore for the variable duration.

Moreover, for 1.603 user ID’s no information is available regarding the demographics. Again, listwise deletion is not a reasonable approach as more than 5% of the data points will be lost. Hence, another approach has to be chosen. The demographic information will only be used to describe the clusters after the cluster analysis. Therefore, it has been chosen that first cluster analysis will be performed and then it will be checked whether the user ID’s with missing values are equally distributed over the clusters. Based on that outcome, if necessary, further steps will be taken to solve the missing values.

After checking and handling the preliminary checks, the dataset is ready for analysis.

4.2 Statistical descriptions

To get a feel for the data used in this research, this paragraph will highlight the main characteristics of the variables.

Customers in this dataset are on average 52 years old, with the youngest customer being 17 years old and the oldest 94 years old. In this dataset, 60% is female and 40% is male. On average, the customers have a household of 2,5 people, however a single household is by far the most common as 21,1% of the customers has a single household. The maximum number of people in one household is eleven. Of all households, 73,1% does not have children and 26,9% has one or more children. The largest group in the data (16,6%) does not know or want to share its income, followed by a large group (16,3%) that earns the average gross income. Earning twice the average is least common (4%).

(25)

25 4.3 Cluster analysis

4.3.1 Cluster analysis

For the cluster analysis a separate data frame is created, which is aggregated on customer level (user ID). This data frame contains the active variables that will be used for cluster analysis. The active variables are share of device, share of CICs, share of neutral touch points, share of competitor touch points and the number of journeys per user. Since these variables are not all on the same scale, the variables are standardized. As mentioned in the methodology chapter, the hierarchical cluster method, Ward’s method, will be performed first and based on several criteria the optimal number of clusters will be determined.

The ‘within sum of squares’ method, also known as the elbow criteria indicates that four is the optimal number of clusters (graph 3). At four clusters the line bends most and therefore, this dot indicates that four clusters will give the best clustering results. Furthermore, the average silhouette method also indicates that four clusters should be used (graph 4). A high average silhouette width shows the finest cluster solution, and since four clusters has the highest average silhouette width, four clusters is optimal.

Graph 3: Elbow method Graph 4: Average silhouette method

(26)

26

Graph 5: Dendogram

In conclusion, based on the elbow method and the silhouette method, four clusters is the optimal number for hierarchical clustering. The dendogram does not give a clear cluster solution, hence the cluster solutions of the elbow method and the silhouette method will be used and the optimal number of clusters for hierarchical clustering is four.

Next, the non-hierarchical cluster method, K-means is performed. The elbow criteria indicates that five clusters is the optimal number of clusters (graph 6). Furthermore, comparing the silhouette coefficient, the Dunn index and connectivity validation, all three measures indicate that two clusters would be most optimal (appendix 3). These results do not suit with the result of the elbow method, and therefore, it is difficult to provide one overall conclusion on the best number of clusters for K-means clustering.

Graph 6: Elbow method

(27)

27 clusters, the fifth cluster only has 172 customers, which is, compared to the sizes of the other clusters, almost negligible, and therefore, it makes sense to combine this cluster with another cluster, which indeed happens when four clusters are chosen. The customers are fairly spread over the clusters. Cluster 1, 2, 3 and 4 contain respectively 2.262, 2.462, 3.505 and 1.448 customers. Therefore, also taken into account the results of hierarchical clustering, it has been chosen that K-means will be performed with four clusters.

Hence, the final K-means clustering will be performed with four clusters and these clusters will be further used in the logistic regression. The cluster distribution is added to a dataset with the passive variables to describe the different clusters. An ANOVA test will be used to check whether the clusters are significantly different. First, the three assumptions of ANOVA, mentioned in section 3.2.1, are tested. Testing the first assumption, non-normality is detected. The normality plots already show a non-normal distribution (appendix 4), which is also confirmed by the Kolmogorov-Smirnov test and the Jarque-Bera test. Both test show a significant result, respectively p = <2,2e-16 and p = < 2,2e-16, indicating non-normality. Hence, the first assumption is not met and it can be concluded that the ANOVA cannot be performed. Therefore, the Kruskal-Wallis test is used. The Kruskal-Wallis test is often used when the assumptions for ANOVA are not met and checks whether the median of the clusters are significantly different. The test shows a significant result (p = <2,2e-16), meaning that the null hypothesis can be rejected and the clusters are significantly different from each other. Thus, these four clusters can be used for further analysis.

4.3.2 Cluster description

The first step in describing the clusters is checking the distribution of the missing values of the passive variables. As mentioned in paragraph 4.1, the demographic information, which will be used to describe the clusters, has 1.603 missing values. A first look at the clusters indicates that the missing values are fairly distributed, which is also confirmed after calculating the percentage of missing values per cluster. Cluster 1, 2, 3, 4 have respectively 17,1%, 12,8%, 19,5% and 15,1% missing values of the total number of observations per cluster. Therefore, it is not needed to solve this issue and hence, only the available values will be used to describe the clusters.

(28)

28 Cluster 1 Cluster 2 Cluster 3 Cluster 4

“The competitor seekers”

“The purchase peeps”

“The explorers” “The Modern Mobile (wo)Men”

Average age 54,8 years 52,6 years 52,3 years 46,1

Gender 38,9% male 61,1% female 38,7% male 61,3% female 40,1% male 59,9% female 43% male 57% female Most common household size

2 people (42,6%) 2 people (43,5%) 2 people (38,1%) 2 people (35,5%)

Least common household size

7 people (0,1%) 8 people (0,2%) 10 and 11 people (both 0,07%) 8 people (0,08%) Most common income level Between 1 and 2 times average (19,5%) Between 1 and 2 times average (22,3%) Between 1 and 2 times average (20,5%) Between 1 and 2 times average (24,6%) Least common income level Above 2 times average (2,9%) Above 2 times average (4,6%) Above 2 times average (3,6%) Minimum income (4,6%) Most common education level

Level 4 (31,2%) Level 4 (28,8%) Level 4 (27%) Level 4 (26,4%)

Least common education level Level 1 (2,2%) (no education) Level 1 (1,9%) (no education) Level 1 (1,9%) (no education) Level 1 (1,8%) (no education) Most common life stage Empty nesters (33,4%) Empty nesters (32,5%) Empty nesters (29%) Empty nesters (20,7%) Least common life stage Young families (3,3%)

Single parents, adult child(ren) (4,1%)

Single parents, adult child(ren) (4,2%)

Single parents, adult child(ren) (3,1%) Purchase yes or no 24% yes

76% no 54,2% yes 45,8% no 19,3% yes 80,7% no 20,8% yes 79,2% no Average share mobile device 0,06188 0,03764 0,00718 0,9496

Average share CICs 0,9936 0,9700 0,9977 0,9986

Average share neutral touch points

0,215244 0,6800 0,8870 0,7755

Average share

competitor touch points

0,7229 0,24244 0,10031 0,19116

Average number of journeys

2,5 5,8 2,0 3,5

Table 2: Statistical descriptions of the clusters

(29)

29 and 3 the smallest group earns twice the average income, while in cluster 4 the smallest group earns the minimum income.

However, comparing the results of the behavioural variables, some clear differences between the clusters can be established. Cluster 1, the “competitor seekers”, uses, compared to the other clusters, many competitor touch points. The average share of competitor touch points in cluster 1 is 0,7229, while cluster 2, 3 and 4 have, respectively, a share of only 0,24244, 0,10031 and 0,19116. Cluster 1 also has journeys that consist of solely competitor touch points, which does not occur in the other clusters. Cluster 2, the “purchase peeps”, is the only cluster for which a majority, 54,2% of the customers, has actually made a purchase. Also, the average number of journeys per customer is much higher than in the other clusters. The customers in cluster 2 have on average 5,8 different journeys, while the customers of other clusters have on average only 2,5 journeys (cluster 1), 2 journeys (cluster 3) or 3,5 journeys (cluster 4). Cluster 3, the “explorers”, uses the most neutral touch points and the least competitor touch points in the journeys compared to the other clusters. In cluster 4, the “modern mobile (wo)men”, almost all customers (average share of 0,9496) use a mobile device in the customer journey, by which they distinguish themselves from the other clusters. In cluster 1, 2 and 3 a mobile device is hardly used in the journey, shown by the extremely low average share of mobile device, which are respectively for cluster 1, 2 and 3, 0,06188, 0,03764 and 0,00718. All clusters have a remarkably high average for the share of CICs, however this can be explained due to the fact that the ratio of CICs to FICs measured in this dataset is 2.413.345 CICs (98,2%) to only 43.069 FICs (1,8%).

4.4 Logistic regression

Before running the logistic regression, for each cluster the data is split into a training set and a test set. The training set, which consists of 75% of the data is used to train the model and the remaining 25%, the test set, is used to test the model.

(30)

30 which indicates whether a certain touch point has occurred (1) in a journey or not (0). Once more, the variable frequency remains unchanged. Looking at the results of this model, the variable frequency has again a missing value in the output. Therefore, it has been decided that the variable frequency will be removed from the model, resulting in values for all other variables in the model. However, due to the removal of the variable frequency, the model is no longer able to test the second hypothesis.

Furthermore, in cluster 3 perfect separation occurs, which means that the model can indicate with a hundred percent certainty that an observation will be a 0 or a 1.This perfect separation might lead to inflated coefficients that are no longer reliable. A first look at the output and the validation of this model does not show any odd results, thus with some caution, one can assume that the results of this model can be used in further analysis. A more comprehensive analysis indicates that the touch point comparison app causes the perfect separation. Running the model without this variable provides very similar results to the model including this variable. Comparing the results of the two models shows, except for the touch point accommodation search, similar results regarding the coefficients and significance levels of the variables. Accommodation search is no longer significant (p = 0,14965) when removing comparison app from the model. Moreover, the findings of the validation of both models also show very comparable results. Therefore, one can assume that the results of the first model are not affected by the perfect separation and with some caution, one can proceed with these results. Furthermore, a Firth logistic regression can be performed to address the problem of perfect separation. The results of the Firth logistic regression again show comparable results with the original model for cluster 3. However, the touch points accommodation search and travel agent website focus brand are no longer significant (p = 0,18726 for accommodation search and p = 0,10094 for travel agent website focus brand). The touch point flight ticket app does become significant is the Firth logistic regression (p = 0,05896), while it is not significant in the original model (p = 0,53631). All other coefficients and significance levels are similar to the original model. Nevertheless, continuing with the Firth logistic regression is not preferable, as the validation of this model cannot be determined. Since, a majority of the coefficients are comparable with the results of the complete binary logistic regression of cluster 3, it can be concluded that the results of the original model can be used in further analysis. Hence, the results of the logistic regression of cluster 3 will be used to compare cluster 3 to the other clusters. However, it should be noted that the results should be interpreted with some carefulness, especially the variable accommodation search, as this variable is not significant in both the model without comparison app and the Firth model.

(31)

31 All models have also been checked for multicollinearity by calculating the Variance Inflation Factor (VIF) scores, as multicollinearity makes the coefficients unreliable (Leeflang et al., 2015). A VIF score higher than five is often taken as a sign of multicollinearity. As shown in appendix 5, all VIF scores are below the threshold, and hence, multicollinearity is not a problem in any of the models.

4.4.1 Comparing the results between clusters

The results of the logistic regression of the four clusters show that cluster 1 has eight significant variables, cluster 2 has ten significant variables, cluster 3 has six significant variables and cluster 4 has only four significant variables.

Touch point

Variable Coefficient P value Significant

in another cluster Intercept - 2.3641698 < 2e-16 *** Yes

2 Accommodation app 0.1114907 0,07671 . Yes

4 Comparison website - 0.0076997 0,00369 ** No

6 Comparison search 0.3958013 0,02890 * No

7 Travel agent website competitor 0.0022122 8,76e-11 *** Yes 9 Travel agent search competitor 0.1388050 0,01813 * No 12 Flight tickets website - 0.0062403 0,05745 . Yes

18 Email 0.0857976 0,03100 * Yes

19 Prerolls - 0.2559430 0,07019 . No

Table 3: Significant variables binary logit model cluster 1, the competitor seekers Significance level: *** 0,1% level, ** 1% level, * 5% level, . 10% level

Touch point

Variable Coefficient P value Significant

in another cluster

Intercept - 2.2350256 < 2e-16 *** Yes

1 Accommodation website 0.0117289 < 2e-16 *** Yes

5 Comparison app 0.0216482 0,03412 * Yes

7 Travel agent website competitor 0.0042646 8.13e-11 *** Yes 8 Travel agent app competitor - 0.0807410 0,04762 * No 10 Travel agent website focus brand 0.0021272 0,00145 ** Yes 11 Travel agent search focus brand 0.2231200 0,08475 . No 12 Flight tickets website 0.0039948 0,00533 ** Yes

17 Banner - 0.0657368 0,04972 * No

18 Email 0.0249590 0,01640 * Yes

20 Retargeting 0.0037854 0,02215 * Yes

(32)

32 Touch

point

Variable Coefficient P value Significant

in another cluster

Intercept - 2,1923260 < 2e-16 *** Yes

1 Accommodation website 0,0007348 0,00784 ** Yes

2 Accommodation app 0,0163003 0,08546 . Yes

3 Accommodation search 0,0496996 0,09621 . No 7 Travel agent website competitor 0,0019848 0,09396 . Yes 10 Travel agent website focus brand 0,0054625 0,05806 . Yes

15 Generic search 0,0113317 0,03539 * No

Table 5: Significant variables binary logit model cluster 3, the explorers Significance level: *** 0,1% level, ** 1% level, * 5% level, . 10% level

Touch point

Variable Coefficient P value Significant

in another cluster

Intercept - 2.551e+00 < 2e-16 *** Yes

1 Accommodation website 3.872e-03 1,12e-07 *** Yes

5 Comparison app 1.447e-03 0,007477 ** Yes

18 Email - 5.529e-01 0,003544 ** Yes

20 Retargeting 1.114e-01 0,000622 ** Yes

Table 6: Significant variables binary logit model cluster 4, the modern mobile (wo)men Significance level: *** 0,1% level, ** 1% level, * 5% level, . 10% level

Analysing the (in)significant variables of the four clusters shows that the touch points flight ticket app, flight ticket search and affiliates do not have a significant effect in any of the clusters. Hence, it can be concluded that these touch points do not affect the probability of making a purchase. Moreover, the variables average duration, rainfall and cloudiness also do not show a significant result in any of the clusters, which means that duration and bad weather do not affect the probability of making a purchase in the travel industry. It can, therefore, also be concluded that hypotheses 3 and 4 are not supported by this research.

(33)

33 For the competitor seekers, the touch points comparison search and travel agent search competitor both have a significant and positive effect on the probability of purchase, while the touch points comparison website and prerolls both have a significant and negative effect on the probability of purchase. These touch points are not significant in any of the other clusters.

For the purchase peeps, three significant touch points occur that are not significant in one of the other clusters. The touch point travel agent search focus brand positively affects the probability of purchase and the touch points travel agent app competitor and banner negatively affect the probability of making a purchase.

For the explorers, the touch points accommodation search and generic search both have a positive and significant effect on the probability of purchase and these variables are not significant in another cluster.

The cluster of the modern mobile (wo)men only contains touch points that are also significant in other clusters.

4.4.2 Validation of the models

To test the validation of the models, several test will be run to compare the models to the naïve models of each cluster. First, the hit rate per model will be determined and thereafter, compared with the hit rate of the naïve model. Furthermore, the TDL and the Gini coefficient will also be used to compare the models of the cluster to a random model. Lastly, the Nagelkerke R2, the McFadden R2 and the Cox & Snell R2 will be calculated.

Cluster 1 Cluster 2 Cluster 3 Cluster 4

Hit rate 88,7% 85,3% 91,2% 93,3%

Hit rate naïve model 79,9% 74,9% 83,9% 87,5%

TDL 3,67 3,45 3,23 3,38

Gini coefficient 0,60 0,66 0,52 0,36 Nagelkerke R2 14,97% 22,09% 12,59% 9,89%

McFadden R2 11,15% 15,73% 9,71% 7,90%

Cox & Snell R2 7,60% 12,93% 5,74% 3,96%

Table 7: Validation of the models

(34)

34 The model of cluster 4 has the highest hit rate, while it also has the lowest Gini coefficient. Cluster 3 also has a very high hit rate, but the lowest TDL. Cluster 1 and 2 have a slightly lower hit rate than cluster 3 and 4, however, cluster 1 has the highest TDL and cluster 2 has the highest Gini coefficient. Hence, cluster 4 is best in predicting the overall hit rate and cluster 1 is best in predicting the top responders.

Moreover, the Nagelkerke R2, the McFadden R2 and the Cox & Snell R2 are calculated and can be found in table 7 above. The R2s cannot be compared across the models as all clusters use different dataset for the logistic regression. The values of the R2s are rather low and it would be worth to re-estimating these models to make better predictions, but, since the models have to remain the same across all clusters, the re-estimation will not be done in this research. However, even though the values are not that high, the models can still be very useful and give a better prediction than without using a model.

5. DISCUSSION

The aim of this research is to give an in depth understanding of the influence of different aspects of online touch points on the probability of purchase. The purchase can either be at the focus brand or at a competitor. The type of touch point, the frequency of touch points, the average duration spent on a touch point and the influence of weather have been identified as aspects that may have an influence on the probability of purchase. For each aspect a hypothesis has been developed. The first hypothesis states that the type of touch point has an effect on purchase, expecting the CICs to have a stronger effect than the FICs. The second, third and fourth hypotheses all state that the independent variables (frequency, duration and bad weather) have a positive effect on purchase. In this research, only the first hypothesis is partially supported. The third and the fourth hypothesis are not supported by this research, as the analysis showed insignificant results in all four clusters. Due to the removal of the independent variable frequency before the logistic regression, the second hypothesis cannot be tested anymore. This chapter will first provide a possible explanation of the results of the analysis for the different hypotheses. Second, implications will be given per cluster and lastly, the limitations of this research will be discussed, resulting in implications for future research.

5.1 Type of touch point

(35)

35 points, such as word-of-mouth. It is expected that touch points affect the probability of purchase and this expectation is partially supported by this research. Most touch points have indeed an influence, either positive or negative, on purchase. However, this research shows that the touch points flight ticket app, flight ticket search and affiliates do not have a significant effect on purchase in any of the clusters. An explanation why flight ticket app and flight ticket search are not significant is difficult to determine in this research. However, a possible explanation why affiliates is not significant could be that customers do not appreciate to be bothered with affiliates. Firms often push their message to the customer, even if the firms do not know whether it is wanted (Shankar & Malthouse, 2007). An overview of the influence of the touch points on the probability of purchase per cluster can be found in table 8 below.

Touch point Cluster 1 The competitor seekers Cluster 2 The purchase peeps Cluster 3 The explorers Cluster 4 The modern mobile (wo)men 1 Accommodation website -

2 Accommodation app √ -

- 3 Accommodation search - -

- 4 Comparison website X - - - 5 Comparison app -

-

6 Comparison search

- - -

7 Travel agent website competitor

-

8 Travel agent app competitor - X - -

9 Travel agent search competitor

- - -

10 Travel agent website focus brand -

-

11 Travel agent search focus brand -

- -

12 Flight ticket website X

- -

13 Flight ticket app - - - -

14 Flight ticket search - - - -

15 Generic search - -

- 16 Affiliates - - - - 17 Banner - X - - 18 Email

- X 19 Prerolls X - - - 20 Retargeting -

-

Table 8: The influence of the touch points on purchase per cluster

‘√’ is a significant positive effect, ‘X’ is a significant negative effect, ‘-’ is no significant effect

Previous research also argues that CICs are more effective than FICs (Shankar & Malthouse 2007; Sarner & Herschel 2008; Wiesel, Pauwels & Arts 2011; Li & Kannan 2014; De Haan, Wiesel & Pauwels 2016), because FICs are usually imposed by firms despite that customers are not always interested

Referenties

GERELATEERDE DOCUMENTEN

Topics investigated and covered during the interviews include: the severity of the margin risk within Gall &amp; Gall, the design of management control systems,

Monetary policy arrangements and asset purchase programs Firstly, in this section is presented how monetary policy is arranged in the United States, the Euro Area, the United

I will focus on the very long eighteenth century, the period roughly from the 1680s to the 1850s, and try to point at several differences that are so fundamental and big that

This indicates that due to overuse (that includes the extraction of groundwater by companies that extract groundwater for drinking water or industrial water and agriculture),

This is a test of the numberedblock style packcage, which is specially de- signed to produce sequentially numbered BLOCKS of code (note the individual code lines are not numbered,

The empirical results show no significant evidence for the influence of debt market changes on M&amp;A payment methods but show significant evidence for the influence

If this is the case, it is important to ascertain which combination of cross-media marketing activities might have the greatest influence on the purchase behavior of

Number of settlement fails: The number of settlement fails per geographic area is also a good indication of operational risk. A settlement failure can be as a result of