MAPPING THE USE OF MULTIPLE DEVICES IN THE CUSTOMER JOURNEY:

(1)

MAPPING THE USE OF MULTIPLE DEVICES IN THE

CUSTOMER JOURNEY:

USING A LOGISTIC REGRESSION AND A LATENT CLASS ANALYSIS

Master thesis of:

E.E.M. Verlinden

Completion date:

(2)

MAPPING THE USE OF MULTIPLE DEVICES IN THE CUSTOMER

JOURNEY:

Master thesis of:

E.E.M. Verlinden

Faculty of Economic and Business Master of Science in Marketing

(3)

MAPPING THE USE OF MULTIPLE DEVICES IN THE CUSTOMER

JOURNEY:

Abstract: Nowadays, an increasing number of people own multiple internet-connected

devices such as a smartphone, tablet and laptop. An increasing number of people use these devices in order to book flight tickets or vacations online. The difference in the use of multiple devices, and the impact of these difference on the online purchase probability is the research area of this study. This research maps the impact of visiting specific touchpoints by use of a mobile or a fixed device on the purchase probability in flight tickets and vacation bookings. Results shows that the use of both devices, has a positive effect on purchase probability. Furthermore, the more people switch between devices the higher the purchase probability. Differences in effect between visiting a touchpoint by use of a fixed or mobile device showed, visiting an arbitrary touchpoint by use of a fixed device leads to a higher purchase probability compared to visiting by use of a mobile device. Also, the use of a mobile device only in the customer journey has a more negative effect on the purchase probability, compared to the use of only a fixed device.

Next to investigating these effects, a segmentation analysis is conducted in order to make segments based demographical variables. This segmentation analysis is performed by use of a Latent Class Analyses. Three segments have been found in this research, namely: the

smart and active switching device users, the lower educated fixed users, and the middle ones.

Each segment shows interesting characteristics.

Keywords: customer journey, purchase probability, leisure travel, logistic regression, latent

(4)

PREFACE

My bachelor study of Business at the University of Applied Sciences Groningen sparked my interest in marketing. Since I was not done studying jet, I decided to continue studying marketing at the University of Groningen. In the year of my pre-master, my interest in

marketing was further developed into the more statistic area. The Marketing Intelligence track of the master marketing helped me further develop my skills and knowledge in these areas of interest. The topic of this thesis helped me gain insights into both specific pieces of

knowledge regarding the customer journey, as well as the field of data science. It also helped me further improved my skills in the programming language R. This thesis signifies the end of my study and study life. After my graduation, I hope that I will find a job as a junior marketing intelligence analyst, where I will have enough opportunity to improve my data science skills further and broaden my knowledge of marketing.

I am thankful for the opportunity for the chance I got to write my thesis on this topic. I first want to thank my supervisor, Peter van Eck, for the guidance towards helping me in writing this thesis. I also want to thank my fellow students, who also worked with the same database for their help when I was struggling. Lastly, I would like to thank my family, boyfriend, and other friends for their support during my academic career.

(5)

TABLE OF CONTENT

PREFACE ... 4

TABLE OF CONTENT ... 5

INTRODUCTION ... 6

2. THEORETICAL FRAMEWORK ... 9

2.1 MULTIPLE DEVICES AND TOUCHPOINTS IN CUSTOMER JOURNEYS ... 9

2.1.1 Customer journey ... 9

2.1.2 Online purchase probability in flight/vacations as the dependent variable ... 9

2.1.3. Mobile and fixed devices in the customer journey ... 9

2.1.4 Switching devices in the customer journey ... 10

2.1.5. Touchpoints ... 11

2.1.5.1 Partner-owned touchpoints ... 11

2.1.5.2 Brand-owned touchpoints ... 13

2.1.5.3 Customer-owned touchpoints ... 14

2.3 SEGMENTING THE DEVICE USERS ... 14

2.4.1 Segmentation ... 14

2.4.1.1 Demographical variables as segmenting variables ... 15

2.2 CONCEPTUAL MODEL ... 16

3. METHODOLOGY ... 17

3.1 RESEARCH DESIGN ... 17

3.1.1 Data collection ... 17

3.2 LOGISTIC REGRESSION MODEL ... 17

3.2.1 Logit model... 18

3.2.2 Variables in the logistic regression model ... 19

3.3 SEGMENTING CUSTOMERS – CLUSTERING METHODS ... 20

4.RESULTS ... 22

4.1 PRELIMINARY ANALYSIS ... 22

4.2 SAMPLE DESCRIPTION ... 23

4.3 LATENT CLASS ANALYSIS ... 28

5. CONCLUSION & RECOMMENDATION ... 33

6. REFERENCES ... 37

7. APPENDICES ... 42

APPENDIX A: DESCRIPTIVE STATISTICS ... 42

Appendix A1: Descriptive Demographic Dataset ... 42

Appendix A2: Descriptive dataset Journeys ... 43

APPENDIX B: RESULTS LOGISTIC MODEL ... 44

Appendix B1: Results logistic model including Fixed variable. ... 44

APPENDIX C: LATENT CLASS ANALYSIS ... 45

Appendix CA: AIC and BIC values plotted ... 45

(6)

INTRODUCTION

People nowadays are more connected to the internet than ever before. The number of global internet users went from 1100 billion in 2005 to 4131 billion in 2019 (Statista, 2020). The three most popular online activities are: e-mail related activities, searching for general information, and searching for driving directions (Infoplease, 2019). In the last decade, there is an increase in the use of the internet for travel-related activities. Nowadays, seventy-three percent of the people use the internet for searching for travel information, and 64 percent of people make a reservation for travel online (Infoplease, 2019).

With the increasing behavior of people using the internet for travel-related activities, the online travel industry is booming. The numbers show a fundamental increase every year until now. In 2018 the worldwide digital travel sales were worth 564,87 billion dollars, the growth of digital travel sales worldwide compared to the year before was 15.4%. Online hotel sales values worldwide showed an increase of 10,3% (Statista, 2018). Based on this can be concluded that the internet is becoming a more valuable source for the travel industry.

Another remarkable development in the last decade is the use of internet-connected devices. In 2019 around 5 billion people in the world have a mobile phone; more than half of them own a smartphone with an internet connection. Furthermore, 80 percent of all

households in America have at least one laptop or desktop, and 68 percent of them have a tablet (Pewresearch, 2019). Nowadays, an increasing number of people own more than one device. For example, people own a smartphone, a laptop, and a tablet. In the UK, 30 percent of the people own even more than five devices connected to the internet. This percentage differs barely between EU countries. In Germany, 20 percent of respondents claim to have more than five devices. In the Netherlands, only 3 percent of the population has no single device (Statista, 2020). People having multiple connected devices also seems to use these devices when booking a trip or a flight. Based on figures of Criteo published by PhocusWire (2019), 94 percent of online leisure travelers use multiple devices when booking or planning a trip. Based on HEBS Digital client portfolio data, which consists of data from hundreds of hotels and resorts of every size, 58 percent of web visitors and more than 52 percent of page views were generated from mobile devices in 2018.

The above-mentioned trends, such as the increase of people connected to the world via the internet; the rise in people search on the internet for information about their trip; the increase in people booking their holiday or flight online; the use of multiple devices when booking, makes it is essential for marketeers working in the (online) travel industry to understand the online behavior of their customers. Especially the increasing use of owning multiple devices and the ease of switching between those devices, makes it interesting to know more about the effect of this behavior on the purchase probability.

(7)

Nowadays there is an increasing number of online touchpoints when it comes to booking a flight ticket or vacation. Some example of these online touchpoints are accommodation websites, tour operator websites, comparison websites, flight ticket websites and competitors’ websites. These different touchpoints make mapping the online buying process complicated. Customer journeys nowadays can be extremely diverse and long (Hall, Towers, and Shaw, 2017). Since every customer go through its own journey, the journeys can differ from each other extremely. These characteristics make managing the customer journey for marketeers complicated. Different sources (Barnes et al., 2007; Rohm & Swaminathan, 2004) emphasize the importance of knowing your audience in order to design effective marketing strategies. However, it is impossible for marketeers to develop marketing strategies for every single individual. An effective marketing tool in order to solve this problem is segmentation. A segmentation analysis divides a group of customers into more homogeneous groups. This allows marketeers to distinguish target groups and develop marketing strategies for each group (Malhotra, 2009). This research shed more light on the differences in behavior of the use of (multiple) devices in the customer journey, based on demographical variables. The behavior of device use throughout the different touchpoint in the customer journey will be identified. Based on demographical variables, differences in behavior between customer segments will become clear.

Earlier research done by Okzazaki, Campo, and Andreau (2014), led to four different segments based on the time of using of mobile apps over many activities. These activities involved travel and executing planning for a vacation. The following segments where indicated: Savvies, Planners, Opportunists, and Low-techs. Each segment showed different patterns in mobile internet service usage. Planners are heavy mobile device users before and less after their trip. Opportunists show more intensive online behavior during their trip and less before their trip. Low-tech mobile device users are not very active on the mobile internet. Another study done by MacKay and Vogt (2012) identified segments based on the frequency of internet use and IT equipment ownership. A study conducted by Eriksson (2014) identified segments based on the activities performed on mobile devices and based on that identified five segments. These segments are all-rounders, bookers, checkers, info-seekers, and non-users. The gap in the literature on segments based on demographical variables and the rise in use of (multiple) devices in the customer journey do make research on this crucial.

In this research, the relationship between using multiple devices in combination with visiting different touchpoints, and the purchase probability is further investigated.

The main research question is: what is the effect of using multiple devices in the customer

journey on online purchase probability in flight tickets/vacation bookings? The research questions are:

1) What is the effect of using either a fixed or a mobile device on the purchase

probability?

2) What is the effect of using both devices on the purchase probability?

3) What are the differences in purchase probability when using a fixed or a mobile device

by assessing different touchpoints?

4) What is the effect of switching devices on the purchase probability? 5) What segments can be made based on the uses of devices?

(8)

(9)

2. THEORETICAL FRAMEWORK

2.1 Multiple devices and touchpoints in customer journeys

2.1.1 Customer journey

The customer journey is defined by Nenonen, Rasila, Junnonen, and Kärnä (2008) as the cycle of relationship/buying interaction between a customer and an organization. The

customer journey is a transition from never-a-customer to always-a-customer. Christopher & Payne & Ballanttyne (1991) describe a customer journey as a customer staircase or ladder. The essence of this journey is that the value of a customer change over time. The maps drawn from the customer journey include people’s mental models, possible touchpoints, and the flow of interactions.

In this study, the customer journey is conceptualized as a journey of a customer with the focal firm and or competitors over time across multiple touchpoints. These touchpoints can be either customer-initiated (visiting accommodation website or comparison website), or firm initiated (seeing a banner of the focus brand). Prior research by Howard and Sheth (1969) and a more recent studies by Neslin (2006) and by Pucinelli (2009) conceptualized the customer journey in three overall stages: pre-purchase, purchase and post-purchase. The first stage is the pre-purchase stage and is characterized by pre-purchase touchpoints visited before the purchase. These pre-purchase touchpoints can be brand-owned, partner-owned, customer-owned, or social/extern. The second stage is called the purchasing stage, which covers all customer interactions with the brand and its environment. In this stage it is about all

interactions during the purchase itself. The third and the last stage is called the post-purchase stage. This stage is characterized by the behaviors of service requests, usage and

consumption, and post-purchase engagement (Lemon & Verhoef, 2016). Not every customer goes to all stages. It can be that a journey ends before the purchase stage, in this case the pre-purchase stage leads to no pre-purchase at all.

2.1.2 Online purchase probability in flight/vacations as the dependent variable

An online purchase is the action of a consumer making a transaction online. The online purchase probability is the chance a consumer making the action of online purchasing. In this research, the online transaction is ‘booking a flight ticket or vacation’. McCole (2014)

investigated the role of trust in electronic commerce in services. The findings of this research indicated that tourism transactions as booking a flight or vacation is a suitable activity for the internet. Another study done by Kim, Chung, and Lee (2011) came to that same conclusion. They investigated four product characteristics which determine what makes a product suitable to sell via the internet. These characteristics are 1) intangible, 2) inseparability of production and consumption, 3) seasonal, and 4) perishable, and perfectly fit with the travel industry (Kim & Chung & Lee, 2011).

2.1.3. Mobile and fixed devices in the customer journey

In this research there is a distinction made between mobile devices and fixed devices. Mobile devices are portable and ready to access during your way. Examples of mobile devices are smartphones and tablets (Dummies, 2020). Fixed devices are bigger compared to mobile devices and therefore, less easy to carry. Examples of fixed devices are laptops and desktops (Dummies, 2020).

(10)

Tablets are designed for consuming media, such as reading e-books, listening to music, or browsing on the internet. Tasks as text processing are not part of the main functionalities of tablets and smartphones. A laptop or desktop can be better used for these tasks. Furthermore, a desktop or laptop is less portable and has a (separate) physical keyboard. Compared to tablets, desktops and laptops have powerful operating systems, which makes them different in functionalities (Dummies, 2020).

The smaller screens of mobile devices make it harder to search for alternatives. Using a mobile device, therefore, could decrease the evaluation process and increase the probability of buying the wrong product. Another important consequence of the smaller screen is that it is not ideal for the paying transaction. It is harder to fill in the banking details on a small screen compared to a bigger screen of fixed devices, for example, a desktop (Shankar et al. 2010). The characteristics of mobile devices make them less suitable for purchase and therefore leads to the first hypothesis of this research:

H1: Using a mobile device only in the customer journey has a negative effect on online purchase probability in flight tickets/vacations bookings.

2.1.4 Switching devices in the customer journey

With the rise of mobile internet as the new standard, and the increase of mobile devices (Maurer, Hausen, De Luca & Hussmann, 2010) there was a necessary need for website developers to adjust a website to all different internet-connected devices. Nowadays, website developers do have many tools like responsive design approaching, which allows them to obtain user interfaces that can adapt to screen sizes and orientation. Therefore, websites are suitable to enter via all kinds of devices. Consumers can start their customer journey on every device and continue their customer journey on any other device. Customers can even execute every stage of the customer journey on another device if they want to. The situational

characteristics of the user and the functional components of the devices are essential drivers of the device a consumer uses in every stage of the customer journey (Haan et al., 2018). Simpleview (2016) investigated the use of devices among travelers and came with the result that, especially leisure travelers, use many different devices in their customer journey. For example, they use a mobile device to search for information, and a fixed device to book their tickets or accommodations (Simpleview, 2016).

According to the research of Xu et al. (2017), into the impact of tablets on e-commerce, cross-device browsing behavior is found to have a significant positive impact on sales outcomes. As well, they managed to quantify the causal effect of tablet adoption on an e-commerce market. Xu et al. (2017) analyzed real sales data from Alibaba (the largest e-commerce firm in the world), and exploit a natural experiment via an iPad app. Results showed that the adoption of the iPad increased sales on the digital commerce market of Taobao on average by 6.7% per user. Not only the research of Xu et al. (2017) suggests the relationship between the use of multiple devices and the impact on sales. Research into the topic of multiple devices uses in the customer journey (Google, 2012; Charlton, 2013) also suggest a relationship between switching devices and the purchase probability. According to Google (2012), most

(11)

device could lead to a higher purchase risk and higher security risk. These kinds of risk are often associated with high-involvement products (iResearch Services, 2020). De Haan and collogues (2018) investigated that the risks associated with these products can be reduced by switching to a fixed device. The conversion rate based on their sales data was significantly higher for consumers starting on a more mobile device and switching to a less mobile device. They found that the effect of switching behavior is higher when people perceive risk and when the prices are higher. De Haan and collegues (2018), also emphasize the searching behavior associated with switching devices. The more people switch the more they search for information. Silayoi and Speece (2004) pointed out that the more people are involved in a product, the more they search for information about that product. People involved in a product search extensively more to information than people less involved.

Booking a flight or vacation for holiday purposes is not something most people do in daily life. Experiences with the companies offering these products/services are therefore low. Also, booking a vacation is an expensive expense, this makes it a high involvement product

(iResearch Services, 2020). These characteristics make the behavior of switching between devices in a customer journey related to travel activities more likely, compared to customer journeys of other products/services. Since the behavior of switching between devices is associated with searching for information. The more people switch between devices, the more people are involved, and therefore the higher the purchase probability.

The theory mentioned above leads to the following hypotheses:

H2: Switching devices have a positive effect on online purchase probability in flight tickets /vacation bookings.

2.1.5. Touchpoints

Court, Elzinga, Mulder, and Vetvik (2009) defines a touchpoint as a direct or episode contact with the focus brand. As already mentioned above, there are different types of touchpoints in the customer journey. Baxendale, Macdonald, and Wilson (2015) distinguish the following three touchpoints: brand owner, retail, and third party. Brand owner touchpoints consist of brand advertising, and retail touchpoints include retailer advertising and in-store

communications. Third-party touchpoints are word-of-mouth received, peer observation, and traditional earned media. Lemon and Verhoef (2016) identified four categories of customer touchpoints, namely, brand-owned, partner-owned, customer-owned, and

social/external/independent. All these are points customers might interact with each stage of the customer journey. The importance of each touchpoint differs depending on the nature of the product/service. Attribute modeling identifies the most critical touchpoints in the customer journey. In this study, we examine the following type of touchpoints based on Lemon and Verhoef (2016):

2.1.5.1 Partner-owned touchpoints

Lemon and Verhoef (2016) define partner-owned touchpoints as those designed and controlled by the firm or its partners. Partners can be marketing companies, distribution channels, partners from a loyalty program, or comparison websites. In some cases, the lines between partner-owned and brand-owned are less visible. For example, when a firm

(12)

brand-owned touchpoints (Lemon & Verhoef, 2016). Examples of a partner-brand-owned touchpoint in this research are comparison websites and tour operator website.

Comparison websites

A comparison website compares different products, often based on prices and other

characteristics. These websites enable consumers to check the prices of many firms, which is a great benefit for the consumers. On the other hand, comparison websites are less beneficial for the firms in the market since these websites lead to an increase in competitive pricing pressure. The presence of internet altered search costs because consumers can search for (better/cheaper) alternatives easily on the internet. Without the internet, this is more time-consuming. The emerging presence of comparison websites makes searching for alternatives even more accessible and therefore decreases search costs even further (Ronayne, 2015). Comparison websites are very popular in many markets, including hotels, flights, and services all over the world (Ronayne, 2015). Because comparison sites decrease searching costs, a comparison site would have a positive impact on the purchase probability of a consumer. As already mentioned, the smaller screen size of mobile devices makes them less ideal for paying transactions compared to a fixed device (Shankar et al. 2010). Visiting a comparison website by the use of a mobile device would, therefore, have a negative impact on the purchase probability. Since fixed devices are more suitable for paying transactions visiting a comparison website by use of a fixed device would have a positive effect on the purchase probability.

Base on the theory mentioned above, the following hypotheses are constructed:

H3A: Comparison websites visited by use of a mobile device has a negative effect on the purchase probability in flight tickets/vacation bookings.

H3B: Comparison websites visited by use if a fixed device has a positive effect on the purchase probability in flight tickets/vacation bookings.

Touroperator website

A touroperator combines travel components, for example, a booking at a hotel and a flight ticket and offer these together in holiday packages. They use different marketing instruments to promote their products (Wikipedia, 2020).

The touroperator industry is growing, from 13,4 billion pounds in 2014 to 16 billion pounds in 2020, and it is expected to grow even more to 17,4 billion pounds from 2022 to 2023. Because of the competitive market, more touroperators follow a niche market strategy. A niche marketing strategy is a marketing approach that focuses on a unique target group. The product and services have features that appeal to this particular minority market subgroup (BusinessDictionary, 2020). An example of a touroperator who focuses on a specific target group is the Dutch touroperator ‘Tui.’ Tui mainly concentrates on families and adjust

advertisements accordingly. Consumers who visit these tour operator websites are mostly part of their target group and therefore have a higher probability of purchasing.

(13)

H4A: Touroperator websites visited by use of a mobile device have a negative effect on the purchase probability in flight tickets/vacation bookings.

H4B: Touroperator websites visited by use of a fixed device have a positive effect on the purchase probability in flight tickets/vacation bookings.

2.1.5.2 Brand-owned touchpoints

Lemon and Verhoef (2016) define brand-owned touchpoints as interactions that are designed and managed by or under control of the focal firm. These touchpoints include all brand-owned media (advertising, loyalty programs, websites) but also the elements of the marketing mix that are controlled by the brand. Examples of those instruments are packaging, service, price attributes of products, sales force, and convenience. Many studies (Berry & Seiders & Grewal, 2002; Bitner, 1990; Olivier, 1993) emphasized and investigated the impact of perceptions on attributes of products and services on satisfaction. Also, some more recent studies (Baxendale et al., 2015; Hanssens, 2015) have proved that advertising and promotion have a positive influence on the perception and preferences of customers. Examples of brand-owned touchpoints in this research are accommodation websites and flight ticket websites.

Accommodation website and flight tickets websites

People who do not want to book a package deal, including flight tickets and accommodation, book them separately. Nowadays, there is a trend going on where an increasing number of people do not book their vacation by one travel agency or touroperator but book tickets and accommodations separate themselves (Jacobsen & Munar, 2012). Keshavarzian and Wu (2016) also emphasize this trend and speak of a two-stage decision-making process. In the first stage people book their flight tickets and thereby choose their holiday destination. In the second stage they book their accommodation.

One reason for choosing to separately purchase an accommodation via an accommodation website and a flight ticket via an airline website, has to do with trust. Trust plays an essential role in online markets. The reason for this is the fact that risk and uncertainty is more present in the online market compared to the offline market (Lewis & Semijn, 1998). Different

studies (Lien & Wen & Huang & Wu, 2005; Everard & Galletta, 2003) argued that when trust in a website is high, the purchase intention of buying from that specific website is also high. Because an accommodation website or flight ticket website is managed and controlled by the accommodation or airline itself, the information provided on these websites is very reliable in the eyes of the consumers. Trust of a consumer in the online purchase would therefore be high in an accommodation website or flight ticket website. Trust is highly correlated with purchase intention, and therefore, purchase intention is high in accommodation and flight ticket

websites (Kim & Kim & Park, 2017).

As again already mentioned, a fixed device is more suitable for making a purchase compared to a mobile device. Therefore, based on the theory mentioned above the following hypotheses are compose:

H5A: Accommodation websites visited by use of a mobile device has a negative effect on the purchase probability is flight tickets/vacation bookings.

H5B: Accommodation websites visited by use of a fixed device has a positive effect on the purchase probability in flight tickets/vacation bookings.

(14)

H6B: Flight tickets websites visited by use of a fixed device have a positive effect on the purchase probability in flight tickets/vacation bookings.

2.1.5.3 Customer-owned touchpoints

Customer-owned touchpoints are those that are not controlled or influenced by the firm or its partners. An example of a customer-owned touchpoint is where customers share information on how to consume certain products. Accessing a customer-owned touchpoint can be in the pre-purchase or post-purchase stage (Lemon & Verhoef, 2016). An essential driver of this kind of information sharing is social media. Social media creates the opportunity for the customer to share information about specific brands or services without being under the firm’s direct control. Due to this, the complexity of measuring the attribution contribution of a customer-owned touchpoint in the customer journey increase.

The fast-developing technology increased the social structure of the internet and transformed the customer journey to more complex in nature (Lemon & Verhoef, 2016). Examples of customer-owned touchpoints in this research are accommodations search, information search, touroperator/travel agent search competitor, tour operator/travel agent search focal brand, flight ticket search, and generic search.

2.3 Segmenting the device users

2.4.1 Segmentation

Segmenting is dividing a heterogeneous sample into homogeneous groups based on a set of (active) variables (Malhotra, 2009). In other words, making groups that are internally as similar as possible, but are as different as possible from other groups. When setting up a marketing strategy, companies want to gain a competitive advantage over their competitors (Bharadwaj & Varadarajan, 1993; Day & Wensley, 1988). There are many ways to achieve this competitive advantage. One crucial element to be successful is knowing who the customers are, including what they want and need. Segmenting is a research method to gain insight into customer wants and needs. Based on these wants and needs heterogenous customer groups are created, which can be easily used to develop custom marketing strategies. Therefore, segmentation is a powerful tool to generate a competitive advantage. According to Hunt and Arnett (2004), three requirements need to be fulfilled for segments to create a competitive advantage. The first requirement is that segments need to be identified, this means a clear understanding of what the segments are.

The second requirement is that the segments which are identified also are targeted. The third requirement is that effective marketing mixes are created to serve profitable segments. According to Konos, Verhoef, and Neslin (2008), in the segmenting process, customers are grouped based on similar behavior characteristics. Kumar (2018) pointed out that

segmentation gives valuable insights for firms and customers. This is mainly because creating and investigating customer groups increases the effectiveness of customer relationship

management.

Furthermore, customer segmentation leads to a more efficient and effective allocation of marketing mix resources. The reason for this is because the information about customers (acquired from the customer segmentation) leads to a higher understanding of customers' behavior, wants, and needs. Therefore, firms can better determine what marketing instruments are most effective across customers segments. By this, firms can better customize their

(15)

The variables used for segmenting can be devided into active and passive. Active variables are used to divide a heterogeneous group into smaller homogeneous groups. Passive variables are used to describe the smaller homogenous groups (Malhotra, 2009). The first step in

market segmentation is defining a measure to assess the similarity of customers based on their needs. After selecting and executing the right method, the results can be interpreted based on passive variables.

2.4.1.1 Demographical variables as segmenting variables

The use of fixed and mobile devices is distributed differently under age classes. Based on figures of the CBS (Dutch Central Statistical Office) (2016), there is a difference in the use of smartphones, laptops, tablets, and personal computers between different age classes. People between 12 and 65 years old use all devices almost equally intensive. But there is a decrease in the use of various devices when people are getting older. In the age class of 75 years and older people use the smartphones least (CBS, 2016). Ratchford and colleagues (2003) found that older people are less likely to use the internet as a search medium. While younger and educated people are more likely to use the internet for searching. In this group, the internet even substitutes other traditional searching methods like the use of encyclopedia.

The Netherlands Institute for social research (2016) investigated that although there is an increase in ownership of smartphones and tablets, this increase is not the same for all ages and education levels. For example, almost all younger people (96%) in the age class of 13 till 19 and 20 till 34 own a smartphone. For people from the age of 65 or older, 36% own a

smartphone. The higher educated people 86% own a smartphone, compared to 79% of middle – educated people and 57% of lower educated levels. A possible explanation for this

difference in the adoption of new technology could be the Technology Adoption Model (Davis, 1986). Based on this model, there are the following (main) factors that determined whether a new technology will be adopted by someone or not. The first factor is the degree to which a person believes the new technology will increase his/her (job) performance. The second factor is the degree to which a person thinks the new technology is free of effort. When the new technology for someone is hard to use, this new technology will not be used by this person. Overall a new technology will be adopted by people when they believe it will benefit their performance, and it is easy to use. Because younger people are grown-up with the new technology, they are more familiar with the benefits and usage. This explains why younger people easier adopt new technology compared to older people.

Next to education level and age, also gender influences device use. There are several studies (Oksman & Rautiainen, 2001; Wilska, 2003) who linked the difference in usage of devices and online search behavior to gender. One difference in the behavior is that men use search alternatives less compared to women. Furthermore, women base their decision more on different information sources and less on their personal opinion. In contrast to women, men do trust their own opinion more than other sources of information. Since women search more for information before they make a purchase online (Paypal, 2016) and information search is most executed on mobile devices (De Haar et al. 2018), it is expected that women use their mobile device more often compared to men.

The theory mentioned above leads to the following research questions: what segments can be

(16)

Table 1: Segmentation variables

Demographical variable Literature

Age Ratchford and colleagues (2003) found that older people are less likely to use

the internet as a search medium. In this research, the internet even substitutes other traditional searching methods like the use of encyclopedia. Since the mobile phone is most used as a search device (De Haar et al. 2018), older people are less likely to use a mobile device over a fixed device. Therefore, it is expected that older people are less likely to switch and use a fixed device over a mobile phone.

Gender Women and men have different behaviors in terms of device use. Women do more search for information before a purchase compared to men (Paypal, 2016). Since a mobile device is often used for searching for information (De Haar et al. 2018), it is expected that women switch more often between devices than men.

Education level Educated people are more likely to use the internet for searching than less educated people do Ratchford and colleagues (2003). Since a mobile device is often used for searching behavior (De Haar et al. 2018), it is expected that higher educated people switch more often between devices.

The active variables used for segmentation are summarized in table 1 ‘Segmentation variables’. Next to these variables also the variables ‘income’ and ‘size of the municipality’ will be included as a segmentation variable in the analysis.

2.2 Conceptual model

Figure 1 represents the conceptual model from hypothesis one until five in this study.

(17)

3. METHODOLOGY

This section of this research, starts with a description of the data used for this research. After, the methods used for analyses will be described in detail.

3.1 Research Design

The goal of this descriptive research is to get a deeper understanding of the effect of the use of multiple devices in the customer journey within the online travel industry. In order to answer the research question, this research consist of some quantitative analyses. The final results of this study can be used to describe the impact of the use of multiple devices on the purchase probability. Furthermore, the results can be used to describe the consumers in the online travel industry based on their behavior of the use of multiple devices.

3.1.1 Data collection

The dataset used in this research consists of panel data made available by GFK (Growth from Knowledge). GFK is the largest German research institute and one of the four largest market research organizations in the world (GFK, 2020). Panel data is any set with repeated

observations over time for the same individuals. Examples of these individuals are but not limited to: households, firms, industries, regions, workers, or countries (Arellano, 2003). Together these repeated observations of a specific category of individuals (for example, all households) form a panel (Arellano, 2003). In this research, the panel consists of individuals tracked over time, through all different touchpoints, in their customer journey. The type of observations in this research is mechanical because mechanical devices are used to collect the observations. The data collection took place in the time spread from 1-6-2015 until 31-09-2016 in The Netherlands.

Furthermore, the data consist of two levels. Customer journeys are on touchpoint level. The demographical data is collected via a log-in plug of the respondent and, therefore, is on user level. One user ID can have different customer journeys. After four weeks of no activity in a specific journey, a customer journey ends. When a customer is active again after a four week period of no activity, a new customer journey starts.

3.2 Logistic regression model

(18)

considerations, the logit model is often preferred. As also stated by Leeflang et al. (2015). The selection of appropriate explanatory variables should be mainly based on theory (Pitch & Stevens, 2016). The second assumption is the fact that the observations needs to be

independent. Meaning that the measurement of each sample subject is not influences by the measurement of other subjects in the data. This assumption cannot be met with certainty because of the panel data used in this research. In this dataset, one user can have multiple purchase journeys. In fact, the purchase journeys are measured separately, but can be made by the same user. This will affect the results in such a way that a particular behavior for one person which has more purchase journey’s is represented more and therefore has a bigger influence. In order to be sure this will not influence the results too much, it will be tested how many purchase journeys with the same userID are made. The third assumption is about the measurement of the explanatory variable. This assumption means that the explanatory variables needs to be measured without measurement error. Since the dataset is provided by GfK, which is a well experienced and knowledge company in data collecting, it is assumed that this assumption is satisfied. The last assumption is about the size of the dataset. The size of the dataset should be large enough, in order for the model to produce valuable results. Leeflang et al. (2015) state that every parameter should at least account for five observations. Since the dataset is quite large (about 8000 observations) this assumption is met.

3.2.1 Logit model

This logistic regression is used to model the dependent variable of interest ‘purchase

probability’. Since the predictor variable purchase probability is a binary variable, the values

of these variables must lie between one and zero. In order to make sure the value of the variable indeed lies between zero and one, a log-transformed equation is executed. After estimating the model, the equation below (1) is used to estimate the odds (Allison, 2012).

Π𝑖 =

1

1+exp(−{𝑥′𝑖𝛽}) (1)

A logistic regression uses a latent variable (yi*) that links the dependent variable and the independent variable. In this model, the probability of a purchase depends on the value of the latent variable. This relationship between the observed dependent variable (Yi) and the latent variable (yi*) is related as follows: Yi = 1 if yi* > 0 and Yi = 0 if yi* ≤ 0. The estimation model that predicts the probability of a purchase is shown in the equation below (2):

𝑃[𝑌𝑖 = 1] = Ʌ(𝛼 + 𝑥𝑖 ′𝛽) = exp(α + xi ′β)

1 + exp(α + xi ′β) (2) Where is:

𝛼 = Intercept

𝑥𝑖 𝛽′ = The value of the individual consumer i depends on this exponentiated beta. Using this notation, the model specification for purchase probability is as in equation 3.

𝑙𝑛𝑃[𝑌𝑖=1]

𝑃[𝑌_𝑖=0]= 𝛼 + 𝛽𝑚𝑐𝑋𝑚𝑐+ 𝛽𝑓𝐶𝑋𝑓𝑐+ 𝛽𝑚𝑎𝑋𝑚𝑎+ 𝛽𝑓𝑎𝑋𝑓𝑎 + 𝛽𝑚𝑓𝑋𝑚𝑓+ 𝛽𝑓𝑓𝑋𝑓𝑓 + 𝛽𝑚𝑡𝑋𝑚𝑡+ 𝛽𝑓𝑡𝑋𝑓𝑡+ 𝛽𝑠 𝑋𝑠 + 𝛽𝑓+ 𝛽𝑚+ 𝜖𝑖𝑗

(19)

Where: 𝑙𝑛𝑃[𝑦𝑖=1]

𝑃[𝑌𝑖=0]= natural logarithm of the probability to purchase (Pij = 1) divided by the probability not to purchase (Pij = 0)

𝛼 = Intercept

𝛽_𝑚𝑐 = Beta coefficient mobile device + comparison website & app 𝛽_𝑓𝐶 = Beta coefficient fixed device + comparison website

𝛽_𝑚𝑎 = Beta coefficient mobile device + accommodation website & app 𝛽_𝑓𝑎 = Beta coefficient fixed device + accommodation website

𝛽_𝑚𝑓 = Beta coefficient mobile device + flight tickets website & app 𝛽𝑓𝑓 = Beta coefficient fixed device + flight tickets website

𝛽_𝑚𝑡 = Beta coefficient mobile device + tour operator website & app 𝛽_𝑓𝑡 = Beta coefficient fixed device + tour operator website

𝛽_𝑠 = Beta coefficient switching devices 𝛽𝑓 = Beta coefficient using fixed device only 𝛽_𝑚 = Beta coefficient using mobile device only 3.2.2 Variables in the logistic regression model

In order to perform the logistic regression, first the variables used for analysis needed to be created. In this paragraph is described in detail how these variables are constructed.

Purchase probability as dependent variable

Purchase probability is created as a dummy variable consisting of a 0/1 value. Where 1 indicates a purchase is made in a purchase journey, and a 0 indicated no purchase is made. Because in this research there is no distinction between whether a purchase was made at the touroperator of interest, a purchase dummy is made by counting all purchases. So including purchase at any point and purchase at the touroperator of interest. After aggregating the dataset on journey level, the purchase dummy is turned back to a 0/1 variable.

Effect of using mobile or fixed devices only in the customer journey

The effect of using mobile devices only in the customer journey is investigated by use of dummy variables. Because using mobile devices only is part of a constant together with using fixed only and using both devices, three dummy variables are created. When a customer only used a mobile device the dummy ‘MobileOnly’ indicated a 1. When someone used a fixed device only the dummy ‘FixedOnly’ turned 1. And, when someone used a fixed and a mobile device in the customer journey, the dummy ‘BothDevices’ turned 1. Since there are these three behaviors possible in this research (using a mobile device only, using a fixed device only or using both devices) one behavior is the reference level and two dummies are included in the model.

Type of device used and type of touchpoint accessed

Next to the creation of mobile only, fixed only and both devices, variables of the use of mobile or fixed are created on touchpoint level. These variables are combined with the specific touchpoints in the purchase journey. For example, when some customer visited a comparison website by use of a mobile device the variable ‘MobileComparison’ turned 1. When creating a dummy variable for visiting a touroperator website by use of a fixed device, a few touchpoints are combined. For example, the touchpoints ‘touroperator website

(20)

technically not possible to visit an app by use of a personal computer the touchpoints of an app are left out by created the fixed variables. After aggregating the dataset on journey level, the dummies created are in the form of count variables. Meaning the variable indicated how often a specific touchpoint is visited in a specific purchase journey.

Switching and purchase probability

In order to construct a switch variable, first the column device type is separated into two columns, ‘fixed’ and ‘mobile’. After, by use of a device_lag variable, the switch variable is created, indicating whether there was a switch between fixed and mobile device. The new switch variable consisted of some NA’s. This is explained by the fact that for some situations it is unknown which type of device a user used first, or they simply did not used another device first because this is their first touchpoint. Since the first touchpoint is not considered as a switch these NA’s are imputed to ‘0’.

The logistic regression is conducted in the statistical program R.

3.3 Segmenting customers – Clustering Methods

Segments are created based on demographical variables of the customers, such as age, gender, income, region, household size, kind of work, gross income, number of children in household, social class, and education level, and life stage. These are the active variables, also called the segmenting variables (Malhotra, 2009). The passive variables are used to describe the

segments, which are number of times a person switched between devices, purchase

probability, use of mobile / fixed devices, and the combination of use of device (either fixed or mobile) and the different touchpoints they visit. The method used for this analysis is latent class analysis (LCA).

The latent class analysis model is written in equation 4 below (Vermunt & Magidson, 2002): 𝑓(Υ𝜄|𝜃) = ∑𝐾_𝑘=1Π𝜅𝑓𝜅(𝑦𝜄|𝜃𝜅) (4)

Where is:

Υ𝜄 = denotes a customer’s i scores on the set of behavioral segmentation variables. Κ = number of segments

Π𝜅 = denotes the probability of belonging to latent segment k.

The classification in a latent class analysis uses posterior class membership probabilities. Each customer i is assigned to segment k with the highest posterior probability. Equation 5 below is employed for classification:

𝜋_𝑘|𝑌_𝑖 = 𝜋𝑘𝑓𝑘(𝑦𝑖,𝜃𝑘)

∑𝜅 𝜋𝑘𝑓𝑘(𝑦𝑖,𝜃𝑘) (5)

The outcomes of the latent class analysis provide multiple segment solutions. In order to find the right solution, different segments are compared and evaluated against each other.

(21)

information criteria is only possible within the same type of criteria so not across types. The criteria calculate penalties for the number of parameters and the sample size, therefor using multiple information criteria can beneficial.

(22)

4.RESULTS

In this section of this thesis the results of the statistical analysis will be provided. First the preliminary analysis of the data is presented containing the first checks on the data. After some descriptive of the data are given, both on touchpoint level and demographical level. Second the output of the logistic regression well be presented. This section ends with the outcome of the Latent Class Analysis, for segmenting the customers based on demographical variables.

4.1 Preliminary Analysis

The data used in this research consists of two datasets. One dataset of the customer journey data on touchpoint level, and one dataset of demographic data on userID level.

Before conducting the multiple analysis’s, it is important to check the completeness of the data. Both datasets where separately checked for inconsistencies and missing values. A first check on the journey data did not show much inconsistencies or errors. This can be explained by the fact that the dataset consists of actual tracked behavior, where missing’s are less likely to be present. Although a first glace did not reveal much inconsistencies, there are found some inconsistencies and missing values in variable which were considered problematic. In this paragraph these variables are discussed and explained how the inconsistencies or oddities are solved.

First, the datasets are checked for missing’s values. A first check on the customer journey data shows some not available values (NA’s) in the variable duration. In sum, for 141065 touch point the durations was not registered, it turned out that most NA’s are FIC and therefor registration is technically not possible. The outliers in this duration variable are not

problematic because this variable is not used in any analysis. Therefore, the NA’s are not imputed or deleted.

A first check on the demographic dataset shows missing values for the same 12 users. This indicates that these users did not fill in their demographic information. This can be either because they did not fill in this information in their account or did not create an account. Because the demographic data is not used for the logistic regression, and the accompanied user ID provides valuable information for the cluster analysis, the missing values where not deleted.

Second, it is checked whether each PurchaseID have just one single UserID. It is expected that one PurchaseID can only be connected to one single UserID. Given that the latent class analysis will be performed on user-level, it is highly important to make sure that there are no oddities or inaccuracies in connection of PurchaseID and UserID. In the dataset was found that for some cases one PurchaseID is connected to many UserID’s. Table 2 ‘Number of people with more than one purchase’ shows the distribution of these cases.

Table 2: Number of people with more than one purchase

1 2 3 4 5 6 7 8 9 10 11

0 2870 1914 1443 1209 929 648 299 81 22 3 2

1 2047 569 205 77 34 7 4 2 1 0 0

(23)

connected to only one UserID which was not the case. This problem is fixed by first aggregate the data on Customer Journey level by PurchaseID and after match the travel UserID’s and the aggregated dataset on PurchaseID’s.

Lastly, it is assumed that the dataset is not normally distributed. Since the data consists of real tracked real behavior most outliers presented in the data are expected.

4.2 Sample Description

In order to get insights into the sample, a demographical profile is created based on the demographical variables in the dataset. The demographic profile is shown in table 3 ‘Population demographics’ below. The demographics shows that the panelists sample is dominated by female (60,11%), and most of them (40,2%) are two person households.

Furthermore, most of panelists come from the West of the Netherlands (29,04%). The sample is dominated by empty Nesters, which means they have no children. The graphs about the demographical dataset can be found in Appendix A1 ‘Descriptive Dataset Demographics’.

Table 3: population demographics Age Number of household members Male 39,89% 1 21,1 Female 60,11% 2 40,2 3 14,5 Life Stage 4 17,1 1: Young Singles 6,70% 5 5,3 2: Mature Singles 14,40% >6 1,8 3: Young Couples 7,90%

4: Empty Nesters 29,70% _{Region of Residence}

5: Young Families 5,80% _{Amsterdam, Rotterdam, and}

6: Mature Families 14,10% _{The Hague} 11,36%

7: Established Families 11,90% _West 29,04%

8: Single Parents, child(ren) 5,40% _North 11,84%

9: Single Parents, adult

child(ren) 4,20% _East 21,61%

97: unknown 0,01% South 26,15%

After creating the variables needed for further analysis on the travel dataset, some boxplots of those are made in order to get a first glace on this data. A boxplot of the behavioral variable

switch, showed many outliers in the behavior of switching between devices. Also, the use of

mobile device in the customer journey shows many observations outside the boxplot range. Since the data is based on real consumer behavior outliers in these variables are expected. Therefor these results are realistic and not considered as oddity. The boxplots can be found in Appendix A2 ‘Descriptive Dataset Journeys’.

(24)

dataset, with only 2087 of the 25204 customer journeys. The number of these journeys are presented in the table 4‘Distribution of customer journeys based on device used’ below.

Table 4: Distribution of customer journeys based on device used

Mobile Only Fixed Only Both devices

Number of journeys 4124 18993 2087

In percentage (%) 16.4% 75.4% 8.2%

4.3 Logistic Regression Analysis

A logistic regression is performed in order to explain purchase probability (binomial: 0/1) by the independent variables selected based on the theoretical framework of this thesis. After creating the variables as described in the methodology section of this thesis, the logistic regression is performed by use of the steps described below.

First, the variables needed for the logistic analysis are put into a new data frame. After, a train subset and a test subset are created in order to test the accuracy of the predictions obtained from the model. The dataset is dived in 75% training data and 25% testing data. Since the dataset is large, a test training set of 25% is suitable (James & Witten & Hastie & Tibshirani, 2017). Next the KDiff value is set to 20 instead of 50. The reason for this decrease is because the chance a variable will lead to a purchase probability of 1 is in this model extremely small. Firstly, because assessing a specific touchpoint will already have a hypothetically small probability of influence the purchase probability to 1. Secondly, because this specific touchpoint is combined with using a mobile or fixed device, which makes the change even lower. Decrease the KDiff value will therefore lead to a more reliable output of the model. Before dive into the estimated coefficients of the model, a check on multicollinearity is done by use of a VIF-test. The chance of having multicollinearity in this model is great due to the fact that all variables in the model are dummy variables. Dummy variables increase the chance of finding equal patterns in the data (Leeflang, Wieringa, Pauwels, & Bijmolt, 2015), and so, increase multicollinearity. Investigating multicollinearity is therefore extra important. Results of the VIF-test showed that there is no multicollinearity since all VIF values are between 1 and 3 (Leeflang, et al., 2015). Therefore, there is no necessary need to further investigate multicollinearity between the variables in the model.

After checking the VIF values of the variables a stepwise AIC procedure is applied to the model in order to obtain the most parsimonious model. The AIC procedure started with an AIC value of 19142.33. The AIC value did improve by leaving variables out of the model. Since the purpose of this study is to investigate the hypothesis in order to answer the research question, all variables are kept in the model. In order to investigate whether this model is significantly better than a random classification this AIC values is compared to the AIC values of a null model. The AIC value of the null model is 20574, compared to the AIC value of this logit model, null model is not significantly better than this model.

(25)

2.2e-16 < 0.05) with a log likelihood value of -7221.6.1 compared to -7735.5. Finally, the hit rate of the logit model is calculated by use of the following calculation:

(21411 + 222) / (21411+33624+ 207 +222) * 100 = 77.2% Where:

Correctly classified as purchase: 222 Correctly classified as no purchase: 21411 Wrongly classified as purchase: 3364 Wrongly classified as no purchase: 207

Table: 5 Pseudo R2

CoxSnell NagelKerke McFadden

0.05730294 0.10255443 0.07212579

After checking the validation of the model by use of the previous discussed method, a further look into the interpretability of the model is done. This step included looking at the signs of the coefficients and interpretation of the marginal effects and the Odds ratio of the logit model. The figures can be found in table 6‘Results of the logit model’ below.

Table 6: Results of the logit model

Estimate VIF Marginal Effects Odds

Ratio

Intercept -1.780e+00 (***) - - 0.1686372

MobileOnly -1.013e+00 (***) 1.854218 -9.0030e-02 (***) 0.3629482 FixedOnly -2.159e-01 (**) 2.119751 -2.5499e-02(**) 0.8058349 Switch 9.535e-03 (*) 1.487859 1.0814e-03(*) 1.0095808 MoTouroperator 1.116e-03(**) 1.216261 1.2653e-04 (**) 1.0011163 MoFlightTickets -1.352e-03 1.228503 -1.5332e-04 0.9986490 MoAccomodation 1.064e-03(**) 1.301417 1.2062e-04 (**) 1.0010642

MoComparison 5.486e-04 1.155813 6.2215e-05 1.0005487

FixComparison 1.237e-04 1.140101 1.4025e-05 1.0001237 FixAccomodation 4.677e-03(***) 1.120617 5.3036e-04 (***) 1.0046875 FixFlightTickets -1.227e-05 1.000607 -1.3917e-06 0.9999877 FixTouroperator 11.620e-03(***) 1.144012 1.8378e-04(***) 1.0016218 A first look into the estimations of the model, already gives valuable information. Some signs are expected, and some are unexpected based on the hypothesis. The variable ‘MobileOnly’, has a negative sign. Meaning that a person who only uses a mobile device in their customer journey has a negative effect on the purchase probability. In terms of odds ratio, a person using a mobile device only leads to a decrease in purchase probability of 0.36%, with a marginal effect of -0.09%. Therefore, hypothesis 1 ‘Using a mobile device only in the

customer journey has a negative effect on online purchase probability in flight tickets/vacation bookings, is supported.

(26)

The variable ‘Switch’ shows a significant positive effect on the purchase probability. Meaning that an increase in the number of times a person switch between devices leads to an increase in purchase probability. In terms of odds ratio, an increase of 1 in the number of times a person switch between devices leads to an increase in purchase probability of 1.01%

compared to not switching between devices, with a marginal increase of 1.001%. Therefore, hypothesis 2 ‘Switching devices have a positive effect on online purchase probability in

flights tickets/vacations bookings’ is supported.

The variable ‘MoComparison’ did not turned out to be significant, therefore it is not possible to interpret these coefficient and corresponding marginal affect and Odds Ratio. Hypothesis 3A ‘Comparison websites visited on a mobile device has a negative effect on the purchase

probability in flight tickets/vacation bookings’, can not be supported or rejected based on this

model.

The variable ‘FixComparison’ did not turned out to be significant, therefore it is not possible to interpret these coefficient and corresponding marginal affect and Odds Ratio. Hypothesis 3B ‘Comparison websites visited on a fixed device has a positive effect on the purchase

model.

The variable ‘MoTouroperator’ shows a significant positive effect on the purchase

probability. Meaning that the number of times a person visits a touroperator website by use of a mobile device leads to an increase in purchase probability. In terms of odds ratio, an

increase of 1 in the number of times a person visits a touroperator website by use of a mobile device leads to an increase in purchase probability of 1.001%, compared to not visiting a touroperator website by use of a mobile device (or other device), with a marginal increase of 0.00001%. Therefore, hypothesis 4A ‘Tour operator websites visited on a mobile device have

a negative effect on purchase probability in flight tickets/vacation bookings’ can not be

accepted. The effect is hypothesized as negative effect but actually turned out to be positive effect.

The variable ‘FixTouroperator’ shows a significant positive effect on the purchase

probability. Meaning that the number of times a person visits a touroperator website by use of a fixed device leads to an increase in purchase probability. In terms of odds ratio, an increase of 1 in the number of times a person visits a touroperator website by use of a fixed device leads to an increase in purchase probability of 1.002%, compared to not visiting a

touroperator website by use of a fixed device (or other device), with a marginal increase of 0.00002%. Therefore, hypothesis 4B ‘Tour operator websites visited on a fixed device have a

positive effect on the purchase probability in flight tickets/vacation bookings’ is accepted.

The variable ‘MoAccomodation’ shows a significant positive effect on the purchase

probability. Meaning that the number of times a person visits an accommodation website by use of a mobile device leads to an increase in purchase probability. In terms of odds ratio, an increase of 1 in the number of times a person visits an accommodation website by use of a mobile device, leads to an increase in purchase probability of 1.001%, compared to not visiting a touroperator website by use of a fixed device (or other device), with a marginal increase of 0.0002%. Therefore, hypothesis 5A ‘Accommodation websites visited on a mobile

device has a negative effect on the purchase probability in flight tickets/vacation bookings’, can not be accepted. The effect is hypothesized as negative effect but actually turned out to be

(27)

The variable ‘FixAccomodation’ shows a significant positive effect on the purchase

probability. Meaning that the number of times a person visits an accommodation website by use of a fixed device leads to an increase in purchase probability. In terms of odds ratio, an increase of 1 in the number of times a person visits an accommodation website by use of a fixed device, leads to an increase in purchase probability of 1.005%, compared to not visiting a touroperator website by use of a fixed device (or other device), with a marginal increase of 0.0005%. Therefore, hypothesis 5B ‘Accommodation websites visited on a fixed device has a

positive effect on the purchase probability in flight tickets/vacation bookings’, is supported.

The variable ‘MoFlightTickets’ turned out to be not significant, therefore it is not possible to interpret these coefficient and corresponding marginal affect and Odds Ratio. Hypothesis 6A ‘Flight tickets websites visited on a mobile device has a negative effect on the purchase

model.

The variable ‘FixFlightTickets’ turned out to be not significant, therefore it is not possible to interpret these coefficient and corresponding marginal affect and Odds Ratio. Hypothesis 6B ‘Flight tickets websites visited on a fixed device have a positive effect on the purchase

(28)

4.3 Latent Class Analysis

A Latent Class Analysis (LCA) is performed in order to segment the consumers based on demographical variables. Before conducting the LCA on the dataset, a new dataset is prepared, aggregated on UserID. Because the segmentation is used to get an insight in how demographical variables are related to device use, the same variables used for the logistic regression are merged together with the demographical dataset. After the dataset is prepared the LCA is performed as discussed below.

First, the right package for executing the Latent Class Analysis has been selected. For choosing the right package is important to know the variables used for analysis (Haughton, Legrand, and Wolford 2009). The variables used for segmentation are ratio or binary scaled and therefor the ‘depmixS4’ package is best suited.

Because using the mlogit model including the segment variables leaded to running problems in R. The ‘Mix’ function from the package ‘depmixS4’ is used to execute the segmentation. Different segment solution running from a two- segment solution until a seven-segment solution are retrieved. Table 7 ‘Segment sizes (in users) x number of segments’ below reports the sizes of the segments for the six different models. In the two-segment solution, the

number of users is almost equally distributed. In the third-segment solution, the number is less equally distributed. From the fourth-segment solution until the seventh-segment solution there is a greater variation. For example, as can be seen from the table below, in the four-segment solution the first segment consists of 58 users while the second segment consist of 3827 users.

Table 7: Segment sizes (in users) x number of segments

Segment K=2 K=3 K=4 K=5 K=6 K=7 1 3226 2346 58 3827 3827 189 2 3201 1662 3827 57 304 510 3 2419 425 2075 81 98 4 2117 99 784 1597 5 369 75 195 6 1356 3827 7 11

In determining the optimal segment solution, the information criteria of the different models are assessed. Table 8 ‘Information criteria per segment’ gives an overview of the information criteria per segment solution and the Log Likelihood values. The AIC is defined as -2*LL +

3* npar and BIC is defined as -2 * LL + ln(N) * npar. A graphical overview of the BIC and

(29)

Figure 2: Graphical overview of BIC and AIC values

When deciding on the right segment size, it is important to not underestimate the managerial and operational perspectives. Having too many segments could lead for example to

(30)

Table 7: Log-Likelihood, AIC and BIC values Number of segments Log likelihood AIC BIC 2 -68402.11 136846.2 136988.3 3 -1696.889 3457.777 3674.362 4 123926 -247765.9 -247474.9 5 122245.4 -244382.7 -244017.3 6 122545 -244960 -244520 7 123877.7 -247603.3 -247089

Based on the decision mentioned above, the segments within a three-segment solution are further investigated. To be sure, the homogeneity of variance assumption is not affected, Fligner-Killeen tests are performed on all variables to test this assumption. Since the data is not normally distributed the Fligner-Killeen test is most appropriate. Table 8 ‘Latent Class

Analysis Results: variable comparison’ shows an overview results of the tests. For the

variables being significant, a Kruskal-Wallis analysis of variance was conducted (Vargha & Delaney, 1998).

Given that Gender, Education, Income and Size of municipality are on a nominal or ordinal scale another test is needed in order to investigate the statistical differences. Therefore, a Chi-square test is performed on these variables. Table 8 ‘Latent Class Analysis Results: variable

comparison’ below shows the means and the p-values for the focal variables. All p-values are

significant meaning that all segments differ significantly. Although the segments are

significantly different, this is no evidence between which segments these variables differ. In order to investigate which values, differ between which segments, a pairwise Wilcoxon rank sum test is performed. The values that are significantly different compared to all segments for that variable are highlighted in green. The values highlighted in red are not significant

(31)

Table 8: Latent Class Analysis Results: variable comparison

Sign. Codes: 0 ‘***’ 0.001

a_{Do not want to say also includes do not know.} Size % Segment 1 2346 36.5% Segment 2 1662 25.9% Segment 3 2419 37.64% P-value Gender Male Female 959 1387 722 940 919 1500 .002* Age 45.75 62.59 52.66 .002* Education Low Middle High 92 1373 881 960 702 0 688 1298 433 0.000* ** Income Below average Average Above average

Do not want to saya