The Extent to which Type of Customer and Competitive Effects Influence Purchase Behaviour

(1)

The Extent to which Type of Customer and

Competitive Effects Influence Purchase

Behaviour

A study on online search intermediaries

V. I. Pelousek University of Groningen Faculty Economics and Business Master Thesis MSc Marketing Intelligence

17th of June, 2019

(2)

(3)

1. Preface

This is my Master thesis; The extent to which search characteristics and competitive effects influence purchase behaviour. It marks the end of my academic education, for now. It is an analysis of different types of customers and their change in purchase behaviour when participating in cross-over comparison and not. It is the outcome of following a Master degree in the field of Marketing Intelligence.

This paper has been written under the supervision of dr. Abhi Bhattacharya. His guidance, but especially the fun and enlightening meetings have made the process of writing a thesis enjoyable. It was a good choice to pick him as my supervisor, also because of the dataset. I wanted to work with a real dataset originated from a company, to learn how to handle big datasets. This has been challenging from the beginning but enhanced my coding skills for the future.

I need to thank not only my supervisor, but also my two group members, Bas Vollenbroek and Herman Mull, who were always helpful and up for a discussion related to our thesis. Another shout out goes to Lilas Faham, Elina Marangotidou, Karel Van Duijl, Maud Groot Beumer, Dominika Jakabova, Antje Bieberstein, and Wouter Van Nijendaal, who have been always great supporters throughout my life in Groningen.

My warmest and deepest gratitude goes to my father and his wife, DI Erich Pelousek and Mag. Elsa Königshofer. We have been through a lot together in the last years, but I would have never had the strength to finish this master without their unconditional love and support for living my life and making my own decision. I am very grateful to have you as my parents.

It has been a wild journey for me, and I am grateful for every bit of it. All my professors were dedicated, and motivated lecturers and I always appreciated the friendly atmosphere in the tutorials, even when my group and me were doing the same mistakes and asking for help. And as Franz Josef, an Austrian Emperor, famously said: “Es war sehr schön, es hat mich sehr gefreut!“. In English: „It was very nice, it was my pleasure!“

(4)

Table of Content

1. Preface ...3

2. Introduction ...6

2.1. The Age of Technology and the Challenges it Brings ...6

2.2. The Need of Intermediaries ...7

3. Theoretical Framework ...9

3.1. The Purchase Decision Journey ...9

3.2. Online Search Intermediaries ... 10

3.3. Online Travel Search Intermediaries ... 11

3.4. Cross-Over Comparison and Who Benefits From It? ... 12

3.5. Individuum or Family as a Customer Unit ... 13

3.6. Short And Long Stays ... 15

3.7. Advanced and Spontaneous Planners ... 16

3.8. Interactions ... 16

3.9. Conceptual Model ... 17

4. Data ... 18

4.1. The Search Process ... 18

(5)

5.3.3. Promotion ... 26

5.3.4. Place ... 26

5.4. Model Estimation ... 26

6. Results ... 27

6.1. Assessing Mediator Effects ... 27

6.2. Validity ... 30

6.3. Model Fit and Accuracy ... 30

6.4. Models Interpretation ... 32

7. Discussion ... 37

7.1. Managerial Insights ... 37

7.2. Limitations ... 38

7.3. Recommendations for Future Research ... 38

8. Bibliography ... 39

(6)

2. Introduction

The aim of this master thesis is to evaluate consumer behaviour and the way it changes based on different search characteristics, as well as the effect of cross-over comparison. This change will be analysed through user data observed from intermediaries in the service industry such as search engines and online marketplaces. The dataset used has been provided by Expedia however the aim is to create a more general overview of consumer behaviour changes when dealing with comparison shopping.

2.1. The Age of Technology and the Challenges it Brings

In order to get a better understanding of the present situation, it serves us to take a short stroll down memory lane. A little over 10 years ago, the iPhone was invented. At this time Facebook was still competing with Myspace for traffic and Amazon was only known for selling books. The online economy as it exists today was close to non-existent but growing rapidly. Most people, however, found the products they were looking for in large malls where billboards would advertise the latest trends in order to influence consumer behaviour before people ever set foot in the store. In the store, salesmen and women provided a friendly face for the customer as well as the main source of information when it came to making purchases. Websites comparing prices and customized ads were still a niche and not widely used. The journey of a product from production to marketing to eventual sale was pretty well understood. All that started changing with the addition of digital comparison tools and the ability for people to buy things online. This would forever change the classical purchase decision journey from local stores to the information superhighway that is the internet.

(7)

2.2. The Need of Intermediaries

(8)

sales were $81.6 billion in the same year. The third biggest player on the market is American Express Global Business Travel with sales of $32.7 billion in 2017. In order to compete, companies are constantly looking for new ways to integrate reviews and diversify the products that are being offered. This is done from the assumption that consumers want convenience above all else. They would much rather book everything on one website (Lynch & Ariely, 2003), so finding regular customers in this growing industry is vital for the longevity of your travel company. Especially with the increased competition from larger tech giants.

This competition got another formidable opponent in Google as it launched “Google Travel” on the 14th of May 2019. Google is obviously in the fortunate position to use the vast amount of user data it has and will further collect to make better and more personalized recommendations. This is a feature most companies cannot compete with, not for lack of trying but strictly from a lack of user data. Why would a customer use Expedia to make travel plans, if Google is consistently able to market options that are far more accurate portrayals of the individual customer? Mark Okerstrom, CEO of Expedia Group, who spent $1.53 billion on sales and marketing in the first quarter this year, admitted that Google will be the main source of competition. Ironically, most of these sales and marketing expenditures were made using Google and in part financed the development of Google Travel. (reference: skift.com retrieved on 17th of May 2019 at 2 pm).

This competition highlights the importance of understanding customers and their individual needs in order to be able to market to them effectively. In the current space, marketers and advertisers need a full understanding of the customers and how to affect their decisions. This thesis will focus on analysing the competitive effects for different types of customers in order to develop better strategies for providing information and services. The main question is:

To which extent does simultaneous comparison of competitive offers affect the purchasing probability for different groups of consumers?

(9)

insights. The last part will also deal with limitations and future research possibilities for the industry.

3. Theoretical Framework

In this section, I examine the theoretical background of several topics that are relevant for this paper. Firstly, the change in the purchase decision-making process due to the advent of the internet will be illustrated to emphasize the importance of online intermediaries. Secondly, the concept of cross-over comparison will be explained. Thirdly, important keywords, such as online intermediaries, will be defined by analysing their role for consumers and to assure a coherent understanding. Fourthly, each customer characteristic will be explained and hypotheses regarding differences in consumer behaviour while comparing or not with competitor’s websites will be evaluated.

3.1. The Purchase Decision Journey

One of the most famous models which explained the buying process better is the “AIDA Model”, developed by E. St. Elmo Lewis in 1898 (Langett, 2018). It illustrates four linear stages a consumer embarks on when purchasing a product or service: Awareness, Interest, Desire and Action. A further development is the general buying decision process including five steps, problem recognition, information search, evaluation of alternatives, purchase decision and post purchase behaviour (p. 166, Kotler & Keller, 2012). The models became more detailed over the years by adding more specific steps into the journey and both are still relevant today. Nevertheless, both did not capture the full spectrum of activities involved in a modern purchase decision.

(10)

media influence the decision process. The magnitude of ZMOT is that it might be the first marketing framework that highlighted the relevance of digital channels as a critical part of the customer decision journey. This inspired businesses to begin seeing “buzzwords” such as SEO and SEM as essential (Lecinski, 2011). Moreover, it shows the importance of intermediaries in people’s everyday life and especially during their purchase decision journey.

3.2. Online Search Intermediaries

Today online search intermediaries have a vital role for customers by gathering information, comparing alternatives, all on one website, which reduces their search costs tremendously. Back in the days, customers had to visit various places to compare products, services and suppliers, to be able to make an informed purchase decision. These were brick-and-mortar intermediaries for example retailers and shopping malls. However, in times of globalisation, round the clock interconnectedness and an ever-growing amount of information, the importance of “new economy” intermediaries increase day by day (Hagiu & Jullien, 2011).

Research has shown that the risks and uncertainties of consumer choices rise with the ever growing number of alternatives or attributes in a choice set (Bettman, Luce, & Payne, 1998; Lurie, 2004) and includes higher transaction costs – for example search costs for analysing alternatives, learning costs associated with accustoming oneself with alternatives and activity costs involved in motivating a change (Schweitzer, 1994). This shows that the most important function of market intermediaries is needed now more than ever, to reduce the search costs of their customers. The intermediaries success is measured by the amount of this reduction (Spulber, 1996). This means that the easier it is for a buyer to find what they need, to set standards, and to enable comparison shopping, the more successful an online intermediare will be.

(11)

margins, which are not neceserally matching the unsers preferences. Moreover, offline intermediaries, such as shoppign malls, have also their tactics to increase the duration of a consumer’s shopping experience by placing anchor stores far from each other. Another example, retail shops will place the most sought-after items at the far end of the store (Petroski, 2003).

3.3. Online Travel Search Intermediaries

In 2017, the global travel industry became one of the largest and fastest growing sectors in the world with gross bookings of $ 1.6 trillion (Langford & Weissenberg, 2018). The World Bank measured that the number of international travel departures across the globe has more than doubled in two decades, from roughly 600 million in 1996 to 1.567 billion in 2017 (World Bank). Travel Weekly releases an annual list of 50 online travel agencies with at least $100 million in annual sales. Expedia has been on top of this list since 2014, however, new players have just joined the game as Google launched “Google Travel” on the 14th of May 2019. It is a combination of their known features Search, Maps and Trips. All three will feed into each other so that if a user searches for restaurants or hotels in another city, Google will store this information and populate that into the Travel desktop service. Another new rival is Amazon, stepping into the travel sector together with India online travel agency Cleartrip, Amazon has started offering domestic flights in India (Singh, M., 2019; May, K., 2019).

The most popular travel products purchased online are flights and accommodations (Card, Chen, & Cole, 2003; Morrison, Jing, O’Leary, & Cai, 2001). However, there is still a high number of persons who inform themselves online but book accommodation offline. 25% of people who seek information about accommodation offers online but buy offline. 13% of people who seek information about flights book offline and 14% of people who seek information about car rentals book offline (Jun, Vogt, & MacKay, 2010). One of the reasons for this high amount of offline bookings is that accommodations are high complex products.

(12)

The need for greater information detail can be associated with greater perceived risk of the delivered quality. People who book their accommodation online have a higher mistrust towards the end result, which aligns with the above mentioned 25% of people who seek information about accommodation offers online but book offline. It can be assumed that accommodations take more attention and time from consumers while booking than flights or car rentals. (Beldona, M orris on, & O’Leary, 200 5)

The vital competition, the growing demand and the necessity to offer precise and well thought through search results show the importance for online travel agencies to strive for competitive advantage by serving their customers with personalised offers and therefore more accurate search possibilities and results.

3.4. Cross-Over Comparison and Who Benefits From It?

Search intermediaries are a comparison-shopping tool offering cross-over comparison shopping. This is the practice of examining in contrast product specific characteristics in advance of buying in order to realise the best purchase and pricing on products or services (businessdictionary.com, 2019). Although research has been proven that comparison shopping tools have a considerable influence on online shopping behaviour (Häubl & Trifts, 2000; Kamis, 2006), only little is known about the type of customers who are more expected to participate and profit from it. Anticipated convenience, anticipated ease of use and trustworthiness are key factors of travel meta-search engine adoption (Park and Gretzel, 2006). It hast been found that anticipated convenience is constantly the greatest direct predictor of intentions to adopt a technology (Lee, Kozar & Larson, 2003). Agarwal and Prasad (1999) demonstrated that differences in people, such as demographics, cognitive styles and characteristics influence anticipation of convenience of new technologies. Not much has been found on consumer shopping behaviour regarding specific types of customers and their tendency to cross-over comparison. It can be argued that whether one perceives a comparison-shopping tool as convenient is subject to how one purchases. Therefore, this thesis investigates which type of customers are more prone to cross-over comparison, to then be able to use this knowledge for customer-centric marketing strategies.

(13)

and product types on their need for diversity, (Rohm and Swaminathan, 2004). They simply enjoy the process and find it easy and convenient, as Klassen et al. (2009) showed that online shoppers have a general positive approach towards the Internet. Correspondingly, elderly Internet users have confidence into online cross-over shopping due to a positive attitude towards the Internet (Iyer and Eastman, 2006).

As shown, many studies have been focused on the type of person, online shopper, elderly people, but not many looked at the comparison behaviour of types of customers selected upon their shopping behaviour. People who buy in bulks, who plan in advance, who buy for others or themselves only. These customers have different motives to shop and I assume will accordingly differ in their shopping behaviour as well. Additionally, it can be suggested that these groups will also differ in their comparison-shopping behaviour.

The focus of this thesis is to analyse different types of customers and by doing so knowledge can be acquired to enhance customer centric marketing strategies to better serve their needs. In order to answer the research question thoroughly, several sub-research questions need to be formulated:

• To which extent does cross-over shopping affect the purchasing probability of families? • To which extent does cross-over shopping affect the purchasing probability of individuals? • To which extent does cross-over shopping affect the purchasing probability of people who

purchase in large quantities?

• To which extent does cross-over shopping affect the purchasing probability of people who plan

in advance?

3.5. Individuum or Family as a Customer Unit

(14)

The ideologies familialism and individualism show the core differences of these two target groups. Although, one person is conducting the acquisition, purchasing a product for a family or one individual, differentiates in purchase intention and process. Families and single consumers have different intentions while purchasing a product, a family will put priority to the needs of all its members (p. 20-21, Ochiai E., Hosoya L.A., 2014) and an individual only to one-self (Lukes S. M., 1998). Next, several factors of differentiation between families and one individual as a customer will be elaborated, such as the decision process, flexibility and budget. Many studies looked at family decision making process with the specific focus on conflict or conflict resolution (Lackman & Lanasa, 1993; Lee & Beatty, 2002; Martínez-Salinas & Polo-Redondo, 1999). This illustrates, without surprise, that a customer unit with more members, for example a family, will often lead to a conflict or discussions. A family is a group of people with different opinions and preferences, which can clearly lead to a longer evaluation of alternatives compared to a single consumer.

I argue that families will compare more with other websites, because of two reasons. Firstly, a family decision is a collective process where several different hotel preferences need to be considered. This leads to multiple hotel ideas, which need to be researched and compared. It takes more research to find something that everyone likes. In contrast, an individual has only one person’s preferences to acknowledge, which reduces the consideration set and therefore the number of alternatives to compare. Secondly, more than half of the parents cannot take their children on holiday abroad, due to the high costs. 54% of families single out staycation as a holiday due to too high expenses by traveling to another country. A family, especially with children, who are attending school, will have a lower flexibility due to school holidays compared to a single person who is not time bounded to other family member’s agenda. A research by Expedia revealed that 59% of parents cannot afford to take their children on holiday during the peak season and rather take their children out of school during the school year (Hampson L., 2018). This shows that many families cannot afford to take the whole family on holiday abroad, which assumes that families are more price sensitive. Customers who are more price sensitive will spend more time on comparing offerings.

(15)

H1: Families will have a stronger impact on purchase probability than individuals in case of cross-over comparison.

3.6. Short And Long Stays

Another option to personalize the search process and the resulting offerings more towards the needs of the customers is by differentiating between a customer who is looking for an accommodation for a short or a long stay. To be able to generalise the findings for this type of customer, it can be seen as a person who tends to purchase in large quantities, compared to a person who buys in small amounts.

Research has shown that the complexity of the selection process for an accommodation depends on the length of the stay (Lockyer & Roberts, 2009). Lockyer and Roberts show clearly that a customer preparing for a short stay has minimalistic expectations towards the room. In contrast, a visitor planning a longer stay will have higher expectations, because the more time a person spends in a room, the more comfort is appreciated and needed. This illustrates clearly that the length of the stay of a customer is a crucial factor, while booking an accommodation and must be analysed in more detail. Beldona, Morrison, & O’Leary (2005) conducted similar findings and characterised an accommodation as a high-complex product, which makes it an investment rather than a quick purchasing decision.

Considering that a hotel room is seen as a high expense and becomes even harder to select if the stay is long, this leads to the assumption that customers will invest more time and effort into the selection of their hotel room in case of a long stay. Visitors who stay longer will spend more time in their room therefore they have more expectations towards the hotel. Due to the enhanced research, customers will visit intermediary and hotel specific websites to get a better picture of their holiday housing. This assumes higher search costs for customers who look for a hotel room for a long stay or for any person making an investment in some way. This leads to hypothesis 2. Users who are looking for a longer stay will participate in comparison shopping, compared to users who are planning a short stay.

(16)

3.7. Advanced and Spontaneous Planners

The booking window or lead time is the period of time between the day of the booking of the accommodation and the actual arrival day to the place. No research or literature could be found on the specific topic of advanced or spontaneous trip planning and how this correlates with comparison shopping. However, customers who book rooms in advance differ from customers who book a trip spontaneously. It can be assumed that a booking in advance can be compared to a longer stay in a hotel, both can be characterised as investments. The user is planning something in the further future, is looking forward to it, needs to plan ahead for some reason, which means he has more time to compare with different products. This means that the ZMOT will take more time and depth for people with advanced bookings compared to people with spontaneous trips. In general, they will compare more, visit other intermediary and specific hotel webpages. On the contrary, visitors who are going on a spontaneous trip have less time for research.

H3: Advanced planers will have a stronger impact on purchase probability than spontaneous planners in case of cross-over comparison.

3.8. Interactions

Several interactions will be tested as well. For example, families who are planning a long holiday seem to have very high chance to participate in cross-over comparison. The reasons for this are the available time for comparison, the amount of different people and the need to save money (Hampson L., 2018). It is assumed that the need to compare offers for a long stay is higher for a family than for an individual.

H4: Families planning a long stay will have a stronger impact on booking probability than an individual planning a long stay, in the case of cross-over comparison.

The second interaction which will be tested is for families planning far ahead into the future will have a higher tendency to conduct comparison shopping, in contrast to an individual planning ahead.

H5: Families planning in advance will have a stronger impact on booking probability than an individual planning in advance, in the case of cross-over comparison.

(17)

H6: Planning a long stay far in advance will have a stronger impact on booking probability than a spontaneous short trip, in the case of cross-over comparison.

3.9. Conceptual Model

The hypothesized relationships are visualized in the following graphical summary (Figure 1):

Booking (Y) c Search Characteristics (X): • Familie • Individual • Long/Short Stay • Advanced/Spontaneous Planner Competition (Z) Booking (Y) Covariates: • Price • Product • Place • Promotion c’ a b

Figure 1: Conceptual Model

(18)

4. Data

In this chapter the dataset used to test the hypotheses empirically will be explained. Firstly, the search process will be outlined to give a better understanding of the actions a user can take during a visit on an intermediary site. This is important as most actions are captured as a variable. Secondly, conducted cleaning steps will be examined. Thirdly, an overview of the data and its variables will be presented in form of descriptive statistics and graphs. Fourthly, transformed and newly coded variables will be presented to give a specific picture of each variable included in the model.

4.1. The Search Process

(19)

4.2. Cleaning Steps

Expedia provided a train and a test dataset for potential analysis, but the test dataset can not be used for this paper and its purpose, as it does not include the variable booking_boolean, which is essential here. Therefore, it was not possible to merge these two sets, nor to test the model on it. This is the reason for splitting the original train dataset into a training and a test set. As the model estimation is done on the search impression level, it is important to ensure that the random splitting controls, selects and orders by search_id. The training set consists of 70% and the test dataset of 30% of the original dataset, however, both sets are each novel and randomly filtered out of the original dataset. This means that these sets are not complementary to each other but could contain the same search_id’s. Nevertheless, this is not a problem because the original data set contains 9,917,530 observations, which means that the amount of data weighs out the risk of similar search_id’s.

The dataset provided by Expedia is reasonably tidy only a few steps need to be taken to assure unbiased estimations. Inspired by the paper from de Jonge and van der Loo, “Steps in ‘statistical analysis value chain’ ” cleaning steps regarding accuracy, consistency and completeness of the dataset are being conducted. Consequently, the following cleaning steps and analysis have been done with the training dataset.

4.2.1. Missings

The dataset contains many NULLs, however, more than the majority of them are essential for this paper. The variables comp_rate, comp_inv and comp_rate_percent_diff indicate when a user was comparing Expedia to another intermediary site or not. This means that these NULLs are MNARs, Missing not at random. This is the reason why the author decided not to manipulate these variables. Nevertheless, comp_rate_percent_diff had to be modified. It measures the price difference between the Expedia and the competitor price of all eight competitors in percentage. However, these values were in their original form absolute numbers and therefore, impossible to work with. A decimal comma was introduced to the variable.

4.2.2. Outliers

(20)

impression containing one invalid value has to be deleted. The dataset had 279,541 unique search impressions before the treatment and 269,978 afterwards, which indicates that 9,563 queries are invalid. This is a small reduction of approximately 3.42 % and therefore not troublesome.

One session has to be deleted due to inaccuracy. The search impression with the srch_id 641,377 shows five different hotels, but five identical prices. This needs to be deleted due to inaccurate values, moreover, five observations are affected by it, which is as well not alarming.

4.3. The Dataset

The dataset had to be reduced to 30% because the modelling process could not take place due to the high memory demand. Analysing the entire dataset was simply not possible as the program R would constantly crash. However, the reduced dataset still has statistical power and a random selection done by search id assures a good sample of the whole dataset. The following descriptives are based on the reduced and cleaned dataset.

This dataset has 2.9 million observations and originates from customers’ requests of hotels on Expedia, which are spread over a time period starting on the 1st of November 2012 until 30th of June 2013. This provides us with eight months of sales and marketing information, with the substantial amount of observations an analysis for patterns in the intermediary sector will be significant. The dataset is contributed by Expedia, one of the leading online travel agencies and contains details on 120,835 hotels situated in 168 countries. The Expedia data is organized at the level of search impressions. A search impression is an ordered list of accommodations and their features seen by users in return to their travel query.

(21)

Many variables are anonymized, such as the name of the country the user is searching from or for. Nevertheless, Ursu (2016) has proven that 80% of the search impressions are made by users that live there, assuming it has a large territory with a large amount of domestic travel. Another point of evidence is that 73% of Expedia’s traffic originates from U.S. visitors (Ursu, 2016). Table 1

Specific Descriptives

Total Percentage

Families 28,259 24%

Families comparing 21,167 75%

Families not comparing 7,092 25%

Individuals 20,792 18%

Individuals comparing 17,443 84%

Individuals not comparing 3,349 16%

Long Stay (Short Stay) 11,839 (104,017) 10% (90%)

Long Stay comparing (Short Stay) 7,118 (82,278) 8% (92%)

Long Stay not comparing (Short Stay) 4,721 (21,739) 18% (82%) Advanced Planner (Spontaneous Plan.) 61,268 (54,588) 53% (47%) Advanced Planner comparing (Spontaneous Plan.) 37,553 (51,843) 42% (58%) Advanced Planner not comparing (Spontaneous Plan.) 23,715 (2,745) 90% (10%)

Table 1: Specific Des cr iptives

(22)

4.4. Transformation Steps

Furthermore, new variables are created to define specific search characteristics and be able to measure exactly their effect on the dependent variables, booking. Additionally, it will keep the model simple, one of the model specification rules of Little. Each variable symbolises one type of customer elaborated in the theoretical part of this paper.

Types of Customer Units

A family is defined as at least one child and one adult, and the variable is called Family. A one-person-customer unit is characterised as one adult and no children, the variable is called Individual. Both variables are retrieved from both variables srch_children_count and srch_adults_count. The author chose to analyse families and individuals, because both target groups need differentiated marketing concepts. Furthermore, research has been done on both groups as well, which lies the basis for this paper. Moreover, as these groups are fundamentally different, the contrasts will be very apparent and interesting for further marketing activities. Any comparison agents serving these kind of groups can adapt its service or advertisement accordingly.

Length of Stay

The variable srch_length_of_stay shows the number of holiday days a user defined in his query. A variable with the name Long has been coded to indicate a long stay with 1 and a short stay with 0. Therefore, the variable is a binary and measures both types of stays. A short stay is defined as a stay under or equal to four days, such as a long weekend trip, which are very popular as seen in the chapter data. Everything above this is a long stay.

Leading Time

(23)

5. Methodology

This chapter will examine the method in detail used in this paper. In order to investigate the effects of search characteristics and competition on the dependent variable, a logistic regression is mandatory. This method has been chosen due to the binary nature of the dependent variable, Booking. This part is separated into four parts. Firstly, the class bias will be checked, followed by illustrating both models which will be used for this research. Model one will analyse the direct effects and model two the interaction effects. Thirdly, the control variables will be portrayed, and their use explained. Lastly, the model estimation will be described.

5.1. Class Bias

For a logistic regression to predict well, the ideal dataset would be balanced, this means that the proportion of events and non-events in the dependant variable should approximately be the same. However, this happens rarely in reality and almost never in specific business sectors, such as in the case of online search intermediaries. Users often visit these websites only for research purposes without conducting a purchase and table 1 clearly confirms this assumption. The dataset includes an obvious class bias with only 2.89% bookings made by Expedia customers.

Table 1 Class Bias

No Booking - 0 Booking - 1 Percentage of Bookings

2,804,150 81,087 2.89%

Table 2: Clas s Bias

(24)

5.2. Model Specification

Model 1 – Direct Effects: 𝑃𝐵𝑜𝑜𝑘𝑖𝑛𝑔

= 𝑒𝑥𝑝 (𝛼 + 𝛽1𝑃𝑟𝑖𝑐𝑒 + 𝛽2𝑃𝑙𝑎𝑐𝑒 + 𝛽3𝑃𝑟𝑜𝑑𝐵 + 𝛽4𝑃𝑟𝑜𝑑𝑆 + 𝛽5𝑃𝑟𝑜𝑚 + 𝛽6𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽7𝐴𝑑𝑣𝑃𝑙𝑎𝑛𝑛𝑒𝑟 + 𝛽8𝐹𝑎𝑚 + 𝛽9𝐼𝑛𝑑 + 1 + 𝑒𝑥𝑝 (𝛼 + 𝛽1𝑃𝑟𝑖𝑐𝑒 + 𝛽2𝑃𝑙𝑎𝑐𝑒 + 𝛽3𝑃𝑟𝑜𝑑𝐵 + 𝛽4𝑃𝑟𝑜𝑑𝑆 + 𝛽5𝑃𝑟𝑜𝑚 + 𝛽6𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽7𝐴𝑑𝑣𝑃𝑙𝑎𝑛𝑛𝑒𝑟 + 𝛽8𝐹𝑎𝑚 + 𝛽9𝐼𝑛𝑑 +

𝛽10𝑆ℎ𝑜𝑟𝑡𝑆𝑡 + 𝛽11𝑆𝑝𝑜𝑛𝑃𝑙𝑎𝑛) 𝛽10𝑆ℎ𝑜𝑟𝑡𝑆𝑡 + 𝛽11𝑆𝑝𝑜𝑛𝑃𝑙𝑎𝑛) Model 2 – Interaction Effects:

𝑃𝐵𝑜𝑜𝑘𝑖𝑛𝑔 = 𝑒𝑥𝑝 (𝛼 + 𝛽1𝑃𝑟𝑖𝑐𝑒 + 𝛽2𝑃𝑙𝑎𝑐𝑒 + 𝛽3𝑃𝑟𝑜𝑑𝐵 + 𝛽4𝑃𝑟𝑜𝑑𝑆 + 𝛽5𝑃𝑟𝑜𝑚 + 𝛽6𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽7𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽8𝐹𝑎𝑚 + 𝛽9𝐼𝑛𝑑 + 1 + 𝑒𝑥𝑝 (𝛼 + 𝛽1𝑃𝑟𝑖𝑐𝑒 + 𝛽2𝑃𝑙𝑎𝑐𝑒 + 𝛽3𝑃𝑟𝑜𝑑𝐵 + 𝛽4𝑃𝑟𝑜𝑑𝑆 + 𝛽5𝑃𝑟𝑜𝑚 + 𝛽6𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽7𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽8𝐹𝑎𝑚 + 𝛽9𝐼𝑛𝑑 + 𝛽10𝐿𝑜𝑛𝑔𝑆𝑡 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽11𝐹𝑎𝑚 ∗ 𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽12𝐹𝑎𝑚 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽13𝐼𝑛𝑑 ∗ 𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽14𝐼𝑛𝑑 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽10𝐿𝑜𝑛𝑔𝑆𝑡 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽11𝐹𝑎𝑚 ∗ 𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽12𝐹𝑎𝑚 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽13𝐼𝑛𝑑 ∗ 𝐿𝑜𝑛𝑔𝑆𝑡 + 𝛽14𝐼𝑛𝑑 ∗ 𝐴𝑑𝑣𝑃𝑙𝑎𝑛 + 𝛽15𝑆ℎ𝑜𝑟𝑡𝑆𝑡 ∗ 𝑆𝑝𝑜𝑛𝑃𝑙𝑎𝑛) 𝛽15𝑆ℎ𝑜𝑟𝑡𝑆𝑡 ∗ 𝑆𝑝𝑜𝑛𝑃𝑙𝑎𝑛) Where, 𝜶 = Intercept

𝜷_{𝟏,𝟐,𝟑,𝟒….𝟏𝟎} = Parameter estimates, effects of treatments

𝑷 Booking = Probability of users booking a hotel room

𝑷𝒓𝒊𝒄𝒆 = Hotel room price

𝑷𝒍𝒂𝒄𝒆 = Attractiveness of Hotel location 𝑷𝒓𝒐𝒅𝑩 = Hotel chain affiliation

𝑷𝒓𝒐𝒅𝑺 = Hotel number of stars

𝑷𝒓𝒐𝒎 = Indicated hotel promotion 𝑭𝒂𝒎 = Family as a customer unit 𝑰𝒏𝒅 = Individual as a customer unit

𝑳𝒐𝒏𝒈𝑺𝒕 = Long Stay

ShortSt = Short Stay

𝑨𝒅𝒗𝑷𝒍𝒂𝒏 = Advanced Hotel Planner

SponPlan = Spontaneous Planner

5.3. Control Variables

(25)

significant effects on sales (booking), which need to be controlled for to successfully separate them from the inspected effects to be able to correctly interpret these results. The author will justify the choice of variables in the following part with previous research.

5.3.1. Product

Product is in the case of this research the hotel, which is best portrayed by two variables of the dataset, prop_brand_bool and prop_starrating. Prop_brand_bool signifies if the hotel belongs to a chain, such as Marriot or Hilton, or not. This variable is of Boolean nature; hence it specifies chain affiliation with 1 and the opposite with 0. The benefits of corporate affiliation have been greatly studied, also specifically for the hospitality industry. Economic advantages such as geographic dispersion of units and economies of scale in attainment of commodities, production and labour utilization are the classical examples (Israeli, 2002). Additionally, there are subtle benefits as well, such as the proficiency to utilize the chain’s name as a symbol of quality (Schelling 1960). Tourists will return to the same hotel chain in expectation of the same good service quality and the brand name is an informal signal of this.

However, a brand is very depending on brand recognition, individual taste and experience. Several tourists may simply not know some hotel cooperation’s or are not in favour of them, due to poor experiences. Whereas a standardized system is able to measure the hotel’s quality more objectively (Israeli, 2002). Therefore prop_starrating has been chosen as a second variable to illustrate the quality of a hotel by the number of its stars from 1.0 to 5.0. To ease the comparison process for users Expedia assigns the stars and declares their star classification system on their website (expedia.com, 2019). Both variables together give a very good impression of the hotel and therefore about the product’s quality.

5.3.2. Price

(26)

Former research shows that the hotel price is not only one of the main factors in the accommodation selection decision (Lockyer, 2005) but also an influence of perceived service quality (Lewis and Shoemaker, 1997) and consumer satisfaction (Voss et al., 1998). In general, pricing strategies are easier to adjust to changing environments, and therefore seen as more flexible. However, the hotel industry has very inflexible product supply and is unable to change it overnight.

5.3.3. Promotion

Promotion is any kind of marketing communication to inform or persuade a customer about a product or service, the variable promotion_flag was chosen to control for the third P. This variable is of binary nature and shows if the user saw a sale promotion displayed specifically for one hotel offer. The user would see this indicated by an extra field with bright colours stating something similar as “Sale: Best Price”.

5.3.4. Place

As the hotel is the product, the place of the product is the hotel geographical location. In this case the variable prop_location_score1 has been chosen to measure this. Expedia created a variable measuring the location attractiveness with a score ranging from zero to seven. This includes what kind of monuments or types of public transport are nearby, etc. The dataset includes another prop_location_score2, however, this has many missings and the interpretation of its values is not clear. This is the reason for not including it.

5.4. Model Estimation

The author will compare four models in order to test the hypothesis and to validate them. All models attempt to estimate the probability of a hotel booking based on a number of explanatory predictors. The binary nature of the dependent variable leads to a logistic regression model, which will be conducted in this research. It assumes a maximum likelihood method to determine the coefficients which can most accurately predict the dependent variable Booking.

(27)

not compare. By comparing these models, I will be able to show the differences between customer types and how their behaviour changes when comparing or not.

6. Results

This chapter is separated into four parts. Firstly, the mediator effects will be assessed and presented in part one. Secondly, to be permitted to conduct a logistic regression, several assumptions need to be met to do so. These assessments will be examined in the second part. Thirdly, the model fit and accuracy of different models are tested to find the best predicting model for the final analysis. Fourthly, the results of the estimated models testing the authors assumptions will be examined.

6.1. Assessing Mediator Effects

The mediation analysis examines an underlying structure of a relationship where one variable (exposure/treatment X) influences another variable (outcome Y) through a third variable (mediator Z). This makes the mediator Z a variable on the causal pathway from X to Y. Baron and Kenny (1986) found the “causal steps approach” to differentiate mediation. According to their findings, four assumptions must hold to assess mediator effects. The effects should not only be statistically significant, but also substantially different from zero. The assumed mediator model for this research is illustrated in figure 1: (Baron & Kenny, 1986).

1. The effect of the independent variables, search characteristics (X), such as length of stay, lead time and type of customer, on the dependent variable, Booking (Y), has to be significant. This indicates the effect c.

2. The effect of X on the potential mediator, competition (Z), has to be significant and shows the effect called a.

3. The effect of Z on Y has to be significant and illustrates the relationship b.

4. Including Z has to make the effect of X on Y and shows the path called c’. This relationship can either be:

a. non-significant, which would show a full mediation.

(28)

the path c’ would only become visible when the mediator is controlled for. Therefore, testing the direct method of the pathway a*b is more efficient. Another criticism is that Baron and Kenny’s method is not based on the quantification of the intervening effect but on separate tests of the paths a and b, which is not accurate enough. Moreover, it is the method with the lowest statistical power among methods for testing intervening variable effects (Hayes, 2009). In that case, traditional mediation analysis approaches such as Baron and Kenny should be prevented by new methods for example a causal mediation analysis in a counterfactual framework provides a better approach (Pearl, 2012, 2014). It offers a general, non-parametrical calculation of mediation by calculating the direct (c’), indirect effects (a*b) and total effect (c= c’+a*b). Furthermore, it improves understanding, but also allows for more and better estimation methods, enhancing validity, interpretation and utilization in a broader variety of models than the linear one, such as different types of variables and nonlinear effects, for example interactions and moderation.

The concept of counterfactuals provides a better definition of the causal effects involved. The average causal mediation effect (ACME), which is the natural indirect effect or mediator, is the anticipated alteration in Y (outcome) when one lets Z (mediator) alter as if X (treatment) did, while holding X constant, which shows the effect of X on Y through Z. The average causal direct effect (ADE), which is the natural direct effect is the likely difference in Y when one lets X change, but Z is kept constant. This characterises all effects of X on Y, other than through Z. The total causal effect is the anticipated rise in the outcome as the treatment changes from zero to one. While the mediator Z measures this development by averaging over levels of Z, the covariates are being measured as well based on bootstrapping or other simulation methods (Küpers et al., 2015; VanderWeele & Robins, 2007).

(29)

Table 2

Estimates of the Causal Mediation Analysis ACME % ACME (average) ADE % ADE (average) Total Effect Family 2.40% < -0.001*** 102,41% 0.003*** 0.003*** Individual 5.40% < 0.001*** 94,60% 0.003*** 0.003*** Advanced Planner 2.20% < -0.001*** 102.027% 0.003*** 0.003*** Long Stay 1.07% < -0.001*** 99,12% -0.011*** -0.011***

Table 3: Caus al Medi ation Analys is

∗ p <0.05; ∗∗ p <0.01; ∗∗∗ p <0.001

Table 2 illustrates the results of the causal mediation analysis and shows if the types of customers are indirectly, through competition, or directly influencing booking probability of a hotel room. From the estimated total effect increase in the probability of booking an accommodation due to the type of customer being a family, an estimated 2.40% is a result of families using cross-over comparison. This is a partial and negative mediation, which increases the direct effect, instead of reducing it. A negative mediator is also called suppressor and occurs when the direct and mediated effects show opposite signs (Mackinnon, Krull, & Lockwood, 2000). It was first defined by Horst (1941) as a variable which increases the predictive validity of another variable by its addition to a regression. The stated 102,41% is a combination of the direct and indirect effect, hence the direct effect equals 100% originating from the predictor family. This means a family who engages in cross-over comparison on websites such as booking.com enhances the direct effect of the predictor variable family by 2.40%. This could proof the old saying that competition is good for business.

From the estimated total effect increase in the probability of booking an accommodation due to the type of customer being an individual, an estimated 5.40% is a result of individuals comparing offerings with other websites. The remaining 94.60% is from the search characteristic individual itself. This shows a partial mediation and positive mediator effect, reducing the direct effect. 5.40% of the entire booking probability of a hotel room done by an individual is mediated by cross-over comparison. Competition has the greatest mediator effect on the variable individual compared to the other predictors.

(30)

estimated 2.20% is a result of cross-over comparison generated by these users. Moreover, it is a partial suppressor increasing the direct effect. This means that cross-over comparison enhances the booking probability of a hotel room of advanced planners.

From the estimated total effect increase in the probability of booking an accommodation due to the influence of the type of customer who seeks long stays, an estimated 1.07% is a result of cross-over comparison generated by Long Stay. This is another observed partial mediator. The direct effect is reduced by the mediator to 99.12%, which is originating solely from the predictor Long Stay.

6.2. Model Validation

Several validity tests are performed on the model in order to control whether requirements for a logistic regression are violated which could bias the results of the models. The necessary assumptions to operate a logistic regression are the following. The dependant variable needs to be of binary nature, this means that it indicates only two options, either 0 or 1. The first assumption is met as the dependent variable of this research indicates if the customer booked or did not book on the website of Expedia, Booking.

The second assumption is multicollinearity, which tests if variables are highly correlated with each other or not. In general, this is an important issue in regression analysis and can be fixed by removing the identified variables. All variance inflation factor (VIF) scores show a value under 4 which is a good sign as it indicates that no multicollinearity has been detected. As a rule of thumb, VIF scores exceeding 5 indicate a somewhat multicollinearity, but above 10 illustrates a problem of collinearity and steps need to be taken to control for this bias.

Table 3 VIF Scores

Price Place Product Brand Product Star Promotion

1.224855 1.188401 1.061676 1.369130 1.116570

Long Stay Advanced Planner CYN Family Individual

1.101862 1.736474 1.644274 1.092581 1.103391

Table 4: VI F Scor es

6.3. Model Fit and Accuracy

(31)

McFadden Pseudo R2, Receiver operating characteristics curve (ROC), confusion matrix and differences of null and residual deviance.

The BIC is a common criterion for model selection and especially recommended for large datasets. Therefore, the author selected this type over the Akaike information criterion (AIC), which does not penalise as hard as the BIC for adding more parameters to the model. The lower the BIC, the better the model. As the logistic regression model is fitted using the log-likelihood method, the McFadden’s Pseudo-R2 _{is also based on the log-likelihood kernels and often used}

to judge the prediction power of logistic regressions. A high R2_{value indicates that the model}

has a great likelihood, the maximum is 1 (McFadden & Zarembka, 1973). ROC shows the percentage of true positives accurately predicted by the model. The curve of a good model should rise steeply, the greater the are under the ROC curve the better the predictive power of the model. The confusion matrix is commonly used to assess classification models and many measures indicating the prediction power can be calculated from it, such as accuracy, true positive rate (TPR) and true negative rate (TNR). Accuracy determines the overall predicted accuracy of the model, TPR illustrates the amount of positive values correctly predicted and the TNR demonstrates the amount of negative values correctly predicted. For all three measures applies the higher the better the prediction power of the model. Null deviance shows the response estimated by the null model. Residual deviance indicates the response forecasted by the model including independent variables. For both measurements holds, the lower the value, the better the model. Therefore, the lower the difference of both values, the better the prediction power of the model. The misclassification error shows the percentage of mismatch of predicted and actual observed values, regardless of events or non-events in the variable. The lower the misclassification error, the better the prediction power of the model.

(32)

in model 1 plus competition, indicating if the user cross compared with other websites while searching on Expedia or not. Model 3 includes the search characteristics, competition and the four covariates, controlling for the four Ps, place, product, price and promotion. Model 4 includes the same variables as model 3 plus adding interactions possibilities of the four search characteristics to the model.

Table 4

Model Fit Comparison.

Model 1 Model 2 Model 3 Model 4

BIC 736,518.5 736,364.5 721,944.6 721,773.3 Deviance Difference 2,689 2,858 17,352 17,598 McFadden Pseudo-R2 0.004 0.004 0.024 0.024 ROC 51.62% 51.62% 61.15% 61.22% Accuracy Rate 96.59% 97.15% 97.18% 97.18% TPR 3.95% 4.21% 13.33% 8.48% TNR 97.19% 97.19% 97.19% 97.19% Misclassification Error 3.41% 2.85% 2.82% 2.82%

Table 5: Model Fi t Compar is on

Table 4 gives a clear overview over all model fit and accuracy measurements for four different models. Model 3 and 4 have very high prediction powers. Five out of eight measurements of Model 3 have very good values and especially the TPR has the highest value compared to all models. This model will be used to analyse the direct effects of types of customer, length of stay and lead time on the probability of booking. Model 4 shows that four of eight model fit measurements obtain the best values compared to the other models. Therefore, Model 4 will be conducted to investigate various interaction effects.

6.4. Models Interpretation

(33)

Table 5

Model 5 – Parameter Estimates of Cross-Over Comparison Main Effects.

Estimate Marginal Effects Odds Ratio P-Value

Intercept -4.637 0.001 0.000*** Place -0.052 -0.001 0.949 0.000*** Price -0.438 -0.011 0.645 0.000*** Product Brand 0.177 0.004 1.193 0.000*** Product Star 0.327 0.008 1.387 0.000*** Promotion 0.378 0.011 1.459 0.000*** Family 0.154 0.004 1.167 0.000*** Individual 0.087 0.002 1.091 0.000*** Advanced Planner -0.320 -0.006 0.774 0.000*** Spontaneous Planner 0.173 0.004 1.189 0.000*** Long Stay -0.257 -0.007 0.726 0.000*** Short Stay 0.226 0.005 1.253 0.000***

Table 6: Par ameter Es timates of Cr os s -Over Compar is on Main Effects

∗ p <0.05; ∗∗ p <0.01; ∗∗∗ p <0.001

(34)

In general, odds ratios demonstrate a positive relationship if the values are greater than 1. Values of 0 show no relationship and values smaller than 1 imply a negative relationship. Accordingly, odds ratios are simply the exponents of the estimates and both, marginal effects and ratio odds, illustrate the same results in a different arrangement. Therefore, the following parts will outline the odds ratio of the predictors and a detailed interpretation combined with managerial insights will finalise this thesis in part seven.

The results of Model 5 show that all variables are statistically significant, hence influence the dependent variable. Notably families and individuals have a positive effect, whereas advanced planners and people with a long hotel stay have a negative effect on purchase probability. The biggest effect on booking probability for users who compare simultaneously several websites has the predictor Advanced Planner. Followed by Long Stay, Family and Individual has the lowest impact on purchase probability.

Hypothesis 1 has been proven to be correct as families have a stronger impact on purchase probability than individuals in case of cross-over comparison. Hypothesis 2 is also be proven to be correct as people with the intend to stay long in a hotel room will have a stronger impact on booking probability than users who are looking for a short stay in case of cross-over comparison. Hypothesis 3 is also correct as advanced planers will have a stronger impact on purchase probability than spontaneous planners in case of cross-over comparison.

The control variables, the marketing mix, show also an interesting picture. While visitors are conducting cross-over comparison, price has the strongest impact of all variables on purchase probability and as expected, it has a negative influence. The second biggest impact on booking probability has the predictor promotion, then star rating, brand and place has the weakest effect on booking. This ranking has been expected.

Table 6

Model 6 – Parameter Estimates of Cross-Over Comparison Interaction Effects.

Estimate Marginal Effects Odds Ratio P-Value Advanced Planner*Long Stay -0.156 < -0.001 0.855 0.000***

Family*Advanced Planner 0.056 < 0.001 1.058 0.005*

Family*Long Stay -0.048 < -0.001 0.953 0.23674

Individual*Advanced Planner 0.005 < 0.001 1.005 0.830

Individual*Long Stay 0.127 0.003 1.135 0.003**

Short Stay*Spontaneous Planner -0.112 0.894 0.015*

Table 7: Par ameter Es timates of Cr os s -Over Compar is on Inter action Effects .

(35)

The interactions between Individual and Advanced Planner and Family and Long Stay are the only interactions which are insignificant. The greatest impact on booking probability has the combination of planning a long stay in advance, however, it has a negative relation and therefore decreases the probability. The second biggest influence on booking probability has an individual looking for a hotel room for a long stay, it increases the booking probability. The weakest impact and less significant than the other two interactions, has the combination of a family planning ahead, however, it has a positive influence.

Hypothesis 4 cannot be interpreted as individuals planning a long trip is insignificant. Similarly, hypothesis 5 cannot be interpreted as individuals planning in advance is insignificant. Hypothesis 6 has been proven right as the interaction of a long stay planned in advance has a stronger impact on booking probability than a spontaneous short trip, in the case of cross-over comparison.

Table 7

Model 7 – Parameter Estimates of No Cross-Over Comparison Main Effects.

Intercept -3.914 0.020 0.000*** Place -0.069 -0.002 0.934 0.000*** Price -0.288 -0.007 0.750 0.000*** Product Brand 0.123 0.003 1.131 0.000*** Product Star 0.210 0.005 1.234 0.000*** Promotion 0.371 0.010 1.448 0.000*** Family 0.168 0.004 1.183 0.000*** Individual 0.100 0.002 1.106 0.000*** Advanced Planner -0.375 -0.010 0.513 0.000*** Spontaneous Planner 0.126 1.13388795 0.200 Long Stay -0.667 -0.013 0.687 0.000*** Short Stay 0.395 1.484

Table 8: Par ameter Es timates of No Cr os s -Over Compar is on Main Effects .

∗ p <0.05; ∗∗ p <0.01; ∗∗∗ p <0.001

(36)

customer units families and individuals. The biggest impact has the customer who seeks rooms for long stays. It shows an odds ratio of 0.687, stating a negative relationship with the outcome. Moreover, the odds of booking are 0.687times less than the odds for customers looking for a short stay at a hotel. The second biggest impact on booking probability has the advanced planner with an odds ratio of 0.513, showing a negative influence on purchase probability. Moreover, the odds of booking are 0.513 times less than the odds for customers looking for a hotel room for a spontaneous trip. The third largest impact on booking probability has the predictor family, the odds of booking are 1.183 times as large as the odds for other types of customers than family, such as individuals or groups. An individual has the weakest impact on booking probability. The odds of booking are 1.106 times as large as the odds for other types of customers than individual, such as families or groups.

Interesting to point out is that promotion has a stronger impact on booking probability than price. This is not the case when people compare with other websites. However, the number of stars has always the third strongest impact on booking probability, whether people conduct cross-over comparison or not.

Table 8

Model 8 – Parameter Estimates of No Cross-Over Comparison Interaction Effects.

Advanced Planner*Long Stay -0.560 -0.011 0.571 0.000***

Spontaneous Planner*Short Stay -0.281 0.015*

Family*Advanced Planner -0.001 < -0.001 0.999 0.992

Family*Long Stay 0.002 < 0.001 1.002 0.968

Individual*Advanced Planner -0.104 -0.002 0.902 0.077

Individual*Long Stay 0.283 0.007 1.327 0.000***

Table 9: Par ameter Es timates of No Cr os s -Over Compar is on Inter action Effects .

∗ p <0.05; ∗∗ p <0.01; ∗∗∗ p <0.001

(37)

7. Discussion

7.1. Managerial Insights

In summary, the most important contribution of this thesis is that price has the strongest impact on purchase probability while users compare with other websites. However, it has been also proven that promotion has the greatest influence on booking probability if visitors did not cross-over shop. This can be a great opportunity to boost visually the price or reduce it when the websites tracks that the user compares with other websites. Alternatively, if the user is not visiting other pages at the same time, promotions about the number of stars the hotel obtains, or the attractiveness of the hotel location could be flagged, as these have a bigger impact on purchase probability. The four Ps could be adjusted towards the type of customer if they compare with other website or not.

Most of the hypotheses have been proven to be correct. This is a good indication to master the entire customer experience and focus on customer centric marketing as some of this information increases booking probability. A marketing option could be to offer specific package deals to each type of customer after noticing which user has entered the website. This would be already visible after the input of the search characteristics, such as number of visitors, length of stay and date of visit. This information could be used to morph the website towards the needs and desires of each type of customer (Hauser, 2009). Already through the information put in by the user, the website would be able to know if it is a family or a single individual looking for an accommodation. This also holds for the length of stay or the lead time. The more information a website has about the customer, the better it can tailor the query solution for him. This combined with the opportunity of digital platforms to constantly run different versions or tests inexpensively, opens opportunities for a large volume of experimentation for a website. These techniques should be used to learn the best way to satisfy the customer and reduce his search costs.

(38)

7.2. Limitations

Many variables, such as country_id, have numeric values although their actual value would be categorical. Another example would be destination_id, as mentioned above these variables have been anonymized, however, their actual values, country names, would make a very interesting analysis and would enrich information and predictions about booking behaviour even more. Unfortunately, this limited my research and analyses a lot as great knowledge could have been retrieved from this information. Most often or most seldom route patterns could have been analysed. More variables like this are, Site_id, visitor location country id, prop country id, prop id, srch_destination id.

Another limitation in the competition analysis was that the dataset had no indication if the customers booked the competitor or not, it only shows if the user participated in cross-over comparison. It only specifies if visitors booked Expedia or not.

Unfortunately, the observations are only provided at a search impression level without the possibility to link them to customers. This is a great limitation as more information could be retrieved if various search queries could be linked to each other. Sequential searches would be easier to analyse and user behaviour could be analysed on a more detailed level as well. The three variables describing the direct competition of Expedia does not include a lot of information, solely if users compared or not, if the hotel is availability on the other website and price difference between Expedia and the rival. However, the marketing mix information similar to the one available in this dataset would give a researcher more possibilities to analyse the reason for users to book Expedia or the competitor.

7.3. Recommendations for Future Research

(39)

8. Bibliography

Agarwal, R. and J. Prasad. “Are Individual Differences Germane to the Acceptance of New Information Technology?,” Decision Sciences, Vol.30, No.2:361-391, 1999.

Baron, R. M., & Kenny, D. A. (1986). The Moderator - Mediator Variable Distinction in Social Psychological Research: Conceptual, Strategic, and Statistical Considerations. Personality and Social Psychology, 51(6), 1173–1182.

Beldona, S., Morrison, A. M., & O’Leary, J. (2005). Online shopping motivations and pleasure travel products: A correspondence analysis. Tourism Management, 26(4), 561–570. Bettman, J. R., Luce, M. F., & Payne, J. W. (1998). Constructive Consumer Choice Processes.

Journal of Consumer Research, 25(3), 187–217.

Card, J. A., Chen, C. Y., & Cole, S. T. (2003). Online travel products shopping: Differences between shoppers and nonshoppers. Journal of Travel Research, 42(2), 133–139.

https://www.expedia.com/Hotel-Star-Rating-Information, retrieved on 28th May 2019

Hagiu, A., & Jullien, B. (n.d.). Why do intermediaries divert search? RAND Journal of Economics, 42(2), 337–362.

Hampson L., (2018) This is how much the average family holiday costs. EveningStandard. Retrieved from https://www.standard.co.uk/lifestyle/travel/average-cost-family-holiday-a3907191.html on 8th of May, 2019 at 00:33

Häubl, G. and V. Trifts. „Consumer Decision Making in Online Shopping Environments: The Effects of Interactive Decision Aids,” Marketing Science, Vol.19, No.1:4-21, 2000. Hauser, John R. et al. “Website Morphing.” Marketing Science, 28.2 (2009): 202-223.

Hayes, A. F. (2009). Beyond Baron and Kenny: Statistical Mediation Analysis in the New Millennium. Communication Monographs, 76(4), 408–420.

Imai, K., Keele, L., & Yamamoto, T. (2010). Identification, Inference and Sensitivity Analysis for Causal Mediation Effects. Statistical Science, 25(1), 51–71.

(40)

performance of hotels in Israel. International Journal of Hospitality Management, 21(4), 405–424.

Iyer, R. and J.K. Eastman. “The Elderly and Their Attitudes Toward the Internet: The Impact on Internet use, purchase and comparison shopping,” Journal of Marketing Theory & Practice, Vol. 14, No. 1: 57-67, 2006.

Jun, S. H., Vogt, C. A., & MacKay, K. J. (2010). Online Information Search Strategies: A Focus on Flights and Accommodations. Journal of Travel and Tourism Marketing, 27(6), 579– 595.

Kamis, A. “Search strategies in shopping engines: An experimental investigation”, International Journal of Electronic Commerce, Vol.11, No.1: 63-84, 2006

Klassen, M., P. Gupta, and M.P. Bunker. “Comparison shopping on the Internet”, International Journal of Business Information Systems, Vol. 4, No.5: 564-580, 2009

Kotler, P., & Keller, K. L. (2012). Marketing Management (14th Editi). Upper Saddle River, NJ: Prentice Hall.

Küpers, L. K., Xu, X., Jankipersadsing, S. A., Vaez, A., La Bastide-van Gemert, S., Scholtens, S., … Snieder, H. (2015). DNA methylation mediates the effect of maternal smoking during pregnancy on birthweight of the offspring. International Journal of Epidemiology, 44(4), 1224–1237.

Lackman, C., & Lanasa, J. M. (1993). Family decision‐making theory: An overview and assessment. Psychology & Marketing, 10(2), 81–93.

Langett, J. (2018). Always Be Converting: Moralizing a Postpurchase Funnel Media Environment. Journal of Media Ethics: Exploring Questions of Media Morality, 33(4), 156–169.

Langford, G., & Weissenberg, A. (2018). Deloitte: 2018 Travel and Hospitality Industry Outlook. Retrieved from www.deloitte.com/us/travel-hospitality-trends

Lecinski, J. (2011). ZMOT: Winning the Zero Moment Of Truth. In Google. Retrieved from https://www.thinkwithgoogle.com/marketing-resources/micro-moments/2011-winning-zmot-ebook/

(41)

Lee, Y., K.A. Kozar, and K.R.T. Larson. “The Technology Acceptance Model: Past, Present and Future,” Communications of the Association for Information Systems, Vol. 12: 752-780, 2003.

Lewis, R.C., Shoemaker, S., 1997. Price-sensitivity measurement: a tool for the hospitality industry. Cornell Hotel and Restaurant Administration Quarterly 38 (2), 44–54.

Lockyer, T. (2005). The perceived importance of price as one hotel selection dimension. Tourism Management, 26(4), 529–537.

Lockyer, T., & Roberts, L. (2009). Motel accommodation: Trigger points to guest accommodation selection. International Journal of Contemporary Hospitality Management, 21(1), 24–37.

Lukes S. M. Professor of Sociology, New York University. Author of Individualism and others at www.britannica.com (1998) https://www.britannica.com/topic/individualism

Lurie, N. H. (2004). Decision Making in Information-Rich Environments: The Role of Information Structure. Journal of Consumer Research, 30(4), 473–486.

Lynch, J. G., & Ariely, D. (2003). Wine Online: Search Costs Affect Competition on Price, Quality, and Distribution. Marketing Science, 19(1), 83–103.

Mackinnon, D. P., Krull, J. L., & Lockwood, C. M. (2000). Equivalence of the Mediation, Confounding and Suppression Effect. Prevention Science, 1(4).

Martínez-Salinas, E., & Polo-Redondo, Y. (1999). Determining factors in family purchasing behaviour: An empirical investigation. Journal of Consumer Marketing, 16(5), 461–481. Marmorstein, H., D.Grewal, and R.P.H. Fishe. “The Value of Time Spent in Price-Comparison Shopping: Survey and Experimental Evidence,” The Journal of Consumer Research, Vol.19, No.1:52-61, 1992.

McFadden, D., & Zarembka, P. (1973). Chapter Four: Conditional logit analysis of qualitative choice behavior. In Frontiers in Economics (pp. 105–142). Berkeley, California: New York: Academic Press.