Are satisfaction patterns better explained by using classic industry characteristics? : a different view on sentiment and topic analysis in hotel reviews

(1)

Are satisfaction patterns better explained by using

classic industry characteristics? A different view on

sentiment and topic analysis in hotel reviews

Author Berend Dumas

Student number 11422718 Date of submission 22.06.2017

Version Final

Track MSc. in Business Administration - Digital Business University University of Amsterdam - Amsterdam Business School Supervisor Konstantina Valogianni

(2)

Statement of originality

This document is written by Berend Dumas who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

Acknowledgments

I want to address some special words here to my supervisor Konstantina Valogianni, who supported me a lot and helped me to understand the data-driven world of text analytics better and who sharpened my mind on many aspects. Furthermore, I would also like to thank the strategy team of the NH Hotel Group Benelux for their valuable industry insights. Also, I want to thank my friends for the great meals we had together, and their input from time to time. Finally, I would like to thank my girlfriend and family, who always supported me unconditionally.

(4)

Abstract

Data of all sorts is expanding rapidly. Business intelligence departments are making more and more use of big data to make strategic business decisions. A specific form of this data is User Generated Content (UGC), such as consumer generated online product reviews. For the hotel industry specifically, consumer reviews contain a lot of useful information about

consumer preferences that could help to better predict demand and to allocate resources more efficiently. The hotel industry must deal with seasonality and fluctuating periods of demand. Consequently, over- or under-utilization of hotel resources can occur, which could impact guest satisfaction. Therefore, it is important to know which specific factors are most

important for consumers in different demand periods, and how this varies between different hotel sizes and hotel segments. This paper uses text analytics to analyse hotel reviews on a large scale, and takes the abovementioned industry variables in account. The results of the analysis show that indeed the sentiment in reviews varies in different demand periods, high and low season to be specific. Also, this effect is being influenced by hotel size and segment. Furthermore, this paper shows that the distribution of topics in reviews and its influence on the sentiment is different in high and low season and that this effect is influenced by hotel size and segment as well. Unlike previous research, this paper shows that valuable insights from consumer reviews, such as the sentiment and the distribution of topics, differs

(5)

Table of Contents

Statement of Originality 2

Acknowledgments 3

Abstract 4

Table of Contents 5

Overview of Figures and Tables 7

1. Introduction 9

2. Literature Review 13

2.1 Big Data and Business Intelligence 13

2.2 User Generated Content 14

2.3 Text Analytics 16

2.4 Seasonality in the Hotel Industry 18

2.5 Research Question 19

2.6 Hypotheses Development 21

3. Data Collection and Method 27

3.1 Data Collection 27

3.2 Data Cleaning 28

3.3 Corpus and DTM 29

3.4 Sentiment Analysis 31

3.5 Topic Modelling - Latent Dirichlet Allocation (LDA) 31

3.6 Framing seasonality 33

3.7 Framing hotel size 34

3.7 Segmentation of hotels 35

4. Analyses and Results 36

4.1 Descriptive statistics 36

4.2 Hypothesis testing 37

4.2.1 Hypothesis 1 - Difference in sentiment score between high and low season 37 4.2.2 Hypothesis 2 - Moderating effect of hotel size 39 4.2.3 Hypothesis 3 - Effect of topic distribution on sentiment score 43 4.2.4 Hypothesis 4 - Moderating effect of season 46 4.2.5 Hypothesis 5 - Moderating effect of hotel segment 49

4.3 Limitations 56 5. Discussion 58 6. Implications 62 6.1 Theoretical implications 62 6.2 Practical implications 63

(6)

8. References 67

9. Appendices 73

9.1 Appendix A - Data mining code 73

9.2 Appendix B - Cleaning and preparation code 74

9.3 Appendix C - Sentiment analysis code 76

9.4 Appendix D - LDA code 77

9.5 Appendix E - Residual plots 79

9.6 Appendix F - Complete hotel list 82

9.7 Appendix G - Transcript expert interview - NH Hotel Group 85

9.8 Appendix H - Figures 87

(7)

Overview of Figures and Tables

Figures

Figure 1: Conceptual model visualising H1:H2. Figure 2: Conceptual model visualising H3:H5.

Figure 3: Part of the DocumentTermMatrix (DTM) of the dataset for this research. Figure 4: Top 25 occurring words in hotel reviews in High Season.

Figure 5: Output after running Latent Dirichlet Allocation (LDA) model.

Figure 6: Graph showing the three clusters of number of hotel rooms. The two dotted blue lines represent the cut off points, 250 and 500 rooms respectively.

Figure 7: Boxplots showing difference in sentiment score between High and Low Season. Figure 8: Boxplots showing sentiment score per hotel size.

Figure 9: Interaction plot showing the interaction effect of (season * size) on sentiment score. Figure 10: Multiple regressions showing correlations between each topic and sentiment score in reviews.

Figure 11: Graph showing the difference of the effect of the topic Facilities on sentiment score between High and Low Season.

Figure 12: Graph showing the difference of the effect of the topic Location on sentiment score between High and Low Season.

Figure 13: Boxplot showing the difference in sentiment score for different hotel segments. Figure 14: Graph showing the difference of the effect of the topic Service on sentiment score between segments.

Figure 15: Graph showing the difference of the effect of the topic Facilities on sentiment score between segments.

(8)

Figure 16: Graph showing the difference of the effect of the topic Comfort on sentiment score between segments.

Figure 17: Graph showing the difference of the effect of the topic Location on sentiment score between segments.

Figure 18: Graph showing the difference of the effect of the topic Experience on sentiment score between segments.

Tables

Table 1: Summary of variables.

Table 2: Descriptive statistics one way ANOVA. Table 3: Summary statistics of factorial ANOVA. Table 4: Results of multiple linear regression analysis. Table 5: Descriptive statistics one way ANOVA. Table 6: Hypotheses results overview.

(9)

1. Introduction

The need for data-driven decisions in businesses is increasing. All sorts of data are generated at a thrilling speed in almost all industries. This data is highly accessible for every company (McAfee, Brynjolfsson, Davenport, Patil, & Barton, 2012). Translating all this data into useful business information is the next step. For many companies this is still an extremely difficult task. This is where the area of big data analytics and business intelligence comes into play. This area is a dynamic one, rapidly growing, and evolves into an important area of study for both managers in the field as academic researchers. As the Chief Economist at Google, Hal Varian (2008) states:

So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyse data: databases, machine learning, econometrics, statistics, visualisation, and so on.

Although we are currently already at the verge of Web 4.0, it is Web 2.0 that grew

significantly in the past couple of years, and which led to a huge increase of a specific stream of data: user-generated content (UGC) (Aghaei, Nematbakhsh, & Farsani, 2012). As a

consequence of Web 2.0, users can now post, share, and access content on the web much easier than ever before. When consumers shop online they rely heavily on UGC, with online consumer reviews in particular, prior to making a purchase (Ghose, Ipeirotis, & Li, 2012). Some studies state that UGC may be seen as a new form of word-of-mouth (WOM). The importance of WOM on business has been largely discussed and researched (Mauri & Minazzi, 2013; Ye, Law, Gu, & Chen, 2011).

(10)

It is important for managers and product vendors to know more about the impact online consumer reviews can have, as prior research showed that online consumer reviews have a big influence on sales and can even predict sales (Chevalier & Mayzlin, 2006; Dellarocas, 2003; Liu, 2006; Zhu & Zhang, 2010). Furthermore, tracking online UGC is important for managers with regards to brand building, quality assurance and product development (Dellarocas, 2003).

Most review systems, however, are based on a single scalar rating for a product, largely ignoring multidimensional product preferences of consumers (Ghose et al., 2012; Rosen, 1974). For product vendors, it is important to know which specific product features are most important to consumers and focusing on a single review dimension (such as the rating) may thus not convey a lot of information on specific consumer preferences (Ghose, Ipeirotis, & Li, 2009).

Hence, analysing reviewers’ content is of major importance for managers. To extract useful information from a large, consumer-generated “hurricane of noise”, text-mining and content analysis techniques are required. Sentiment analysis, which classifies reviews into positive or negative by analysing the sentiment consumers put in their writing, is a well-known technique. This technique does not, however, reveals much information about specific product features that matter to consumers and thus also matter to managers selling the

products (Archak, Ghose, & Ipeirotis, 2011). This problem can be solved by decomposing text reviews into several segments with each segment describing different product features by topics and analysing the sentiment consumers put on each feature. This reveals the relative preferences of consumers for different product characteristics (Archak et al., 2011).

One of the industries in which online consumer reviews play a major role is the travel industry. Hundreds of millions of travellers consult online reviews when making travel decisions. More specifically, the percentage of consumers that rely on UGC prior to making a

(11)

purchase is highest with hotels, compared to all other product categories (Lipsman 2007, as cited in Ghose et al., 2012). Considering the high influence of online reviews in the hotel industry, this industry will be the main focus of this paper.

Prior research examined online hotel reviews and specific product features described in these reviews and their influence on online hotel sales (Ghose et al., 2009; Ye, Law, & Gu, 2009). However, prior research has not yet thoroughly examined how seasonality in the hotel industry, high and low season to be more specific, influences the sentiment and topics in reviews (Ghose et al., 2012). It is important to study this aspect as well because hotels have perishable products: the ability to sell hotel room nights for a specific night expires when that night has passed. Next to this, demand fluctuates significantly between high and low season. For these reasons, forecasting this demand accurately and then selling products (hotel room nights) at the right price and the right time is necessary to survive in the hotel industry (Chiang, Chen, & Xu, 2006). This practice may thus be enhanced by improved knowledge about the content and sentiment in reviews in different demand periods.

Therefore, in order to tackle this challenge, this paper will examine how the sentiment in consumer generated hotel reviews differs between high and low season. This may create useful insights for the hotel industry: in high season managers may make different strategic decisions and allocate resources differently compared to low season when it comes to important hotel characteristics, such as size and segment, identified through this paper. Furthermore, this paper also assesses how these characteristics influence the relationship between seasonality and the sentiment score in reviews.

This thesis paper starts with a description of general developments and the importance of big data (analytics) and business intelligence. The second part contains a more in-depth discussion about UGC and the impact this has on business. Next is a reflection on text mining techniques and UGC analytics. Having presented the relevant theoretical background, the

(12)

research question and hypotheses is presented, after which the data collection and (cleaning method) is explained. Then, the results of hypothesis testing are presented after which the paper concludes with a discussion of the results, implications and a conclusion.

(13)

2. Literature Review

2.1 Big Data and Business Intelligence

A survey, conducted by Bloomberg BusinessWeek (2011), showed that 97 percent of companies questioned, with revenues over 100 million dollar, use some form of business intelligence and analytics. Another study, conducted by the McKinsey Global Institute (Manyika et al., as cited in H. Chen, Chiang, & Storey, 2012) predicts a huge shortage, by 2018, of analytically skilled employers and managers that can handle big data and can effectively leverage this data to make effective business decisions. The increased interest, opportunities and challenges associated with big data and business intelligence, in all sorts of industries and markets, makes an interesting case for studying this subject.

The main goal of big data analytics is to generate new insights by making use of large sets of data, complementing traditional statistics and static data sources. Big data differs from traditional (smaller amounts of) data in such a way that big data consists out of much larger datasets, with a lot of unstructured data, that often need real-time analysis and cannot be managed by traditional IT tools (M. Chen, Mao, & Liu, 2014). The area of business

intelligence exploits big data in order to understand more thoroughly, and gain new insights in, products, customers, markets, competitors and (other) strategic stakeholders and to solve business problems or discover new ones (Xiang, Schwartz, Gerdes, & Uysal, 2015).

Recently, much attention is devoted to mining social media and consumer-generated content, and applying so-called content analytics, as these data sources can uncover previously

unknown patterns and information. This kind of data is growing at almost twice the rate as structured, conventional databases (George, Haas, & Pentland, 2014). For example, research showed that analysing textual data from financial news articles and messages posted on Internet stock message boards can help to predict stock markets and stock market volatility

(14)

(Antweiler & Frank, 2004; Schumaker & Chen, 2009). Moving away from the financial sector, mining and analysing social media content in the automotive industry has proven to successfully identify the existence of motor vehicle defects and critics, helping automotive professionals to enhance vehicle quality management (Abrahams, Jiao, Wang, & Fan, 2012).

2.2 User Generated Content

User-generated content (UGC), sometimes called consumer-generated content, is a specific source of data boosted by the growth of web 2.0. Web 2.0 facilitates Internet users and allows for two-way information communications. UGC and Web 2.0 both have a huge impact on Internet-user behaviour, decision-making and organizations. Web 2.0 and UGC bring both organizational challenges as well as opportunities for the E-commerce of businesses (Sigala, 2010). This phenomenon is even more relevant now because the digital world is gradually moving to a Web 4.0 environment, with new networks and advances in technology changing the modern social and business environment (Ribeiro Soriano, Garrigos-Simon, Lapiedra Alcamí, & Barberá Ribera, 2012).

The Internet enables individuals to express their thoughts and opinions, and to make them widely accessible for other Internet users. As a consequence, UGC may be seen as some new form of word-of-mouth (WOM) for products and services (Ye et al., 2011). The

importance of WOM on business since the worldwide adoption of the Internet has been researched and discussed extensively. For instance, Anderson (1998) showed that the WOM activity of customers is related to customer satisfaction, and that this activity grows when either satisfaction or dissatisfaction increases. Prior research also showed that WOM is especially important for driving demand for experience goods. For example, Zhu and Zhang

(15)

(2006) prove the importance of (online) WOM by showing that video game sales are significantly influenced by online reviews.

A specific industry in which both UGC and experience goods are important is the travel industry. Huge amounts of UGC in the form of online reviews on hotels and travel destinations have been, and increasingly are being, produced by consumers. Research showed that one of the most popular online activities is searching for travel-related information, and that consumers are highly being influenced by hotel reviews in making travel decisions (Archak et al., 2011; Ye et al., 2011). Tripadvisor.com (2006, as cited in Ye et al., 2011) reported that online reviews are being consulted by hundreds of millions of travellers each year. Furthermore, Travelindustrywire.com (2007, as cited in Ye et al., 2011) showed that 84% of these travellers were influenced by consumer reviews when making their travel and hotel reservations. More specifically, with regards to consumer purchase decisions in the hotel industry, more than 87% of consumers rely on UGC when selecting hotels. This percentage is higher than in any other product category (Lipsman, 2007, as cited in Ghose et al., 2012). To express this number in economical value: over $10 billion in travel revenue is influenced by online reviews (Vermeulen & Seegers, 2009).

Chen and Xie (2008) elaborate on why consumers are so sensitive about other consumers’ reviews. They state that, in contrast with company descriptions that focus on product attributes and performance, consumer reviews are more user-oriented and highlight different perspectives of user experience. More specifically, Chen and Xie argue that consumer reviews may be of particular importance for unsophisticated consumers who may doubt to buy certain products when only seller-created content is presented. Credibility is also an important aspect in consumer reviews. Consumer reviews may be found as more trustworthy and credible than company information because the authors of these reviews are

(16)

fellow consumers and perceived to have no specific reasons to manipulate the reader or to promote the product (Bickart & Schindler, 2001).

Taken together, UGC and online hotel reviews for the hotel industry specifically, are important for hotel businesses and can have far-reaching effects on online sales and traveller decisions. Therefore, it is important to analyse and have a clear understanding of patterns in online reviews. Gaining a deeper understanding in this subject might also have important implications for both the research community and managers in the hotel industry.

2.3 Text analytics

Reviews consist of written text and, in most of the occasions, a numerical rating (e.g. 3 out of 5 stars). Previous research examined the effect of numerical ratings and the volume (or simply the amount) of reviews, and found support that these aspects of reviews affect sales. For example, Chevalier and Mayzlin (2006) show causality between the volume and ranking of online reviews with regards to a certain book, and sales of that book at a certain website. They show that when a particular book has more reviews and a higher ranking at a certain website, this book also sells better at that website, compared to other websites where the book receives a lower volume of reviews and a lower ranking. In the movie industry, an increase in volume of WOM appears to have a significant influence on sales (increase) after a movie is released (Liu, 2006). According to Dellarocas, Zhang, and Awad (2007), monitoring WOM in the movie industry can even improve revenue-forecasting accuracy for movie sales.

However, previous research did not account for the written information in these reviews. According to Archak et al. (2011) it is important to analyse the content of this written information because product quality cannot be assessed with only a single number (rating) or the volume of reviews. More specifically, bimodal problems exist with written online reviews. This means that most of the reviews posted in online markets either receive

(17)

an extremely high, or an extremely low rating. When this happens, the average ratings for these products may not reveal much information to prospective buyers, or to the vendor selling the product trying to understand which product characteristics are of highest importance (Ghose & Ipeirotis, 2011). This paper tries to fill this knowledge gap, for the hotel industry specifically, by examining several topics that are most important for consumers, and how they each influence the sentiment in reviews.

It is important to be aware of the managerial problem that ratings and volume of reviews do not clearly show which specific product characteristics are of particular

importance for consumers. This is the case because, according to classical economic theory, different aspects and characteristics of a product can have different meanings and different levels of importance for consumers (Rosen, 1974). A single, average rating number, or the volume metric, might thus not contain all the information a consumer is searching for when reading reviews and might not hold all the information managers would like to know about the products they sell. Therefore, it is important to take the written context of reviews into account to properly examine online reviews and to discover patterns in the context that may reveal important information for managers or product sellers (Archak et al., 2011).

Discovering patterns and extracting useful information from various written resources can be done with text mining. More specifically, the field of natural language processing (NLP) is making a lot of progress in extracting useful information from unstructured, textual resources using computational techniques. With NLP techniques many pieces of information can be extracted from text. One of these is inferring the level of emotion, or sentiment, someone is using in his or her text (Hearst, 2003).

Sentiment analysis, a specific form of text-analytics, is considered as a good approach to extract emotion from unstructured documents authored by humans, such as online hotel reviews (Xiang et al., 2015). Sentiment analysis classifies reviews into positive or negative

(18)

based on the amount of specific sentiment phrases (Das & Chen 2007, Hu & Liu 2004, as cited in Archak et al., 2011). The problem with this approach, however, is that it does not fully capture the meaning consumers put in reviews. Consumers can criticize certain product features while positively evaluating other product features in the same review (Archak et al., 2011). Meaningful conclusions about the sentiment in text cannot be drawn without context, and it is thus of high importance to quantify evaluations of several product features or to discover certain topics. This can then further reveal patterns and highlight important information about products.

2.4 Seasonality in the hotel industry

Prior research mostly focused on analysing the content and sentiment in hotel reviews, but never distinguished between several seasons, such as high and low demand periods. This is an important parameter to include in the current stream of content analyses on hotel reviews, as seasonality is one of the most important characteristics of the hotel industry that shapes the business for a large part. For example, research showed that it is not unlikely that some hotels receive almost two thirds of their yearly revenues in only a couple of high demand months (Jang, 2004).

Seasonality refers to fluctuations in demand of tourism destinations and hotels. These fluctuations in demand come at the price of ineffective resource utilization and low revenue streams in low demand periods. In general, professionals in the industry distinguish between high and low season. In high season, a large number of tourists are expected and high hotel occupancy percentages are often realized. In contrast, low tourism traffic, underutilization of resources and smaller revenue streams characterize low season (Higham & Hinch, 2002).

The phenomenon of seasonality can be attributed to both natural and institutional factors. The natural impact on seasonality is a result of cyclical variations such as

(19)

temperature and rain that impacts the attractiveness of certain destinations. The institutional impact on seasonality refers mostly to practices in society, where demand is influenced by cultural, economical and religious considerations (Higham & Hinch, 2002).

Clearly, seasonality has an impact on the hotel business as a whole, but this impact is not the same for all hotels in general. Hotel size is an important factor that, for a large part, determines how well a hotel can cope with fluctuations in demand. Smaller hotels have less rooms, facilities, and resources than their larger competitors. In high season, these small hotels reach their full capacity at an earlier stage than larger hotels. Reaching full capacity may cause overutilization of personnel and resources, which can cause a reduction in the experienced service level of hotel guests (Briggs, Sutherland, & Drummond, 2007). Hotel guests value their experience in a hotel based on a number of areas, which, amongst others, include cleanliness of the room, and quality of staff. Several studies showed that an increase in hotel occupation percentage also increases the level of stress on personnel (Zeithalm & Bitner, 2000, as cited in Mattila & O'Neill, 2003). Increasing stress levels may, as a consequence, negatively impact the key area’s on which guests base their satisfaction. In general, smaller hotels have to deal with the effects of seasonality at an earlier stage than larger hotels, as they reach their occupational ceiling quicker.

Clearly, there is a general known distinction between high and low season. To verify this, and to add more credibility to this statement, this paper defines high and low season more specifically with help from industry experts from the upper level management of the NH Hotel Group.

2.5 Research question

Prior research showed the importance of analysing textual content of UGC in the hotel (travel) industry to truly find the aspects that consumers think are important. However, prior

(20)

research has paid little attention to classical hotel industry phenomena such as high and low season, in which hotel room availability and demand may vary significantly. Ghose et al. (2012) state that incorporating hotel room availability and hotel size in studying the textual content and sentiment in reviews is a promising aspect for future research.

Hotel room availability is affected by seasonality, with different levels of demand as a result. Seasonality has a large impact on the hotel business in such a way that entire so called ‘revenue management’ teams are devoted to maximize revenue by selling the right product for different market segments, while making use of flexible pricing, different distribution channels, inventory management and purchase restrictions (Chiang et al., 2006). It is important to address the abovementioned research gap, as high and low season, and hotel size, may significantly influence the content and sentiment in reviews. Extending knowledge about this aspect is valuable for the hotel industry and may enhance the effectiveness of revenue management teams in making pricing and segmentation decisions. Furthermore, this could also be very valuable for hotel managers of both small and large hotels, in different segments, as they then know more precise which hotel attributes their guests find important. With this knowledge, hotel managers can then make more successful strategic decisions and allocate resources more efficiently were needed.

Hence, this paper takes differences in hotel room availability in account, in a general sense by splitting two demand periods: high and low season, and investigates whether or not differences exist between these seasons with regards to the textual content of reviews and sentiment in these reviews. In short, this leads to the following research question:

Do topics and sentiment in online consumer generated hotel reviews differ significantly between high and low season1? And how is this influenced by hotel size and segment?

(21)

Addressing this research question contributes to the literature in several ways. First, it enriches the existing body of research on sentiment and content analysis of consumer

generated hotel reviews by incorporating industry characteristics such as seasonality, hotel size and hotel segment. Second, this research also links several topics to the sentiment in reviews. By doing so, a deeper understanding is acquired of which factors help explain the sentiment consumers put in their reviews.

Answering this research question will also contribute, as stated before, to the professional field. The results can possibly indicate that certain aspects are of more importance to some consumers in different demand periods. This is useful information for managers in the hotel industry, as they can then allocate resources more effectively to certain hotel features in periods where these features are of higher importance to consumers.

Furthermore, enhanced knowledge about the overall sentiment and most important topics discussed in reviews can help hotel professionals to set up more effective marketing and revenue management strategies in order to increase tourism flows during low demand season (Spencer & Holecek, 2007).

2.6 Hypotheses development

As stated earlier, the main focus of this paper is to show whether or not high and low seasons influence the average sentiment score in consumer reviews, and to examine the relation between certain topics and the sentiment in consumer reviews. Ghose et al. (2012) note that classical revenue management phenomena are important to address when examining

sentiment in reviews. One of these classical revenue management phenomena is that of fluctuating demand periods, or in other words: seasonality.

Seasonality is one of the most important characteristics that make the hospitality industry a unique industry. Next to its uniqueness, seasonality is also a worrisome facet for

(22)

the hospitality industry, as it causes an imbalance in demand (number of tourists). This imbalance can cause fluctuating revenues, and may lead to over- or under-utilization of personnel, resources, buildings, and facilities. Hence, as demand varies over seasons, so may resources and personnel. Next to this, certain facilities in hotels may become somewhat overcrowded during high season. All this may, consequently, lead to a reduction in service quality and satisfied guests (Jang, 2004). This leads to the plausible argument that guests express different feelings and address different topics in their reviews when differentiating between high and low season.

Seasonality clearly has a large impact on the hospitality industry and the way business is conducted. It is thus quite surprising that no previous study differentiated between different demand periods when analysing content and sentiment in consumer reviews. Taking

seasonality in account, and building upon previous research, I hypothesise the following:

Hypothesis H1. The average sentiment score in reviews is different in high season compared to low season.

Fluctuating demand may cause, as noted earlier in this paper, over- or under-utilization of personnel, resources, buildings and facilities. It could be argued that smaller hotels

experience this problem at an earlier stage when demand increases, as they often have fewer resources, smaller rooms and less facilities than their larger competitors (Briggs et al., 2007).

More specifically, the level of capacity utilization may have a negative impact on the perceived quality of services and facilities when hotel occupation reaches nearly 100%. In general, smaller hotels will reach this full occupation at an earlier stage than large hotels, and will experience the negative effects of full utilization of capacity on guest satisfaction quicker when demand increases (Mattila & O'Neill, 2003). Following from this line of reasoning, it

(23)

could be argued that this effect spills over in hotel reviews. More specifically, it may be possible that the effect of seasonality on the sentiment score in reviews is stronger for smaller hotels. Therefore, I state the following:

Hypothesis H2. The effect of season on the sentiment in reviews is moderated by hotel size.

The first two hypotheses are visualised in the following conceptual model:

Figure 1: conceptual model visualising H1:H2.

In general, travellers posting online reviews are either very satisfied or particularly

dissatisfied (Anderson, 1998). People that had a positive experience for a particular product or service tend to refer in their reviews to these favourable experiences as a recommendation of the product or service to other customers. In contrast, in negative reviews, the emphasis is often put on negative experiences meant to discourage other prospective consumers to buy a certain product or make use of a certain service (Mauri & Minazzi, 2013).

Consumers, when selecting hotels to stay, have certain expectations of hotel performance. If these expectations are not being met at the time of stay at the hotel, then a negative experience occurs. Managing these expectations and meeting them is crucial for hotel providers. This leads to an increase in guest loyalty, repeated purchases and positive

(24)

WOM. Certain key attributes have been identified as being most important for developing expectations and obtaining guest satisfaction (Choi & Chu, 2001).

Guest satisfaction is a complex human process in which the guest evaluates whether or not the needs, wants, and expectations have been met throughout the service experience. Satisfaction can be separated into different components: a core and a secondary component. The core component deals about just “what” product consumers get from purchase (i.e. the accommodation, cleanliness, and bed quality in a hotel), whereas the secondary component deals with “how” consumers receive this product (i.e. atmosphere and interaction with service provider). Cadotte and Turgeon (1988, as cited in Mattila & O'Neill, 2003) suggested that guests are less tolerant with regards to the core attributes. In other words, guests are more easily dissatisfied with core attributes rather than with the secondary attributes. This finding is confirmed by the study of Dube, Enz, Renaghan, and Siguaw (1999, as cited in Mattila & O'Neill, 2003) who found that guest satisfaction is highly correlated with the hotel physical property and guest room design.

Taking into account that customers posting reviews are either very satisfied or very dissatisfied, and that guests are in general more demanding on the core attributes (e.g. the physical property of the hotel), it is then very likely that the distribution of certain topics in reviews (discussing different attributes of the hotel) is different when comparing positive versus negative reviews. From this reasoning, the following can be hypothesised:

Hypothesis H3. A change in topic distribution in reviews has an effect on the sentiment in reviews.2

2_{To be very clear, the direction of the effect of certain topics on the sentiment scores}

(negative/positive) in reviews is not the main focus here. The main interest is how a difference in topic distribution in reviews influences the sentiment in reviews.

(25)

According to research carried out by Calantone and Johar (1984), different groups of tourists seek specific benefits across different seasons. Therefore, it is important for hotel

professionals to understand these specific needs in order to fully satisfy specific groups of tourists across different seasons. As stated before, consumers share their experiences through online WOM, or hotel reviews. When considering the different utilization of resources, personnel and facilities, as well as different wants and needs of tourists, across different seasons, the following hypothesis can be established:

Hypothesis H4. The effect of topic distribution on sentiment in reviews is moderated by season.

As seasonality may influence the topic distribution in reviews and consequently the sentiment, so may different hotel segments. The hotel industry is divided into several

segments, mostly based on the Mobil Travel Guide star rating system (Mobil hotel star rating criteria, 2014). Each segment performs different on different hotel attributes (e.g. hotel facilities and staff service) (Baloglu et al., 2010). The performance of hotels on these attributes is a large motivator for consumers to post online reviews. Yen and Tang (2010) showed that consumers put different weights on these different hotel attributes while

evaluating their stay, and that expectations of the performance of these attributes also varies between hotel segments. According to Oliver’s (1981) expectancy disconfirmation theory, “customer satisfaction is posited to derive from a positive confirmation of company performance against expectations held prior to the purchase.”

Thus, as customers place different weights on different hotel attributes (such as cleanliness and service), and while the expectations of these attributes vary across different hotel segments, it could be argued that different topics are discussed more or less frequently

(26)

in different segments. Following from this, it could also be argued that the influence of these topics on the sentiment score in reviews also differs between hotel segments. Hence, the fifth hypothesis is proposed to be as follows:

Hypothesis H5. The effect of topic distribution on sentiment in reviews is moderated by hotel segment.

Hypotheses 3 to 5 are visualised in the conceptual model displayed below:

(27)

3. Data Collection and Method

The overall design of this research is a quantitative study, making use of big data consisting of unstructured UGC. This is a longitudinal, archival research study containing text mining and analysing techniques, showing:

• a relation between season and the context and sentiment in reviews,

• whether a moderating effect of different hotel sizes exists on the sentiment in reviews in different seasons, and

• whether a moderating effect exists of hotel segment on the relation between topics and sentiment.

3.1 Data collection

The dataset for this study, 121.393 user generated hotel reviews, consists of observations from 117 top reviewed hotels in three major tourist cities in North Western Europe: Brussels, London, and Amsterdam (see appendix F). This dataset can be considered big data as it certainly matches some of the characteristics of big data, namely: volume (in this case a very large volume of unstructured text), and variety (the data was generated by consumers without a pre-set format and varying content). Hotels were selected in different segments, both from large hotel chains as well as individual hotels. Selecting different kind of hotels (varying in size, location, and segment) is necessary in order to arrive at meaningful and generalizable conclusions for all hotels. I accessed this content via an application specifically written for this thesis in the statistical programming language R (see appendix A). With this application, I mined naturally occurring social media conversations about hotels in the form of reviews at Booking.com. Booking.com is a metasearch engine for travel, also facilitating and verifying user generated reviews. More specifically, I manually collected a list of 1703 URLs, each containing around 75 reviews for a specific hotel. My application loops through all given

(28)

URLs, extracting the review content, hotel name, date posted, and numerical rating. This led me to 48.712 reviews for 49 hotels in Amsterdam, 18.906 reviews for 38 hotels in Brussels, and 53.775 reviews for 30 hotels in London. The application can be configured with a duration parameter. For this thesis this duration is set to 24 months to enable an analysis of two full years.

Next to the textual content of the reviews, I also mined the numerical rating (on a scale from 1 to 10) that was given by the reviewer. Furthermore, I also extracted the date the review was posted so a thorough analysis in differences in content and sentiment between different demand periods is possible. For all hotels used in this study I identified the

following characteristics as well: amount of rooms, amount of stars, and location (zip code). The written consumer reviews are the main subject of analysis, and not the hotels for which the reviews were written. Thus, it is important to note that the conceptual models do not account for the dimension of time. Hence, every review in the dataset consists of a written text and is assigned to one season (high or low), one hotel size and one hotel segment.

A specific strength of this research is that the data collection is completely unbiased as hotel consumers generate content on which I have absolutely no influence. Furthermore, the scale of this research (three cities and 117 hotels) and the large amount of reviews (121.393) absolutely helps in generalizing the results, and in drawing precise conclusions about my findings.

3.2 Data cleaning and preparation

Reviews consist of written text with a lot of noise and words that are not particularly helpful when analysing these reviews at a large scale. Hence, before analysing the reviews, first some data cleaning had to be done. After scraping reviews from the Internet, all 121.393 reviews and other information was in simple CSV format. In order to do some proper data cleaning, I

(29)

first created a so-called “Corpus” of all reviews (see appendix B). A corpus is a collection of a large, structured set of machine-readable texts on which statistical analysis can be carried out. A corpus is thus the basis for most natural language processing systems. One can easily perform mutations on text in a corpus (Indurkhya & Damerau, 2011).

After creating the corpus, a cleaning function on the corpus removed all punctuation in all reviews, stripped white spaces, and lowered all capital letters. Furthermore, the most common English stop words were removed. The stop words that were removed were retrieved from the SMART information retrieval system (obtained from

http://jmlr.csail.mit.edu/papers/volume5/lewis04a/a11-smart-stop-list/english.stop). Cleaning and polishing the reviews in this way is necessary to remove noise and unnecessary

information and to make the set of reviews ready for analysis.

3.3 Corpus and DocumentTermMatrix

After cleaning all reviews, the data was ready to be analysed. However, before sentiment analysis and topic modelling, another transformation was necessary. In order to apply functions such as clustering, classifying and all other sorts of operations, all text has to be transformed in some kind of matrix format. In natural language processing such a matrix is called a ‘Document-Term Matrix’ (DTM), which can be created out of a corpus. Within a DTM, each row represents a document number (review) and each column is a unique term and denotes the count of that word in a particular review (row) (Feinerer, 2008). Below an example of a part of the DTM, showing the structure and the quantitative format each review was presented in after creation of the DTM:

(30)

Figure 3: Part of the DocumentTermMatrix (DTM) of the dataset for this research.

After this transformation, some simple explanatory analysis could be done. For example, an examination of the most frequent words used in high season:

(31)

However, these descriptive statistics do not explain much. Some further transformations and calculations are needed in order to derive at meaningful observations.

3.4 Sentiment analysis

After cleaning and preparing the textual data in a workable format, sentiment analysis could be performed (see appendix C). For this analysis, I made use of the AFINN sentiment lexicon. The AFINN sentiment lexicon is an English words list rated with a valence score ranging between minus five (negative) and plus five (positive), specially designed for micro blogs or short pieces of text such as consumer reviews (Hansen, Arvidsson, Nielsen,

Colleoni, & Etter, 2011; Nielsen, 2011).

The sentiment score for each review n, sn is calculated as the total of sentiment scores for each individual word in the review sn,i,

𝑠! = !𝑠!,! (1)

3.5 Topic modelling - Latent Dirichlet Allocation (LDA)

Each review now was assigned a sentiment score. Still, it has to be examined which topics significantly influence these sentiment scores. So, for each review a topic frequency or probability score has to be assigned (see appendix D). In natural language processing, this can be done with a process called topic modelling, which makes use of ‘Latent Dirichlet Allocation’ (LDA), a Bayesian mixture model with discrete data as input (Hornik & Grün, 2011).

Topic models are models that provide a framework with probabilities of the frequency of specific terms in a corpus. Topic models are mixed-membership models, meaning that in

(32)

these models documents (reviews in this specific case) are assumed to consist of several topics simultaneously, and that the topic distribution varies over the documents in the corpus (Hornik & Grün, 2011).

With LDA, each document in the DTM is treated as a mixture of several topics. For example, in a three topic-model one could note: “Document A is 45% topic 1, 5% topic 2, and 50% topic 3”. Furthermore, LDA treats each topic as a mixture of words. Another important thing to note here is that topics may share words. For example, the word “room”, may be shared by the topics “hotel” and “house” (Hornik & Grün, 2011).

It is important to note that in this specific model, the number of topics, k, to be discovered has to be estimated a priori. After an interview with some experts from the hotel industry, I decided to let the model determine five topics in the DTM. After running the topic model function, with the LDA model, the following five topics with word clusters were found:

(33)

After consulting with experts from the industry, I renamed the topics as follows: Topic 1: Service: this topic refers to the level of service of the hotel and staff.

Topic 2: Comfort: this topic refers to the level of comfort of the hotel rooms and the hotel in general.

Topic 3: Experience: this topic refers to consumers writing about their total experience of their stay.

Topic 4: Location: this topic refers to the location and surroundings of the hotel. Topic 5: Facilities: this topic refers to the description of facilities in the hotel.

After discovering and naming these five topics I recalled, from the topic model, the probability of each topic occurring in each specific review, and added this to the existing dataset. Thus, next to the sentiment score, each review now also had a probabilistic frequency distribution score for each of the five discovered topics.

3.6 Framing seasonality

In an in-depth interview with the strategic analyst from the NH Hotel Group, region Benelux, France, U.K., South Africa and New York (see appendix G), the phenomenon of seasonality, and what it means in practice, was explained to me very clearly. The strategic analyst stated that for revenue management practices, seasonality is divided in high and low season based on historical data (i.e. demand numbers and occupation levels) future events (e.g. large business conferences, concerts or large sport events) and forecasting software. Using all this, the strategic analyst and her team use the following practical definition for high and low season in the following cities:

(34)

London - HIGH season: April to November, LOW season: December to March

Brussels - HIGH season: April to June & September to November, LOW season: December to March & July to August3.

3.7 Framing hotel size

Most classifications of hotel sizes are not very uniform and constant. In order to make well-defined assumptions about hotel sizes, I decided to use a graphical approach to determine the exact cut off points for the hotel sizes. After examining the North Western Europe hotel market, and after some discussion with industry experts, I decided to classify hotels as either being Small (S), Medium (M), or Large (L). However, the exact number of rooms denoting the size of a hotel remained unclear. Therefore I plotted all hotels with their number of hotel rooms against the index, and examined the graph. After visualising the distribution of hotel sizes of the sample for this paper, some groups could be defined. The two dotted lines below show the cut off points for each group, 250 and 500 rooms respectively, and clearly three different groups of hotels can be distinguished. With these known cut off points, each hotel could now be assigned a size of either being small (≤ 250), medium (> 250 and ≤ 500), or large (> 500).

(35)

Figure 6: graph showing the three clusters of number of hotel rooms. The two dotted blue lines represent the cut of points, 250 and 500 rooms respectively.

3.8 Segmentation of hotels

Booking.com uses a classical hotel industry segmentation system based on the Mobil Travel Guide star rating system to rank hotels on their website. This star rating system segments hotels by assigning them a number of stars, ranging between one and five stars (Mobil hotel star rating criteria, 2014). A one-star hotel is considered as a limited service hotel. Two-star hotels can be considered as basic properties with basic facilities. Three-star hotels offer more and can be considered mid-scale properties. Four-star hotels are upscale hotels, offering an extended range of services and facilities. Five-star hotels are the most luxurious ones,

offering excellent service. This paper also uses the segmentation as provided by booking.com to distinguish between hotels of different segments.

(36)

4. Analysis and Results

4.1 Descriptive statistics of dataset

Below a short summary of all variables used in this paper: 1. Seasonality: a factor with 2 levels: HIGH and LOW. 2. Hotel Size: a factor with 3 levels: S, M, L.

3. Hotel Segment (number of hotel stars): a factor with 4 levels: 2, 3, 4, 5.

4. Topic probability: a numeric: this actually consists out of five sub variables (five topics). Per review (row) a probability score of each topic occurring in that review is assigned (probability score ranging between 0 and 1).

5. Sentiment score: a numeric: sentiment score making use of the “afinn” sentiment lexicon algorithm. A sentiment score is assigned to each review measuring the polarity (ranging between -5, negative, and +5, positive).

In R, I checked the dataset for outliers on all numerical variables. After checking for outliers, and removing them, I computed some descriptive statistics, which can be found in the table below. The dataset had no missing values.

(37)

4.2 Hypothesis Testing

4.2.1 Hypothesis 1

H1. The average sentiment score in reviews is different in high season compared to low season.

Linear model with categorical predictor (season):

𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡_! = 𝑏_!+ 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛_! + 𝜀_! (1)

The first hypothesis states that a difference could exist in the average sentiment score in reviews between two seasons: HIGH and LOW season. There is one categorical predictor,

(38)

with two categories, so a t-test is used to see if a significant difference exists between means of the two categories:

𝑡 = !!!!!

!! _!!!!_!!!

(2)

The assumption of homogeneity of variances was violated (Levene’s test: p < 0.05), and the t-test was thus adjusted so equal variances were not assumed.

After performing the independent t-test in R, a significant difference in sentiment score between high and low season was found. On average, the overall sentiment score in reviews in high season was lower (M = 1.81, SE = 0.007) than the sentiment score in reviews in low season (M = 1.89, SE = 0.009). This difference, - 0.08, 95% CI [-0.098, -0.052], was highly significant t(91466) = -6.39, p < 0.001; however, the effect size was relatively small, r = 0.02. The difference is visualized in the graph below.

(39)

An explanation for this result (an average higher sentiment score in reviews in low season) might well be the fact that hotels and facilities could be overcrowded more often in high season, which may reduce the service level and as a result also overall guest satisfaction (Mattila & O'Neill, 2003). Although research has examined this aspect before, the result presented here clearly adds some new insights, as this has not been examined thoroughly before in a quantitative way by analysing large quantities of reviews and their sentiment scores.

H2. The effect of season on the sentiment in reviews is moderated by hotel size.

The second hypothesis states that the difference in sentiment scores between HIGH and LOW season is moderated by hotel size. To test this hypothesis, I first ran a one-way independent ANOVA in R, to examine the direct effect of hotel size on the sentiment score in reviews.

𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_! + 𝑏_!𝑆𝑖𝑧𝑒) + 𝜀 (3)

The residual plots do not show strange patterns (see appendix E). After running the analysis, the results indicate that, on average, the sentiment scores in reviews differs significantly among different hotel sizes, F(2, 116.290) = 10.28, p < 0.001, η2 = 0.0017. Furthermore, Tukey HSD post-hoc tests reveal that sentiment scores in reviews from hotels with size S are significantly higher compared to hotels with size M (0.04, p < 0.001) and hotels with size L (0.07, p < 0.011). No significant difference is present in the average sentiment score in reviews of hotels with size M and hotels with size L (p = 0.5).

(40)

Figure 8: Boxplots showing sentiment score per hotel size.

As the results do not infer a significant difference between hotels of size M and hotels of Size L, a clear conclusion about the direction of the effect of hotel size on sentiment cannot be made, although the results seem to indicate a negative trend in sentiment scores as hotels become larger.

In order to analyse the interaction I use a factorial ANOVA that combines the condition season and hotel size:

(41)

The assumption of homoscedasticity and linearity are met for the interaction of season * size (see Appendix E). The results of the two-way independent ANOVA does show a statistically significant interaction between the effects of season and hotel size on the sentiment in reviews: F(2, 116.287) = 2.19, p = 0.05, η2 = 0,001. The interaction effects are visualized in the interaction plot below, and summarized in the table below:

(42)

From the plot, a difference can be spotted between high and low season in the way that hotel size affects the sentiment in reviews. The differences between the mean sentiment scores for different hotel sizes are larger in low season.

Furthermore, Tukey’s HSD post hoc tests reveal that the average sentiment score in reviews for hotels with size S in low season is significantly higher compared than those of hotels with size L in both Low (p = 0.005) and High Season (p < 0.001), hotels with size S in high season (p < 0.001), and hotels with size M in both Low (p = 0.009) and High Season (p = < 0.001).

An interesting conclusion from the interaction effects can be drawn from the

interaction plot (figure 9). The difference in the average sentiment score in reviews between high and low season is much larger for hotels with size S, then becomes smaller for hotels with size M, and decreases even more for hotels with size L. The literature states that an overutilization of resources and personnel occurs when a hotel reaches (almost) full capacity. Consequently, guest experience and overall satisfaction decreases (Briggs et al., 2007). Smaller hotels may thus experience this phenomenon at an earlier stage in high season compared to their larger competitors. Similarly, larger hotels are thus expected to perform more constantly when moving from low to high season, as they have more resources and facilities to handle an increase in demand.

(43)

H3. A change in topic distribution in reviews has an effect on the sentiment in reviews. Multiple regression:

𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑏!+ 𝑏!𝑆𝑒𝑟𝑣𝑖𝑐𝑒 + 𝑏!𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 + +𝑏!𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝑏!𝐶𝑜𝑚𝑓𝑜𝑟𝑡 +

𝑏_!𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 + 𝜀 (5)

After checking for homoscedasticity and linearity (see appendix E), I ran a multiple linear regression model in R that predicted the sentiment score in a particular review based on the topic distribution of the following topics: Service, Facilities, Comfort, Location, and

Experience. A significant regression equation was present [F(5, 96523) = 21570, p < 0.001)] with a total R-squared of 0.528. Table 4 shows the results of the multiple regression model.

The predicted sentiment score in a particular review is equal to 0.480 (constant) + 14.11 (Service) + 10.92 (Facilities) - 6.33 (Comfort) - 12.49 (Experience) + 0.48 (Location), where all variables are measured in a probability score of occurring in a certain review. The topics Service, Facilities, Comfort, and Experience are all significant predictors of the sentiment score in a particular review. The resulting linear regression plot for each topic against the sentiment score in reviews can be seen in figure 10.

(44)

Figure 10: Multiple regressions showing correlations between each topic and sentiment score in reviews.

(45)

The negative relation between the topic ‘comfort’ and the sentiment score in reviews could arise from the fact that comfort is one of the core attributes guests value in their hotel stay. In general, guests have quite high expectations of comfort in a hotel. When these expectations are not met, guests quickly engage in negative word of mouth. This could explain the negative relation between the occurrence of this topic and the sentiment score in reviews (Choi & Chu, 2001). About the same holds for the topic ‘experience’. The experience topic is most likely a summary of the total experience guests had in a hotel. Theory suggests that dissatisfied customers engage in greater word of mouth (Anderson, 1998). Hence, as the topic probability distribution of ‘experience’ increases (more dissatisfied customers engage in greater word of mouth), the sentiment score decreases.

Contrary, the positive relation between both probability distributions of the topics ‘service’ and ‘facilities’ and the sentiment score in reviews may be a consequence of the fact that these are quite tangible attributes, and most hotels put a lot of effort in them. Most hotels are aware that satisfying guests and meeting their basic expectations may not be enough to retain them. Hotels know they have to provide excellent service and facilities in order to reach a high guest satisfaction, and therefore a lot of effort is put on improving the level of service and facilities in hotels (Briggs et al., 2007). Consequently, this could explain the fact that these topics correlate with positive sentiment in reviews.

(46)

A possible explanation for the fact that location does not significantly influence the sentiment in reviews could be that people choose the location beforehand. As people then know what to expect, they are most likely not positively nor negatively surprised by the location of the hotel they have chosen.

H 4. The effect of topic distribution on sentiment in reviews is moderated by season. Here, I show whether the relationship between topic probability distribution and the

sentiment score in reviews is different when differentiating between high and low season. I will perform multiple regressions with season as a moderator.

𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_! + 𝑏_!𝑆𝑒𝑟𝑣𝑖𝑐𝑒 + 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛 + 𝑏_!𝑆𝑒𝑟𝑣𝑖𝑐𝑒 ∗ 𝑆𝑒𝑎𝑠𝑜𝑛) + 𝜀 (6) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑏_! + 𝑏_!𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 + 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛 + 𝑏_!𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 ∗ 𝑆𝑒𝑎𝑠𝑜𝑛 + 𝜀 (7) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_! + 𝑏_!𝐶𝑜𝑚𝑓𝑜 𝑡 + 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛 + 𝑏_!𝐶𝑜𝑚𝑓𝑜𝑟𝑡 ∗ 𝑆𝑒𝑎𝑠𝑜𝑛) + 𝜀 (8) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_! + 𝑏_!𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛 + 𝑏_!𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 𝑆𝑒𝑎𝑠𝑜𝑛) + 𝜀 (9) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_! + 𝑏_!𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝑏_!𝑆𝑒𝑎𝑠𝑜𝑛 + 𝑏_!𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 ∗ 𝑆𝑒𝑎𝑠𝑜𝑛) + 𝜀 (10)

After running the regressions in R, only two were found significant. The significant effects were found for the topics ‘facilities’, and, interestingly, ‘location’.

The model shows that the effect of the topic ‘facilities’ in reviews on the sentiment score is significantly more positive when moving from Low to High season. In other words, the results indicate that the relationship between the topic probability distribution of

‘facilities’ on the sentiment score is moderated by season, b = -1.16, 95% CI [-2.16, -0.070], t = -2.09, p < .005. This result is visualized in the graph below:

(47)

Figure 11: Graph showing difference of the effect of topic Facilities on sentiment score between High and Low season.

The graph shows that the sentiment score increases as the topic probability of ‘facilities’ increases, and that this effect is stronger in high season. A possible explanation for this effect could be that in high season guests may expect less of general hotel facilities (such as the bar, breakfast, and spa) compared to low season, due to crowdedness of the hotel. This could also explain the lower starting point (figure 11) of the sentiment score in high season for the topic ‘facilities’. When a hotel meets or exceeds the expectations of the guest with regards to hotel facilities in high season, this may trigger a larger positive experience compared to low

season, where guests may have higher expectations that are consequently harder to meet. This positive experience may then result in a stronger positive effect of the topic ‘facilities’ on the sentiment score in reviews in high season compared to low season.

(48)

The other significant interaction effect can be seen with the topic ‘location’. The results clearly indicate that the effect of the probability distribution of ‘location’ on sentiment score is moderated by season, b = 1.53, 95% CI [0.37, 2.69], t = 2.59, p = 0.001.

See the plot below for a visualisation of the result:

Figure 12: Graph showing difference of the effect of topic Location on sentiment score between High and Low season.

From the graph, it can be concluded that an increase in probability distribution of the topic ‘location’ in reviews leads to a lower sentiment score in reviews, and that this effect is much stronger in high season. This effect may be the result of locations being overcrowded in high season, with noise disturbance and other negative aspects as a consequence. The same locations could have significantly less negative aspects in low season. This may well explain that the effect of negative sentiment due to an increase of the topic ‘location’ is much

(49)

H5. The effect of topic distribution on sentiment in reviews is moderated by hotel segment. First, the direct effect of segment (four levels: 2 stars, 3 stars, 4 stars, and 5 stars) on sentiment is analysed using an ANOVA:

𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑏! + 𝑏!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝜀 (11)

A one-way independent ANOVA clearly shows a significant effect of hotel segment on the sentiment score in reviews, F(3) = 91.14, p < 0.001, ω = 0.002. The residual plot does not show any strange patterns, and therefore the assumption of homoscedasticity and linearity is met (see appendix E).

Figure 13: Boxplots showing difference in sentiment score for different hotel Segments.

(50)

Now, I examine whether differences exist in the effect of topic distribution on sentiment scores in reviews between segments. I perform multiple regressions with segment as a moderator. 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏_!+ 𝑏_!𝑆𝑒𝑟𝑣𝑖𝑐𝑒 + 𝑏_!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑏_!𝑆𝑒𝑟𝑣𝑖𝑐𝑒 ∗ 𝑆𝑒𝑔𝑚𝑒𝑛𝑡) + 𝜀 (11) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = 𝑏_!+ 𝑏_!𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 + 𝑏_!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑏_!𝐹𝑎𝑐𝑖𝑙𝑖𝑡𝑖𝑒𝑠 ∗ 𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝜀 (12) 𝑆𝑒𝑛𝑡𝑖𝑚 𝑛𝑡 = (𝑏!+ 𝑏!𝐶𝑜𝑚𝑓𝑜𝑟𝑡 + 𝑏!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑏!𝐶𝑜𝑚𝑓𝑜𝑟𝑡 ∗ 𝑆𝑒𝑔𝑚𝑒𝑛𝑡) + 𝜀 (13) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏!+ 𝑏!𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 + 𝑏!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑏!𝐿𝑜𝑐𝑎𝑡𝑖𝑜𝑛 ∗ 𝑆𝑒𝑔𝑚𝑒𝑛𝑡) + 𝜀 (14) 𝑆𝑒𝑛𝑡𝑖𝑚𝑒𝑛𝑡 = (𝑏!+ 𝑏!𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 + 𝑏!𝑆𝑒𝑔𝑚𝑒𝑛𝑡 + 𝑏!𝐸𝑥𝑝𝑒𝑟𝑖𝑒𝑛𝑐𝑒 ∗ 𝑆𝑒𝑔𝑚𝑒𝑛𝑡) + 𝜀 (15)

The residual plots do not show any patterns that deviate from normal, and thus homoscedasticity and linearity is assumed (see appendix E).

The results show a significantly weaker effect of the topic ‘service’ on sentiment score for hotels with two stars compared to all other hotels: b = -7.01, 95% CI[-14.86, 0.84], t = -1.75, p = 0.05.

(51)

Figure 14: Graph showing the difference of the effect of the topic Service on sentiment score between segments.

From the graph a clear difference between two-star hotels versus all other segments can be seen. Hotels with a two-star rating have a substantially lower service level, and provide very basic accommodation in general, delivering a less extraordinary guest experience compared to hotels from higher segments. Hence, it is not surprising that the positive effect of the topic ‘service’ on the sentiment score in reviews is less strong for hotels from the two-star segment compared to all other segments.

Furthermore, the effect of the topic ‘facilities’ on sentiment score is significantly less strong for hotels with four stars, (b = -2.24, 95% CI [-4.30, -0.17], t = -2.12, p = < 0.05).

(52)

Figure 15: Graph showing the difference of the effect of the topic Facilities on sentiment score between segments.

The positive effect of the topic ‘facilities’ on sentiment score appears to be significantly different for hotels with a four-star rating. Hotels with a four-star rating are upscale hotels and promise a large range of high quality facilities, which sets customer expectations at a high level (Baloglu et al., 2010). However, hotels with a four-star rating are not offering the same quality of facilities as the most luxurious hotels from the five-star category. Guests might expect the same quality of facilities in four-star hotels as in five-star hotels, and might then be negatively surprised when these facilities do not match their initial expectations. This may then result in a less strong positive correlation of the ‘facilities’ topic on the sentiment score in reviews for hotels with a four-star rating, compared to other hotels.

Next to this, the effect of the topic ‘comfort’ on sentiment score seems to be significantly different for hotels with two stars compared to the other segments. The plot below (figure 16) shows that when the topic probability ‘comfort’ increases, this does not