Consumer reviews and movies : an in depth analysis of their box office prediction power

(1)

1

Consumer reviews and Movies: An in depth analysis of their box office

prediction power

Msc Business Administration: Digital Business

Author: Dragos Munteanu 11789468

Thesis Supervisor: Frederik Situmeang

University of Amsterdam

(2)

2 Statement of originality

This document is written by student Dragos Munteanu, who declares to take full responsibility for the contents of this document. I declare that the text and the work presented in this document is original and that no sources other than those mentioned in the text and its references have been used in creating it. The Faculty of Economics and Business is responsible solely for the supervision of completion of the work, not for the contents.

(3)

3

Machine learning algorithms and big data have empowered companies and people with the ability to gather better insights from unstructured data in ways previously unthinkable. Consumer reviews have long been thought to offer deep insights into consumer behaviour, and in recent years we have witnessed a proliferation of studies that attempted to quantify them in a way that they can be linked with sales. Few studies have focused on extracting attributes for experience goods such as movies. Our study looked into movie reviews and found that positive emotions in reviews can have a positive effect on sales, and this is true especially for less popular movies. However, we also find negative emotions to have positive effect on sales, with only some negative statements related to the overall experience to have a negative effect on sales. We also found a set of most discussed movie attributes in consumer reviews which can affect sales positively but also negative attributes that have negative effects on sales. Lastly we find one attribute, humour, to be moderated by genre. Our study aimed to explore the effect of reviews on sales in more depth, and we prove our results have value to the movie industry and for future academic research.

Keywords: Data mining, sale forecasting, movie reviews, consumer reviews, Natural Language Processing, LDA analysis, topic modelling, sentiment analysis.

1. INTRODUCTION

The rise of Big Data has caused the world to change in previously unthinkable ways. The movie industry, which has had rapid growth in the past, has seen an ever steeper rise with the spread of online and mobile platforms. The movie industry is rich in data, which makes it very exciting for data scientists to formulate, through opportunities brought by data mining and analysis, a new and accurate method that predicts the success of certain movies (Singla, 2015). The global box office revenue is predicted to increase from 38 billion dollars in 2016 to 50 billion dollars in 2020 (Statista, 2016). Large production companies control much of the

(6)

6

industry, and they spend billions of dollars on advertising alone which can result in heavy losses for them if the investment is unsuccessful. If it would be possible to accurately predict to future success of a movie then production companies could readjust their investments in order to gain maximum profits (Vr, 2014).

An important factor that can decide the success of a movie is having the knowledge about what features, or attributes, get people interested in this type of product. By analysing various different sources such as expert websites and online reviews, movie-makers can get this sort of knowledge with a high degree of accuracy (Singla, 2016). When consumers decide to watch a movie we can assume they will look to find whatever information there is available about that movie. Movie trailers can offer that sort of information, especially when they are advertised worldwide, but consumers might need a better indicator of quality. This is when people will turn to product reviews, which can be an objective view for whether the movie is worth their money (Kennedy, 2008).

Studies have shown how the biggest impact on consumer’s decision to purchase comes from positive reviews, and this can be used by producers as a marketing tool (Pentheny, 2015). A lot of previous research concerning movie revenue prediction is focused on quantifying sentiment and volume from social media, with results showing a link between the attention a movie receives on social media and its commercial success (Asur, 2010). Other studies analysed objective movie attribute data (such as: genre, rating, budget, actors, producer, release date etc.) from data aggregators like IMDB or Metacritic in order to construct regression models that predicted real world outcomes of movies. However these models were not accurate enough for commercial use (Vr, 2014; Im, 2011). What consumers think of a product can without doubt influence sales and that is why understanding the opinions and sentiments expressed in reviews are very important, because together these reviews represent the so called ‘wisdom of the crowds’ (Yu et. al, 2012). Consumers can use objective product

(7)

7

attributes such as director, actors, MPAA ratings or genre to guide their consumption decisions. However, when it comes to experience products such as movies, consumers prefer choosing movies based on the subjective attributes a movie might have (Cooper-Martin, 1991). For example is the movie scary, funny, romantic, and exciting, or does it have a good story? We also find consumers to prefer their peers as sources of information over professional movie critics, mainly because they are able to identify their preferences better in reviews written by their peers (Cooper-Martin, 1991).

Our research aims to extract subjective product attributes from consumer reviews, and by linking it with the signalling theory (Spence, 1973), analyse if those subjective movie attributes act as signal to other consumers, which in effect has a positive influence on movie performance. It would be really interesting to find out if consumers are able to identify and discuss such attributes and to analyse their effect on sales. Moreover, these latent product attributes can be used by movie producers when designing a movie or making advertising and marketing decisions by knowing whether/how they influence consumers. This paper will seek to be mainly an application of data mining and NLP by using techniques that will analyse the semantic and syntactic construction of online word-of mouth text in order to find words that are good predictors (Joshi et al, 2010). Developing an appropriate model and metrics would be an important tool for firms, who can monitor changing attitudes in real time and adapt their marketing strategies, manufacturing and distribution as fitted (Dellarocas et al, 2007). Also, we will be able to conduct a more in depth comparison of the results in our paper with results from previous research, thanks to the fact that the movie industry has received a lot of attention regarding the effect of consumer reviews, or word of mouth, on sales (Duan et al, 2008).

(8)

8 1.1 Research gap

There is a research gap when it comes to using text classification, topic modelling and NLP to analyse experience goods such as movies. More specifically little research has focused on identifying the subjective product attributes that can be found in consumer movie reviews and link them with movie performance. This makes it an exciting opportunity to prove that topic modelling is a viable solution to extract insights from large sets of unstructured texts, which can be used by firms to improve on sales. Previous studies have focused on volume and valence of these reviews and their effect on sales (Asur, 2010; Valentine, 2013), analysed the sentiment effect of social media on movie rating (Kesharwani, 2017), or simply focused on the role of valence in consumer reviews as an influencer (Purnawirawan, 2015). Other studies also looked at valence, but analysed if there is a difference when we look at the source of the review: critic or consumer (Pentheny, 2015). For other studies that focused on product sale forecasting, and used social media, actual sale values were not used (Trang, 2017). When analysing movies, sale information can be retrieved in the form of box office revenues. A number of studies have used data from Box Office Mojo or from The Numbers to improve on their model (Joshi et al, 2010). Also, there is prior extensive work that used data mining to extract objective movie attributes in the form of meta-data. These attributes were later used to build a prediction model for movie performance (Vr, 2014; Latif, 2016; Deniz, 2012). Few studies have attempted to quantify if textual content in product reviews can determine consumer choices better than valence or volume. The study of Archak (2011) found that text from reviews can be successfully used to learn consumer preference for product attributes, and that in turn can be used to predict sales. Similar findings by Tirunillai and Tellis (2014) found that, for certain search goods such as cells and computers, dimensions of quality can be extracted from consumer reviews.

(9)

9

Our study aims to improve on the little, existing research on the effect on movie sale performance of the latent product attributes found in consumer reviews. So far there has been little research on the effect of information found in reviews on the performance of experience goods, with most of the existing research focusing on search goods. However, there have been studies from the hospitality industries that have successfully employed topic modelling techniques to find subjective product attributes mentioned in reviews by consumers, which were then assessed against the star level of the hotels (Barnes, 2016; Bueschken, 2015). Topic modelling is an advanced technique for linguistic analysis that provides the opportunity to extract meaning from unstructured texts (Guo, 2017). Our paper will follow similar text analysis and machine learning techniques as those used in the hospitality industry (Barnes, 2016; Bueschken, 2015; Guo, 2017) in order to find new ways to analyse how subjective product attributes found in movie consumer reviews affect movie performance.

Therefore the following research question can be formulated:

Using topic modelling, sentiment analysis and consumer reviews how do product attributes and sentiments impact movie performance?

In the following sections we will discuss existing research on the topic and link it with our research question. In the following section we lay out the theory that forms the basis for our hypotheses. In the methodology section we discuss our dataset of choice and the methods we will use to operationalize the previous discussed hypotheses and also draw a conceptual model. In the results section we proceed to show the results of our analysis. In the discussion and conclusion section we discuss the implications of the results for each hypothesis and draw out a conclusion. We also discuss the implications of our study and future research opportunities as well as listing out some limitations for this study.

(10)

10 2. LITERATURE REVIEW

2.1. The predictive power of reviews

Product reviews, written by experts or consumers, are transmitting information to prospective buyers about products. This is vital for consumers, especially for products that are difficult to evaluate (such as experience goods in our case), or in situations where there is limited reliable information. It has been suggested these reviews can have a strong impact on consumer behaviour, as 62% of them read online reviews before deciding which product to buy (Situmeang et. al, 2014). An earlier study found that 70% of consumers have stated their product purchases to be influenced by product reviews (Mintel, 2015). This has lead scholars to analyse how these reviews can affect sales, with studies focusing on both the impact of expert reviews and the impact of consumer reviews. If we consider the perspective of signalling theory, the reviews will signal the quality of the product, which can increase the willingness of the consumer to make a purchase (Situmeang et. al, 2014). Much of the earlier literature had focused on linking the impact of product reviews mean valence or rating scores with sales (Archak et al, 2011).

When specific experience product types like movies are investigated, we find a lot of research has been dedicated on understanding the link between movie reviews and how they affect consumers. It is believed critics are viewed as experts that have a more objective view than the average consumer and are able to differentiate between a good and a bad movie, but consumer reviews have been recently getting more attention (Pentheny, 2015). Online consumer reviews are a good substitute for overall word of mouth and can also influence purchase decisions made by consumers (Zhu and Zhang, 2010). If movie reviews can have an impact on how consumers make decisions, they can be used by movie studios as marketing

(11)

11

tools and also as a predictor for movie sale performance. Some studies focus on quantifying the influence of both professional reviews and consumer reviews on movie performance; however these studies are mainly focused on the type of information present in the text (positive, negative or neutral) and how it can influence the consumer (Pentheny, 2015). Other earlier studies also focus on the effect of social media so called ‘chatter’, or content, to predict movie performance. Movies are chosen for analysis in a number of studies because they are well talked about in social media communities and the real life outcomes are also easily observed in box office revenues. There are strong positive effects found for the volume of tweets prior to the movie’s release on its performance, as well as positive effects on movie performance of positive tweets written after the movies’ release (Asur, 2010). When analysing movie reviews it is worth adding that previous studies have had mixed results. For example, Dellarocas et al (2007) find that positive online reviews have a positive effect on box office movie sales but Duan et al (2008) have different results, with their study showing that review valence does not have a direct effect on sales, but higher valence does indeed generate higher volume, and volume in turn can affect movie sales positively.

2.2. Investigating Reviews and LDA Topic Modelling

The existing research mainly focuses on determining the polarity or sentiment of these reviews and linking them with sales. So far little research has emphasized mining reviews and classifying them in order to understand the opinions of consumers towards specific attributes or features of a product (Hur et al, 2016; Ma et. al, 2013), while others have only looked at very few product characteristics such as popularity of the product (Zhu and Zhang, 2010).

(12)

12

Some research studies argue that the predicting value of product reviews cannot be captured by a single scale value such as a rating or valence score (Archak et al., 2011). Furthermore it is argued that product reviews have different dimensions that describe many attributes of a product, and above single scale values or volume, these dimensions can more accurately describe or predict consumer choice. For such an analysis they used machine learning techniques to decompose text into different segments from which the respective product features where extracted (Archak et al., 2011). Dimensions, or latent topics how they are often defined, are those variables that may not be explicitly mentioned by reviewers, but they can represent a large number of attributes regarding product quality (Tirunillai and Tellis, 2014). Such attributes can be about durability, ease of use, performance or overall experience. In the hospitality industry there are a number of papers linked with dimension extraction, which had good results in analysing unstructured consumer reviews and extracting latent dimensions, or topics, which consumers used to describe their experience. Their goal was to identify those topics in consumer reviews that are a good predictor for consumer satisfaction (Buschen, Allenby, 2015). Latent Dirichlet Allocation (LDA) is an unsupervised generative probabilistic model for modelling text and is one of the most used topic modelling methods (Jelodar et al., 2017). In earlier work, Tirunillai and Tellis (2014) have applied a LDA model to capture latent topics in user generated content. They asses the topic importance over various industries over time and use them for brand positioning and segmentation. In LDA modelling, latent topics are defined as a collection of words that have a high probability of usage, but not the significance or commonness of single words. Even more accurately, sentence constrained LDA models deliver better results because this model assumes a single topic per sentence. When combined with single scale ratings it delivered better accuracy than other LDA models that were tested (Buschen and Allenby, 2015).

(13)

13

2.3 The effects of attributes and sentiment from reviews on movie performance

There is already evidence that positive word of mouth from social media has positive effects on sales (Asur, 2010; Valentine, 2013). However, there are mixed findings regarding the valence of reviews and their effect on sales (Dellarocas et al, 2007; Duan, 2008). Building on the findings of Archack (2011), who stresses that single value ratings or valence are not enough for a thorough analysis of consumer behaviour, we will use the latest NLP techniques to find specific words that signal consumer positive or negative emotions and see how they link with movie performance. A similar study was performed by Ullah et al (2015), with their findings suggesting that positive words associated with the movie had a positive effect on the perceived helpfulness for the consumer. By using the signalling theory (Spence, 1973) we will analyse if positive or negative sentiments found in consumer reviews can signal quality to the readers, and affect movie performance. The following hypotheses are formulated:

Hypothesis 1a: High frequencies of positive words found in consumer reviews have a positive effect on movie sale performance.

Hypothesis 1b: High frequencies of negative words found in consumer reviews have a negative effect on movie sale performance.

2.4 Moderation effect of movie popularity on sentiments

In the study of Joshi et al. (2010), they found that sentiment related features are not as significant as it might be expected with only some positive-orientated sentiments to have an actual predictive influence. This is in line with other studies that found a significant but not so strong relationship between positive reviews and movie success. This is different than

(14)

14

earlier studies that found statistical significance between positive acclaim from critics and movie performance, some even suggesting a 10% increase in approval would bring millions in return (Neil, 2005). A study done in the Netherlands proposes that critics can be influencers and predict the success of smaller, independent movies. When it comes to mainstream movies, moviegoers are rather influenced by the various available forms of advertisement (Topf, 2010; Boatright et al, 2007). Another study found similar results for another type of experience goods, namely video games, and they also concluded that consumer reviews are more influential for less popular games (Zhu and Zhang, 2010). This might be because consumers receive more signals of quality form mainstream movies in the form of advertising, in contrast with the limited information made available for smaller movies (Trang, 2017; Boatright et al, 2007). However other findings have concluded the opposite, suggesting wide release movies have a stronger performance correlation with positive reviews than limited release movies (Kennedy, 2007).

Building on the findings of Boatright et al (2007) and Zhu and Zhang (2010) we will test whether the role of the review becomes more important in environments with little available information. As found by Caves (2000), we can expect valence to have greater effect on sales for products that are less popular. In order to have an accurate result we will compare the results with the effect of positive reviews on popular movies.

Hypothesis 2a: High frequencies of occurrence of positive words found in consumer reviews will have a positive effect on blockbuster movie sales.

Hypothesis 2b: High frequencies of occurrence of positive words found in consumer reviews will have a positive effect on budget/independent movies.

(15)

15

2.5 Effect of product attributes found in movie reviews

According to the signalling theory, a decision maker faced with limited available information about a product will seek signals of quality that can aid him in that decision (Spence, 1973). Therefore, product signals become crucial when consumers do not have the ability to evaluate a product based on signals from the producers, or the product itself, because it reduces uncertainties regarding the product (Situmeang, 2014a; Situmeang, 2014b). Consumers of experience goods, such as movies, tend to seek more information when making a purchase compared to those of search goods. This is because experience goods have to be consumed in order for the consumer to from an opinion (Bei et al, 2004; Caves, 2000).

So far there has been little research about what attributes of a movie gives viewers the most satisfaction. Is it character development? A good story well told? Or is it the most up to date special effects? One study proposes the use of text from critics’ reviews as an application of Natural Language Processing in order to extract words and phrases that can be used to predict the movie going tendencies of consumers (Joshi et al, 2010). In addition, meta-data attributes, such as name of the movie, production house, genre, actors, producer, country of origin, MPAA rating, and running time were used to predict sales. Although findings suggest that MPAA ratings (a rating system used in the US to categorize a movie’s suitability to different categories of viewers based on movie content) does not have a significant influence on the movie performance (Greenaway, 2012). They conclude that those attributes extracted from text analysis are on the same level with prediction models using just meta-data, but by combining the two, meta-data and extracted text attributes, it can lead to a better accuracy of prediction of sales (Joshi et al, 2010). In the most recent articles it is stressed that viewers and critics are looking for different things. While critics want to dissect a movie in order to assess its quality in terms of all its cinematic elements, consumers are looking to be entertained by a good story that fulfils their expectations (Collazo, 2014). Perhaps when looking at good

(16)

16

predictors for movie sale performance we can analyse past successful movies to identify them. One article looks at movies that have enjoyed long running times in the cinema because of their popularity (Cain, 2016). The author identifies some key qualities from the movies’ attributes that enables them to enjoy a long box office success: Innovativeness/Novelty – a fresh original story that offers unique viewpoints or visual experience; Delight – movies that charm and entertain viewers with delightful characters and engaging stories; Surprise – movies with a surprising story approach; Strong stories, well told – adaptations of popular stories/books or remakes. Although a movie might not tick all of the qualities above, we suspect successful movies might contain at least one of them. This is in line with studies from the service industry that have found there is heterogeneity across most discussed topics from reviews, and they are usually good indicators of the quality of service (Barnes, 2016). Another approach analysed two main cinematic elements and their psychological effect on the audience. The study used the schema theory to understand how audiences connect with characters personalities, and how musical elements are used to associate emotions with characters (Couch, 2012). Consumer reviews for movies can also contain rich information about the movie’s quality, and consumers read them and receive signals about different product attributes. Our approach will be to identify product features that consumers talk about most frequently in reviews and test whether they are positively linked with movie sale performance. We suspect high frequencies of topics that signal product attributes within the reviews can indicate that consumers were able to identify the most important features. This in turn can provide a signal to other consumers who are reading those reviews and positively affect movie sales. Hence, the following hypothesis is formulated:

Hypothesis 3: A high frequency of labelled topics based on movie attributes that are found in consumer reviews will have a positive effect on movie sale performance.

(17)

17 2.6 Moderating Effect of Movie Genre/Category

It has been suggested in more recent studies that product characteristics have a role in moderating the effect of earned media on product sales. This makes is worthwhile to look deeper into the effect product type has when analysing consumer reviews (Zhu and Zhang, 2010). The movies they review are from different genres. When a consumer acts as a critic and writes his review, he focuses on certain aspects of the movie that he thought are important to mention and review. Moreover, a Sci-Fi movie will have different aspects that viewers find exciting than, for example, a comedy or a horror movie. Viewers will look for the scare factor in the horror movie, while they will go watch a comedy to laugh or distress after a long day. Movie genres are important characteristics for viewers because they often provide a reference point for a particular movie. Because there are a vast number of movies of various and unique characteristics it has been argued that consumers form experienced-based norms at the level of genre (Redfern, 2012). This is turn will signal to consumers that they can expect the same level of satisfaction experienced to those previously enjoyed movies from the same genre.

2.6.1. Defining Movie Genre/Category

Categorizing movies by genre by both audience and critics enabled a much easier way to study movies as a general issue, instead of looking into all the details and different aspects of the industry. The movie industry is also entirely dependent on the audience, and therefore they need to make sure they can meet the expectations of that audience. As such, genres are an important scale for how the audience perceive a certain type of product (Fu, 2010). Sometimes movies can fall into one category (eg. Drama), but more often than not they are a mixture of two or more. For e.g. Interstellar is described by Imdb as an

(18)

18

Adventure/Drama/Sci-fi (Imdb, 2018). A movie’s genre is dependent on a number of factors. A western (e.g. The Good, The Bad and The Ugly) can be characterised by its time and location, and so can a Sci-Fi (e.g. Star Wars). The plot and structure of the movie are also important factors, as romantic comedy will rely on a light-hearted approach of someone trying to win someone else’s heart, while a detective movie will revolve around solving crime and bringing justice. Themes can also set out genres, as we can experience in Horror movies (eg. supernatural forces). Because genres match movies to a certain audience, moviemakers can choose to appeal to their expectations and ultimately increase the chances of a movie’s financial success (Pucher, 2017).

2.6.2 Moderating effect of movie genre/category on the relation between topics of consumer reviews and movie sale performance

Little research has been devoted to understanding individual preferences for different kinds of entertainment and what drives consumers towards certain genre preferences. Genre has been used in research in order to improve on models that predicted movie popularity (Latif, 2016) or as control variables (Situmeang, 2014; Lash, 2016). Another study found only two genres (sci-fi and drama) to be a good predictor of success (Topf, 2010). This means there is a gap in assessing if movie attributes are more linked to some genres that to others and makes it an interesting research opportunity.

In one research the authors conducted an ample study that compared entertainment genre preference with their personality traits (Rentfrow et al., 2011). The aim was to discover a structure that can explain underlying consumer preferences for certain genres. They were able to define broad dimensions that applied to the participants’ preferences by using all genres from music, television, movies and books as variables. They identified factors such as

(19)

19

‘Communal’, ‘Aesthetic’, ‘Dark’,’ Thrilling’ and ‘Cerebral’. Genres corresponding to ‘Communal’ (eg. Family, romance) appeared light-hearted, popular, uncomplicated and focused on people and relationships. Corresponding to ‘Aesthetic’ (eg. Foreign, classic) were those genres described as creative, abstract and cultured. The factor ‘Dark’ (eg. Horror, cult) was characterized by intensity and edginess. Another factor, ‘Thrilling’ (eg. Action, Sci-fi), included common themes such as action, adventure, suspense and fantasy. While the last factor, ‘Cerebral’ (eg. Documentary) was an information orientated factor. They also found that although there were significant age and gender differences for their genre preferences, the factors underlying those preferences remain generally invariant despite gender or age effects. They suggested that consumers are not passive recipients of entertainment but they seek out entertainment that reflects and reinforces aspects of their personality. Another study follows the same line of logic by arguing that genre links certain product types with particular utilities despite differences in culture and nationality (Fu, 2010). This can mean that consumers know what they want when looking for a movie and they might look at the genre attribute to find the product that is best suited to their entertainment preferences. Another article provides a snapshot of a genre assessment for movies done by movie professionals and investors. For example, for dramas the script needs to be strong and the director talented, while for actions the concept and star level of the actors is more important than the director (Brown, 2013). Movie genre could have a moderating effect between consumer reviews and sales by generating a different set of attributes or topics of discussion that are characteristic to those specific genres. Again, building on the signal theory (Spence, 1973; Kirmani, 2000) we can hypothesise that by segmenting movie reviews we find certain attributes specific to each genre (ex. Storyline attributes for dramas or thrillers, technical attributes for sci-fi) and viewers are able to identify these elements and discuss them in reviews, which in turn has an effect on sale performance. Respectively, if we segment consumer reviews by genres such as

(20)

20

comedies, action, drama, horror, sci-fi, animation and fantasy we will find that the same attribute can have a negative or positive effects on sales depending on the genre. In turn these attributes act as signals for consumers who are able to identify and discuss them.

Hypothesis 4: Movie genre is a moderator for attributes extracted from movie reviews. A high frequency of genre specific attributes found in movie reviews will have a positive effect on movie performance.

Figure 1: Conceptual Model

In Figure 1 we have an overview of the conceptual model as discussed in the hypotheses. We proceed to discuss the methodology we will use to implement the above constructs.

(21)

21 3.1 Dataset

The dataset was collected from the review aggregator Metacritic. For each entry it contained the review content, date of the review, movie name, reviewer name and user score. From Box

Office Mojo we gathered information about the movies in our dataset such as production

budget, release date, domestic gross and worldwide gross, runtime, genre and distributor name. These two datasets were merged into one with a Python function that used movie name as a common column. After a closer inspection we realised the dataset had reviews form different languages. We proceeded to clean the dataset of reviews written in foreign languages that were found with the help of the Python package langdetect. Also we implemented a function in Python to check for review duplicates and any that were found were eliminated. From the original set of 120.000 reviews over 2000 were found to be of a language other than English. Next, reviews for movies released before 2010 were eliminated which resulted in a dataset of 50036 reviews, comprising of movies released from 2010 – 2017.

3.2 Operationalization of constructs

Positive/Negative words: A sentiment analysis will be performed on the dataset and each

review will be categorized into positive or negative. This will be done using LIWC, which can capture negative or positive sentiment by using a separate dictionary that contain statistically related words for each dimension. After the processing module has accounted for all the words in a text it calculates the percentage of total words that match each of the dictionary categories. It then assigns a score from 0 to 100 for each category in each text.

(22)

22

Movie popularity: Movies with big budgets that are released on a large number of screens

tend to be more popular (Terry, 2005). We will categorize movies as such by the size and success of the movie distributor. Movies that are popular are widely distributed by a few big distribution houses. We will use the data available on The-Numbers in order to identify the distributors with the largest market share.

Movie attributes: This independent variable is particularly difficult to define because the

existing literature does not focus on cinematic elements that viewers might recognize as signals of quality. By performing an LDA analysis on our dataset we will find topics of discussion in the reviews which we will label manually. However it is important to define beforehand important movie attributes that we expect to influence viewers, based on the little existing articles and psychological research. These variables will form the basis for our product attributes variables. The first attribute will be Characters. One technique used by movie makers to connect the audiences to a certain movie is by using schemas such as villain or hero (Couch, 2012). Schemas are building blocks that viewers build from past experiences in order to form a mental representation unfamiliar situations. By employing them, audiences can better understand the role played by the character in the movie which can lead to a more connected audience (Mcleod, 2015). Topics that contain words associated with character development, heroes, villains, good vs. evil or names of superheroes will be included in this category. A variable that has been found to have a positive effect in a number of studies is

Star Power (Lash, 2016; Greenaway, 2012). We will include here words from topics

associated with actor, director, and specific names of actors and directors. Also, in this category we can include words associated with the performance of the actors.

Good stories that move us emotionally are the ones that make us pay attention and be more involved (Zak, 2015). Story will be our next variable and will include topics and high frequencies of words associated with it such as storytelling, plot, engagement, surprise or

(23)

23

excitement. Because movies that have been based on known books or comics tend to perform well (Greenaway, 2012), we will include also look for topics associated with those key words. Our next variable, Innovativeness is associated with the Story variable but we define it separately because it is a widely used technique by movie writers that uses the rising and falling tension of a performance in order to facilitate emotional connection with the audience. In Hollywood it is called surprising familiarity, that is audiences want to see stories that fit into certain genres, but within those genres they want to see original stories that offer unique viewpoints or some degree of innovation (Zak, 2015; Cain, 2016). The innovativeness variable will include topics and high frequencies of words that can be associated with it such as novelty, surprise, fresh, unique, innovativeness or originality. As previously emphasized, audiences mostly watch movies to have fun (Cain, 2016). Psychological studies have found that people seek exposure to humour and feel better when they laugh, which in turn reduces stress (Lefcourt, 1991). It therefore makes sense to introduce a variable Humour, which will be associated with topics and high frequencies of words such as comic, comedy, funny, laugh or names of actors/characters that play comical roles.

Music is an important attribute that can also be used as a schema, or a framework to provide a better interpretation of the movie. For example, music can be used to highlight emotions of the characters, having a foreshadowing effect that builds anticipation in the story about what might happen, or just transport the audience to a certain setting (Boltz, 2001). It has also been used as a theme in a movie that is associated with characters, settings or emotions (Green, 2010). We will then define Music as our next attribute variable, and it will contain topics and high frequencies of words associated with it. Other variables that are talked about in movie articles but are not necessarily researched are Visuals and Setting. Movie directors use visuals such as special effects or filming techniques to engage audiences in different ways (Miller, 2014). Therefore we will include in this category topics and high frequencies of words

(24)

24

associated with it such as imagery or special effects. We believe setting plays an important role for connecting the story to the audience. For example a war setting in the 16th_century will provide viewers with a different schema than a space setting such as the one in Star Wars. Therefore we will include in this variable topics and high frequencies of words that can be associated with the setting of a movie. Our final product attributes that will serve as our guide in analysing the inferred topics using LDA analysis will be: Characters, Star Power, Story,

Novelty, Music, Humour, Visuals and Setting.

Innovativeness: Based on two stem words, ‘newness’ and ‘innovation’, we developed an

exhaustive word list in order to capture consumer discussion about innovation or something that is new in reviews. We used synonyms of the two words via The Synonym Finder in order to come up with the list of words in Table 1. In order to implement this construct we use Senti

Strength which was developed to detect strengths of sentiments in social texts. It uses a

lexical approach that exploits a list of sentiment-related terms and it has rules to deal with online linguistics such as emotions, punctuations and misspellings (Thelwall, 2013). The main advantage of this tool is that is free of charge and it allows users to implement their custom dictionary for analysis. This is why we constructed our own dictionary to test for sentiment of innovativeness. The software assigns a score of 1 to 5 depending on the strength of the word. For example, we assigned the word ‘new’ a score of 3, while ‘newest’ was assigned the highest score of 5. We proceeded to assign scores to all of the 96 words in our list.

(25)

25

Movie category: It has previously been explained how movie genres can help match

audiences with certain types of movies. There are various definitions of genre listed by various movie database aggregators and movies usually fall within one main genre. We have chosen The-Numbers because they offer a classification of the main genres spanning between 1995 and 2018 ranked by the number of sales in dollars (The-Numbers.com). Therefore the following genre will be operationalized: Action, Drama, Comedy, Horror, Sci-Fi, Animation and Fantasy. These genres were found to be the most representative in our sample, and the other ones that were less frequent were used as baseline for our regression analysis. Because the genre classification for each movie usually contained a base genre and one, or two secondary genres, we used only the base genre so to make the analysis easier. For example Action/Sci-fi was transformed into Action.

Movie performance: In order to assess movie performance actual movie sales will be used

from Box Office Mojo, one of the most complete box office databases on the internet. The values that we use for each movie are domestic box office sales, expressed in dollars.

3.3 Methodology

advanced, avant-garde, brand-new, breakthrough, contemporary, creation, creative, creativeness, creativity, cutting edge, developed, development, different, distinct, evolution, futuristic, imaginative, improved, improvement, innovation, innovative, innovatory, invention, inventive, inventiveness, leading edge, metamorphosis, mint

condition, modern, modernism, modernistic, modernity, modernization, modernized, modification, mutation, neoteric, new, new- wrinkle, newest, newfangled, newfashioned, newness, novel, novelty, original, originality, origination, originative, progressive, radical, radical change, radically, rebuilt, recast, reconstructed, recreated, reformation, regenerated, remodeled, renascent, renewed, renovation, restyle, restyling, revolution, revolutionary, transformation, ultramodern, unhackneyed, unprecedented, up-to-date, way-out

(26)

26

Topic modelling is a powerful machine learning technique that can be used for data and text mining, latent dimension discovery and finding relationships between text and data. These techniques are widely used because of their efficient way of finding hidden semantics in gigantic amounts of information (Jelodar, 2017). Metacritic will serve as a good environment for our application of topic modelling because it is one of the largest online movie review communities, and it is available to a large number of consumers. The data from the website should provide us with a high variety of consumer opinions that we can extract in order to test our hypotheses. We are mainly conducting an analysis of customer reviews in order to identify latent topics and assess their impact on product performance, and secondly, to contrast different methods of creating inferences about the topics. Thus we will test if word choice probabilities are enough to establish meaning on consumer evaluations, and the extent to which topics can provide insights based on UGC discussions in reviews (Buschken, 2016). For our research two natural language processing toolkits will be used for mining topics, MALLET and Gensim in the Python environment. For sentiment analysis we will use LIWC and Senti Strength will be used to test for innovativeness, which has been linked in previous discussed articles to have a positive effect on sales.

3.3 Text pre-processing

The pre-processing step will be done in a similar style as in the study of Tirunillai and Tellis (2014) because their implementation of LDA was successful in extracting dimensions of quality from reviews for online products. We start by first giving a structure to the text, and words that are not informative of product quality or dimensions are then removed. After this we have to transform the remaining words so they can be manipulated statistically. We begin by cleaning and standardizing the text for analysis. Non-English words and characters (html

(27)

27

tags, numbers, and punctuation) are removed. After that all the sentences need to be broken down into a list of words and clear up remaining messy text. The words are tokenized and cleaned up using the Python package, NLTK, which is available for free for various NLP applications. As recommended by Kobayashi et. al (2017), texts are then converted to lowercase and stop words are removed by using the NLTK English lexicon. Next, Bigram and Trigram models are built, which groups words that have a high frequency of occurrence together. We apply Part-of-speech tagging (POS) to retain only words that are adjectives, nouns, verbs or adverbs. This step also involves lemmatization, keeping only the root form of a word, and is done with another Python package, Spacy. Also all the stop words such as: the, then, where, a, is, on, in, are removed. This is because meaning can be inferred from a document even without them. Next Gensim will create the corpus, or the text, that will be the input for the LDA model. Each review is treated as a separate document, and each word within the document is given an id and a frequency count over all the same words from all the input documents. This numerical representation of documents becomes the dictionary that will be used by the LDA model to analyse the corpus of the processed text.

3.4 Latent Dirichlet Allocation (LDA) and topic modelling

A topic model is a hierarchical Bayesian model that introduces a latent variable, namely the topic, between the observed variables that are the documents and the words in order to analyse the semantic distribution over documents (Wang, 2014). The main idea is that documents are assumed to represent a random mixture of topics, and topics are characterised by a distribution over words (Blei et al, 2003). Two of the most widely used models are PLSA (Probabilistic Latent Semantic Analysis) and LDA. In contrast with PLSA, LDA

(28)

28

provides a generative process to assign probabilities to new documents outside the training set, which makes this model more fitting for our study (Wang, 2014).

LDA assumes the following generative process for a corpus D that consists of M documents, with document 𝑑𝑑 having N d words (𝑑𝑑 ∈{1,..., M}):

1. Choose a multinomial distribution 𝜑𝜑𝑡𝑡 for topic 𝑡𝑡 (𝑡𝑡 ∈{1,..., T}) from a Dirichlet distribution with parameter 𝛽𝛽.

2. Choose a multinomial distribution 𝜃𝜃 𝑑𝑑 for document d (𝑑𝑑 ∈{1,..., M}) from a Dirichlet distribution with parameter 𝛼𝛼.

3. For a word 𝑤𝑤𝑛𝑛 (𝑛𝑛 ∈ {1,..., 𝑁𝑁𝑑𝑑 }) in document d, (a) Select a topic 𝑧𝑧𝑛𝑛 from 𝜃𝜃 𝑑𝑑.

(b) Select a word 𝑤𝑤𝑛𝑛 from 𝜑𝜑 𝑧𝑧𝑛𝑛.

In the process above the observed variables are the words in documents and (𝜑𝜑 and 𝜃𝜃) are the latent variables, and (𝛼𝛼 and 𝛽𝛽) are the hyper parameters. In order to infer the latent variables, the probability of observed data D is computed as follows:

𝑝𝑝(𝐷𝐷|𝛼𝛼 , 𝛽𝛽) = ∏𝑑𝑑=1𝑀𝑀 ∫ 𝑝𝑝(𝜃𝜃𝑑𝑑|𝛼𝛼) (∑𝑛𝑛=1𝑁𝑁𝑑𝑑 𝑝𝑝(𝑧𝑧𝑑𝑑𝑛𝑛|𝜃𝜃𝑑𝑑)𝑝𝑝�𝑤𝑤𝑑𝑑𝑛𝑛�𝑧𝑧𝑑𝑑𝑛𝑛, 𝜑𝜑�𝑃𝑃(𝜑𝜑|𝛽𝛽))𝑑𝑑𝜃𝜃𝑑𝑑𝑑𝑑𝑝𝑝

LDA probabilistic model assumes that when writing a review you decide on a number of words N the document will have (Poisson distribution), and you decide on a topic mixture for the document (according to a Dirichlet distribution over a fixed set of K topics), for example you might choose 33% to write about actors/director, another 33% assigned to story, and 34% about special effects. Assuming this generative model, LDA then backtracks from the

documents to find the set of topics that was likely to generate the document. After choosing the set of documents and a fixed set K of topics, we want to learn the topic representation of

(29)

29

each document (Chen, 2011). One way to do this is by using the Gibbs collapsed sampling method, a popular LDA method used to estimate LDA parameters. It is a method that indirectly estimates the document-topic distributions and the topic word distributions. They repeatedly condition the probability of assigning each word to topics on the current topic assignment of all other words’ topics (Jelodar, 2017).

Figure 2: Plate representation of the generative LDA model

We can illustrate the probabilistic generative model with the plate notation in Figure 2. In this notation, the shaded and un-shaded areas represent observed and latent variables, so here only words W in documents are observed. The variables β (beta), α (alpha) and Z are the ones we would like to infer. Hyper parameters α and ɳ are treated as constants. The arrows indicate the conditional dependencies between variables, while the plates represent repetitions of sampling steps. The variable in the right bottom corner represents the number of samples. For example, the plate referring to Z and W represents the repeated sampling of topics until N

(30)

30

words have been generated for document d. The plate around θd illustrates the sampling of distribution over topics for each document d for a total number of D documents. While the plate representing βk illustrates the repeated sampling for each topic k until K topics have been generated (Zinman et al, 2010).

MALLET is a popular toolkit available in Python which provides an implementation of the

Gibbs sampling method. Another advantage for choosing this toolkit is the fact that it integrates a Coherence Topic Model tool. This allows us to perform coherence experiments on the sampled topics with the documents in order to test accuracy (O’Callaghan et al, 2015).

Gensim, another popular toolkit for LDA topic modelling will also be used prior to the MALLET method. Topics inferred by using Gensim will be used as a baseline for comparison

with MALLET, which is known to perform better with large datasets and it implements the online Bayes method, as explained in Hoffman, Blei, and Bach (2010). Also because we are running the algorithm in the Python environment, Gensim will be used as a wrapper for the

MALLET package, which was originally written in the Java environment.

3.5 Parameters for the LDA model

The LDA modelling in MALLET and Gensim was performed on a random generated training set of 10000 reviews. The results from the training model were then used to infer topics for our test sample. Although not much coding needs to be done in order to implement the model, there are certain parameters that need to be specified. Most importantly, setting k (number of topics) indicates how many topics the model can create from the words in the documents. There is no rule of thumb on computing the number of topics. The end goal is to describe the data with as few topics as possible, but with enough topics so that no data is lost (Jacobi,

(31)

31

2016). In order to rate the topics inferred we used automatic topic coherence measures in order to rate topics based on how easy they are to understand (Roder, 2015).

Figure 3: Computing coherence value

As explained by Roder et. al (2015) and illustrated in Figure 3, the word set t is segmented into a set of word subsets S. Then word probabilities P are computed based on the reference word corpus. The set of word subsets S and the computed probabilities P are consumed by the confirmation measure to calculate the agreements Phi vector of pairs of S. Finally, we are given the final coherence value c.

(32)

32

As seen in Figure 4, we ran the coherence model for the MALLET LDA by using the

Coherence Model package on a model with k=16. We found that the best coherence values

were obtained for k=33, as that is where the coherence score peaks, before dropping where k=34.

The main hyper parameters that interest us are the beta hype parameter, which affects the word-topic distribution and the alpha, which affects the topic-document distribution

(Steyvers, 2012). A low beta will mean the topics contain a mixture of just a few of the words, while a high value means the topic may contain a mixture of most words. For alpha, a low value will mean that documents will be described by only a few topics, rather than having a high value and many topics per document (Jacobi, 2015). As recommended by Wallach et al (2009) in their paper, an asymmetric alpha increases the robustness of LDA topics to

variations in the number of topics as well as to highly skewed word frequency distributions. Also, a symmetric beta serves to avoid topics that have similar words. For this reason we implemented an asymmetric alpha and symmetric beta in the MALLET model. We also ran the model with k=16, 20, 25, 30, 35 and 40. After manual inspection we found the model with k=35 to have the best number of robust and readable topics.

The code for the LDA implementation in Python we used as well as the inferred topics and the words associated with each can be found in an open repository on Github at:

https://github.com/gosdra/Mallet_NLP_Python

(33)

33

In this section we present the results of the LDA analysis and test its validity through human examination. Before we present descriptive statistics and correlations among variables, the most correlated variables resulted from the LDA analysis are grouped through Principal Component analysis in SPSS and discussed. Next step we conduct two moderation analysis for the constructs discussed above.

4.1 Latent movie attributes

By extracting and labelling latent movie attributes found in consumer reviews this study generated 35 topics and the top 20 words associated with each topic and the respective weight of each word in that topic. Three topics remained unlabelled because the top words did not seem to be correlated. The process of labelling each topic was done by checking for a connection between the most frequent words in that topic. Visualisation tools such as word clouds were also used.

Table 2: Topic identification

Topic 9: Bad Watch Word % Weight

(34)

34

Figure 4: Topic word cloud

In Table 3 we have all the 33 identified topics and their distributions across as dominant topics across documents. We also listed the 3 unlabelled topics that we could not categorize based on the fact that they contained too many unrelated words. Altogether the unlabelled topics were dominant in 8916 documents. By analysing the labelled topics we can observe they fall within certain categories that are in line with the discussed literature. We can find many dimensions about genre, story, book adaptation, successful movie series, character and features, music or setting.

Table 3: The identified topics and their document distribution as dominant topics

Topic name Num_Documents Weight_Documents

0 Humour 1566 0.0313 1 Driving violence 140 0.0028 2 Tarantino violence 135 0.0027 3 Action scene 335 0.0067 4 Unlabelled 3952 0.079 5 Superpower 132 0.0026

6 Horror and suspense 607 0.0121

7 Original series 1986 0.0397

8 LOTR and Fantasy 527 0.0105

bad make good plot character act terrible watch time waste story stupid movie boring script money awful bore dialogue bore 4.68% 2.68% 1.94% 1.67% 1.35% 1.22% 1.21% 1.11% 1.11% 1.02% 1.01% 0.09% 0.09% 0.08% 0.08% 0.08% 0.07% 0.06% 0.06% 0.06%

(35)

35 9 Bad watch 4165 0.0832 10 Superhero animals 160 0.0032 11 Music 129 0.0026 12 Art 81 0.0016 13 Marvel comics 611 0.0122 14 War 234 0.0047 15 History 547 0.0109 16 Dream fantasy 87 0.0017 17 Animation kids 721 0.0144 18 Alien space 457 0.0091

19 Story and character 5025 0.1004

20 Sex 144 0.0029 21 Character develop 1542 0.0308 22 Unlabelled 98 0.002 23 Actor performance 314 0.0063 24 Oscar bait 126 0.0025 25 Robot future 140 0.0028 26 Unlabelled 4866 0.0972 27 Star wars 592 0.0118 28 Good story 12983 0.2595 29 Time 1395 0.0279 30 Book adaptation 307 0.0061

31 Love and family 877 0.0175

32 Good performance 3974 0.0794

33 Spy 144 0.0029

34 Dc comics 937 0.0187

In order to test the validity of the LDA results we compared them with results derived from human analysis. We asked two researchers with experience in reviews and NLP to derive topics of discussion for a random sample of 200 reviews each. A total of 400 reviews were chosen for human analysis and the results were compared with those from the LDA algorithm. The mean of the user scores for the first sample was 6.47 and for the second one was 6.57, thus we were able to conclude the samples were quite similar.

The Jaccard coefficient was used to examine the overlap between dimensions extracted from LDA and that from human analysis. N (𝐷𝐷𝐷𝐷𝐷𝐷𝑙𝑙𝑑𝑑𝑙𝑙) represents the number of dimensions derived

(36)

36

from LDA, and N (𝐷𝐷𝐷𝐷𝐷𝐷𝑒𝑒𝑒𝑒) represents the number of dimensions identified by the two researchers:

J(𝐷𝐷𝐷𝐷𝐷𝐷𝑙𝑙𝑑𝑑𝑙𝑙, 𝐷𝐷𝐷𝐷𝐷𝐷𝑒𝑒𝑒𝑒) =|𝑁𝑁(𝐷𝐷𝐷𝐷𝐷𝐷_{|𝑁𝑁(𝐷𝐷𝐷𝐷𝐷𝐷}𝑙𝑙𝑑𝑑𝑙𝑙∩ 𝐷𝐷𝐷𝐷𝐷𝐷𝑒𝑒𝑒𝑒)| 𝑙𝑙𝑑𝑑𝑙𝑙∪ 𝐷𝐷𝐷𝐷𝐷𝐷𝑒𝑒𝑒𝑒)|

Researcher A had a Jaccard similarity coefficient of 0.55 and Researcher B had a coefficient of 0.52. Considering that the main purpose of the algorithm is to identify main topics of discussion, and not movie attributes we consider this a very good and a reliable score for further testing. The list with the results of the reliability analysis can be found in the Appendix.

4.2 PCA, Descriptive statistics and hierarchical linear regression

Principal component analysis was performed in SPSS for our variables derived from the LDA analysis on the dataset. Topic modelling techniques generate enough variables that are similar enough to be considered highly related, so PCA was used in order to group variables that are related together (Peladeau, 2018). In the correlation matrix (Table 6) lots of variables had negative weights so we decided to use the top 3 factors we grouped in Table 5, despite the low score of .292 from the Kaiser-Meyer-Olkin measure, as seen in Table 4. The Bartlett’s test was also found significant at .00.

Table 4 KMO and Bartlett's Test

Kaiser-Meyer-Olkin Measure of Sampling Adequacy. .292

Bartlett's Test of Sphericity Approx. Chi-Square 29492.247

df 171

(37)

37

The factors in Table 5 were rotated using Varimax with Kaiser Normalization, and rotation converged in 13 iterations. Factor loadings that had coefficients bellow .40 were supressed. Only the first 10 components had initial Eigen values above 1.00, and the total variance explained by these 10 components was 59.19%. Also in the Communalities table we looked at how much of the variance in each item is explained with all items having a score over .30.

Table 5: Rotated Component Matrix

Component 1 2 3 4 5 6 7 8 9 10 goodStory -.783 characterStory_c onnection .552 Art .459 loveFamily .458 badWatch -.779 goodPerformanc e .638 War .744 History .526 characterDevelop -.717 oscarBait .480 starWars -.768 Time .568 Spy .653 actionScene .586 HorrorSuspense -.480 spaceAlien .839 lotrFantasy .878 tarantinoViolence .834 superpower .781

Extraction Method: Principal Component Analysis. Rotation Method: Varimax with Kaiser Normalization. a. Rotation converged in 13 iterations.

(38)

38

The dataset has 50036 reviews of 1281 distinct movies with a mean user score of 6.54. The innovativeness score has a mean of 1.9 while the average sentiment score is 3.08. Before running a bivariate correlation analysis of the Dependant and Independent variables we checked if they correspond to a normal distribution. Preliminary analysis was conducted to ensure no violations of normality (Appendix 1), linearity (Appendix 3) and homoscedasticity (Appendix 2).

In Table 6 we can see the Pearson Correlation Coefficients we ran in SPSS. Innovativeness has a positive effect of r = .167 on Domestic Total Gross, which is a slight effect. Production

Budget was the only variable found to be strongly correlated to Domestic Total Gross, with r

= .662. Runtime (r = .402) and Distributor (r = .316) were also moderately correlated.

filmSeries (r = .162) had a weak correlation effect while negemo (r = .033) had a positive but

insignificant correlation effect to Domestic Total Gross.

The rest of the factors derived from LDA were all negatively correlated to Domestic Total

Gross: story (r = -.145), watch (r = -.119), character (r = -.136), actorDirector (r = -.095), humour (r = -.085). However, these effects are weak. Two of the variables: posemo and MetacriticUserScore were found insignificant at p > .05 level. These results, together with

our collinearity statistics (all VIF < 10 and Tolerance > .10), indicated that we would be unlikely to have multicollinearity (Tabachnick and Fidell, 2007). All but two of the predictor variables (negemo, MetacriticUserScore) were statistically correlated with the dependant variable, which indicated that the data was suitable for examination through hierarchical multiple regression.

(39)

39 Variables Mean SD 1 2 3 4 5 6 7 8 9 10 11 12 13 1.DomesticTotalGr oss 15762738 5.91 18986367 0.251 2.Innovativeness 1.90 1.370 .167** 3.metacriticUserSc ore 6.54 2.950 .008 .029** 4.ProductionBudg et 86642458. 43 82036898. 769 .662** _.104** _.018** 5.Runtime 120.38 21.694 .402** _.041** _.103** _.471** 6.Distributor .83 .375 .316** _.042** _.019** _.391** _.271** 7.posemo 5.6866 3.23740 -.005 -.037** _.329** _.002 _-.061** _.059** 8.negemo 2.5979 2.26905 .033** _-.052** _-.371** _.023** _-.052** _-.036**_-.218** 9.story .0000000 1.0000000 -.145** _.075** _-.155** _-.239** _-.086** _-.188**_-.389** _.073** 10.watch .0000000 1.0000000 -.119** _.032** _.604** _-.123** _.071** _-.029** _.308**_-.442** _.000 11.character .0000000 1.0000000 -.136** _-.038** _.089** _-.125** _-.065** _-.049** _.044**_-.021** _.000 _.000 12.filmSeries .0606 .0677 .139** _.219** _.024** _.151** _.049** _.079**_-.022**_-.081**_-.017** _.009*_-.015** 13.actorDirector .0096 .0331 -.095** _-.028** _.035** _-.134** _-.020** _-.015** _{.001 -.069}** _.068** _.079** _.019** _-.043** 14.humour .0292 .0650 -.085** _-.058** _-.027** _-.160** _-.226** _.028** _.223**_-.016**_-.022**_-.051** _.006 _-.045** _-.024**

**. Correlation is significant at the 0.01 level (2-tailed). *. Correlation is significant at the 0.05 level (2-tailed).

We then implemented a hierarchical multiple regression in order to investigate the ability of meta-data and variables extracted through machine learning from the reviews to predict

Domestic Total Gross, after sentiment (posemo and negemo) and 3 of the factors extracted

earlier (story, watch, character) were controlled for.

Table 7: Predictive Model Summary

Model R R Square Adjusted R Square Std. Error of the Estimate Change Statistics Durbin-Watson R Square Change F Change df1 df2 Sig. F Change 1 .233a _.054 _{.054 184635451.6} 27 .054 575.749 5 50030 .000

(40)

40

2 .685b _.470 _{.470 138280409.0}

69

.415 4896.596 8 50022 .000 1.997

a. Predictors: (Constant), character, watch, story, negemo, posemo

b. Predictors: (Constant), character, watch, story, negemo, posemo, filmSeries, actorDirector_performance, Runtime, Innovativeness, Distributor, humour, ProductionBudget, metacriticUserScore

c. Dependent Variable: DomesticTotalGross

As seen in Table 7, the first model we ran F (5, 50030) = 575.75 explained 5.4% of the total variance. When we entered the predictive variables in the second block and we saw an increase for the second model, F (13, 50022) = 3408.082, of up to 47% of the total variance explained. The predictive variables explained a change of 𝑅𝑅2_{= .417 at p < .001. These} contributions are significant at p < .001. The Durbin Watson test gave us a value of 1.997 for the second model, which is between the two critical values of 1.5 < d < 2.5. We can therefore assume that there is no first order linear auto-correlation in our multiple hierarchical regression data.

In the final adjusted model (as seen in Table 8), all variables but actorDirector_performance (β = -.006, p > 0.05) were found significant after controlling for sentiment and the main factors we derived from LDA attributes: story, watch and character. The higher beta values were recorded by ProductionBudget (β = .556, p< 0.01), Runtime (β=.129, p<0.01),

Innovativeness (β=.098, p<0.01), watch (β=-.080, p<0.01), character (β=-.050, p<0.01) and Distributor (β=.050, p<0.01).

Table 8: Coefficients for the predictor model

Model Unstd. B Std. Error β t Sig.

(41)

41 posemo -1561378.971 294657.787 -.027 -5.299 .000 negemo -1352294.503 408000.369 -.016 -3.314 .001 story -29337690.757 905557.125 -.155 -32.397 .000 watch -22396618.414 958672.118 -.118 -23.362 .000 character -25614573.695 826658.209 -.135 -30.986 .000 2 (Constant) -169516130.618 4688572.328 -36.155 .000 posemo 1165869.113 231278.638 .020 5.041 .000 negemo 1366558.485 309939.093 .016 4.409 .000 story 2829128.282 710949.111 .015 3.979 .000 watch -15242895.073 853491.910 -.080 -17.859 .000 character -10399091.207 629121.246 -.055 -16.530 .000 UserScore 2322564.018 275508.896 .036 8.430 .000 Innovativeness 13583904.642 467255.212 .098 29.072 .000 ProductionBudget 1.288 .010 .556 134.049 .000 Runtime 1127250.929 33591.356 .129 33.558 .000 Distributor 26581907.494 1830665.059 .052 14.520 .000 ActorDirector -33305765.752 18952948.473 -.006 -1.757 .079 filmSeries 69584726.443 9493117.681 .025 7.330 .000 humour 91869180.428 10224760.642 .031 8.985 .000

a Dependent Variable: DomesticTotalGross

4.3 Movie popularity as moderator for sentiment

We dummy coded the variable Distributor so that movies that are distributed by large production companies have a value of 1 and small distributors the value of 0. The mean from the sample for Distributor was 0.83, which meant most movies in our sample were from large production houses. We then proceed to run model number 1 in Hayes’s Process package v3.0 for SPSS in order to test for simple moderation effect. Domestic Total Gross was our dependant variable, posemo our independent variable and Distributor our moderator. It is

(42)

42

worth noting that a heteroscedasticity standard error and covariance matrix estimator was used.

Table 9: Model summary Distributor

Coefficient SE t p Intercept 𝐷𝐷1 24910039.4 320238.433 77.7859 .0000 posemo (X) 𝑐𝑐1 614186.018 104054.186 5.9026 .0000 Distributor (W) 𝑐𝑐2 159907511 1022175.20 156.4385 .0000 Posemo*Distributor (XW) 𝑐𝑐3 -2404314.5 297131.874 -8.0917 .0000 𝑅𝑅2_{= .1005, p < .001} F(3,50032)=8402.7

Table 10: Moderation effects for Distributor

Conditional effects of posemo(X) on DTG(Y) at levels of Distributor(W) Distributor Effect SE(HC3) t p Small 614186.018 104054.186 5.9026 .0000 Big -1790128.5 278316.504 -6.4320 .0000

There is a significant interaction effect between positive emotions and Distributor on

domestic total gross (𝑐𝑐3=-2404314.5, p<0.001 in Table 9). Thus the effect of positive emotions on domestic total gross depends on the distributor of the movie. This model accounts for 10% of the variance of domestic total gross for movies. In Table 10 we can see the individual effect of the interactions between positive emotions (posemo), and each category of the Distributor moderator on the dependant variable. Small Distributor has a significant positive moderating effect (effect=614186, SE=104054) on the DV, and Big

Consumer reviews and movies : an in depth analysis of their box office prediction power

Consumer reviews and Movies: An in depth analysis of their box office

prediction power

Table of Contents