What are we watching? Analysing movie and television watching behaviour using Twitter

(1)

What are we watching?

Analysing movie and television watching behaviour

using Twitter

SUBMITTED IN PARTIAL FULLFILLMENT FOR THE DEGREE OF MASTER

OF SCIENCE

Guido van Bruggen

6063942

M

ASTER

I

NFORMATION

S

TUDIES

H

UMAN-

C

ENTERED

M

ULTIMEDIA

F

ACULTY OF

S

CIENCE

U

NIVERSITY OF

A

MSTERDAM

August, 2015

1st_Supervisor Drs. Isaac Sijaranamual ISLA, UvA 2nd_Supervisor

Prof. dr. Maarten de Rijke ISLA, UvA

(2)

What are we watching?

Analysing movie and television watching behaviour using Twitter

Guido van Bruggen

University of Amsterdam Graduate School of Informatics

Science Park 904, Amsterdam

guidovanbruggen@student.uva.nl

ABSTRACT

The way we watch movies and television is changing. People increasingly use smartphones and tablets as a second screen to gather additional information or to share what they are watching on social media like Twitter. This provides us with new possibilities to gather information on the watching be-haviour of people. We describe a system that collects tweets and links them to items in the Internet Movie Database. A dataset consisting of tweets collected by this system for four months is analysed. We look for patterns among what is watched and the following three dimensions: time, location and context. Results show that the system finds trends that correlate with other internet sources. Furthermore, we find that ’Game of Thrones’ is by far the most popular TV se-ries, late evening hours are the most popular time to watch, and that the genres drama, comedy and thriller are the most popular. Although our dataset is limited in size, we do show that it is able to obtain insights in watching behaviour using Twitter. Future work should primarily focus on expanding the number of incoming tweets, which enables a variety of other applications like more detailed research and movie rec-ommendations.

Categories and Subject Descriptors

H.3 [Information Storage and Retrieval]: H3.1.1: Con-tent Analysis and Indexing; H.3.3 Information Search and Retrieval

General Terms

Algorithms, Human Factors

Keywords

Social media, movies, television, Twitter, microblogs

1. INTRODUCTION

Watching television shows or movies is one of the most popular leisure activities. By using portable devices like tablets and smartphones, people access the internet simul-taneously while watching TV shows and movies more and more1_{. People use these devices as a second screen to gather} additional information or to share their thoughts via social media. Television shows encourage this behaviour by pre-senting hashtags during the show, which can be used to refer to the show on social media. Moreover, next to sharing their 1 http://www.nielsen.com/us/en/newswire/2012/report-u-s-media-trends-by-demographic.html

thoughts, Twitter users also tend to report on their activ-ities [1]. By composing tweets reporting on the movies or TV series that are currently being watched, users share their activities. The hashtags #nowwatching and #nw (the ab-breviation of ’now watching’) are used to report on the media people are currently watching. An example of such a tweet is: ”#NowWatching Game of Thrones Season 4 Finale.”

Collecting those tweets provides us with possibilities to get insights in what the world is watching at any given mo-ment. Television producers and marketeers are interested in the viewing rates of television shows and movies, because this data can be used for several goals, like scheduling TV broadcast times and advertisements. Traditional metrics on the popularity of TV shows are obtained by placing devices in homes that register the viewing habits of the household. Analytics on the popularity of movies are traditionally based on the number of cinema visitors and the number of DVD sales. However, the way we watch TV shows and movies is changing. Nowadays, we have several options to watch television and movies. Different from the traditional way of watching TV shows and movies, people no longer depend on what is broadcast on television channels, but tend to use on-demand streaming services like Netflix, or download services like Bittorrent. Because of this, it becomes more and more possible for people to watch what they want on any given moment. With this emergence, traditional met-rics might not be sufficient enough to gain insights in the watching behaviour of people2_.

In this paper we describe a method of collecting and analysing tweets about what media people are watching at any given moment. In contrast to traditional ways of measuring pop-ularity, this method is independent of the way people watch these media and furthermore, enables us to obtain these an-alytics in a real-time and costless manner. This could give a better view of the overall popularity of movies and TV series, and by analysing the metadata of tweets, like loca-tion and time, provides us with insights into the watching behaviour of people.

We filter the stream of Twitter messages for a number of hashtags used by people reporting on what they are watching right now. By extracting the title of a movie or TV series out of the text of tweets, we aim at linking tweets to titles in the Internet Movie Database (IMDb). After linking tweets

2_{http://www.mediapost.com/publications/article/238149/} nielsen-calls-for-industry-to-adopt-new-ratings-st.html

(3)

to titles in IMDb, we are able to get insights into what the world is watching at any given moment. By analysing a dataset of collected tweets we aim to get insights into the watching behaviour of people, which we analyse using the following questions:

1. When are we watching movies and TV series? 2. Who is watching and from where?

3. What movies and TV series are we watching? 4. Why are we watching movies and TV series?

2. RELATED WORK

With the vast amount of messages streaming through social media nowadays, challenges in collecting and using this data in various fields form a popular field of research. Examples of these usages include the early detection of new events [2], the prediction of stock markets [3] and the detection of earthquakes [4]. These examples show that messages on social media can provide us with a solid source for measuring what is trending and how people among the world think about certain topics.

The analysis of social media signals about movies is a pop-ular research topic as well, and has been used for a variety of goals. The number and content of movie-related tweets around the release date of new movies have been used to predict box office revenues [5][6]. Moreover, by combining several social media signals from Twitter and YouTube, rat-ings on IMDb have been predicted [7]. Besides using social media messages for prediction tasks, tweets from users rating movies via the IMDb mobile application have been used to create a data source for movie recommendation algorithms [8].

An increasing number of people use some kind of device while watching television to gather additional information or share their thoughts. Because of this so-called second screen-ing, television programs often appear among the trending topics on Twitter [9]. While most studies on tweets about movies aim at capturing the buzz in the pre-release stage, studies on tweets about television are mainly focused on measuring the sentiment of messages composed during a live broadcast. The content and number of tweets during a live broadcast are indicative of the subjects and structure of the television program, and can therefore be used to gen-erate automatic annotations [10]. Furthermore, the number of tweets about television programs is used to calculate TV ratings in a new way [11]. This system uses the similarity between the text of a tweet and electronic program guide (EPG) information to classify tweets that cover certain tele-vision programs.

Our approach differs from related work in the sense that we aim at measuring both tweets about movies and televi-sion programs. Furthermore, we don’t measure the online buzz about movies or television series, but aim at measur-ing self-reports of people really watchmeasur-ing a certain title at the moment. Lastly, because our system is self-contained and does not rely on information in EPGs or pre-defined hashtag lists of specific movies or TV programs, it is not limited to tweets about television shows or movies that are

broadcast on a television channel, and it is able to pick up unexpected signals.

Similar systems can be found in the context of measuring the popularity of music through Twitter. A system, pre-sented in [12] analyses tweets about music and by doing so, composes a real-time chart of what music is popular at any given moment. Furthermore, this study examined the content of tweets written by users while listening to music, and found out that many people tweet snippets of the lyrics of songs. Using a similar way of inferring music popularity from tweets, [13] focusses on finding patterns in the listening habits of people.

3. METHOD

In this section we describe the methods used in this re-search. We begin by describing how we filter the Twitter data stream to gather relevant tweets, followed by describ-ing our multi-staged approach to extract titles out of the text of a tweet. We report on precision and recall scores to evaluate this extraction pipeline and lastly, we cover the methods used to enrich the dataset with location and time-zone information.

3.1 Data collection

The Twitter Streaming API is used to collect tweets from users that are sharing what movie or TV show they are watching at the moment. More specifically, we follow a num-ber of hashtags and collect and analyse all tweets containing at least one of these hashtags. In order to come up with the hashtags to follow, we queried Twitter for the titles of pop-ular TV series and movies and collected the most poppop-ular hashtags among these tweets. In this list, we specifically looked for the hashtags that people use to report what they are watching at the moment. This lead to the following list of hashtags that are followed for the data collection: #nw, #nowwatching, #IMDb, #Trakt and #TVshowtime.

The first two hashtags are generic terms used by Twitter users to share their watching behaviour. The other three hashtags are all related to websites. Tweets containing the hashtag IMDb are usually shared through the mobile ap-plication of IMDb, which provides users with the ability to automatically share activities like rating a movie on Twit-ter. TVshowtime3offers a mobile application and website to keep track of what episodes of TV series a user has watched, and to get notified when new episodes are broadcast. Like IMDb, it is also able to (automatically) share the watch-ing behaviour of a user uswatch-ing his Twitter account. Trakt4 is very similar to TVshowtime and offers the same func-tionalities. However, Trakt is not limited to TV series and contains movies as well. Both TVshowtime and Trakt can be integrated in media center software, making it possible to automatically share the movies or series that are being played to their websites and Twitter.

Accessing the Twitter Streaming API with a filter on this list of hashtags, provides us with a real-time stream of tweets mentioning the consumption of media as they appear on Twitter.

3_{http://www.tvshowtime.com} 4

(4)

3.2 Title linking

After setting up the incoming data stream of tweets, the second step is to extract the correct movie or series title out of the text of a tweet. We aim at linking all tweets to the Internet Movie Database (IMDb)5. IMDb is the biggest on-line movie and TV database, including more than 3 million movies and TV programs. We chose IMDb as a knowledge base over alternatives like RottenTomatoes6, Trakt.tv and TheTVDB7, because it offers the most complete database: all alternatives contain less titles than the vast number of titles in IMDb. By trying to match the tweets to the ID of one of the items in IMDb, we are able to aggregate and analyse the information stored in the tweets. In this section we propose a multi-staged approach for linking the text of a tweet to an item in IMDb. Figure 1 shows the overview of our title linking approach.

In the first stage of our extraction method we look at what hashtags are present in the tweet text. We use the hash-tag(s) to direct the text to the appropriate next stage in the extraction method, or to ignore a tweet.

3.2.1 Filtering tweets

We look at certain characteristics of a tweet to decide whether to continue with a tweet or to ignore it. During manual observations of the incoming tweets, we found that not all tweets are real reports about watching movies and TV se-ries that can be linked to IMDb. People also use the hash-tags we follow to report on watching videos on websites like YouTube and Vevo. Because we aim to measure the popu-larity of movies and TV series, we choose to ignore tweets containing links to these video services. A second obser-vation we did was that many of the incoming tweets are retweets. Because we only want to measure when a user reports on himself watching a movie or episode, we choose to ignore retweets. Moreover, this also prevents spammers, using bots that automatically retweet messages, to influence the results.

3.2.2 IMDb, NW, NowWatching

Many #IMDb tweets, and some of the #nw and #nowwatch-ing tweets contain direct URLs to the IMDb website. The titles in these tweets are the easiest to extract, since the IMDb-ID we are looking for is already present in the URL. Using a regular expression, we check for this presence and if possible, extract the IMDb-ID and save the information to the database. If there is no URL present, we move on to the IMDb query stage.

3.2.3 TVshowtime and Trakt

The vast majority of tweets with #TVshowtime and #Trakt contains a URL to the website, which in some cases contains an IMDb-ID. If present, we fetch the URL and use a reg-ular expression to grab this ID and save the information to the database. Otherwise, we use the web page to ex-tract the specific titles of the movie or episode as used on the TVshowtime or Trakt website, which are then used to query the IMDb database. This step helps in eliminating the noise that users add to their tweets, and therefore helps 5

http://www.imdb.com

6_{http://www.rottentomatoes.com} 7

http://www.thetvdb.com

in improving the results obtained using the IMDb search en-gine. When the tweet contains no URL, we move on to the IMDb query stage.

3.2.4 IMDb query

The IMDb search engine is used to query the database for a match when the aforementioned methods did not result in a match. The engine is able to deal with lots of variations in title descriptions; it handles different languages, common abbreviations (e.g. expanding the acronym ’GoT’ to ’Game of Thrones’) and alternative titles. In order to query the IMDb search engine, the tweets are first stripped down as follows:

1. Hashtags used to query Twitter are removed

2. Hashtag symbols are removed (eg. #Inception -> In-ception)

3. User mentions are removed 4. URLs are removed

5. Punctuation is removed

6. Hashtags written in camelcase are splitted (eg. #Break-ingBad -> Breaking Bad)

After this step, the remaining text is used to send a query to the IMDb search engine. If found, the first result is chosen and the IMDb-ID and tweet information are saved to the database.

3.3 Evaluation

We evaluate our dataset by manually annotating a test set and comparing this with the results of our extraction pipeline. In order to minimize bias caused by geographic differences, we choose to select an equal number of tweets out of every hour of the day. We randomly selected 21 tweets for every hour of the day, resulting in a test set of 504 tweets. We an-notated each of these tweets with an IMDb-ID, if possible. Because every episode of a TV series has its own IMDb-ID, we also included the IMDb-ID of the general TV series. For 403 tweets we were able to relate the tweet to a title in IMDb, leaving 101 tweets that did not mention anything that could be related to IMDb. Examples of these cases include sport matches and Bollywood movies, that are not present in IMDb.

When comparing the ground truth with the results from our extraction method, we define a match when the IMDb-ID in the ground truth is the same as the result from the extraction method. In the case of episodes of TV series, we also account tweets mentioning a specific episode of a TV series, which are linked to the title of the general TV series title, as being correct. Looking at the results of our extraction pipeline for the 403 tweets referring to an item in IMDb, we see that our algorithm linked a title to 338 of the tweets. 319 of these assignments were correct, resulting in a precision score of 0.94 and a recall of 0.79, indicating that our linking method causes few wrongly linked tweets, but misses out on a number of tweets that could have been linked. Combining both evaluation metrics, we obtain a F1-score of 0.86.

(5)

Figure 1: Title extraction pipeline

Table 1: Dataset overview

Metric Value

Tweets 955,251

Unique titles 65,296

Unique users 206,754

Max tweets per hour 1288 Average tweets per hour 506 Max tweets per user 1585 Average tweets per user 5 Median tweets per user 1 Max tweets per title 10,616 Average tweets per title 15 Median tweets per title 2

3.4 Dataset

The dataset analysed in this paper consists of tweets col-lected from 24 March 2015 to 16 July 2015. The total num-ber of tweets in the dataset is 955.251, composed by a total of 206.754 users. Every tweet is associated with an IMDb title. The total number of unique IMDb titles that occur in the dataset is 65.296. Table 1 provides an overview of the dataset.

3.5 Location

We are interested not only in what people watch, but also where they live, and therefore we try to obtain the location of the users in our dataset. Because only 1,7 % of the tweets in our dataset contain exact coordinates, we look at other ways to enrich our data with location information. The sec-ond option Twitter provides to derive geographic informa-tion of a tweet is the locainforma-tion field of a user’s profile. In contrast to the exact coordinates field, this field is textual and therefore less precise. Even though Twitter provides an auto-complete function to help users fill in this field, users can fill in this text field without restrictions to real locations, which results in many people filling in fake locations. In our dataset the amount of unique users with a non-empty value for this location attribute is around 66 %.

In order to map these strings to a valid location, we use a geocoder maintained by Foursquare8, containing location indexes from Geonames9. The geocoder was able to map 71 % of these strings to a latitude-longitude pair, resulting in 50 % of all tweets in the dataset being assigned to a location. A manual evaluation of 150 geolocated users shows that the geocoder performed well: we find a precision of 0,89 and a recall of 0,93. Most errors are caused by false locations, like ’Nowhere’ or ’City of smiles’, which are incorrectly mapped to a location by the geocoder.

3.6 Timezone

We are interested not only in what people watch, but also when people watch TV series or movies. Therefore we want to know the local date and time when a tweet was written. Because all timestamps provided by the Twitter Streaming API are in UTC time, we need to know the timezone of a user to calculate the local date and time of a tweet. Twitter pro-vides users with the ability to fill in their timezone in their profile settings. Users can choose from a list containing ma-jor cities and the offset in time between the user’s timezone and UTC. The value of the offset can be used to calculate the local time and date of a user’s tweet. However, there are two main drawbacks of using this attribute. First, not every tweet contains an entry for the timezone attribute; 82% of all tweets has a timezone assigned to it. Second, looking at the distribution of timezone offsets among our dataset, we see that not all information provided in the timezone attribute is valid.

Therefore, we look at an alternative way to extract the time-zone of a user, using the geocoding proces. By mapping the location found by the geocoder to a timezone, we are able to obtain timezone information for nearly 50 % of all tweets. We assume that this information is less error-prone because users don’t provide this information themselves. When com-paring the timezone information on user’s profiles with the timezone information derived using our geocoding method, 8_{www.twofishes.net}

9

(6)

Figure 2: Tweet volume over time (in UTC)

Figure 3: Number of tweets per day of the week

we find that only 60 % of the tweets match.

We use a combination of both methods to calculate the lo-cal time and date of a tweet as follows: if we were able to obtain a timezone using our geocoding method, we use this information, otherwise we use the timezone information on a users profile. Using this method, we are able to calculate the local time and date of 88 % of all tweets.

4. ANALYSIS

This section describes the results of the analysis of the dataset. By looking at the data from different perspectives, we aim at obtaining insights in the watching habits of people.

4.1 When are we watching?

First we analyse the watching behaviour of people by looking at the total volume of the collected tweets over time. Figure 2 shows the volume over time of a subset of 2 weeks. The graph shows a circadian pattern with a clear peak everyday at 19:00 UTC, which corresponds to the evening hours in Europe. Furthermore, we see a smaller peak occurring at 15:00 UTC, which can be related to the evening in Asian countries. The drop in volume at May 29th was caused by an outage of the server. Next to peaking hours of the day, the graph also shows peaks for certain days.

4.1.1 Day of week

Looking at Figure 2, we see that certain days of the week tend to have a higher number of tweets than others. While this graph only shows a subset of two weeks, we also observe differences when grouping the complete dataset by day of the week (Figure 3). As expected, we see high values during the weekend, with a maximum at Sundays. Furthermore, we see a high number of tweets on Mondays.

Figure 4: Number of tweets per hour of the day

When filtering the data on the kind of title (movies or series), we see some differences. The data for tweets mentioning TV series shows that Monday is the most popular day to watch. This can be explained by the fact that new episodes of Game of Thrones, the most watched TV series in our dataset, are broadcast on Sunday evenings in the USA. Because this is during the night in Europe, many Europeans watch the new episode on Monday, which explains the maximum on Mon-days.

The distribution of the number of tweets for movies is dif-ferent: during the week the numbers are pretty consistent. However, we do see a tendency towards the weekend. This difference could be explained by the fact that the number for movies are probably less influenced by release dates; whereas new episodes of series are released every day, causing the number of tweets to peak, new movies are released less of-ten, which is why we see a more consistent pattern during weekdays. Furthermore, movies usually take more time to watch, which could explain why people watch movies more often during the weekend, and less during weekdays.

4.1.2 Hour of day

By calculating the local times of all the tweets containing timezone information, we are able to see at what hour of the day people watch movies or TV series. Figure 4 shows the distribution of the local times of all tweets. As expected, this shows a peak in the evening around 21:00 and 22:00, followed by a drop to a minimum in the morning around 05:00. After reaching the minimum we can see the world waking up, and starting to tweet about watching movies and TV series again.

(7)

pat-Figure 5: Number of tweets per user rank

terns for all genres, except for the genre ’Animation’. This genre has a lower peak in the evenings, and more tweets in the afternoon (Figure 4). This observation can be explained by the fact that the main audience of most movies and se-ries in the Animation genre consists of children, who are most likely to watch TV during the afternoon than during the evening. Surprisingly, when comparing the most popular hours of day between countries we find very similar patterns, indicating that people from all over the world seem follow similar day rhythms.

4.2 Who is watching?

Next, we analyse the users sharing what they are watching through Twitter. We look at how the tweets are distributed over the users, which helps in finding spammers or bots. Furthermore, we look at the location of the users.

4.2.1 Users

The dataset consists of tweets composed by 206.754 unique users. To see how the tweets are distributed over these users, we look at the number of tweets per user. We plot the num-ber of tweets per user over the rank of users (Figure 5). The rank of a user is found by ordering the users based on their number of tweets: users with a low rank, have a high number of tweets. Plotting data this way helps in finding irregular users: in the case of spammers or bots, who automatically send out loads of tweets, we would see a hump for the first top users. The graph shows a distribution alike Zipf’s Law, indicating that there are no users with an irregular number of tweets.

4.2.2 Location

The distribution of tweets over (UTC) time suggests that our systems picks up a strong European as well as a weaker Asian signal. In order to confirm this observation and to look at how the watching behaviour of people differs around the world, we look at the location of the users in our dataset.

Table 2 shows the top 10 most occurring countries, as well as the total distribution in continents. This shows us that, while the United States is the country with the most tweets in our data, France has a similar number of tweets. This is not as expected, since the population of the US is almost 5 times as big as the population of France. Furthermore, we see 4 other European countries appearing in the top 10. Combined with the number of tweets from the other

remain-Table 2: Top countries and continents

Country % USA 14,4 France 13,7 Brazil 9,7 Turkey 5,3 Philippines 5,1 Malaysia 4,8 Great Britain 4,8 Italy 4,5 Spain 2,9 Portugal 2,0 Continent % Europe 38,6 Asia 23,5 North America 19,5 South America 12,7 Africa 3,4 Oceania 1,9 Antarctica 0,0

Figure 6: Hashtag usage per continent

ing European countries, we see that our system picks up a strong European signal, with 39 % of the tweets coming from countries in Europe.

This is not as expected, since research on the geographic distribution of the Twitter population, shows a much bigger signal from the United States, with 57 % off the Twitter users living in the United States [14]. In addition, this re-search shows that around 70 % of all tweets are produced in the US. Furthermore, most popular movies and TV series are produced in the US, which leads to the assumption that the data in our dataset would have a strong North American sig-nal. Because this result is not as expected, we take a deeper look at whether our data acquisition method influences the geographical distribution of the users in the dataset.

One of the reasons for the bias towards European tweets could be the fact that we use certain hashtags that are re-lated to companies with a market penetration that might differ in the various continents. To investigate this assump-tion, we look at the geographical distribution of the tweets for all hashtags we use in our data collection. Figure 6 shows this distribution for the 4 biggest continents. This graph shows that all of the hashtags have a non-equal distribution between the continents, and therefore suggests that all the used hashtags cause some kind of bias in the geographical distribution of the signal we measure.

The biggest difference can be seen for the hashtag ’TVShow-time’, where 50 % of all tweets containing this hashtag is composed in a European country. The high percentage in Europe can be explained by the fact that TVShowtime is developed in France, and that therefore their market pen-etration is higher in Europe than in other continents. The data for Trakt shows similarities with TVShowtime, which also leads to a strong European signal. Furthermore, we see that the number of tweets from Asian countries containing

(8)

these hashtags is very low, indicating that the market pen-etration of both companies in Asia is negligible. This also implies that using these hashtags for our data collection in-troduces a bias towards non-Asian Twitter messages.

While the inequality of the geographic distribution for these two hashtags can be explained by the fact that they are related to companies, each having their own markets, we also see that the more generic hashtags ’NW’ and ’NowWatching’ don’t result in equal distributions. Furthermore, we see that using both hashtags to collect data, still does not lead to a very strong signal from the United States, as we would have expected since the Twitter population overall has a strong tendency towards the US. What we do see however, is that the majority of tweets containing these generic hashtags, are from Asian countries.

The lack of a substantial majority of users from the United States could be an indication for the fact that people from the United States are less willing to share what they are looking at via Twitter. However, a similar system focussed on measuring the music popularity, does not find this indi-cation, and it has a big majority of data coming from US users [13].

4.3 What are we watching?

This section focusses on insights about what movies and TV series people are watching. First, we look at what movies and TV series are the most popular and how the overall popularity of items is distributed in our dataset. Second, we analyse at a more contextual level and look at the genres assigned to every item and how this data is distributed. We do this on a global level, but also look at differences between continents. Lastly, we look at the overlap between audiences of the most popular titles.

4.3.1 Type of media

We look at the kind of titles in the dataset. IMDb defines a kind for every title in their database. The three most oc-curring kinds are ’movie’, ’TV series’ and ’episode’. Since episodes are part of TV series, we grouped both kinds to-gether. In our dataset, we see that a majority (58 %) of the titles is of the kind ’TV series’ (Figure 7). When grouping users by their continent, we see that Europe has a slightly bigger preference towards series. The data from Asian users show an opposite preference where 60 % of all tweets refer to a movie. Whereas these differences are minor, South Amer-ica shows a major preference for TV series over movies, with 83 % of the tweets referring to TV series. This can be ex-plained by the fact that most of the South American tweets come from following the hashtags ’TVshowtime’ and ’Trakt’, which are both used primarily for TV series.

4.3.2 Title popularity

In this section we look at the popularity of all the items in our dataset. The most popular TV series and movies are presented in Table 3. To get an insight in how the tweets are distributed over the various titles in the dataset, we look at the popularity distribution. In order to do so, we grouped episodes of TV series together with the general title of the TV series, which corrects for the fact that some tweets tion the precise episode information, while others only men-tion the general title of the TV series. We plot the number of

Figure 7: Title type per continent

Table 3: Top 10 series and movies

Rank Series Movies

1 Game of Thrones Avengers: Age of Ultron 2 Arrow Kingsman: Secret Service 3 The Flash Interstellar

4 Grey’s Anatomy Pitch Perfect 2 5 Daredevil Mad Max: Fury Road 6 Orange Is the New Black The Fault in Our Stars 7 Pretty Little Liars Fifty Shades of Grey 8 The Vampire Diaries Love, Rosie 9 Once Upon a Time American Sniper 10 The Walking Dead Ex Machina

tweets per title over the rank of titles (Figure 8). The rank of a title is found by ordering the titles based on their number of tweets: a low rank means a high number of tweets. We see a distribution alike Zipf’s Law, indicating that a small number of the top titles account for a big part of the tweets in our dataset. This is confirmed by looking at the data: the top ten percent of the titles in the dataset, account for 90 % of all tweets. Furthermore, we see a substantial difference between the most popular and the second most popular ti-tle: tweets linked to ’Game of Thrones’ occur twice as much as tweets linked to ’Arrow’.

4.3.3 Genres

Every item in IMDb is assigned to one or more genres. The total number of unique genres assigned to titles in the dataset is 28. The distribution of the most occurring genres is shown in Figure 9. This shows that Drama is by far the most popular genre: 23 % of all tweets belong to this genre. Other popular genres are Comedy, Thriller and Action.

(9)

Figure 9: Number of tweets per genre

Figure 10: Stacked barplot showing differences between genre preferences per continent and overall

At a more detailed level, we look at how the genres are dis-tributed between different continents. Appendix A contains the distributions of all genres per continent. Figure 10 shows how the genre preferences differ from the overall distribution for the most popular genres. Since the majority of tweets in our dataset are from European countries, we first focus on the genre distribution between Europe and the overall aver-age. By doing this, we see only small differences: in Europe, drama and mystery are more popular while romance, ani-mation and family are less of interest. Furthermore, the dis-tribution for North America shows minor differences, with a tendency towards comedy, animation and family and smaller interest in the genres adventure, fantasy and romance.

Compared to the overall genre distribution, we see that South American countries show a bigger interest in fantasy, mystery and science fiction. The genres comedy, romance and family are less popular among the South American users in our dataset. Asian countries show a tendency towards family, romance and action titles, while drama, sci-fi, crime and mystery titles are less popular.

4.3.4 Audience overlap

By comparing the list of people having watched a certain title with the same list of another title, we can see whether and to what extent their audiences overlap. We calculated the percentage of overlapping audiences for the top ten most popular titles (Appendix B). This confirms the fact that ’Game of Thrones’ is by far the most popular title: more than 25 % of the audiences of most other titles have also watched ’Game of Thrones’. Furthermore, we see that

ap-proximately half of the audience of ’Arrow’ has also watched ’The Flash’. This is expected, since the latter is a spin-of series of the former.

4.4 Why are we watching?

In this section we look at what triggers people to start watch-ing a movie or an episode. Furthermore, we compare data about the popularity of media titles, measured by other in-ternet sources, to see to what extent we measure a similar signal.

4.4.1 Releases and broadcasts

The main triggers we found for people to start watching, are releases of movies or episodes. The most notable example of such a trigger is the weekly peak we have seen on Monday evenings, caused by the release of new episodes of Game of Thrones. Another clear example of such a trigger, was the release of a complete new season of ’Orange is the New Black’ on Netflix at June 11: whereas the number of viewers was almost zero for the days prior to the release, the series was among the most popular titles watched on this day and following weeks.

4.4.2 Correlation with other signals

Because we are interested in to what extent our systems measures signals similar to other sources, we compare our data with data from other websites. Two internet services providing information on the activity of their users are Google Trends10 _{and Wikipedia}11_{, offering data on the number of} search queries for specific terms and page views of articles, respectively. Both metrics have shown to be good indicators of the popularity of topics among internet users [15][16].

We compare the data for the top ten most popular titles in our dataset with both sources. The Google Trends website is used to download statistics on the daily volume of queries containing the title of the movie or TV series. A REST in-terface12offering a service to download the daily page view statistics of Wikipedia articles is used to gather informa-tion about the popularity of the movies and TV series on Wikipedia. We analyse the similarities between our data and the two other sources by calculating Pearson correla-tion coefficients (Appendix C). This shows that the number of tweets correlates well with the two other sources for most of the cases. We also see that the correlation coefficients between the number of tweets and the Google Trends data are usually higher than those relating to the Wikipedia page views.

We take a deeper look into the correlation values for one example with high coefficients and one with low coefficients. For the TV series ’The Flash’ the high correlation with both sources is confirmed by the scatter plots (Appendix C, Fig-ure 11), which both approximate a straight-line. Scatter plots for ’Grey’s Anatomy’, the series with the lowest cor-relation coefficients, both show one clear outlier (Appendix C, Figure 12). Both outliers refer to the same date: 24th of April, where both sources pick up a very strong signal, which is not as big in our data. A manual search for news 10

http://www.google.com/trends/

11_{https://dumps.wikimedia.org/other/pagecounts-raw/} 12

(10)

Table 4: Pearson correlation with other signals

Title Tweets -_Wikipedia Tweets -_{Google Trends}

Game of Thrones 0,82 0,89

The Flash 0,86 0,92

Arrow 0,72 0,91

Grey’s Anatomy 0,57 0,63

Daredevil 0,93 0,97

Orange is The New Black 0,98 0,99

Pretty Little Liars 0,75 0,90

The Vampire Diaries 0,63 0,88

Once Upon a Time 0,78 0,78

The Walking Dead 0,95 0,70

concerning the TV series on that particular day reveals the reason for the peaks found for Wikipedia page views and Google Trends: on the episode released the day before, one of the main characters dies, which lead to a big consterna-tion among followers of the series. This explains the fact that, while both other sources pick up this consternation and show peaks, our system picks up a weaker signal: al-though people are triggered to start browsing the internet or to communicate about the episode after watching it, peo-ple are less likely to immediately watch the same episode again. Because our system only measures the number of people watching the episode, it does not pick up the same strong signal.

5. DISCUSSION

When studying human behaviour through social media data, it is important to keep in mind that using social media as a source, introduces several challenges [17], which also af-fect this study. These challenges include a population bias, caused by the fact that the majority of social media users are young people living in urban areas. Moreover, the ac-quisition of data is limited to the tweets of users who choose to publish their messages publicly. Similarly, we are lim-ited to what people want to share with others. Looking at our dataset, we see that although some of the tweets are published automatically by media players, people manually publishing tweets can be selective in when to share what they are watching. For example, people might only want to share messages about watching new or popular titles. This could lead to a discrepancy between what can be measured and what people are really watching, and cause a tendency towards new and mainstream titles.

A main drawback of our research is the sparsity of data; although we have collected tweets over a period of almost 4 months, our dataset size is limited. Whereas a simi-lar system focussed on music [12] collects around 6 tweets per second, our systems finds less than 1 tweet every sec-ond. Although the characteristics of both domains obviously vary, we also see that studies focussing on measuring the ’buzz’ about movies on Twitter, collect substantially bigger datasets [5]. Because we focus on collecting tweets wherein people explicitly report on watching something right at a given moment, our system is not able to take full advantage of the vast amount of TV or movie-related tweets. Because of this sparsity of data, the number of tweets are too small to analyse when looking at small subsets of the data (e.g. comparing the hours of day for a country not occurring in

the top 10 countries).

Furthermore, the methods used to extract information out of tweets, can invoke a geographic bias. We use IMDb as a knowledge base to link tweets to. IMDb is mainly fo-cussed on western movies and TV series, and therefore it does not contain all titles originating from Asia, like Bolly-wood movies. Because of this, linking tweets from Asia to titles in IMDb is more difficult and error-prone than other tweets, resulting in a smaller number of tweets from coun-tries like India in the dataset. Another problem arises when using the location of tweets in the analysis. Here we are lim-ited by two factors: the number of users who have entered some kind of location text in their profiles and the accuracy of the geocoder used to convert this text to a longitude-latitude pair. Those two factors combined, results in 50 % of the tweets having a location. Therefore, it is important to note, that when analysing differences between countries or continents, we are limited to looking at this subset of the data.

6. CONCLUSION

In this paper we have described the analysis of watching behaviour of people watching movies and TV series. The way we watch television is changing. People increasingly tend to use their mobile devices while watching television, as a second screen. These screens are being used to gather additional information or to share thoughts on social media like Twitter. Using specific hashtags people share what they are watching. Collecting these tweets provides us with new possibilities to infer the watching habits of people.

The system described in this paper collects tweets contain-ing a number of specific hashtags, and aims at linkcontain-ing every incoming tweet to a title in IMDb. The evaluation of the title linking approach showed good results for both preci-sion and recall. Most errors are caused by tweets referring to media that are not present in IMDb, like sport matches. While this system enables various applications, like com-posing real-time charts and television ratings, this paper fo-cussed on using the data to analyse the watching habits of people. A dataset was constructed, consisting of all the col-lected tweets linked to IMDb titles for a period of 16 weeks. By enriching the data with location and timezone informa-tion, we have looked at several aspects of the watching habits of people.

Looking at the total number of tweets over time, we found peaks during evening hours for European and Asian coun-tries, suggesting that our system picks up a strong signal from both continents. The location information confirms this assumption; whereas we would have expected to see a majority of North American tweets, like similar systems do, we found that people from Europe and Asia are perhaps more willing to share what they are watching using Twit-ter. Furthermore, we find that the most popular moment to watch movies or TV series is between 21:00 and 22:00. No significant differences were found when comparing countries, suggesting that people from all over the world tend to live in a similar day rhythm.

When analysing the titles in the dataset, we find a tendency towards TV series over movies. The TV series ’Game of

(11)

Thrones’ is by far the most watched title in our collection. Combining the genres of all titles with number of views, we find that drama, comedy and action titles are the most popular. Furthermore, when comparing continents we see several differences in the genre distribution, indicating that the preferences in genres differ between continents. More-over, by combining lists of people having watched certain titles, we have seen that our system is able to find overlap-ping audiences.

Looking at what triggers people to start watching a movie or TV series, we find that releases of new episodes or movies proves to be a strong trigger: for example, we have seen a peak at Monday evenings due to the release of new episodes of ’Game of Thrones’. When comparing the data for the top ten titles with activity data from both Google Trends and Wikipedia, we find high correlation coefficients. Since both sources are known to be good indicators for the popularity of topics, we find that our system measures similar popularity scores.

Future work should focus on finding other ways to increase the number of tweets flowing through the system. This can be done by finding more hashtags or keywords used by peo-ple to share what they are watching. Collecting more in-formation enables various other applications of the system. While we were not able to zoom in on small subsets due to data sparsity, gathering more data enables more detailed analysis and pattern finding, which can be of interest for marketeers and advertisers.

Designing an interface to browse the data flowing through the system can provide an interesting and fun way for people to see what is popular at any moment in time, and might also help in coming up with new ideas about what to watch. Although the dataset is limited in size and the period of time it is covering, we have seen that it is possible to dis-cover overlapping audiences. This information can be used in the popular field of movie recommendation, where sim-ilarities between users and items are often used to suggest new titles to people. Although more data is needed for this task, incorporating the information about time and location, can be an interesting challenge.

7. ACKNOWLEDGEMENTS

The author would like to thank Isaac Sijaranamual for his feedback as supervisor. Furthermore, the author would like to thank Manos Tsagkias and Wouter Weerkamp for their ideas and feedback in the early stages of this study.

8. REFERENCES

[1] H. Kwak, C. Lee, H. Park, and S. Moon, “What is twitter, a social network or a news media?” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 591–600.

[2] S. Petrovi´c, M. Osborne, and V. Lavrenko, “Streaming first story detection with application to twitter,” in Human Language Technologies: The 2010 Annual Conference of the North American Chapter of the Association for Computational Linguistics.

Association for Computational Linguistics, 2010, pp. 181–189.

[3] J. Bollen, H. Mao, and X. Zeng, “Twitter mood predicts the stock market,” Journal of Computational Science, vol. 2, no. 1, pp. 1–8, 2011.

[4] T. Sakaki, M. Okazaki, and Y. Matsuo, “Earthquake shakes twitter users: real-time event detection by social sensors,” in Proceedings of the 19th international conference on World wide web. ACM, 2010, pp. 851–860.

[5] S. Asur, B. Huberman et al., “Predicting the future with social media,” in Web Intelligence and Intelligent Agent Technology (WI-IAT), 2010 IEEE/WIC/ACM International Conference on, vol. 1. IEEE, 2010, pp. 492–499.

[6] F. M. F. Wong, S. Sen, and M. Chiang, “Why watching movie tweets won’t tell the whole story?” in Proceedings of the 2012 ACM workshop on Workshop on online social networks. ACM, 2012, pp. 61–66. [7] A. Oghina, M. Breuss, M. Tsagkias, and M. de Rijke,

“Predicting imdb movie ratings using social media,” in Advances in information retrieval. Springer, 2012, pp. 503–507.

[8] S. Dooms, T. De Pessemier, and L. Martens, “Movietweetings: a movie rating dataset collected from twitter,” in Workshop on Crowdsourcing and human computation for recommender systems, CrowdRec at RecSys, vol. 2013, 2013, p. 43. [9] R. Deller, “Twittering on: Audience research and

participation using twitter,” Participations, vol. 8, no. 1, pp. 216–245, 2011.

[10] D. A. Shamma, L. Kennedy, and E. F. Churchill, “Tweet the debates: understanding community annotation of uncollected sources,” in Proceedings of the first SIGMM workshop on Social media. ACM, 2009, pp. 3–10.

[11] S. Wakamiya, R. Lee, and K. Sumiya, “Towards better tv viewing rates: exploiting crowd’s media life logs over twitter for tv rating,” in Proceedings of the 5th International Conference on Ubiquitous Information Management and Communication. ACM, 2011, p. 39. [12] W. Weerkamp, E. Tsagkias, and M. de Rijke, “Inside

the world’s playlist,” in CIKM 2013: 22nd ACM Conference on Information and Knowledge Management, 10/2013 2013.

[13] M. Schedl and D. Hauger, “Mining microblogs to infer music artist similarity and cultural listening patterns,” in Proceedings of the 21st international conference companion on World Wide Web. ACM, 2012, pp. 877–886.

[14] J. Kulshrestha, F. Kooti, A. Nikravesh, and P. K. Gummadi, “Geographic dissection of the twitter network.” in ICWSM, 2012.

[15] H. Choi and H. Varian, “Predicting the present with google trends,” Economic Record, vol. 88, no. s1, pp. 2–9, 2012.

[16] M. Ciglan and K. Nørv˚ag, “Wikipop: personalized event detection system based on wikipedia page view statistics,” in Proceedings of the 19th ACM

international conference on Information and

knowledge management. ACM, 2010, pp. 1931–1932. [17] D. Ruths and J. Pfeffer, “Social media for large studies

of behavior,” Science, vol. 346, no. 6213, pp. 1063–1064, 2014.

(12)

APPENDIX

A. GENRE DISTRIBUTIONS

Table 5: Genre distribution per continent

Genre Europe Asia North

America

South

America Africa Oceania Overall

Drama 22,8% 18,8% 20,0% 23,5% 20,4% 18,8% 21,3% Comedy 10,3% 11,2% 12,1% 9,1% 12,3% 13,9% 10,8% Thriller 8,0% 8,3% 8,1% 7,7% 8,3% 8,4% 8,1% Action 8,0% 8,8% 8,1% 8,1% 8,6% 7,6% 8,2% Adventure 7,1% 7,4% 6,7% 7,6% 6,3% 6,0% 7,1% Sci-Fi 6,7% 5,7% 6,9% 7,7% 5,4% 6,2% 6,6% Crime 6,8% 5,7% 6,6% 7,1% 6,9% 6,7% 6,6% Fantasy 6,0% 5,5% 5,2% 7,2% 4,4% 5,2% 5,8% Mystery 6,5% 4,4% 5,4% 7,0% 4,9% 6,4% 5,8% Romance 4,8% 7,0% 4,6% 4,3% 6,4% 5,2% 5,3% Horror 3,4% 3,5% 3,5% 3,7% 2,4% 3,8% 3,4% Animation 2,0% 2,3% 2,4% 2,3% 2,1% 2,4% 2,2% Family 1,5% 2,6% 1,9% 1,2% 1,8% 2,2% 1,8% Music 0,9% 1,5% 1,2% 0,6% 1,4% 1,1% 1,0% Biography 0,7% 1,1% 1,0% 0,4% 1,2% 0,7% 0,8% Reality-TV 0,7% 0,8% 1,1% 0,6% 1,0% 1,0% 0,8% Documentary 0,7% 0,9% 1,1% 0,3% 1,3% 1,1% 0,8% War 0,7% 0,8% 0,7% 0,5% 0,9% 0,6% 0,7% Sport 0,5% 0,7% 0,8% 0,1% 0,8% 0,5% 0,6% Short 0,5% 0,8% 0,5% 0,1% 1,1% 0,3% 0,5% History 0,5% 0,6% 0,5% 0,4% 0,7% 0,4% 0,5% Musical 0,4% 0,6% 0,5% 0,2% 0,5% 0,4% 0,4% Talk-Show 0,2% 0,4% 0,4% 0,1% 0,4% 0,4% 0,3% Game-Show 0,2% 0,4% 0,4% 0,2% 0,2% 0,5% 0,3% News 0,1% 0,2% 0,2% 0,0% 0,3% 0,2% 0,1% Western 0,1% 0,1% 0,1% 0,0% 0,1% 0,1% 0,1% Film-Noir 0,0% 0,0% 0,0% 0,0% 0,0% 0,0% 0,0% Adult 0,0% 0,0% 0,0% 0,0% 0,0% 0,0% 0,0%

(13)

B. OVERLAPPING AUDIENCES

Table 6: Overlapping audience: number of overlapping users divided by the total viewers of the column’s title

(Values higher than 25% are marked red)

Game of

Thrones Arrow The Flash Grey’s Anatomy Daredevil Orange is the New Black Pretty Little Liars The Vampire Diaries Once Upon a Time The Walking Dead Game of Thrones x 34,5% 33,1% 19,0% 35,6% 27,1% 18,3% 27,0% 33,4% 34,5% Arrow 13,6% x 47,4% 13,1% 26,2% 11,8% 13,2% 24,8% 29,1% 20,6% The Flash 13,8% 50,4% x 11,8% 30,2% 12,0% 13,5% 23,5% 29,9% 19,8% Grey’s Anatomy 6,1% 10,7% 9,1% x 6,2% 11,7% 14,2% 16,8% 17,6% 8,6% Daredevil 11,9% 22,3% 24,2% 6,4% x 12,1% 5,1% 9,8% 15,0% 15,9% Orange is the New Black 8,2% 9,1% 8,7% 11,0% 11,0% x 12,6% 10,6% 13,9% 9,6% Pretty Little Liars 7,0% 12,8% 12,3% 16,7% 5,8% 15,9% x 28,1% 25,7% 10,2% The Vampire Diaries 7,2% 16,8% 15,0% 13,9% 7,8% 9,4% 19,7% x 24,8% 11,0% Once Upon a Time 7,5% 16,7% 16,1% 12,3% 10,1% 10,3% 15,2% 20,9% x 10,1% The Walking Dead 11,2% 17,0% 15,3% 8,7% 15,4% 10,2% 8,7% 13,4% 14,5% x

(14)

C. CORRELATION WITH OTHER SIGNALS

(a) Google Trends (b) Wikipedia page views

Figure 11: Correlations with other signals for ’The Flash’

(a) Google Trends (b) Wikipedia page views