Predicting the Controversy Level of a Facebook News Post

(1)

Predicting the Controversy Level of a

Facebook News Post

A study involving Facebook Reactions, Controversy

and

Topic Modelling

Daphne Groot

Master thesis Information Science Rijksuniversiteit Groningen

Primary supervisor: Tommaso Caselli Secondary supervisor: Malvina Nissim Daphne Groot

s2723468 August 29, 2018

(2)

A B S T R A C T

Controversy detection is a subject that has been studied multiple times, but never on Dutch resources. This while the importance of controversy detection is clear; when it is known if something (whether a complete topic, a Facebook post or a news article), the necessary precautions can be made to prevent too much arguing and fights in the comment sections of a Facebook post for example.

This paper describes the research concerning controversy detection for Dutch re-sources using a distant supervised, entropy based approach, followingBasile et al. (2017). This entropy score is calculated by using the different reaction types Face-book provides (LIKE, LOVE, HAHA, WOW, SAD and ANGRY). Controversy detec-tion is not the only subject covered in this research, an attempt to topic modelling for the posts in the data set is also made.

The data sets used are posts from Reddit (with the subreddit ‘thenetherlands‘) which contain a link to certain news articles, and posts from Facebook pages of news sources. The news sources used in this research are NOS, RTL Nieuws, de Volkskrant, het Parool, NRC and de Telegraaf.

The topic modelling is done by using LDA in combination with a tf-idf approach. Using the tool MALLET (McCallum,2002) has also been tried, but that had worse results (Micro F1 scores of 0.332 and 0.116 respectively). Predicting the entropy score using the different reaction types already works quite well, with a MSE score of 0.272.

Not only the entropy score of Facebook posts is predicted, the total number of reactions and the number of reactions per type is also predicted, both with success (MSE total reactions = 0.211, MSE per reaction type < 0.033).

These results show still room for improvement (especially concerning the topic modelling and predicting the entropy score), but it also shows that using the correct features provides a very reliable system for predicting the number of reactions per type.

It also turned out that using an entropy based approach for detecting contro-versy on Reddit is not reliable. In order to obtain better, or any, results concerning this subject, a different approach should be used.

(3)

C O N T E N T S

Abstract i

Preface vi

1 Introduction 1

1.1 Research and sub questions . . . 3

1.2 Hypotheses . . . 3 2 Background 4 2.1 News sources . . . 4 2.2 Distant Supervision . . . 5 2.3 Controversy . . . 5 2.4 Topic modelling . . . 7 3 Data collection 9 3.1 Reddit-data . . . 10 3.2 Facebook-data . . . 11

3.3 Data set description . . . 14

3.3.1 Reddit data set . . . 14

3.3.2 Facebook data set . . . 15

4 Method 17 4.1 Reactions as proxies . . . 17 4.1.1 Calculating entropy . . . 17 4.2 Experimental settings . . . 18 4.2.1 Controversy on Reddit . . . 18 4.2.2 Controversy on Facebook . . . 19

Testing features for significance . . . 21

Predicting entropy . . . 22

4.3 Topic detection . . . 23

4.4 Assembly of experiments . . . 25

4.4.1 Predicting total reactions . . . 25

4.4.2 Predicting reactions volume . . . 25

4.4.3 Predicting entropy and controversial level . . . 26

4.4.4 Final Pipeline . . . 26

GUI Final Pipeline . . . 28

5 Results & Discussion 29 5.1 Analysing CAPOTE-model . . . 29

5.2 Experimental settings . . . 29

5.3 Relevant topic for controversy . . . 32

5.4 Assembly of experiments . . . 34

5.4.1 Topic prediction . . . 35

5.4.2 Total number of reactions . . . 35

5.4.3 Number of reactions per type . . . 35

5.4.4 Predicting entropy . . . 35

5.4.5 Final Pipeline . . . 35

(4)

CONTENTS iii 6 Conclusion 36 6.1 Limitations . . . 37 6.2 Future work . . . 37 Bibliography 40 Appendix 41

(5)

L I S T O F F I G U R E S

Figure 1 The different types of reactions Facebook uses . . . 2

Figure 1 Political distribution of parties . . . 9

Figure 2 Facebook header section 1_{: Message 2: Link/Headline} . . . 12

Figure 3 Facebook comment section 3_{: Reactions given 4: Comment 5: Reactions to} comment 6: Comments to comment. . . 12

Figure 1 Schematic representation final Pipeline . . . 27

Figure 2 GUI start screen . . . 28

Figure 3 GUI result screen . . . 28

Figure 4 GUI Facebook URL . . . 28

Figure 5 GUI article URL . . . 28

Figure 6 GUI text . . . 28

(6)

L I S T O F T A B L E S

Table 1 Data distribution Reddit . . . 10

Table 2 Example of downloaded Reddit data . . . 11

Table 3 Data distribution Facebook . . . 12

Table 4 Data distribution Facebook after scraping . . . 13

Table 5 Data distribution per month . . . 14

Table 6 Chi-squared results Reddit . . . 14

Table 7 Number of reactions per source . . . 15

Table 8 Average number of reactions per type . . . 15

Table 9 Distribution of reactions . . . 16

Table 1 Highest and lowest entropy scores using Reddit . . . 18

Table 2 Two posts with highest entropy scores with LIKE . . . 20

Table 3 Two posts with highest entropy scores without LIKE . . . 20

Table 4 Two posts with lowest entropy scores with LIKE . . . 20

Table 5 Two posts with lowest entropy scores without LIKE . . . 20

Table 6 Best model settings and features for predicting entropy . . . . 21

Table 7 Chi-squared results Facebook reactions . . . 22

Table 8 Chi-squared results Facebook comments . . . 22

Table 9 MALLET training files distribution . . . 23

Table 10 Complete list of found topics with the percentage of data covered . . . 24

Table 11 Best model settings and features for predicting Total Reactions 25 Table 12 Best model settings and features for predicting Total Reactions 26 Table 1 Average entropy, LIKE included . . . 30

Table 2 Average entropy, LIKE excluded . . . 30

Table 3 Top 5 results with different predictive features . . . 31

Table 4 Cross validated results on all data sets . . . 31

Table 5 Cross validated cross source results on complete data set . . . 32

Table 6 Ci-squared results topic . . . 32

Table 7 Examples significant topics . . . 33

Table 8 Reactions for Airplane/flight . . . 33

Table 9 Distribution of controversial posts per source . . . 33

Table 10 Distribution for each controversial topic per source . . . 34

Table 11 Micro F1 and MSE results Pipeline . . . 34

Table B1 Complete list of results with different indicators . . . 42

Table B2 Complete list of chi-square test results for each topic . . . 43

(7)

P R E F A C E

After seven months of hard work, I am very proud to present to you the final version of my Master Thesis. It was an exciting, educational, interesting, and at times a difficult process, which makes the result very rewarding.

In this section, I would like to take a moment to thank some people; without them I would not have been able to complete this thesis. First, my family, who always supported me, even trough the long and difficult days and irritated moods because of that. Secondly, my friends, who were always willing to help me when needed and giving advice on handling certain problems. I also would like to thank the Social Media Sensing team from the University of Groningen, who provided pointers on how to handle certain things. At last but definitely not least, I would like to give special thanks to my supervisor Tommaso Caselli for guiding me into the right direction and helping me when I got stuck, confused or in doubt.

I hope you will enjoy reading this thesis.

Daphne Groot Groningen August 29, 2018

(8)

1

_{I N T R O D U C T I O N}

Social media is not only used for posting pretty pictures and describing your day-to-day activities, it has also become a way for people to stay updated on the news. This can be done for example via Reddit where users create their own posts with a link to the news article, or by what news sources are posting on Facebook with regards to their articles. The opportunity for news sources to be able to post on social media also created the possibilities for users to comment on those posts and engage in discussions about the article or subject.

These given comments and reactions can then indicate what the general opinion of the users is towards the news. Most of those news posts on Facebook do receive multiple reactions, but there are some topics that simply cause more people to react and join the discussions. These discussions can sometimes cause users to come to a mutual understanding or agreement about the post, but for some posts this is not the case. When this happens, the post can be classified as controversial. More precise: “a controversy is a situation where, even after lengthy interactions, opinions of the involved participants tend to remain unchanged and become more and more polarised towards extreme values“ (Basile et al.(2017),Timmermans et al.(2017)). Examples of controversial topics could be vaccines, abortion or any political subject.

In this research, some terminology will be used, which will be clarified here first in order to clarify the usage of these terms, before moving on.

• News posts: The posts on a public Facebook page of a news source. These posts are purely informative by using a video, photo or a link to a news article • Post: A regular Facebook post, not connected to a news source

• Reactions: The different reactions (LIKE, LOVE, HAHA, WOW, SAD and ANGRY) a user can give to a Facebook post

• Comments: The (usually textual) messages users can give on a Facebook post. Users can also react to these comments with their own comments

• Entropy: They way in which the controversy score is calculated (more in this in Section4.1.1)

The aim of this research, is to create and describe a full pipeline that tries to predict if news posted on a Facebook page of a news source is controversial. This is done by using distant supervision and an entropy based approach inspired byBasile et al.(2017) (see Section2.3for more information) using the different reaction types on Facebook. The entropy score can be used as an indicator for controversy, which means that the reactions can be used as a proxy.

It was chosen to use the entropy score as a measurement for controversy, since it is an easily calculated number, and the outcome is clear to interpret. The way it is calculated also takes into account the polarisation of the different users (by using the number of reactions per type).

The different reactions from Facebook that are looked at are LIKE, LOVE, HAHA, WOW, SAD and ANGRY and they all have their own emoticon, which are visible in Figure1. Each reaction also has its own connotation: LOVE and HAHA have a positive connotation, SAD and ANGRY a negative connotation and LIKE and WOW can be used ambiguously.

Alongside that full pipeline to predict controversy, the topic of the articles will also be looked at to see which topics can be discovered and which of those topics

(9)

CHAPTER 1. INTRODUCTION 2

Figure 1: The different types of reactions Facebook uses

can be classified as being controversial. Eventually, everything will also be merged together in an interactive tool that predicts the values for a post given by the user.

By using the reaction system on Facebook as mentioned above, Facebook can be seen as the main resource for the data set. There is however also a secondary focus, which is on Reddit. Reddit is a lesser-known social media platform, where users can for example create a post that links to an article, and in the comments on the Reddit post they can discuss this article.

Using Reddit as a resource alongside Facebook is done to see if the reaction-based entropy method is also adaptable to other resources. Reddit is a completely different medium than Facebook, which is also visible in its user demographics (see Section 2.1 for more information on this). Because Reddit is so different, it might be interesting to see if the entropy score can be calculated on Reddit, and if so, if the articles that are controversial are the same as the ones from Facebook. Important is to stress that the experiment on Reddit only concerns predicting the entropy score, whereas the experiments on the data from Facebook include much more (e.g. predicting the number of reactions and creating a Final Pipeline).

Researching this subject and creating a Final Pipeline that predicts different as-pects of a Facebook news post, contributes to the already existing work in the way that it provides information on how far along controversy prediction is with respect to the Dutch language. The importance of detecting controversy, preferably even before news is posted, is that it can give the creator insight in how the news article will be received. If for example the creator finds out that his post will cause contro-versy, he can choose to maybe use a different wording or make sure that all sides of a subject are covered, so no reader will feel left out because his or her point of view is not represented.

The following chapters are included in this research: • Background (Chapter2)

• Data collection (Chapter3) • Methodology (Chapter4) • Results & Discussion (Chapter5) • Conclusion (Chapter6)

(10)

1.1 research and sub questions 3

1.1 research and sub questions

All this information leads to the research question that was the main goal of this research: “To what extent can we predict if news is controversial using reactions from a Facebook news post?“. In order to answer this research question, the following sub questions are also created:

1. Which topics are more prone to give rise to controversy on Dutch news pages?

2. Can we predict the reactions of a given online community (e.g. readers of a news paper) on a post?

3. Does a left-winged news source or right-winged news source focus more on controversial topics?

4. Can the created Pipeline for Facebook news posts be applies to other Facebook posts as well?

5. Can we predict controversy using entropy with different proxies on Reddit (so without the reaction system of Facebook)?

1.2 hypotheses

Regarding the different aspects of this research, some hypotheses have risen. These hypotheses are listed below:

1. Topics regarding political and religious subjects (e.g. elections, Islam) will be controversial

• Yasseri et al.(2014) find in their study on Wikipedia, that “pages related to political topics represent the largest group of contested pages“ and that this topic is followed by pages related to geographical/countries and religious topics. It is therefore expected to find the same in this research. 2. There will be more non-controversial posts than controversial ones

• This is expected, since the majority of the posts will not treat a controver-sial subject, and because not too many posts can be controvercontrover-sial in order to remain an attractive source for the users.

3. LIKE will be the most used reaction, and concerning the other reactions: the ones with a negative connotation (SAD, ANGRY) will be more frequently used than the ones with a positive connotation (LOVE, HAHA)

• According toThompson et al.(2017), events with a negative subjectivity occur almost twice as much as those with a positive subjectivity, and that words with a negative connotation are often used instead of neutral words in order to ‘sensationalise‘ the story. Due to this, one would expect that the overall reaction towards the news will also be more negative. 4. Controversial posts on Reddit will also be controversial posts on Facebook

(11)

2

_{B A C K G R O U N D}

In this chapter, previously done research will be discussed, and linked to the current research. There is a section about the news sources, distant supervision, controversy and topic modelling.

2.1 news sources

In this research, it was chosen to use news articles as (part of) the data set. This was done because news (both the news articles and the news posts on Facebook) were easily and publicly accessible and provided a lot of data to work with.

The news sources that were used in this research are NOS, RTL Nieuws, Het Pa-rool, De Volkskrant, NRC and De Telegraaf (more on the collection of these sources in Section3). These sources are chosen since they represent both more left-oriented political views (Volkskrant and Het Parool) and right-oriented political views (NRC, Telegraaf) (Vliegenthart et al. (2011), Brown (2011)). RTL Nieuws and NOS are included since they are generally perceived as more objective, although opinions on that do differ (Hilderink(2016),De Jesus Madureira Porfírio(2016), Unknown (2016),Geelen(2016)), but that is mostly a feeling people have, in general it cannot be proven.

Another reason to select different news sources, is to avoid the research being influence by the ‘filter bubble‘ effect. The filter bubble is a phenomenon that refers to the concept that a website’s personalisation algorithm selectively predicts the information that users will find of most interest based on data of each individual, and that this creates a form of isolation from a diversity of opinions (Seargeant and Tagg(2018),Pariser(2011)). By looking at different resources this filter bubble effect should be nonexistent, since the sources are of different political orientations (see Section3) which makes that different opinions of different users with different views are taken into account. If for example one user only reads NRC, and another de Telegraaf, a filter bubble may arise with how they look at the world and the current events based on how the news is brought. By taking both these sources into account, this should be resolved.

These news sources are all active on social media (Twitter and Facebook), and these sources are also all represented on Reddit. Since the sources use Twitter as a medium to profile themselves, to broadcast news and to use the data to make pre-dictions, for example in trying to predict the outcome of election results (Broersma and Graham(2012),Broersma and Graham(2013),Sang and Bos(2012)), it might be interesting to see if this prediction possibility is also present on Facebook and even Reddit.

Since the user demographics between Reddit and Facebook differ, comparing both sources can be interesting. The point on which the demographics differ is for example with the most prominent present age group (22-25 years old on Reddit1,2 and 40-64 years old on Facebook3

) and with the gender users identified themselves with (male on Reddit with 85.3%,4

and almost equally distributed in Facebook5 ). 1 https://www.reddit.com/r/thenetherlands/comments/7yqz5i/rthenetherlands_census_survey_ 2018_the_results/ 2 https://docs.google.com/forms/d/12Rmx_6nn86QY4FTZX21T0JiEOCKFD5qLeZCi5JPjou0/ viewanalytics 3 https://www.statista.com/statistics/579352/age-distribution-of-facebook-users-in-the-netherlands/ 4 see footnote2 5 https://www.statista.com/statistics/858024/number-of-facebook-users-in-the-netherlands-by-age-group-and-gender/ 4

(12)

2.2 distant supervision 5

The user demographics concerning Reddit are much easier to find, and are also better documented. This is because of the fact that Reddit conducts surveys once a year; for the 2018 survey, they had 4482 participants.6

On Facebook, these kind of surveys are not held, and the data Facebook has on its users concerning the demographics are not accessible, especially when trying to get the demographics about a specific country (e.g. the Netherlands).

2.2 distant supervision

In this research, the methodology is partly based on distant supervision (by using Facebook reactions as proxies for controversy). In order to be able to fully compre-hend the study, a definition of distant supervision and some works using distant supervision will be discussed.

When using a distantly supervised approach, one often makes use of already existing data, without the needed labels. From this data, reasonably safe proxies are needed in order to obtain training labels. This can then be used to automatically generate the training data.7

Using distant supervision, large amount (but possibly noisy) of data can easily be generated, but it will be silver data instead of gold.

One of the studies that uses distant supervision the study by Go et al. (2009) in which they created a novel approach for automatically classifying the sentiment of Twitter messages. They had training data that contained Twitter messages with emoticons, and using three different algorithms (Naive Bayes, Maximum Entropy and SVM) the accuracy that was obtained was higher than 80%. This shows that, at least for the task at hand, distant supervision using tweets with emotions is effective.

A second paper that is described is byMintz et al.(2009). Mintz et al. focused on relation extraction, using Freebase (a large semantic database containing relations). They trained a relation classifier by extracting textual features from sentences that contained the same Freebase relation. 10.000 instances of 102 relations could be extracted with a precision of 67.6%.

These studies show that distant supervision is a legitimate method to apply in research, as long as the training data is chosen with care. It is thus safe for this research to use the reactions as proxies in a distantly supervised method.

2.3 controversy

In this research the term controversy can, as stated in Section 1, be applied to a situation in which, after long discussions, opinions of the participants will re-main unchanged, and become more and more polarised towards extreme values (Basile et al.(2017),Timmermans et al.(2017)). In the field of controversy detection on the web, multiple research has already been done on different sources, with varying amounts of success. (Dori-Hacohen and Allan(2015),Lourentzou et al.(2015),Choi et al.(2010)).

Dori-Hacohen and Allan (2015) use meta data available from Wikipedia arti-cles in a weakly-supervised classification approach by using a nearest neighbour classifier. They find that their approach improves substantially from the baseline and that the results are comparable to work that used human annotations. The study byLourentzou et al.(2015) uses news article and Twitter data in order to find controversial points in news. They experimented with two different statistical mea-sures: ratio and entropy. They find in their experiments that both approaches can effectively discover controversial parts in news articles. The last study mentioned, 6

see footnote1 7

(13)

2.3 controversy 6

byChoi et al. (2010), tackles two problems: controversial issue detection and their subtopic extraction, both from news articles. Both are looked at on a more gram-matical level (looking at noun and verb phrases) and using sentiment. They state that their results show that the proposed method is reasonable as the first attempt for both problems.

The study ofZielinski et al.(2018) for example focused on Wikipedia articles in order to calculate the list of most controversial categories, since Wikipedia has a rich amount of meta data which can be used to obtain the most reliable results. Zielin-ski et al. used a classification based method for automatic detection of controversial articles and categories in Wikipedia. The three classifiers they use are AFT-based (where they computed the frequency of each of five available AFT ratings for every article in their data set), emotion polarity-based (where they calculated the frequen-cies of negative/neutral/positive words for each talk page/section/comment, and use these as input in the classifier) and the meta classifier from Kittur et al. (2007) which they used as a baseline. They were quite successful in their research, with an obtained F-score of 84.1. They also extended their, work showing that “evaluating controversy of Wikipedia content may also be used for identifying controversial queries in search engines“. They state that it worked quite well for most controversial queries, without stating too many false alarms.

When looking at studies that are more in line with the approach and data set used in this research,Basile et al.(2017) is the most important one. In their research, Basile et al. use Facebook reactions on news posts as proxies for predicting contro-versy using a distantly supervised approach. They use the reactions to calculate an entropy score (more in Section4.1.1). The higher the score, the more likely it is that the post is controversial.Basile et al.(2017) treated the detection of controversy thus as a gradual classification problem. The promising results from this study are just a few in a line of recently developed systems that were created in order to predict controversy on online resources (Dori-Hacohen and Allan(2015),Lourentzou et al. (2015), Choi et al. (2010); all have been described above). Another study that can be placed amongst those, is the one fromTimmermans et al.(2017), which will be highlighted here as well.

In their research, Timmermans et al. provide a model that is rooted in the find-ings of social science literature and linked to computational methods (Timmermans et al., 2017). They came up with their CAPOTE model, which stands for: Con-troversy is composed of Actors, Polarisation, Openness, Time-persistence and Emotions. The aspects from this model are discussed below:

• Actors: There must be many participating actors

• Polarisation: Viewpoints must be polarised and not uniform or scattered (par-ticipants should be grouped in two or more camps, with few people position-ing themselves in between)

• Openness: The controversy plays out in an open and public space (e.g. the web)

• Time-persistence: A controversy persists over longer stretches of time (typi-cally years or more)

• Emotions: Strong sentiments or emotions are expressed and are an important driver

According to Timmermans et al. (2017) in the beginning of their paper, some-thing can only be called a controversy if all five of these aspects are present. They do revise this point later in their research, when they did not find evidence that the Actors aspect contributes to the definition of controversy. What they state about how controversy is computed is used as an important factor in this research, since most of the factors they propose, are also looked at in this research (actors, polarisation,

(14)

2.4 topic modelling 7

openness and emotions). The factors looked at in this research are partially based on the findings of Timmermans et al., and it is interesting to see if their findings will match the results from the research in this paper.

When combining the research of Basile et al. (2017) and the CAPOTE-model fromTimmermans et al.(2017), the usage of the Facebook reactions can mainly be placed under the Emotions aspect, since the reactions do present different emotions, whether they can be ambiguous or not. The reactions are also an indication of the Openness aspect, since the reactions would not be able to be given if the post was not available to all different kinds of people. The Polarisation aspect is not completely met, since it is not necessary in the research ofBasile et al.(2017) that the reactions only fall in opposing categories. Something can also have a high entropy score (and therefore be marked a controversial) if for example LIKE and LOVE only have high numbers of reactions, which are not opposing per se.

Finally,Garimella et al. (2018) also studied controversy and focused on Twitter for their data set. They state that the difference in their research as opposed to previously done work, is that they do not focus on particular subjects (e.g. political issues) and are not centred around long-lasting major events (e.g. elections). In their paper, Garimella et al. state that they tended to overcome these limitations by developing a framework to identify controversy regarding topics in any domain and without prior domain-specific knowledge about the topics in question (Garimella et al.,2018). They created a pipeline that measures controversy using graphs, and the hash tags that are widely used on Twitter. Although their results are good, they also state that the work they looked at using sentiment analysis is also promising. Hence the decision to use the Facebook reactions, since they represent

emotions/sentiment.

When looking at controversy in Facebook news posts, it is important to be able to detect this, before a discussion can get out of hand en trigger or give rise to toxic conversation or even hate-speech. For example Anderson et al. (2016) looked at the effects of uncivil comments proximate to a blog post containing hard news content on individuals’ perceptions of media using an online experiment. In this experiment, par-ticipants read a neutral blog post and within each issue, parpar-ticipants were given one of eight manipulations. Each of these showed a different version of the comment section of the blog post. After reading, participants had to answer some post-test questionsAnderson et al.(2016) found that participants who saw uncivil comments perceived more bias in the news blog post compared to those who saw civil com-ments.

This shows that the comments on a news article are an important factor in how the post is looked at by news readers. Thus, when a post is controversial and therefore causes the comments to be uncivil, more bias in the news will arise. This can then lead towards a downward spiral.

2.4 topic modelling

Another important part in this research is topic detection. The topics in this research are used as proxies for events. Because there was a lack of resources and training data in Dutch for event extraction, it was chosen to instead focus more on the topics and let them represent event types (followingRitter et al.(2012)).

The topic modelling is done in order to be able to study controversy in a more structured and clearer way, since then only the topic of for example a post can be looked at to see if it could be controversial.

For a definition of topic, the definition from MALLET is used: A topic is a semantic representation of the content of a post (e.g. animals, politics, etc.), it consists of a cluster of words that frequently occur together (McCallum,2002).

(15)

2.4 topic modelling 8

In this research, latent Dirichlet allocation (LDA) is used, which is described in Blei et al.(2003). They state that LDA “is a three-level hierarchical Bayesian model, in which each item of a collection is modelled as a finite mixture over an underlying set of topics. Each topic is, in turn, modelled as an infinite mixture over an underlying set of topic probabilities“.

Since topics are used as proxies for events in this research, some of the method-ology of the topic modelling is based on event detection as well.

Ritter et al.(2012) use an approach that is very similar, LinkLDA. They used this with respect to Twitter, and therefore on an open-domain. In advance, Ritter et al. did not know which events would be recognised, which is also the case with the research in this paper. With they LinkLDA approach, Ritter et al.(2012) found 37 distinct labels that represent events in their data set, and from the list they propose it became apparent that they used topics as proxies for events as well. Ritter et al. (2012) also added an extra label called OTHER, which represents non-significant events nor general interest. Following this, the usage of OTHER is also applied in the research discussed in this paper.

Ritter et al.(2012) were not the only ones that tried to detect events on Twitter. van Noord et al.(2016) designed a system that assigns topical labels to automatically detected events from the Twitter stream. Van Noord et al. based their method on the one from Ritter et al., and applied it on Dutch resources. The results from both studies are comparable, which implies that the open domain method proposed is not language dependent.

Another approach is proposed byHuang et al.(2016), which uses Liberal Event Extraction, which they claim is “a new paradigm to take humans out of the loop and enable systems to extract events in a more liberal fashion“. In their research, Huang et al. do not address topics, but events: they treat their events as topics. Their system performs well, especially since they human involvement is non existent. Huang et al.(2016) use multiple tools in order to be able to exclude human involvement, such as FrameNet, VerbNet, a word sense disambiguation system, OntoNotes, etc. The system is thus primarily focused on English data sets, and is not suited for usage on other languages, which do not have this broad range of resources (e.g. for Dutch). Their approach is interesting, and if for example Dutch resources are created that makes this kind of work possible, it would be a good way to extend the topic modelling and change it into event detection.

Due to this lack of resources, this research focused on using LDA, which is used by a tool called MALLET (McCallum,2002). MALLET was written by Andrew Mc-Callum with contributions from people connected to the university of Pennsylvania and it can be used for statistical natural language processing, document classifi-cation, clustering, information extraction, topic modelling and more. The topic modelling toolkit “contains efficient, sampling-based implementations of LDA, Pachinko Allocation, and Hierarchical LDA. A topic model can be build from training docu-ments, and than later on, the found topics can be inferred from a new test data set.

(16)

3

_{D A T A C O L L E C T I O N}

This chapter contains how the used data was collected and the description of the data (including some descriptive statistics performed of the data). Two data sources were used in this research: Facebook and Reddit. Both are open source, meaning that everyone can at least read the posts and comments (for actually replying, a account is needed), and therefore easily accessible for all different types of people (see Section2.1for more information about the differences in demographics).

Facebook is chosen because it is a well known platform which makes that there are a lot of different users, and therefore also a lot of Facebook news posts in order to meet the information needs of those users. It is, or was at least (more on this in Section6.1) also an easily accessible data source.

Alongside Facebook data being used in this research, Reddit data is used as well. This is done to see if the developed entropy-based method is suitable for multiple resources. Reddit is chosen since the data is also easily available and because it is a platform with a broad spectrum of users. It might thus be interesting to see if the Reddit data and the obtained results can be compared to those from Facebook.

By looking at both sources, it might be interesting to see if the sources can be compared to each other concerning controversial news.

This research used downloaded data, in the form of comments and reactions, from both Reddit and Facebook. All of the data belongs to one of the following Dutch news sources: Het Parool, RTL Nieuws, NOS, De Volkskrant, De Telegraaf or Het NRC. Most of these sources can be placed on a political scale from more left oriented to more right, as shown in Figure11,2,3,.4 NOS is not included in this overview, since it is the national state broadcaster. It is thus founded by the govern-ment, and the political orientation is therefore bound to be in the same direction as the political orientation of the government. RTL Nieuws is also not included, because it can be seen as one of the more objective news sources in the Nether-lands. Since it is a commercial broadcaster, it is not as heavily influenced by the government as NOS, and it has the reputation to be more objective than the other sources.

The split for these sources based on the political orientation is made, since it can be interesting to see which sources focus more on controversial topics, right-oriented news sources (NRC, Telegraaf) or left-right-oriented news sources (Volkskrant, Parool).

Figure 1: Political distribution of parties

Note: the placement of these sources is an estimation

For all sources, the downloaded data was published within a time frame of six months: from August 19th, 2017 until February 19th 2018. These dates were chosen, since downloading the data started shortly after February 19th, so posts from that 1 http://www.denieuwereporter.nl/2008/07/eerder-een-rechtse-dan-een-linkse-kerk-aan-het-binnenhof/ 2 https://www.reddit.com/r/thenetherlands/comments/4nac37/wat_zijn_nou_prima_voorbeelden_ van_linkse_en/ 3 https://www.nieuwskoerier.nl/news/9339-indeling-kranten-naar-rechts-en-links 4 http://verschillen-tussen.nl/verschil-tussen-kranten/ 9

(17)

3.1 reddit-data 10

day would not gain any more reactions or comments; and six months back in time was deemed to contain enough data.

3.1 reddit-data

For downloading the Reddit data, the Reddit API PRAW5

was used. To download the data, a Dutch subreddit was used. A subreddit can be seen as a community within the complete Reddit community. It contains only posts and data regarding the specific subreddit (e.g., the subreddit ‘Dogs‘ only contains posts concerning dogs). The particular subreddit that was used to download data from, was the subreddit ‘thenetherlands‘. All posts on this subreddit which belong to the selected news sources, and were in the designated time fame, were downloaded.

From these posts were the title, article URL, Reddit URL, date, score (the product from up- and down votes on a post) and a controversial-tag (yes or no) downloaded. All comments and comments on comments were also collected per post. The up-and down voting that makes up the score is a system designed by Reddit with which users can show if they like or dislike the post or comment (up voting and down voting respectively), the score is then made from extracting the down votes from the up votes. The higher the score, the more users agree. From these up-and down votes, the controversy tag is also created; a post or comment is classified controversial if the number of up- and down votes lie close together and there is a lot of activity6,

.7

This construction of controversy does more or less corresponds with the definition used in this research: it shows that opinions are polarised and remain unchanged, but it does not necessarily have to be after lengthy discussions since the time frame is not taken into account.

In the script for downloading, the .search() function from PRAW was used to make sure the posts were ordered to ‘new‘, which ensures that the posts were downloaded from ‘recent‘ to ‘older‘. The controversial tag that was used, was ob-tained from Reddit’s own .controversial() function, which returns a list of all controversial posts IDs for the selected source. If a post ID is in the generated list, the post is (according to Reddit) controversial.

An overview of the number of downloaded posts can be seen in Table1. It shows that according to the controversial tag provided by Reddit, Het Parool has the most controversial posts (29.17%), whereas the NOS has the least (1.49%). Especially the number of NOS is expected, since it should be one of the more objective and least-controversial-causing sources as it is the national state broadcaster. The low percentage of controversial posts confirms this image.

Also important to note, is the total amount of controversial posts. Out of the 1085downloaded posts, only 8.76% is marked as being controversial.

Het Parool RTL Nieuws NOS Volkskrant Telegraaf NRC Total Number of posts: 48 58 471 273 48 187 1085 Controversial: 14(29.2%) 13(22.4%) 7(1.5%) 31(11.4%) 11(22.9%) 19(10.2%) 95(8.8%)

Table 1: Data distribution Reddit

Using the comments on a posts, multiple features were also obtained. The av-erage depth (how many comments on comments the post has on avav-erage) is calcu-lated, the number of direct comments, the number of total comments, the number of unique users, the obtained score of the original post and the ‘duration‘ of the post in seconds (how much time there is between the first comment (direct or indirect) 5 https://praw.readthedocs.io 6 https://www.reddit.com/r/explainlikeimfive/comments/1rqjwp/eli5_what_does_top_new_hot_ controversial_old_mean/ 7 https://www.reddit.com/r/announcements/comments/293oqs/new_reddit_features_controversial_ indicator_for/

(18)

3.2 facebook-data 11

and the last). The average depth is how many comments on comments there are, how ‘deep‘ the thread goes. The score is calculated by Reddit, and is a result of the number of up- and down votes. An example of how this data looks like, can be found in Table2. Post Number Average Depth Number of Comments Number of Direct Comments Number of

Unique Users Duration Controversial

1 1.512 43 13 30 125568 False

2 1.429 7 2 5 289200 False

Table 2: Example of downloaded Reddit data

3.2 facebook-data

The Facebook data was downloaded using the provided API by Facebook, using the PHP SDK.8

Most of the data collection was done before Facebook put most of their restraints on their API (e.g. getting page access, being able to download more posts in a small time frame), which means that for the majority of the collection, no major problems arose.

For downloading, the time frame was divided into six parts all with the length of one month (e.g. August 19th - September 19th). The data collection was split in six separate parts, to prevent major setbacks when an error occurred in downloading the data. By splitting the parts to be downloaded, not as much was lost when a, seemingly random, error occurred and the program for downloading the data had to be restarted.

The information that was downloaded per post was the message, the URL of the article where the post links to, the post identification number, the date, the total number of reactions and the total number of comments. After this information was downloaded, the post ID is used to download the number of reactions for each type (LIKE, LOVE, WOW, HAHA, SAD and ANGRY). Facebook introduced these reaction types on February 24th in 2016, to accommodate to the users who wanted to be able to express themselves better, only LIKE was not enough.9

This turned out to be a great success: these reactions were used over 300 billion times in their first year of existence.10

As a seventh reaction, THANKFUL was also downloaded, but since this is a reaction that could only be given on special occasions,11

it was not taken into account in any further calculations. The post ID was also used to download the comments and comments on comments, plus the reaction on each comment. An example of a Facebook post and what was downloaded, can be seen in Figures2 and3. These images show the message (corresponding to number 1), which is provided by the one that places the post on Facebook, the link to which the post refers (number 2), the (total) number of reactions given (number 3), a comment (number 4), the reactions that are given to a comment (number 5) and the comments on a comment (number 6).

Out of all the posts on the Facebook pages of the news sources, only the posts that had over 30 reactions in total were downloaded.

After all posts were downloaded, the linked URLs were checked to see if they send the user to an actual article, or if they were Facebook URLs (meaning they contained a Facebook photo or video). All posts with an article URL were saved and the Facebook URLs were discarded, since this research aims to predict contro-versy by means of article text, and not the contents of a photo or video. The posts 8 https://developers.facebook.com/docs/reference/php/ 9 https://newsroom.fb.com/news/2016/02/reactions-now-available-globally/ 10 https://blog.hootsuite.com/how-facebook-reactions-impact-the-feed/ 11 https://www.facebook.com/help/community/question/?id=10207656192377271

(19)

Figure 2: Facebook header section

1_{: Message} 2_{: Link/Headline}

Figure 3: Facebook comment section

3_{: Reactions given} 4_{: Comment}

5_{: Reactions to comment} 6_{: Comments to comment}

containing those do not provide useful information for this research and can thus be omitted.

A overview of the number of posts downloaded can be found in Table 3. In this overview, it can be seen that especially NOS uses a lot of Facebook videos or photos in their posts, whereas NRC only uses a few. The total number of discarded posts in this stage is 1.544, which is unfortunately quite a big part of the initially downloaded data (roughly 43%).

Het Parool RTL Nieuws NOS Volkskrant Telegraaf NRC Total

Number of posts: 437 454 833 669 846 341 3580

Number of posts without

Facebook videos/photos 377 99 256 416 566 322 2036

Table 3: Data distribution Facebook

From these posts, the article text was scraped. From this text, the article title and summary are also extracted. This is done because the complete article text (the wording and the textual set-up) are hypothesised to have influence on the reactions given and thus on the corresponding controversial tag (more on this in Section4).

For each article URL obtained from the downloaded Facebook posts, the cor-responding article was also scraped to obtain the full article text (body) and the summary (first paragraph of the article). Scraping the sites was done using Beau-tifulSoup.12

Unfortunately, due to different HTML-coding of the pages, a different scraper had to be build for each source. Since not all article URLs linked to a cor-rect and ‘scrapable‘-web page, some posts/articles were discarded. Examples of pages that were discarded, are liveblogs and certain ‘specials‘ which had a different build-up.

The final number of posts that is remained, so posts with over 30 reactions and scraped data, can be found in Table4. This table shows that from the posts that do contain a non-Facebook URL, 83 posts were still discarded because of the HTML build-up of the pages linked to the URL. The source from which most posts had to be discarded (relatively speaking) is Volkskrant; over 7% of the posts did not have an article link that could be scraped. Het Parool on the other hand had only a small amount of pages that could not be scraped (1.4% of its total number). This shows that Het Parool is most consistent on its web pages with the build-up and lay-out.

While reshaping and merging the data set multiple times, some data was also lost. One of the reasons being for example with mapping the topic (more in Section 4.3) to the entropy scores. The articles that did not have a match between the two 12

(20)

data sets (one with and one without the topic number assigned), were discarded. The final number of posts that remained and used in for example the final Pipeline are also displayed in Table4. This table shows that after the reshaping, Volkskrant lost the highest number of posts, whereas Het Parool lost the least. In total, 4.81% of the posts was removed after reshaping.

Het Parool RTL Nieuws NOS Volkskrant Telegraaf NRC Total Number of posts without

Facebook videos/photos and scraped

345 95 249 410 539 315 1953 Number of posts after

reshaping 340 90 240 380 500 309 1859 Percentage of posts lost 1.45% 5.26% 3.61% 7.32% 7.24% 1.90% 4.81%

Table 4: Data distribution Facebook after scraping

Before computing additional information in this research, the following data was downloaded per Facebook post:

• Message • URL • Post ID • Date

• Number of reactions

– Total number of reactions – Number of LIKE-reactions – Number of LOVE-reactions – Number of HAHA-reactions – Number of WOW-reactions – Number of SAD-reactions – Number of ANGRY-reactions • Number of comments

– Total number of comments – Number of direct comments – Number of indirect comments

A remark should be made however, concerning the number of downloaded posts per month. Table5 shows the number of posts downloaded per month (be-fore reshaping the data), the total posts downloaded per source and the number of posts that are downloaded and have over 30 reactions in total (these numbers corre-spond to the ones in Table3). From this table, it can be seen that for some reason, considerably less posts are downloaded in the first few months as compared to the last two (the last month has in total six times more posts than the first month). To check if this was correct or if there was a problem with the created coding, down-loading the data was redone multiple times, and also tried with a different system. Unfortunately, the outcome stayed the same. This means that the low number of of downloaded posts in the first few months must thus be caused by something in the API of Facebook. There are still enough posts to work with, but it should be taken into account.

Table 5 also shows that De Telegraaf had the most post both before and after filtering, NRC lost the most posts (closely followed by Het Parool) and NOS lost the least posts.

(21)

3.3 data set description 14

Het Praool RTL Nieuws NOS Volkskrant De Telegraaf NRC Total

Month 1 59 110 50 51 71 37 378 month 2 21 42 68 49 31 42 253 Month 3 75 30 57 36 58 37 293 Month 4 79 90 40 45 70 47 371 Month 5 219 57 232 216 222 216 1.162 Month 6 472 149 402 371 502 452 2.348 Total (before filtering) 925 478 849 768 954 831 4.805 Total (over 30 reactions and all fields) 437 454 833 669 846 341 3.580 Number of posts lost 488 24 16 99 108 490 1.225

Table 5: Data distribution per month

3.3 data set description

Before performing the actual experiments, it is important to get more insight in the data. This is done by performing some chi-squared tests on Reddit and looking more closely at some aspects of the data of Facebook.

The chi-square test on Reddit was done in order to determine whether a feature can be used as an indicator for controversy given by Reddit. Most of the descriptive statistics were measured by performing a chi-squared test in R. The relative distri-bution for each reaction type on Facebook is also calculated. This is done in order to see which reactions are used most, and if this distribution could already be an indication of which reactions can be used as a predictor for controversy.

3.3.1 Reddit data set

For Reddit, the chi-squared test was used to see if there were significant connections between the features (average depth, number of direct comments, number of total comments, number of unique users, score of the post and the duration). Each of these features was tested against the obtained controversial-label from Reddit. If a feature comes up as being significant with an alpha level of 0.05, it contributes for detecting controversy.

Table6shows the chi-squared results from the different Reddit components and the controversial label assigned by Reddit.

It can be seen that there are only two features that are significant, and thus indi-cators for controversy. These two features are the average depth and the score. The average depth is interesting, since it shows that when more discussion is happen-ing (which could be the reason for the depth), the post is more controversial. This supports the The fact that the score also indicates controversy is not very surprising, nor is it informative, since Reddit uses the up- ad down votes partly to determine if a post is controversial.

Average depth Number of direct comments

Number of comments

Number of

unique users Score Duration X-squared 751.44 29.945 186.76 105.83 412.86 1021.1 Degrees of freedom 649 41 178 105 223 1005 P-value 0.00322 0.8991 0.3112 0.459 0.000 0.3548

Significant * *

(22)

3.3.2 Facebook data set

After the final number of posts is known (in total and per source), the total number of reactions per source can be measured, which is displayed in Table7, and from that the average amount of reactions per type (Table8).

Het Parool RTL Nieuws NOS Volkskrant Telegraaf NRC Total LIKE 39.268 41.528 135.150 78.157 88.377 30.838 413.318 LOVE 4.374 6.736 17.617 6.820 12.193 2.370 50.110 HAHA 1.435 4.724 15.444 4.281 19.363 1.156 46.403 WOW 1.490 3.088 10.747 5.265 14.800 1.605 36.995 SAD 2.527 3.649 14.722 2.616 18.043 1.392 42.949 ANGRY 1.663 3.251 14.695 2.938 19.947 914 43.408 Total 50.757 62.976 208.375 100.077 172.723 38.275 633.183 AVG total reactions per post 149 700 868 263 345 124 341

Table 7: Number of reactions per source

Table7shows that NOS is the source with the most absolute number of reactions, and could therefore be considered the most popular source amongst them. The least popular source in that case is then NRC. The difference between these most and least popular sources is quite big: NOS has 170.100 more reactions than NRC has. When looking at the division with the sources, it can easily be seen that for all sources LIKE still is the most used reaction, with for most sources LOVE in second place. The only exception is for De Telegraaf, which has ANGRY as its second most used reaction type. De Telegraaf is in any case the source with the widest division between the given reaction types; for none of the other sources the number of reactions is as evenly distributed as for De Telegraaf. When looking at the average number of reactions per post, it turns out that the NOS received by far the most reactions (868). RTL Nieuws also has a lot (700), whereas all other sources have significantly less reactions per post. This could maybe be due to the fact that both NOS and RTL Nieuws are seen as the state broadcasters, and have therefore the biggest audience.

LIKE LOVE HAHA WOW SAD ANGRY

AVG 224.218 28.068 25.058 19.696 23.156 24.695 Table 8: Average number of reactions per type

The average number of reactions per type from Table8give insight in how often each reaction type is used on average. As expected LIKE is mostly used by a large margin, and LOVE comes in second place. When not taking LIKE into account, the reactions with a positive connotation (LOVE and HAHA) are used more often than the reactions with a negative connotation (SAD and ANGRY), but most striking is the fact that all these reactions are more often used than WOW, which has an unclear connotation.

WOW is the kind of reaction that can have multiple meanings, and the fact that it is notably less used can indicate that users also have trouble with deciding how and/or when to use WOW as a reaction.

Finally, Table 9 shows the relative distributions per reaction type for Facebook. Here, it can be seen that, as expected, LIKE is the most used reaction. This could also mean that LIKE will not be an indicator for controversy, since it is so widely used. All the other reactions are more evenly distributed, and could thus all be indicative of controversy. This corresponds to the results found in the chi-squared test.

(23)

LIKE LOVE HAHA WOW SAD ANGRY

65.28% 7.91% 7.33% 5.84% 6.78% 6.86%

Table 9: Distribution of reactions

There was also an attempt to see which articles have an overlap between Reddit and Facebook. For these posts, the data was merged together and the controversial label from Reddit was used. Unfortunately, there were only 63 posts that occurred in both Reddit and Facebook, and only one of those was marked as controversial (with the title ‘NRC checkt: "In België zijn op één bedrijf na alle grote multinationals weg"‘, which translates to ‘NRC checks: "All big multinational companies have left Belgium, except one"‘). Due to this small number of (controversial) posts in both sources, the results obtained by the performed chi-squared test are not very reliable, and thus discarded.

Since the number of posts that is in both Reddit and Facebook is too small to perform any calculations on, only Facebook data is used for further experiments. A solution for the needed controversial tag must be found, since the computed tag from Reddit cannot be used. Because of this, the entropy score is used as a measurement for controversy. Since the provided tag by Reddit only covers a small portion of the data (95 posts for all Reddit data, and only 1 post for the combination of Reddit and Facebook data), the results would be very unreliable. A new way of determining the controversial tag was thus needed, for which the entropy score was used.

(24)

4

_{M E T H O D}

In this chapter, all components of the methodology used in this research will be treated and explained. It starts with explaining how the Facebook reactions are used in this research, followed by the experimental settings which were used in order to follow the research ofBasile et al.(2017). Then the entropy scores of Reddit are treated (to see if the entropy score with different proxies can be predicted) , followed by the different components of the Final Pipeline (predicting the topic, predicting the total reactions, predicting the number of reactions per type and predicting the entropy score and controversial level).

For all sections where a training and testing data set is used, the (final) complete data set from Section 3 is split in an training set of 80% (1487 posts) and a test set of 20% (372 posts), and randomly filled. This was done to prevent the chance that the sources were split between, instead of distributed over, the training and testing data set. Because some sources had more downloaded posts and reactions than others (see Section3.2, Table4and Table7), this randomisation should provide more reliable results.

The code of the Final Pipeline is visible in a publicly accessible GitHub reposi-tory here: https://github.com/DaphneGroot/MasterThesis.

4.1 reactions as proxies

From previous research it is confirmed that the Facebook reactions can be used to calculate entropy scores and thus for finding controversy. More on this can also be read about in Section4.2.2.

Using the reactions as a proxy for controversy, it is possible to come up with a reliable entropy score for each post, and thus also with a controversy-tag.

LIKE will be included in the calculations for the entropy scores, since it is the most frequently given reaction type, and since the reasons for using it are quite am-biguous (it could be because you actually liked it, to give ‘respect‘, to acknowledge something, etc.) I believe it is not right to completely abandon the LIKE reaction in order to predict controversy.

4.1.1 Calculating entropy

An important part of this research concerns the entropy scores that will be calcu-lated from the different Facebook reactions (see Section 4.2), from these entropy scores the level of controversy was calculated. The entropy score was chose, be-cause it expresses the division of emotions different users have towards a Facebook news post. These emotions are expressed by using the reaction system on Facebook.

In order to understand this entropy score better, it will be shortly explained here first.

The entropy score can be considered a measure of uncertainty of probability distribution (Wang(2008),Shannon(2001)), and is calculated in this research using SciPy’s entropy function.1

It takes the given sequence and multiplies it by the logarithmic outcome of that sequences. From there, the sum is taken and turned into a negative. The sequence that was given, was the number of reactions per type (LIKE, LOVE, HAHA, WOW, SAD and ANGRY) and since the reaction types are 1

https://docs.scipy.org/doc/scipy/reference/generated/scipy.stats.entropy.html

(25)

4.2 experimental settings 18

a good indication of controversy (as also mentioned in Section 4.1), the sequence results in an entropy score that can be used as a measurement for controversy.

4.2 experimental settings

This section will provide the information about the settings of the different exper-iments that were conducted. There was one experiment performed on the data from Reddit (calculating the entropy scores), and four on the data from Facebook (calculating the entropy score, predicting the topic, predicting the total number of reactions, and predicting the number of reactions per type).

The goal of the experiment on the Reddit data was to see if it was possible to use the entropy based approach inspired from Facebook can be used on a different platform. The goal of the experiments on the Facebook data was more directed to-wards creating a final Pipeline, in which all the separate experiments come together in order to predict the controversial level of a post and how the post will be received (by providing a prediction of the number of reactions).

4.2.1 Controversy on Reddit

When trying to predict controversy on Reddit, the entropy score is used as well, but with a different sequence.

Since the number of posts that are in both the Reddit data set and the Facebook data set was so low, a separate system for finding controversy on Reddit had to be created. Because the user population on Reddit probably is different than the user population on Facebook, it might be interesting to see on what points the results would differ concerning the controversial posts or even topics.

The features that are used are:

• Average depth of the interaction belonging to the post • Total number of comments

• The number of direct comments • The number of unique users • The duration in seconds

The two highest and lowest scoring articles can be found in Table1

Average depth Total comments Total direct comments Unique

users Duration Entropy Highest

Politie deed zich voor als motorclub

om moord op te lossen 0.0 1 1 1 0.0 0.683 AD-baas: als je oliebollen bakt zoals

je praat, snap ik die laatste plaats 0.0 1 1 1 0.0 0.683 Lowest

Donorwet in de Eerste Kamer:

spannend tot het laatst 0.0 0 0 0 0.0 0.0 Met PSD2 weet niet alleen uw bank

wat u uitgeeft 0.0 0 0 0 0 0.0

Table 1: Highest and lowest entropy scores using Reddit

The first thing that stands out from these entropy scores from Reddit, is the fact that all individual numbers are very low, even when the entropy score is not. This already indicates that using the features of Reddit to calculate the entropy score, and thus predict controversy, is not reliable at all. When looking closer to the sub-ject of the posts, they match what was stated before, it is not a good indication for

(26)

controversy. The highest scoring posts for example, are about the police going un-dercover at a motor club to solve a murder, and the annual ‘oliebollen test‘ (a contest for deep fried doughnut balls) from Het Algemeen Dagblad. Both of these are not subjects where the public argues over and that are likely to give rise to controversy. The lowest scoring entropy scores on the other hand, talk about the ‘donorwet‘ (the organ donor law) and the ‘Payment Services Directive‘ (about payments within the European Union). With these two, one would expect more discussion and arguing, and therefore the posts being more controversial.

So, using the features from Reddit to calculate entropy scores and determine if a post is controversial is appears to be unreliable and calls for the identification of better factors. To be able to identify is a post is controversial on Reddit, a completely different system should be created. This corresponds with the findings in Section 3.3.1, where was found that only the average depth is a significant indicator for controversy.

This also means that the results cannot be compared to the entropy results ob-tained from Facebook, and a comparison between user population is not possible, at least not by using these entropy scores.

4.2.2 Controversy on Facebook

On Facebook, the entropy score was first calculated for each post using the different reaction types, to get more insight in the data and if the entropy score indeed provides a correct indication of controversy for Dutch news posts from Facebook. This was done by both including and excluding the LIKE reaction. Since LIKE has an ambiguous meaning (see Section 4.1), it could be interesting to see what the different results would be with and without LIKE. Examples of the highest and lowest scoring posts (with LIKE both in- and excluded) can be found in Tables2to 5respectively.

For completeness, these entropy scores were also calculated for posts that have over or equal to thirty reactions when LIKE is excluded, since LIKE is often the most often assigned reaction. These results are displayed in Table 3 and Table 5. From these entropy scores, the average entropy was calculated, and this score was then used to determine if something is controversial. Everything above or equal to this average score is seen as ‘controversial‘ and everything below the average score as ‘not-controversial‘. This entropy score was calculated before the last bunch of articles was discarded, as explained in Section3.2, since assigning the topics (which causes some posts to be discarded) was done after calculating the entropy scores.

When looking at the obtained results, the post with the highest entropy scores (when LIKE is included) talks about the television channel from the public broad-caster and some of its popular shows being in danger of being cancelled, and about the Dutch people who don’t know how to save money. Especially the first post could cause some controversy, since there is a clear division about the shows that are in danger. The second post, about money, is also plausible, since money issues in general can cause controversy as well. When looking at the lowest scores, it shows that they are not controversial, since they are simply not very popular, and people don not react as much to these posts. As for the subjects these articles are about (a question if too salty food really does cause more kidney patients and treating a psychosis (when LIKE is included); and poetry and Ruben Terlou (a documentary maker) (when LIKE is excluded and there are over 30 reactions, even without LIKE). These subjects are not that controversial.

From these results, it can thus be said that the entropy score calculated with the Facebook reactions is a good way to indicate controversy and the tables provide the validation to use the reactions in the rest of this research (hence why is chosen to publish them in this section instead of in the Results & Discussion section, to validate again the usage of the reactions).

(27)

LIKE LOVE HAHA WOW SAD ANGRY Entropy Eén tegen 100 en Bed & Breakfast te veel

amusement voor publieke net 170 19 78 20 55 155 0.944 Nederlander is het sparen verleerd 43 2 9 28 10 22 0.941

Table 2: Two posts with highest entropy scores with LIKE

LIKE LOVE HAHA WOW SAD ANGRY Entropy ‘Cariës is een gedragsziekte‘ 78 2 2 1 2 1 0.969 Kabinet maakt aantallen telefoontaps

AIVD en MIVD toch openbaar 30 1 3 3 2 2 0.961 >= 30 reactions:

Na aannemen donorwet ruim 30.000

nieuwe nee-registraties 1149 174 81 154 112 64 0.960 Christelijke scholen hebben geen zin in

het Wilhelmus 104 22 10 24 8 9 0.934

Table 3: Two posts with highest entropy scores without LIKE

LIKE LOVE HAHA WOW SAD ANGRY Entropy Door te zout eten krijgt Nederland er in tien

jaar 150 duizend nierpatiënten bij - klopt dit wel?

33 0 0 0 0 0 0_.0 Psychose behandelen met VR-bril blijkt effectief 35 0 0 0 0 0 0.0

Table 4: Two posts with lowest entropy scores with LIKE

LIKE LOVE HAHA WOW SAD ANGRY Entropy

Medaillekansen voor schaatsers: favoriet Kramer

in tiende rit op 5000 meter 168 21 0 0 0 0 0.0

‘Ik ben nooit het vertrouwen kwijtgeraakt. Als

het moet, dan sta ik er‘ 37 4 0 0 0 0 0.0

>= 30 reactions:

‘Poëzie hoeft niet altijd over je eigen innerlijke

wereld te gaan‘ 208 45 0 0 0 0 0.0

Iedereen stort zijn hart uit bij Ruben Terlou 713 71 0 0 0 0 0.0

Table 5: Two posts with lowest entropy scores without LIKE

After the entropy scores per post were calculated, the experiment was extended by creating a system that predicts the entropy scores. For this, the best predictive-feature must be decided. This predictive-feature can be a combination of one of the following: summary, message, full article text, named entities from the title and a one hot encoding representation of the topic.

To get the optimal combination of features to predict the entropy score, a pipeline was created using character and word n-grams, and a Linear SVR (Linear Support Vector Regression). The best settings can be found in Table 6. These settings are found by performing a grid search. Different regressors were also tried, e.g. a SGDRegressor or SVR with a linear kernel, but the LinearSVR scored best. It is im-portant to know that from here on, LIKE is included in every calculation concerning an entropy score.

The baseline was created by using a CountVectorizer and a DummyRegressor.2 This DummyRegressor makes predictions using simple rules. In this case the de-fault strategy was used, which always predicts the mean of the training set.

2

(28)

Feature Parameter Value

LinearSVR C 20

Character N-gram range (2,5)

Binary False

Normalization l2 Sublinear tf True

Word N-gram range (2,3)

Binary False

Normalization l2 Sublinear tf False

Table 6: Best model settings and features for predicting entropy

Testing features for significance

The results and outcomes visible above show that the entropy computed with reac-tions works. Following from that, it is now possible to test for significant features; which features from Facebook are a significant indicator for controversy. This was done by using a chi-squared test that tests the downloaded features (total number of reactions, total number of comments and the total number of LIKE/

LOVE/WOW/HAHA/SAD/ANGRY), the composite features (the number of reac-tions with a negative connotation (SAD + ANGRY) and the number of reacreac-tions with a positive connotation (LOVE + HAHA)), against the controversial tag which is determined from the entropy scores (the entropy score functions as a proxy for controversy). This controversial tag is based on the average entropy score (which is discussed above) from each Facebook post. When the calculated entropy score is higher than the average entropy, the post is marked as controversial, otherwise it is not.

Table7and Table8show the results from the performed chi-squared test using the Facebook components, and the controversy label that was assigned using the (average) entropy score. The results are given directly here instead of in Chapter 5, since the results are needed in order to get a better picture and idea of the rest of this research.

All reactions are significant with an alpha level of 0.05, except for LIKE. Interest-ing is that the total number of reactions and the different numbers of comments are also indicators for controversy. This could be because they indicate more interaction between users (the comments), or that something is very popular (the number of reactions) and because of its popularity also has more divers types of reactions.

As can be expected, since the reactions separately are also significant, the combi-nations of reactions (ANGRY+SAD and LOVE+HAHA) provide significant results. This means that even when the reactions are grouped according to the connotation they have (positive or negative), they still are a good indication for whether a posts is controversial or not.