Pluralism in Television Programme Content of the NPO

(1)

Pluralism in Television Programme Content of the NPO

submitted in partial fulfillment for the degree of master of science

Iris Meerman

10330984

master information studies

data science

faculty of science

university of amsterdam

2018-06-28

Internal Supervisor External Supervisor Title, Name dhr. dr. R.N. (Bob) van de Velde Robbert van Waardhuizen

Affiliation UvA, FNWI, ILPS NPO

(2)

abstract - The aim of this study is to present a methodology for measuring pluralism in episodes broadcasted by public service me-dia. The study proposes a technique to frag-ment episodes based on semantic similari-ties between adjacent blocks of text and mea-sure opinion pluralism within these frag-ments. Evaluation is performed based on per-ceived pluralism by the audience. The topic fragmentation algorithm shows an improve-ment in sum of squared error as opposed to the baseline when the right parameter setting is chosen. Also, the opinion clusters show semantic coherence despite many over-lap. However, the results show that there is no sentiment correlation between perceived pluralism as is answered by the audience and pluralism as is retrieved from the method-ology proposed in this study. It is therefore advised to use the results of this study as as-sistance in reducing content search for plu-ralism, without replacing a human assessor.

1 INTRODUCTION

The public broadcaster of the Netherlands (NPO) aims to connect their public with programmes that inform, inspire and amuse [16]. In order to achieve that goal, they strive to provide pluralis-tic content that represents the diversity in people and opinions of the Dutch society. In order to im-prove the quality of their content, they conduct pluralism studies on a regular basis. Pluralism in the context of media can refer to either in-ternal or exin-ternal pluralism. The scope of this study is internal pluralism. Internal pluralism is understood as the divergent political opinions and idealogical points of view reflected in the content being addressed by news and opinion programming [23]. Although internal pluralism is a central aim for the NPO, it is not currently being monitored. Instead, pluralism is measured indirectly as external (i.e. perceived) pluralism: the audience is asked whether they were exposed to a multitude of opinions throughout a pro-gramme. This is done retroactively, which has inherent disadvantages [23]. Ideally, pluralism is measured indirectly and instantly without the need for audience interpretation. Therefore, the

aim of this study is to present a methodology that can help determine internal pluralism.

With such a methodology, several new ques-tions related to pluralism can be answered about broadcasted episodes. Possible questions are: How plural are different programmes relative to one another? And, has pluralism in programme con-tent changed over time within certain programme titles? Answering these question enables the broadcaster to analyze and evaluate their con-tent based on a new metric and hence improve their performance towards their goals.

An operationally feasible interpretation of plu-ralism is understood as the amount of differ-ent opinions about a topic. This means that an episode that covers many opinions about a topic is said to be more plural than an episode that covers just as much topics from a singular per-spective.

The research question of this study is RQ1 How can internal pluralism be assessed

within a single news and opinion programme episode?

The data consists of subtitle files of opinion programmes broadcasted in 2017. The broad-casted programmes often cover a variety of top-ics in consecutive order. The tasks to fulfill in order to answer the research question with this data are as follows. The first is to fragment each episode into parts about a single topic, because opinions should be matched to their respective topic. The next task is to identify the dominant topic, if present, of each fragment. The third task is to extract and distinguish the opinions on the topic within each fragment. Three models are proposed. In the final task, the fragments are ranked and compared to perceived pluralism.

A topic is defined as a set of words that co-occur often and can be addressed in multiple episodes. An example topic is about the expan-sion of Schiphol, immigration or fashion. Opin-ions are defined as words used to describe the topic with a priori sentiment or subjective mean-ing.

In the next sections, the study will be described that leads to an answer to the research question.

(3)

First, literature related to topic and opinion di-versity is described in the related work section. Second, the methodology used for creating and improving the model is explained in the method-ology section. Third, the performance results of the model are compared in the results section. Fourth, conclusions are drawn in the conclusion section. Last, issues and ideas of future work will be addressed in the discussion section.

2 RELATED WORK

With the rising popularity of blogs and social networks, topic modeling and opinion mining have become a field of interest for many studies. Pang and Lee [17] provide an overview of topic and sentiment analysis techniques explored in research up to 2008. It appears that one of the most preferred approaches for topic modeling is Latent Dirichlet Allocation (LDA). LDA is a generative probabilistic model described by Blei e.a. [2]. LDA assumes each textual unit covers multiple topics and each topic has all words as-sociated to it to some degree. LDA iteratively updates its word-topic associations over a pre-defined amount of topics. Intuitively, the more often words co-occur in documents, the more likely it is those words are semantically related. LDA models can be used for dimensionality reduction, information retrieval and automatic summarisation purposes. However, topic models such as LDA do not offer a solution for opinion detection. To perform opinion detection, topic models must be extended.

Lin e.a. [11] implement a joint sentiment-topic model (JST) which distinguishes positive affilia-tion with a topic from negative affiliaaffilia-tion with a topic. Another approach for grasping opinions is Topic-Aspect Modelling (TAM) as described by Paul e.a. [18]. They introduce a Bayesian mixture model which jointly discovers topics and aspects. An aspect is interpreted as a perspective on a topic which is consistent over a document. Fang e.a. [8] have explored a similar approach that bypasses the need for single-aspect documents. They suggest an opinion mining technique they call Contrastive Opinion Modeling (COM) with which they find contrastive opinions over differ-ent predefined perspectives.

Unlike Fang e.a., the data in the present study does not have predefined perspectives or singu-lar aspects within one document. This is due to the fact that multiple people are involved in a conversation. The conversation is transcribed and the transcription is put into subtitle texts, but speakers are not annotated throughout tran-scription. What is also a characteristic of the data, is that very distinct topics are covered in consecutive order. By means of fragmentation, the sequential nature of the conversation data is exploited.

Adams e.a. [1] have studied topic extraction from conversations. They have implemented a TF-IDF vector space model and penalise sen-tences that are further away in the conversa-tion to be part of a topic. Hearst e.a. [10] have described a text tiling algorithm for comparing adjacent blocks of texts on lexical and semantic features. The comparison scores are calculated based on how many words the adjacent blocks have in common. In the present study, a similar approach to Hearst’s e.a. fragmentation is pro-posed, but more semantic similarities are intro-duced. Also, similarity scores between adjacent blocks are calculated using sum of squared error.

For opinion classification, multiple techniques have been covered by the literature. The easiest interpretation of opinions is by means of po-larity, denoting positive or negative sentiment. An example of sentiment and objectivity classi-fication is given in the research of Yu e.a. [26]. They propose an approach in which subjectiv-ity or objectivsubjectiv-ity is detected on sentence level and scaled on a – to ++ scale including neu-tral opinions. The approach makes use of seed words to indicate positivity, negativity and objec-tivity. In follow-up research [4,9,25], opinions are classified in more sophisticated classes us-ing rule-based subjectivity classifiers. The classi-fiers aim to identify opinions by means of opin-ion clues of the types: thought (think, consider, ...), impression (confuse, bewilder, ...), emotion (glad, worry, ...), modality about propositional attitude (should, would, ...), utterance-specific sentence form (however, nonetheless, ...) and cer-tainty/uncertainty (wondering, questioning ...) [4]. Synonyms of these opinion clues were added

(4)

to Wordnet by Chen e.a. [4]. However, to the best of the author’s knowledge, opinion finding im-plementations that make use of opinion clues are not available for Dutch language at this time.

What is available for Dutch language, is basic sentiment and polarity classifiers for sentences. However, these classifiers detect sentiment at word level. If a sentiment word is present, the sentence is determined to be subjective and the sentiment score is determined by a sum of its in-dividual sentiment words [14]. Subjective words are to great extend adjectives [14]. For example, Bruce e.a. [3] proved that adjectives are statisti-cally significantly, and positively correlated with subjective sentences in the corpus on the basis of the log-likelihood ratio. The probability that a sentence is subjective, simply given the fact that there is at least one adjective in the sentence, is 55.8% according to Missen e.a. [14]. As a result, the majority of opinion words in this study are adjectives. Also, opinions are interpreted at word level. By doing so, no false believe is created that the methodology in the present study can de-tect a more complex opinion structure, such as negation or sarcasm.

The next question is how to compare subjec-tive words in order to determine different opin-ions. Ideally, opinion words are compared on semantic similarity. This can be done by means of vectorisation. It was discovered by Mikolov e.a. [13] that similarity of word vectors are able to reflect semantics as well as syntactic regular-ities. This is best explained by the example of Mikolov e.a. that vector(‘King’) - vector(‘Man’) + vector(‘Woman’) results in a vector that is closest to the vector representation of the word ‘Queen’. Vector representations are therefore common practice for automatic detection of semantic sim-ilarities between words. Example applications of semantic word vector representations are en-tity completion in knowledge bases [5], connect-ing images and sentences [24] and document retrieval tasks [20].

Therefore, opinionated words in this study are translated into semantic vectorspace in order to distinguish similar and divergent opinion words. Clustering in vector space is a common tech-nique as it is used in various studies mentioned

in the survey of Turney e.a. [22]. In addition to clustering, similarities towards fixed vectors are also calculated. According to Ekman [7], there are six basic human emotions that can be ex-pressed. These are happiness, surprise, sadness, fear, disgust and anger. As an approximate, the vectors of these six words are used as opinion axes. This is due to the fact that emotion often precedes opinion [15]. It can be the case that per-ceived pluralism is biased towards emotion ex-pression wherefore mapping the opinion words into emotion can be a matching approximation. In summarisation, inspired by the literature, an approach is conducted consisting of the tasks of text fragmentation, subjective word extrac-tion, and translation into distinct opinions to approximate pluralism.

3 DATA

The data is provided by the public broadcaster of the Netherlands (NPO) and consists of subti-tle files of 31 opinion programmes broadcasted in 2017 in Dutch. Each subtitle file is a single text file containing all spoken utterances in a single episode of a programme. The utterances can for instance be spoken Dutch text of the talkshow host, a little movie clip that is shown within the episode or English lyrics of a band that is performing. Also, the utterances are not presented one full sentence at a time, but one displayed unit of subtitle at a time. This means that sentences are split in approximately two to three utterances and each contain a time slot. Within this dataset, 20 evaluation episodes are appointed. These episodes are assessed by the audience on pluralism.

4 METHODOLOGY

This section explains the method with which the research question is answered. The task is split into three components, that is fragmentation, opinion mining and evaluation.

4.1 Preprocessing

To prepare the subtitle files, several preprocess-ing functionalities are implemented. These func-tionalities are extraction of the full sentences, POS tagging, stemming, tokenisation and removal

(5)

of special characters. In all cases, the programme titles that were overrepresented in the dataset are cut off above ten episodes in order to have a varied, manageable dataset. To give an example, the general news programme (NOS Journaal) has over a 1000 episodes in a single year whereas a programme likeUit Europa consists of four episodes.

4.2 Fragmentation

The unique characteristic of the data, is that very distinct topics can be addressed within a sin-gle document in consecutive order. As an ex-ample, politics and fashion can be discussed di-rectly after another. Thus, a text tiling approach has been taken to split the episodes into frag-ments about one topic. Text tiling means that adjacent blocks of texts are compared to one an-other based on lexical and semantic properties. Using these properties subsequent blocks of text can be grouped by topic without losing chrono-logical order [10]. The optimal splitting criteria are found by a parameter grid search and evalu-ated on an annotevalu-ated dataset of 28 samples. Next, a sliding window implementation is used that slides over each preprocessed document with a step size of ten words at a time. Table1denotes the parameters that are optimised.

windowsize [50, 100, 150, 200]

named entities [0,1,2,3,4]

word overlap [2,3,4,5,6]

topic overlap [1,2,3]

word movers distance [0.8 ... 2.0]

Table 1. Parameters and their range explored in the grid search

The window size denotes the size of each indi-vidual block that is compared to the next block in the document. The other four parameters de-termine whether to cut between blocks. If any of the parameters is not satisfied, the blocks are not about the same subject and should thus be cut. First, the named entities are extracted by means of the Stanford NLTK named entity recog-niser that has been trained on a Dutch corpus with a Naive Bayes classifier [12]. Second, the

word overlap is a count of the overlap of identical words between the two blocks. Third, the topic overlap property counts the amount of unique words that overlap between the blocks within a same topic obtained with an LDA. The LDA has been trained beforehand on all episodes in the dataset. The LDA is trained on nouns only since they become noisy when incorporating all part of speech types [4,8]. The optimal amount of topics is assessed with a perplexity score after convergence. Perplexity is a quantitative mea-sure for comparing language models and is often used to compare the predictive performance of topic models. The value of perplexity reflects the ability of a model to generalise to unseen data [8]. Lastly, inspired by the vectorisation of sentences of Adams and Martell [1], word movers distance is introduced. The word movers distance seeks the shortest path between the word vectors of the words in the two blocks. This means that the word movers distance finds the minimal se-mantic distance in vector space between the two windows.

The objective to minimise is the sum of squared errors between the estimated locations of the cuts and the closest actual location of the cuts. The squared error is used, since two small dif-ferences in cutting locations are preferred over one estimation that is far off. The sum of squared errors is defined:

SSE = ΣN

i=1(xi−yi)2 (1) In case of no cuts are found or no cuts were manually annotated, the value ofx or y is sub-tracted from zero respectively. After determining the optimal split parameter settings, the episodes are cut into fragments.

4.3 Opinion mining

The opinion words are extracted from these frag-ments. The opinion words in this study are re-duced to words that have sentimental polarity or subjectivity, extracted by use of Pattern and Polyglot [6,19]. The result is a list of opinion words per fragment.

From this result, it must be addressed what is considered to be the best approximation of plu-ralism. Three models are hereafter determined.

(6)

4.3.1 Model A: clusters. The first model de-termines the amount of opinion clusters that are addressed in a fragment. The fragment with the highest amount of opinion clusters determines the score of the episode. The clusters are ob-tained via k-means clustering on the whole opin-ion words vocabulary in semantic vector space. The word vectors are trained with a mixed set of documents containing Dutch wikipedia pages, news articles and blogposts [21]. The assump-tion has been made that on average 20 words are semantically related to one another. For this dataset containing around 5000 unique opinion-ated words this results in 250 clusters. The reason for adapting this assumption instead of using a metric that evaluates cluster coherence, is due to the fact that the clusters show much overlap wherefore no optimum can be identified. The maximum fragment score per episode accumu-lates in a ranking over multiple episodes, so that they can be compared.

4.3.2 Model B: clusters and sentiment. The second model is an expansion of the first model by adding polarity to the clustering. The down-side of the first model is that semantically or syn-tactically related words were clustered together while their polarity can be opposite. An example is ‘groot’ and ‘klein’. These two words are se-mantically related, but are opposites. Therefore, each cluster obtained with model A is split into one up to three clusters, for positive, negative or neutral sentiment respectively. The sentiment is obtained with Pattern and Polyglot [6,19]. A ranking is created in a similar fashion as with model A.

4.3.3 Model C: Emotions. In the last model, the average cosine similarity of each opinion word towards each of the six base emotion vec-tors is calculated. These emotions are anger, dis-gust, happiness, surprise, sadness and fear [7]. The result is a list of six values for average sim-ilarity to each base emotion per opinion word within one fragment. A ranking is created based on these values. First, the samples are sorted on whether all six similarities are above average, next they are sorted on standard deviation. The intuition behind this approach is that the more

uniformly distributed the emotions are exposed, the more plural the fragment must be.

4.4 Evaluation of methods

Evaluation of the fragmentation is conducted by means of sum of squared error. Evaluation of the opinion clusters for model A and B are evaluated by means of Silhouette score. The Sil-houette score is a score for the balance between the mean intra-cluster distance and the mean nearest-cluster distance for each sample. The score is the intra cluster distance divided by the distance to a closest other cluster. The higher the score, the better the sample is unique to the clus-ter. The best value is 1 and the worst value is -1. Scores close to zero denote overlapping clusters. However, for the fragmentation as well as the opinion clusters there are no benchmarks. In addition, they are not said to be objective evalu-ation metrics for measuring internal pluralism. Therefore, the plurality rankings of the models are compared to perceived pluralism question-naire data. While perceived pluralism and inter-nal pluralism are not the same concepts, it is assumed that they correlate.

The perceived pluralism is obtained from ques-tionnaires sent by the NPO to their panel. The respondents were asked to determine whether an episode to their estimate addressed a high amount of opinions. The percentage of respon-dents answeringyes has been reported. The min-imal number of responses per episode is n=30. In this study, we assume that the higher the ratio of respondents that consider an episode to be plural, the more plural the episode. A ranking is created with these ratios for a test set of 20 episodes broadcasted in 2017 and compared to rankings obtained with the models. The rank-ings are evaluated with Spearman rank-order correlation coefficient. Spearman correlation is expressed as a score on a scale from -1 to 1, de-noting inverse ranking and identical ranking re-spectively.

5 RESULTS

This section presents the results of the conducted methodology in corresponding order.

(7)

5.1 Fragmentation robustness

Figure1shows the influence of the window size on the sum of squared errors obtained by means of a grid search. The window size has most in-fluence on the sum of squared error, which can be explained by the fact that it determines the amount of terms considered for the other pa-rameters. The window sizes explored are 50, 100, 150 and 200. The window size of 100 shows the lowest sum of squared error.

Fig. 1. Influence of windowsize on the sum of squared error

The boxplots shown in figure2are obtained for the other parameters with a fixed window size of 100 words. Each boxplot represents the ranges of sum of squared errors for a specific parameter value obtained by changing the other parameters for a fixed training set. From left to right, these are the sum of squared errors for the named entities in the range 0 to 4, the terms in the range 2 to 6 and the topic overlap in the range of 1 to 3. The total amount of topics obtained with the LDA is 200, since this resulted in an optimal perplexity score.

It can be seen in the figure that the terms have most positive influence on its own at an optimum of 6 for this data. Combining the parameters resulted in a minimal sum of squared errors at 2 topics, 6 terms and 2 named entities for a window size of 100. However, the optimum for the train data is not the same as for the validation data. The train data showed an improvement in sum

of squared error by a factor of 8.7, while the improvement for the validation set had a factor 1.04. With other parameter settings, this factor increased for the validation set, but decreased for the train set. It is therefore not said to generalise well to new, unseen data.

With these parameter settings, the whole cor-pus was split into fragments. On average, each document contained 4.86 fragments.

Fig. 2. Individual parameter influence on the sum of squared error for a windowsize of 100

The word movers distance has been elimi-nated due to the fact that it did not improve performance. This is likely due to the fact that it suffered from noise introduced by words with limited semantic added value. On top of that, the time cost was high for computing the word movers distance for each two adjacent windows. 5.2 Opinion mining

In this section, results of models A and B and results of model C are presented separately.

5.2.1 Model A and B. Model A is artificially fixed to a total of 250 opinion clusters. Model B uses these clusters and splits them in distinct polarities if present, resulting in 604 clusters in total.

The average Silhouette score for individual opinion words is 0.030, whereas the average Sil-houette score for clusters is 0.062. This difference

(8)

can be explained by the fact that the clusters have different sizes. Clusters with high average Sil-houette scores are relatively small compared to clusters with lower Silhouette scores, hence the higher Silhouette average for clusters. The dis-tribution of these averages per cluster is shown in figure3as well as the average.

Fig. 3. Distribution of average Silhouette scores per cluster

85.2% of the clusters have positive Silhouette score, however the scores are close to zero. This means there is cluster overlap. This can be caused by the fact that a large amount of words are adjectives, which are syntactically similar. Two clusters with high average Silhouette scores are shown in table2as an example. In column two, the cluster assignments of model A are presented. What can be seen in this example, is that se-mantically related words are correctly clustered together. However, they have divergent polari-ties. For this reason model B was developed. The sentiment introduced by model B is shown in column three. ‘Pos’ denotes positive sentiment, ‘Neg’ denotes negative sentiment. There is also a class ‘Neutral’, which means that the word is subjective, but has no sentiment.

5.2.2 Model C. For model C, the emotion im-balance, the similarities subtracted by the mean for five fragments are shown for each of the six emotions in figure4. The most plural fragment in this example is the fifth, Zomergasten. This is due to the fact that all emotions have positive

Cluster Sentiment Silhouette Score Word talent 201 Pos 0.174 jonge 201 Pos 0.253 veelbelovende 201 Pos 0.165 talenten 201 Pos 0.237 jeugdige 201 Pos 0.207 talentvolle 201 Pos 0.420 ambitieuze 201 Pos 0.181 getalenteerde 201 Pos 0.358 begaafde 201 Pos 0.137 fitte 201 Pos 0.144 piepjonge 201 Pos 0.131 onervaren 201 Neg 0.136 gelukkige 84 Pos 0.111 hoopvolle 84 Pos 0.175 blijde 84 Pos 0.162 trieste 84 Neg 0.193 droevig 84 Neg 0.107 verdrietige 84 Neg 0.290 droevige 84 Neg 0.365 droeve 84 Neg 0.352 treurige 84 Neg 0.273 sombere 84 Neg 0.174 droef 84 Neg 0.113

Table 2. Clusters with high average Silhouette score and their polarity distinction

value, meaning that on average more emotion for each of the six emotions is shown in the doc-ument. Also, the standard deviation is smaller than the standard deviation of What the Hague, which is the second most plural in this example. The standard deviation of this Zomergasten frag-ment is 0.030 and the standard deviation of the What the Hague fragment is 0.034.

5.3 Evaluation of methods

The Spearman scores of the model rankings com-pared to the rankings obtained from the panel are shown in table3. From the table, it can be read that model A shows most similarity compared to perceived pluralism. However, the correlation is weak. Also, the p-value is > 0.05, wherefore the null hypothesis that the two rankings are uncorrelated can not be rejected.

The weak correlation scores can be explained by the following example. The presenter of a

(9)

Fig. 4. Deviation of the mean similarity towards each emotion

Clusters Spearman

Score p-value

A: Clusters 250 0.206 0.383

B: Clusters & Sentiment 604 0.115 0.629

C: Emotion imbalance 6 -0.083 0.729

Table 3. Spearman correlation with the perceived pluralism by the audience

programme aims to convince the viewer to be against duck breeding. The presenter might use words like ‘illegal’, ‘unfair’ and ‘suffering’. How-ever, he can also use the words ‘good’ (action), ‘beautiful’ (animals) in the same context. These opinion words are classified into different classes, wherefore the episode appears plural while in fact the presenter is still advocating the same opinion. In other words, the methodology helps find different opinion words, but is unable to semantically assess thatlook at these beautiful ducks and the conditions on this farm are awful entail the same point of view.

The ranking scores of model A and B were slightly different, where A was better than B. The rankings appear to be similar which can be explained by the fact that model A and B are structurally similar. The difference can be caused by speakers using negations. An example would be that is not big, and I would rather say it is small. In model B, these would be considered

as different opinions, while in fact they are the same.

Model C shows the least similarity score. This might be caused by the fact that emotion repre-sentations are less related to what the audience considers to be plural.

6 DISCUSSION

The proposed methodology aims to find sub-jective word clusters used in describing a topic within a single news and opinion programme episode. This study therefore proposes a method-ology to help assessing internal pluralism on episode fragment level.

However, the results of the methodology pre-sented in this study do not show a correlation with perceived pluralism. This could have vari-ous reasons. First of all, 28 files were annotated with splits in order to be used in the grid search. This amount appeared not to be sufficient to find generalisable optimal parameter settings. In the future, more annotations should be created to overcome this problem. Also, relative distances between adjacent windows can be explored in-stead of absolute differences to account for differ-ent coherences in episodes. Second, the rankings created with the percentages obtained from per-ceived pluralism as is the evaluation objective in this study, do not represent internal plural-ism which the NPO would like to learn from this study. Also, the evaluation set contained 20 documents, which is a small number. Therefore, evaluation of the methodology is hard.

Considering the techniques, a couple of limi-tations are present. First, an LDA learns topics from words co-occurring together in documents. In order to distinguish sets of words from each other it needs repetition. Therefore, topics that are very rarely addressed in documents have less chance of being detected by the LDA as being a topic. Therefore, a fragment about a very unique topic will not be taken into account with this methodology.

Second, some of the packages used are not fully accurate. To give an example, ‘zoals’ has positive sentiment according to Polyglot, while this is not a sentiment word. Also, the named entity recogniser of the Stanford NLTK trained

(10)

with Naive Bayes was often inaccurate in deter-mining a named entity. In future work, other taggers could be explored that gain higher accu-racy.

Third, the word vector space aims to represent semantic similarities, but is also influenced by syntactic similarities. To give an example, ‘boos’ is within the 20 closest vectors to the vector of ‘blij’. Hence, there is overlap between the opin-ion word clusters, as is shown by the Silhouette scores. Within-cluster polarity as is proposed in model B can not overcome all errors caused by syntactic influence.

Fourth, considering the data, it should be no-ticed that the subtitles are not in all cases iden-tical to the actual spoken content of an episode. This can be caused by the fact that subtitles are occasionally created live.

7 CONCLUSION

To conclude, the methodology is split into three components, that is fragmentation, opinion min-ing and evaluation. The methodology suffered from lack of a suited evaluation for internal plu-ralism. Therefore, the methodology is not said to be able to replace a human assessor. However, internal pluralism can be assessed by a human as-sessor by using the results of this study, because the study reduces content search by presenting the topics and opinions per episode.

REFERENCES

[1] Paige H Adams and Craig H Martell. 2008. Topic de-tection and extraction in chat. InSemantic Computing, 2008 IEEE International Conference on. IEEE, 581–588. [2] David M Blei, Andrew Y Ng, and Michael I Jordan. 2002.

Latent dirichlet allocation. InAdvances in neural infor-mation processing systems. 601–608.

[3] Rebecca F Bruce and Janyce M Wiebe. 1999. Recogniz-ing subjectivity: a case study in manual taggRecogniz-ing.Natural Language Engineering 5, 2 (1999), 187–205.

[4] Bi Chen, Leilei Zhu, Daniel Kifer, and Dongwon Lee. 2010. What Is an Opinion About? Exploring Political Standpoints Using Opinion Scoring Model.. InAAAI. [5] Danqi Chen, Richard Socher, Christopher D Manning,

and Andrew Y Ng. 2013. Learning new facts from knowledge bases with neural tensor networks and se-mantic word vectors. arXiv preprint arXiv:1301.3618 (2013).

[6] Yanqing Chen and Steven Skiena. 2014. Building senti-ment lexicons for all major languages. InProceedings

of the 52nd Annual Meeting of the Association for Com-putational Linguistics (Short Papers). 383–389. [7] Paul Ekman, Wallace V Friesen, and Phoebe Ellsworth.

2013.Emotion in the human face: Guidelines for research and an integration of findings. Elsevier.

[8] Yi Fang, Luo Si, Naveen Somasundaram, and Zhengtao Yu. 2012. Mining contrastive opinions on political texts using cross-perspective topic model. InProceedings of the fifth ACM international conference on Web search and data mining. ACM, 63–72.

[9] Osamu Furuse, Nobuaki Hiroshima, Setsuo Yamada, and Ryoji Kataoka. 2007. Opinion Sentence Search Engine on Open-Domain Blog.. InIJCAI. 2760–2765. [10] Marti A Hearst. 1997. TextTiling: Segmenting text into

multi-paragraph subtopic passages.Computational lin-guistics 23, 1 (1997), 33–64.

[11] Chenghua Lin and Yulan He. 2009. Joint senti-ment/topic model for sentiment analysis. InProceedings of the 18th ACM conference on Information and knowl-edge management. ACM, 375–384.

[12] Christopher Manning, Mihai Surdeanu, John Bauer, Jenny Finkel, Steven Bethard, and David McClosky. 2014. The Stanford CoreNLP natural language pro-cessing toolkit. InProceedings of 52nd annual meeting of the association for computational linguistics: system demonstrations. 55–60.

[13] Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Efficient estimation of word representa-tions in vector space. arXiv preprint arXiv:1301.3781 (2013).

[14] Malik Muhammad Saad Missen, Mohand Boughanem, and Guillaume Cabanac. 2013. Opinion mining: re-viewed from word to document level.Social Network Analysis and Mining 3, 1 (2013), 107–125.

[15] Myriam D Munezero, Calkin Suero Montero, Erkki Suti-nen, and John Pajunen. 2014. Are they different? Affect, feeling, emotion, sentiment, and opinion detection in text.IEEE transactions on affective computing 5, 2 (2014), 101–111.

[16] NPO. 2015. Het publiek voorop, Concessiebeleidsplan 2016-2020.NPO (2015), 122.

[17] Bo Pang, Lillian Lee, et al. 2008. Opinion mining and sentiment analysis.Foundations and Trends® in Infor-mation Retrieval 2, 1–2 (2008), 1–135.

[18] Michael Paul and Roxana Girju. 2010. A two-dimensional topic-aspect model for discovering multi-faceted topics.Urbana 51, 61801 (2010), 36.

[19] Tom De Smedt and Walter Daelemans. 2012. Pattern for python.Journal of Machine Learning Research 13, Jun (2012), 2063–2067.

[20] Nitish Srivastava and Ruslan R Salakhutdinov. 2012. Multimodal learning with deep boltzmann machines. InAdvances in neural information processing systems. 2222–2230.

[21] Stephan Tulkens, Chris Emmery, and Walter Daele-mans. 2016. Evaluating Unsupervised Dutch Word Embeddings as a Linguistic Resource. InProceedings

(11)

of the Tenth International Conference on Language Re-sources and Evaluation (LREC 2016) (23-28), Nicoletta Calzolari (Conference Chair), Khalid Choukri, Thierry Declerck, Marko Grobelnik, Bente Maegaard, Joseph Mariani, Asuncion Moreno, Jan Odijk, and Stelios Piperidis (Eds.). European Language Resources Associ-ation (ELRA), Paris, France.

[22] Peter D Turney and Patrick Pantel. 2010. From fre-quency to meaning: Vector space models of semantics. Journal of artificial intelligence research 37 (2010), 141– 188.

[23] Dunya van Troost. 2017. Project: What is the band-with of opinion encountered when watching news and opinion programs broadcasted by the NPO? (2017). [24] Jason Weston, Antoine Bordes, Oksana Yakhnenko, and

Nicolas Usunier. 2013. Connecting language and knowl-edge bases with embedding models for relation extrac-tion.arXiv preprint arXiv:1307.7973 (2013).

[25] Theresa Wilson, Paul Hoffmann, Swapna Somasun-daran, Jason Kessler, Janyce Wiebe, Yejin Choi, Claire Cardie, Ellen Riloff, and Siddharth Patwardhan. 2005. OpinionFinder: A system for subjectivity analysis. In Proceedings of hlt/emnlp on interactive demonstrations. Association for Computational Linguistics, 34–35. [26] Hong Yu and Vasileios Hatzivassiloglou. 2003. Towards

answering opinion questions: Separating facts from opinions and identifying the polarity of opinion sen-tences. InProceedings of the 2003 conference on Empirical methods in natural language processing. Association for Computational Linguistics, 129–136.