• No results found

Stance analysis with supervised machine learning applied to the news outlets' articles about Russia's participation in the Syrian civil war

N/A
N/A
Protected

Academic year: 2021

Share "Stance analysis with supervised machine learning applied to the news outlets' articles about Russia's participation in the Syrian civil war"

Copied!
103
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Stance analysis with supervised machine learning

applied to the news outlets' articles

about Russia's participation in the Syrian civil war

Master Thesis Digital Humanities

By

Ksenia Iakovleva (S3461769)

Supervisor: Dr. Marc Esteve del Valle

(2)

Abstract

The topic in this thesis was chosen based upon personal experience in the news media field, and excitement about how much words can influence political issues in the world, and how easily opinions can be skewed from these media.

To illustrate such two different ways of presenting the same information from the current political situation in the world, the Syrian civil war was chosen as a good example to investigate further. The Syrian civil war was followed by another conflict – between the Russian and global news media. While global media reported about the devastating airstrikes of the Russian military forces in Syria, the Russian newspapers complained about “Western propaganda” against

Russia’s counter-terrorist operation. In this thesis, automatic stance analysis will be used in order to answer the following research question: "Were the Russian media (RBC and "Russia Today") more inclined to show their stance than the global media (CNN and "Al Jazeera") in the news articles about Russia's participation in the Syrian civil war?".

In this research, the theoretical framework for manual stance analysis of news articles will be combined with automatic stance analysis. First, articles will be scraped from the websites of the newspapers and stored in a data set. Second, each of the scraped articles will be manually classified with one of the three stance categories: “agree”, “disagree” or “discuss”. Then two supervised machine learning (SML) models for stance analysis of news in English and in Russian will be used to predict the stance in the news articles automatically.

Finally, a Chi-square test will be employed to answer the research question and conclude that the two Russian newspapers were not more inclined to show their stance than the global media.

This thesis gives an example of how stance analysis with machine learning can be used to analyze data from news media. It demonstrates some current disadvantages of this approach and gives directions for its future improvement. In the conclusions, some suggestions for future research are mentioned as inspirational guidance.

(3)

Acknowledgements

I am eternally grateful to my former colleagues and friends Nikolaos I. Tzanetis and Gerrit Sijberen Luimstra, whose passion to machine learning, deep knowledge in this field and hard work inspired me during the planning and realization of the practical part of this thesis. These people not only gave me inspiration, but also explained to me fundamental machine learning concepts and provided me with the priceless technical help.

Special gratitude goes to Henriette Schoen, whose interest to machine learning, perfectionism and patience helped to make this thesis as coherent and clear as possible. Her extensive experience in writing academic papers was very important for finalizing this thesis.

This thesis is dedicated to three of you, while you helped me so much in getting the best possible results, and supported me during the whole process.

(4)

Table of Contents

INTRODUCTION 5

Chapter 1: Stance analysis of media texts 8

1.1. Meaning of the term “stance” 8

1.2. Journalistic stance 9

1.3 Methods of manual stance analysis of news media texts 9

1.4. Application of manual stance analysis in research 11

1.5. Methods of automatic stance analysis of media texts 15

1.5.1. Supervised machine learning models 15

1.5.2. Automatic stance classification with supervised machine learning 16

Chapter 2: Data collection and evaluation 20

2.1. Extracting CNN news from the Kaggle data set 20

2.2. Scraping news 21

2.2.1 Scraper for RT 22

2.2.2. Scraper for RBC 23

2.2.3. Scraper for "Al Jazeera" 23

2.3. The results of collecting data 24

Chapter 3: Annotation 25

3.1 Data set description 25

3.2 Annotation methodology 26

3.3. Inter-annotator agreement 27

3.4. Complicated cases for human annotation 27

3.5. Results of the annotation 29

Chapter 4: Automatic stance analysis 31

4.1. Applying the best model for automatic stance analysis 32

4.1.1. Description of the model 32

4.1.2. Stance detection process 34

4.1.3. Holdout 34

4.1.4. Cross-validation 34

(5)

4.1.5. Training and final test 35

4.1.6. Preparing our data set 35

4.1.7. Case studies and results 36

4.1.8. Error analysis of automatic stance prediction in English 41 4.2. Supervised machine learning model for stance analysis in Russian language 43

4.2.1. Preprocessing of the Russian data set 43

4.2.2. Converting words to vectors and feature extraction 45

4.2.3. Logistic regression classifier 46

4.2.4. Cross-validation and grid search 47

4.2.5. Results of stance analysis of Russian news data set 48

4.2.6. Error analysis of automatic stance prediction in Russian 48

Chapter 5: Statistical analysis of the results 52

Chapter 6: Discussion, conclusions and future research 56

6.1. Discussion 56

6.1.1. Patterns of stance found via annotation 56

6.1.2. The results of supervised machine learning 58

6.2. Conclusion 63

6.3. Further research 66

Bibliography 69

Appendices 75

(6)

List of Tables

Table 1​. ​The difference between manual vs. automatic stance analysis in referenced research....6 Table 2: Results of the original FNC model with the original training and test sets…………....33 Table 3: Results of training with FNC data and testing with our data………..37 Table 4: Results of training with FNC data and part of our data, testing with another part of our data………39 Table 5: The best results achieved by removing stop words……….…41 Table 6: Results of our model applied to our Russian data set……….48 Table 7: Number (and percentage) of news labeled with “agree” and “discuss” per dataset..….53 Table 8: Number (and percentage) of news labeled with “agree” and “discuss” per

newspaper……….53

(7)

I

NTRODUCTION

The Syrian civil war was the most devastating armed conflict in the world in the 21st century. It was considered as one of the worst humanitarian disasters since World War II (Doganay & Demiraslan, 2016). This conflict involved the most powerful countries, which had their own interests in the war, sometimes other than helping one of the sides to win. Russia was one of these countries: its part in the Syrian civil war led to a lot of discussions, mostly negative in the global media, which accused Russia of supporting the totalitarian regime of the Syrian President Bashar Assad (Bagdonas, 2012). Articles about the brutalities perpetrated by the Syrian government forces were often seen in Russia as “Western propaganda” (Bagdonas, 2012).

According to the Ministry of foreign affairs of the Russian Federation, Western media’s articles were usually biased and in some situations involved outright acts of information warfare being waged "to maximally tarnish the image of Syria and its leadership in the eyes of the world and thus achieve the creation of conditions to justify outside intervention in the affairs of Syria to overthrow the existing regime there" (Bagdonas, 2012). At the same time, many of the Russian media’s articles demonstrated that the country's role in the war was completely positive and that it was focused on fighting terrorists and providing Syrian civilians with humanitarian help (Brown, 2014). Bagdonas compares Russian media's articles to the public relations campaign of Putin (Bagdonas, 2012). Brown even implies that Russia historically used propaganda as "a tactic of non-linear war" and expanded its "misinformation campaign" by increasing the funding of some media outlets, like RT ("Russia Today") (Brown, 2014).

The differences between reporting the same situation in Syria by Western and Russian media were explored in the research (Brown, 2014; Aldreabi & Haitham, 2019). The scholars proved that there was a difference in topics of news about Russia’s participation in the war published by Russian and global media (some topics were emphasized; others were ignored by them). However, this difference does not necessarily indicate the newspapers’ subjectivity or objectivity: for instance, it is reasonable that Russian newspapers pay more attention to the topics

(8)

regarding Russia, while American media are more focused on the events which influence the US. The method that can determine which newspapers are more subjective is called stance analysis (stance is an attitude of a journalist towards the topic she describes (White, 2004)). There is still no research comparing the stance of Russian and global media’s articles about the country’s participation in the Syrian civil war. Therefore, the following research question was chosen for this thesis: “Were the Russian media more inclined to show their stance than global media in articles about Russia's participation in the Syrian civil war?".

To answer the research question, this thesis will combine two approaches – manual and automatic stance analysis of news articles. The differences between these two methods can be observed in Table 1 below. In research, manual stance analysis was conducted by annotators, who assigned a label to each word in a corpus of news articles according to the specific guidelines. The labels represented not only the stance itself (“agree”, “disagree” or “neutral”), but also the specific type of it (e.g. “affect”, “judgement”, “appreciation” etc.) (Huan, 2018). This research was also focused on finding patterns which point out the stance (non-neutral adjectives, metaphors, absence of references, etc.) (Teo, 2000; Fairclough, 2010; Zhang, 2013; Liu & Stevenson, 2013). This complicated way of stance analysis was time-consuming, and, therefore, researchers could not analyze many articles (usually a case study included 15-20 texts) (Zhang, 2013; Liu & Stevenson, 2013).

Table 1. ​The difference between manual vs. automatic stance analysis in referenced research

Type of stance analysis Human annotation needed Annotation guidelines used Number of categories (labels) Number of articles analyzed Type of analysis Process speed

Manual yes yes 10 and more 15-20 word low

Automatic yes no 3 2700 text high

The first step of automatic stance analysis also included human annotation. However, annotators assigned labels not to the words, but to the whole articles. The annotation guidelines were never specified in this type of research (Riedel et al., 2017; Baird, Sibley & Pan, 2017; Hanselowski et al., 2018). Annotators assigned only one of the three labels - “agree”, “disagree”

(9)

or “discuss” (which is the same as “neutral” category in manual stance analysis), without any subcategories which were used in manual stance analysis research. After that supervised machine learning model was applied to the annotated data set of articles and then “learned” from them the features of each type of stance. Then this model could be applied to other unlabeled articles and labeled them automatically. Academic research in automatic stance analysis included large data sets of news articles (e.g. 3000 texts).

However, a complete combination of manual and automatic stance analysis is not possible, because even the most advanced supervised machine learning models for stance analysis are not able to predict many specific categories which are used in a manual

classification of journalistic stance. Therefore, for this thesis, the same categories as were used in the studies of automatic stance analysis before (“agree”, “disagree”, “discuss”) will be kept. The rules for assigning to an article each of these labels will be defined according to the stance categories and patterns from the previous research in manual stance analysis (Teo, 2000; Fairclough, 2010; Zhang, 2013; Liu & Stevenson, 2013). All the criteria of stance, which are relevant to the analysis of news articles, will be combined in the instructions for annotators, who will label the data.

After the annotation supervised machine learning models for stance analysis will be applied to the data sets. In this research, automatic stance analysis will be used for analyzing “quality media” articles for the first time (before it was used only when analysing “fake news” and articles from unreliable sources). In these news articles, the stance can be shown in a “hidden” way, so it might be more complicated to detect it automatically, than in the articles from unreliable sources, where it is demonstrated explicitly. Moreover, in the previous research automatic stance analysis was applied to the news on many different topics, while in this thesis articles on one general topic will be analyzed. This can also confuse the automatic model for stance detection. The results of this research will either prove that supervised machine learning models for stance analysis are able to predict correctly the stance in “quality media” articles, which are labeled with the methodology used in manual stance detection, or that they can detect stance only in news, retrieved from unreliable sources (“fake news”).

Through theoretical argumentation in Chapter I, this thesis will summarize the main

(10)

findings of the research in manual and automatic stance analysis of different kinds of texts, specifically news articles. Chapter II will be dedicated to data scraping and cleaning, Chapter III – to the annotation of the data sets. In Chapter IV five case studies with training and applying machine learning models to the test sets will be conducted. In Chapter V the statistical analyses of our data and its results will be described. In Chapter VI a final summarization of the found research results will be presented.

(11)

Chapter 1: Stance analysis of media texts

In this chapter relevant academic literature about stance, journalistic stance, manual and automatic stance analysis will be reviewed. The most useful approach for these research methods (“Appraisal” framework and Critical Discourse Analysis) will be chosen and applied in the next chapters of the thesis, where manual annotation of the data sets will be conducted and supervised machine learning models will be used.

1.1. Meaning of the term “stance”

The term “stance” has always been associated with the terms “evaluation” and “attitude” in research: “For us… evaluation is the broad cover term for the expression of the speaker or writer’s attitude or stance towards, a viewpoint on, or feelings about the entities or propositions that she is talking about” (Hunston, 2000). Stance encompasses the ways in which a writer expresses attitudes and opinions towards specific propositions and also the overall discursive position that a writer takes in relation to a particular topic.

The meaning of the term “stance” has become more complicated through time

(Kockelman, 2004). In the late 1980s, it was considered as the overt expression of an author's or speaker's attitudes, feelings, judgments, or commitment concerning the message (Biber & Finegan, 1988). In 1990s-2000s it was defined as the tendency of humans as species to interpret behavior in terms of putative mental states, such as belief, desire, and fear (Khalidi, 1995; Reboul, 2000). At the same period, it was called “a cover term for the expression of personal feelings and assessments” (Conrad & Biber, 2000). In late 2000s researchers still implied that very little was understood about stance: “what it is, how we do it, what role language and

interaction play in the process, and what role the act of taking a stance fulfills in the broader play of social life” (Du Bois, 2007). In 2009 stance was defined as “taking up a position with respect to the form of the content of one’s utterance” (Jaffe, 2009). In 2010s stance was defined as ways in which speakers and writers encode opinions and assessments in the language they produce

(12)

(Gray & Biber, 2012). Expression of stance varies along with two major parameters: the meaning of the assessment (personal feeling and attitude or status of knowledge) and linguistic level used for the assessment (lexical or grammatical).

These complexities in defining the term “stance” appeared because researchers stopped seeing it only as a linguistic phenomenon and started reflecting on its socio-cultural sides. They described stance as a linguistically articulated form of social action whose meaning is to be construed within the broader scope of language, interaction and sociocultural value (Du Bois, 2007). In defining stance, scholars started paying specific attention to context – social relations, gender roles, cultural values, language ideologies (Kockelman, 2004).

1.2. Journalistic stance

The journalistic stance is always undertaken within a particular social-institutional context (Huan, 2018). For this reason, some scholars not only included into their research linguistic and semantic analysis of media texts, but also analyzed the agenda of the period and pointed out which facts were ignored and not published (Vertommen, Vandendaele & Van Praet, 2012). Journalistic stance refers to how sets of evaluations relating to people, objects or ideas enhance coherence in a text, explicitly or implicitly unveiling particular ‘‘lines of vision’’ (Verschueren, 1999, 2012). These ‘‘lines of vision’’ can include exploitation of common

stereotypes and beliefs in society and even create and reproduce their own ones, which are called “news values” (values that are created and spread by newspapers) (Huan, 2018).

The other important sign of journalistic stance incorporates how journalists evaluate and engage with different news sources. For instance, they can dedicate the biggest part of the article to the source they find trustworthy and leave a small passage at the end of it for the source which they do not find reliable. Giving priority to particular sources is governed by social, institutional and personal factors of values and beliefs (Huan, 2018).

(13)

1.3 Methods of manual stance analysis of news media texts

The most widely used method in the research of stance analysis of media texts is the “Appraisal” framework (Martin & White, 2005). As one of its authors describes it, this

framework “provides a delicate taxonomy of attitudinal meanings that attends to such issues as the basis on which the positive or negative assessment is being made, its target, what is at stake socially, and whether the assessment is explicitly or implicitly conveyed” (White, 2015). This taxonomy groups meaning-making resources (words and expressions that demonstrate the author’s stance) together as the “language of evaluation”. By this “language” speaker’s or writer’s personal, evaluative involvement in the text is revealed as they adopt stances either toward phenomena (the entities, happenings, or states of affairs being construed by the text) or towards meta phenomena (propositions about these entities, happenings, and states of affairs).

The “Appraisal” framework consists of three sub-systems, which represent different types of stance: attitude, engagement, and graduation (White, 2015). The “Attitude” sub-system refers to evaluations that are associated with speakers’ or writers’ subjective or affective

reactions on people, objects or situations. Engagement is applicable when the author expresses her degree of commitment to facts or points of view (e.g. words and expressions which show how much the author trusts the source she refers to). Graduation includes examples when the author evaluates the scale of the described subject or event (e.g. “a little” and “a lot”) and can be applied both to the attitude and engagement sub-systems. For instance, in this sentence, the word “superhero” can be classified in both “attitude” and “graduation” categories: “Agent McCarthy, ... in a superhero move, used his body as a shield” (White, 2015).

All the subcategories of attitude, engagement, and graduation can be either positive or negative (White, 2015). Attitude sub-system includes two more levels of multiple specific subcategories. The first level has the following three categories: judgement, appreciation, and affect. Judgement represents all examples, where the author evaluates people’s behaviour and character by reference to ethics/morality and other systems of conventionalized or

institutionalized norms (e. g. “elite troops”, “swift action”, “weary rescuers” or “would-be rescuers”). Appreciation includes evaluations of objects, artifacts, texts, states of affairs, and

(14)

processes (“advanced technologies”, “blocked road” or “shocking sights”). The “Affect” category is used when the author makes subjective statements about other people’s feelings, making an assessment of their emotional reactions (e. g. “he was grateful for conversation”, and “expressing sympathy and condolences”).

Some researchers criticize the “Appraisal” framework: they find this method not complete enough for detecting journalistic stance (Vertommen, Vandendaele & Van Praet, 2012). According to Vertommen et. al. (2012), the stance should be evaluated not only in terms of including overt or implicit evaluative expressions in the text, but also in terms of importance – the choice of which information to include in the article and which to ignore. These choices are governed and constrained by the specific socio-economic and professional context in which the journalist operates (Vertommen, Vandendaele & Van Praet, 2012).

The other commonly used method for stance analysis of news media texts is Critical Discourse Analysis (CDA). CDA is a method of critical linguistics, which means an enquiry into the relations between signs, meanings and the social and historical conditions which govern the semiotic structure of discourse (Fowler, 1991). CDA has an emphasis on linguistic

manifestations of social and political domination in both spoken and written texts. A major goal of CDA is to develop a framework of analysis that can become "a resource for people who are struggling against domination and oppression in its linguistic forms" (Fairclough, 2010).

CDA does not only identify and label certain key linguistic constructions; it relates them to the context in a special way. The authors of CDA emphasize, that the word “critical” in the name of it could be intended to denote negative evaluation, but this negativity is not necessarily the aim of this analysis. Actually, this word means that CDA “unpacks” the ideological

underpinnings of discourse that have become so naturalized over time that we begin to treat them as common, acceptable and natural features of discourse (Teo, 2000). However, Teo thinks that there is still much consolidatory work to be done to give CDA a conceptual and analytic unity and coherence (Teo, 2000).

(15)

1.4. Application of manual stance analysis in research

This subchapter will describe the ways how researchers used methods of manual stance analysis and which patterns of each type of stance they found. All the scholars who applied manual stance analysis for their research chose a small number of topics (from one to four) and a limited number of news articles. For instance, one of them examines the stance in cross-cultural media discourse by comparing 15 disaster news reports on the Sichuan earthquake of May 2008 in three newspapers (Liu & Stevenson, 2013). The author of the book "Evaluation and Stance in War News: A Linguistic Analysis of American, British and Italian Television News Reporting of the 2003 Iraqi War" analyzed transcriptions of reports from four TV-channels on one topic (Haarman, 2009). Other types of research with manual stance analysis included only two sample texts of two newspapers on one topic (Teo, 2000). The study of manual stance analysis of 24 news’ headlines in six Australian and Chinese newspapers included four topics (Zhang, 2013). Another paper presents a case study cross-comparing 48 news articles from 16 newspapers reporting on one event (Vertommen, Vandendaele & Van Praet, 2012). Importantly, all these studies had different types of topics: while large-scale studies used the broad definition of the word “topic” (e.g. “Iraqi war”), which included many different news events, small-scale studies meant by this word “news event” (e.g. “Tiger Woods wife Elin wants $750 Million in divorce settlement”).

For applying manual stance analysis in news media research most of the scholars preferred to use the “Appraisal” framework (Zhang, 2013; Liu, Stevenson, 2013; Huan, 2018), while others chose CDA (Teo, 2000; Fairclough, 2010). The “Appraisal” framework was applied by the scholars on the word-level: they extracted keywords, which showed the author’s stance, and labeled them with the framework’s categories. Some of the researchers did not focus on all of the subcategories of “Appraisal”, but used only “Attitude” (Zhang, 2013) or both “Attitude” and “Engagement” (Huan, 2018). C. Huan (2018) classified all the words in the corpora of Chinese and Australian news into parts of speech and labeled them with one of the following subcategories: “affect”, “social esteem”, “reaction”, “social sanction”, “appreciation”. For each of the categories, he chose “positive”, “negative” or “neutral” label and calculated which news

(16)

corpus had more positive, negative and neutral words per each category. Describing the patterns of “affect” subcategory of the framework (which includes the situations when the journalist describes other people’s feelings), Huan pointed out that the verbs “concern”, “worry” and “panic” were usually referred to ordinary citizens, whereas those who often fear about risk events were the authorities. He also paid attention to different references in the articles (e.g. affiliated with the government or not), quotations (e.g. some speakers were quoted more than others, etc.) and the level of trust to these sources of information (e.g. the journalist emphasized that the source should be trusted or not).

Another way of applying the “Appraisal” framework was to analyze patterns of stance instances (words or phrases which show stance in the texts) for categories in the framework’s sub-systems (Liu & Stevenson, 2013). Most of the examples from the “judgement” category emphasized the positive or negative evaluation of rescue workers’ professionalism and of the political elite’s actions. The examples of the “appreciation” category included descriptions of circumstances and conditions in the earthquake-hit areas, emphasizing the scale of destruction. Sometimes negative stance in descriptions of destructions was shown specifically to demonstrate later positive stance to the people, who were not afraid to go there (e.g. to emphasize the positive role of political leaders, who came to damaged zones even though it was dangerous). In the instances of “affect” category journalists usually described the feelings and emotions of people who suffered from the earthquake and of the politicians who demonstrated their engagement in the situation (e.g. “the general secretary said sentimentally…”) (Liu & Stevenson, 2013).

Some researchers, who applied the “Appraisal” framework, went beyond the analysis of the specific words which show stance (Zhang, 2013). They also analyzed the style and “tone” of the texts. They imply that stance and attitudes of different news agencies are realized through diction and text style: some media use neutral narrative words, others – more evaluative words and judgments. For instance, scholars conclude about one of the newspapers that it “reports the event in a mocking tone”. They also pay attention to direct quotes (which sometimes indicate neutrality, while in other situations they only seem neutral and actually show the stance) and paraphrasing of them. Moreover, scholars evaluate the stance according to the background of each topic and to the Chinese and the Australian agenda (which topics are more familiar to

(17)

readers, and what is the difference between the countries’ readers’ interests).

Critical Discourse Analysis (CDA) was also applied differently by scholars who used it in their research about news articles. In the study about stance in war news, Haarman created a separate text corpus per each news media and removed from them the words which were common for both editions (e.g. words connected with the topic of the Iraqi war), and therefore, could not help to show the difference between their stances (Haarman, 2009). Importantly, he did not remove common words, such as “say” or “but”, because they could show the stance of the journalist (e.g. “The Iraqis were pushed back but not defeated. Neither was the city taken.”).

Teo applied CDA for manual stance analysis for revealing racism in news (Teo, 2000). He found a generalization pattern: the journalists tried to generalise the actions of the group of Vietnamese criminals in Australia for all Asian people. Moreover, the research revealed

over-lexicalization in the publications: the journalists used descriptive words, when they did not have to, which showed their stance towards a specific subject. For instance, if the author of the article writes "female lawyer", it means that, according to the journalist, a lawyer should be male by default.

The completeness of the information in the articles was an important criterion for

scholars in manual stance analysis of news: researchers pointed out if the journalists tried to hide something or, on the contrary, included extra information from the source which they supported (Haarman, 2009; Vertommen, Vandendaele & Van Praet, 2012). Vertommen et al. (2012) tracked down subtle detail differences among the articles analyzed to see which facts were considered as more important ones by each of the newspapers.

Vertommen et al. (2012) used a multidimensional approach for manual stance analysis of news articles. They discerned ‘‘discourse topics’’ (van Dijk, 1977; Chafe, 1994) related to the general topic of the research: issues such as the reference and (political) background of political parties, the geographical situating of them, the mention of side-participants such as the Supreme Court, and so on (Vertommen, Vandendaele & Van Praet, 2012).

Scholars who applied approaches of manual stance analysis other than the “Appraisal” framework, paid attention to the sources which were used in the news articles, and to the way how they were used (Haarman, 2009; Vertommen, Vandendaele & Van Praet, 2012; Teo, 2000).

(18)

Haarman (2008) calculated the distribution of news sources for each of the news companies to see the differences in the number of articles which included each type of source. He also

analyzed “framing verbs” in the sentences where the reporter refers to a source, such as “say” or “reveal”: it helped to point out which sources were supported by the journalist and which of them were not. Some scholars have earlier stated that these verbs function as a linguistic device to express stance. Other researchers, who applied CDA, pointed out that sources which represented minorities of the society, were more "silent" in the news than the majority and were quoted in press less than a quarter of times people from the major group were quoted (Teo, 2000). Vertommen et al. (2012) pointed out that during the political conflict in Belgium some journalists completely ignored representatives of one of the parties in their news articles.

Since the data in this thesis will be labeled only with three types of stance – “agree”, “disagree” and “discuss” (“neutral”), we cannot apply the “Appraisal” framework, which has three levels of subcategories and cannot be used in machine learning classification. Neither CDA can be applied in this analysis, because machine learning model cannot analyze facts, which are not mentioned in the texts. However, the “Appraisal” categories can be used for annotation instructions to create patterns of stances and to define what should be considered as a positive or negative stance. Therefore, articles will not be labeled with “Appraisal” subcategories, but these categories’ descriptions will be used to define what annotators should consider as “agree”, “disagree” or “discuss” stance. Moreover, the annotation instructions will include news’ sources analysis, which is a part of CDA.

1.5. Methods of automatic stance analysis of media texts

1.5.1. Supervised machine learning models

Supervised machine learning is the search for algorithms that reason from externally supplied instances to produce general features, which then make predictions about future instances (Kotsiantis, 2007). The resulting model can be used to assign class labels to the new instances where the values of the predictor features are known, but the value of the class label is unknown. Every instance in any data set used by machine learning algorithms is represented

(19)

using the same set of features. If instances are given with known labels (the corresponding correct outputs, e.g. “agree”, “disagree” and “discuss” for stance) then the learning is called supervised, in contrast to unsupervised learning, where instances are unlabeled.

Before applying the supervised machine learning model our data needs to be annotated: each instance of the data set must be labeled with the corresponding category. The second step is the feature extraction – looking for patterns which help to distinguish between the labels. The simplest method for it is that of “brute-force”, which means measuring everything available in the hope that the right (informative, relevant) features can be found. However, a data set

collected by the “brute-force” method will probably contain noise (non-informative features) and missing feature values, and therefore will require significant pre-processing (Zhang et al., 2002).

One of the most important parts of every supervised machine learning model’s

application is data preparation and pre-processing. There is a number of methods to choose from to handle missing data and for outlier’s (noise) detection (Batista & Monard, 2003; Hodge & Austin, 2004).

Another important step for creating a supervised machine learning model is choosing a classifier (a function that assigns a label to a data point). This choice should be made simply by comparing the accuracies of different classifiers. One technique for calculating a classifier’s accuracy is to split the training set by using two-thirds for training and the other third for

estimating performance, apply the classifier to it and divide the number of correct predictions by the total number of instances. In another technique, known as cross-validation, the training set is divided into mutually exclusive and equal-sized subsets and for each subset, the classifier is trained on the union of all the other subsets of the data. Accuracy is calculated for each subset separately. The average of the error rate (accuracy) of each subset is, therefore, an estimate of the error rate of the classifier.

After the most accurate classifier is chosen, the supervised machine learning model can be applied to unlabeled test data. The result can be evaluated by calculating the accuracy of labeling (the number of correctly predicted instances divided by mistakenly predicted instances).

(20)

1.5.2. Automatic stance classification with supervised machine learning

The most accurate machine learning models for automatic stance detection in news were developed during the Fake News Challenge (FNC) in 2016 and after it (Riedel et al., 2017; Baird, Sibley & Pan, 2017; Hanselowski et al., 2018). The Fake News Challenge was an online competition, which suggested a solution for fake news detection to be composed by a collection of automated tools to support human fact-checkers and speed up their processes. Stance detection was considered as one of these tools. 50 teams participated in the competition. The goal of the first part of FNC was to create a supervised machine learning model which could detect stance in news. While stance detection has been previously focused on individual sentences or phrases, the systems participating in FNC had to detect the stance of an entire document, which raised many new challenges (Hanselowski et al., 2018). The sentence-level detection would not work since one sentence of the disagreeing article could be mistakenly labelled as agreeing if considered in isolation. This is the reason why in this thesis stance analysis of the news corpus will also be conducted on the text level, but not on the sentence level.

In the beginning, when research started in this field, teams were provided with 300 claims (news’ headlines representing topics) and 2,700 articles’ bodies (texts without the headline). The first goal of the participants was to create models that can detect which articles relate to each topic (in other words, to label each pair of headline and article body as “related” or “unrelated”). The next step was to make the models label each of the “related” pairs of news articles and headlines with one of the three categories – "agree" (which means that the journalist agrees with the claim), "disagree" (the journalist disagrees with the claim), "discuss" (the journalist neutrally describes the claim).

An important part of developing supervised machine learning models for every team was feature engineering. For instance, if the researcher thinks that only verbs and nouns can help to define stance in a text, she can remove from the data set all the words except verbs and nouns. The resulting new data set would be a new set of features. Each team created its own features for developing the model and kept the ones which gave the best results and also tried ready-to-use features. For instance, widely used bag-of-words features represent a text as the multiset of its words, disregarding grammar and word order but keeping multiplicity (Riedel et al., 2017).

(21)

All the three winners of FNC were using a multi-layer perceptron (MLP) classifier or a multi-layer neural network (Riedel et al., 2017; Baird, Sibley & Pan, 2017; Hanselowski et al., 2018). This model consists of a large number of units (neurons) joined together in a pattern of connections. Units in a net are usually segregated into three classes: input units, which receive information to be processed; output units, where the results of the processing are found; and units in between known as hidden units (or hidden layers). First, the network is trained on labeled data to determine the input-output mapping. Second, the weights of the connections between neurons are fixed and the network is used to classify a new set of data. Generally, to properly determine the size of the hidden layer is a problem, because an underestimate of the number of neurons can lead to poor approximation and generalization capabilities of the model, while excessive nodes can result in overfitting and eventually make the labeling of new data instances more difficult.

The FNC winner team used a combination of deep convolutional neural networks

(regularized versions of multi-layer perceptrons) and gradient-boosted decision trees with lexical features. In the first model, the winner of the competition applied several different neural

networks used in deep learning. This model uses a one-dimensional convolutional neural net on the headline and body text, represented at the word level using the Google News pre-trained vectors (the software which computes vector representations of words). The output of the 1 neutral net is sent to a multi-layer perceptron (MLP) and trained end-to-end. The other model which was also presented by the winner of the competition was a Gradient-Boosted Decision Trees model. It takes few text-based features derived from the headline and the body of an article, which are then fed into a Gradient Boosted Tree classifier to predict the relation between the headline and the body (the stance category). The algorithm produces a prediction model in the form of decision trees (it goes from observations about an instance - “branches” to

conclusions about its target value - “leaves”) (Baird, Sibley & Pan, 2017). The general accuracy achieved by the model was 82,02%.

The team that ended as number two (Athene) proposed an ensemble of five multi-layer perceptrons (MLP) with six hidden and softmax layers and multiple hand-engineered features in addition to the baseline features provided by the FNC-1 organizers. For the prediction, they used

1 Google Code Archive. Retrieved from: https://code.google.com/archive/p/word2vec/

(22)

hard voting: the label was chosen by a simple majority vote for accuracy. The general accuracy of the model was 81,97%.

The team that landed the third place, UCL Machine Reading (UCLMR) used a

multi-layer perceptron with bag-of-words features (Riedel et al., 2017). The model scores the probability of every label and chooses the one with the highest score (the same way as team “Athene”’s model chose a label). The scholars, who were in the team, chose a different method for their model. They used multi-layer perceptron (MLP) classifier with one hidden layer. Before applying the classifier Riedel et. al. created two simple bag-of-words (BOW) representations for the text inputs: term frequency (tf) and term frequency-inverse document frequency (tf-idf). Then they extracted from the data only the following: the TF-vector of the headline, the TF-vector of the body, the cosine similarity between the 2-normalised TF-IDF vectors of the headline and body. They also tokenized the headline and body texts and derived the relevant vectors. For the TF-vectors the scholars extracted a vocabulary of the 5,000 most frequent words in the training set and excluded stop words. For the TF-IDF vectors, a vocabulary of the 5,000 most frequent words was defined on both the training and test sets and the same set of stop words was excluded. The scholars achieved the general accuracy of 81,72%.

After the end of the challenge, researchers continued building models for classifying stance in news and compared their results to the winners of the competition. Hanselowski et al. reflect on the results of FNC three winners' models (Hanselowski et al., 2018). They compare the features and architectures used, which leads to the creation of a novel feature-rich stacked LSTM (Long short-term memory) model that performs on the same level as the best systems of the competition, but is superior in predicting minority classes. LSTM model is an artificial recurrent neural network (RNN) architecturewhich cant process not only single data instances (words), but

also entire sequences of data. The scholars also proposed a new evaluation metric for the FNC and related document-level stance detection tasks, which was less affected by highly imbalanced data sets (the data set offered by FNC had the following percentages of four labels: 77%

“unrelated”, 12% “discuss”, 9% “agree”, 2% “disagree”). According to Hanselowski et al. (2018), most of the related articles had the label "discuss", so when the classifier chose this label, the chance to find a correct match was higher than when it chose "agree". At the end of this

(23)

evaluation the positions of teams changed and the second team became the winner, since the model of the first team got higher accuracy at the competition thanks to the prevalence of one class over the others (if their model would always label every news article with this dominant class, it would still have the highest accuracy). The scholars paid more attention to the minor classes (“agree” and “disagree”) which were more important for the research and figured out that the second and the third teams’ models were better in classifying them.

Hanselowski et al. (2018) upgraded the model of the team “Athene” (number two in the competition) and made it more accurate in the prediction of the minor labels, not losing accuracy for predicting majority labels. The final general accuracy for the upgraded model was 89%.

The supervised machine learning models, which were developed for the Fake News Challenge, are expected to be universal and to predict stance in any articles. However, during the challenge, they were tested only on the data, which was published by the organizers of the FNC (it was retrieved from unreliable sources which published “fake news”). After the competition, Hanselowski et al. (2018) tested all the models of the three winners on another data set, which consists of 188 debate topics with popular questions from the user debate section of the New York Times and users’ comments to these topics (Hanselowski et al., 2018). The accuracy dropped significantly: for the team number one it became 73%, for the second and third teams – 68% and 67% respectively.

For the practical part of this thesis, the model of the “Athene” team upgraded by Hanselowski et al. was chosen, since it achieved the highest accuracy after the upgrade.

However, considering the fact that we are going to apply the model to “quality media” news on one particular topic, annotated according to the manual stance analysis rules (which were not applied before for annotating data sets for automatic stance analysis), the resulting accuracy can be much lower.

(24)

Chapter 2: Data collection and evaluation

For the manual and automatic stance analysis, two data sets are needed – consisting of Russian and global news. Each data set will include articles of two newspapers – the one which is famous for being an independent news outlet, and another, which was accused of being biased previously. Since RBC and CNN are considered as independent newspapers, while RT and "Al Jazeera" received accusations of being subjective in the past, these four news media outlets were chosen for stance analysis in this research. The published dates of the scraped articles will match the following period: from the beginning of June 2016 till the end of September 2016. In the previously collected data sets which were published on the Internet, only CNN articles for this period were found (as a part of Kaggle "All the news" data set). Therefore, the websites of the 2 other three newspapers will be scraped in order to get the relevant news for the defined period. Only the news articles which are relevant to the topic "Russia's participation in the Syrian civil war" will be used in this research.

First, we will extract the CNN news from the downloaded data set. Second, we will scrape news from the websites of "Al Jazeera", RBC and RT and store all the articles in two data sets – with Russian and global news. Scraping will be done using the filters with particular keywords on the websites of the news media. More specifically, among all the news of the defined period, we will use only the articles which contain the words connected with Russia (e.g. "Russia, Russian, Moscow, Kremlin, Putin" etc.) and Syria (e.g. "Syria, Syrian, Bashar Assad, Aleppo" etc.).

2.1. Extracting CNN news from the Kaggle data set

"All the news" data set from Kaggle contains 143,000 articles from 15 American newspapers – "New York Times", "Breitbart", CNN, "Business Insider", "The Atlantic", "Fox News", "Talking Points Memo", "Buzzfeed News", "National Review", "New York Post", "The

2 ​All the news. (2017, 20 August). Retrieved from https://www.kaggle.com/snapcrack/all-the-news

(25)

Guardian", NPR, "Reuters", “Vox”, and "The Washington Post". It has the following variables: "id", "title" (article headline), "publication" (newspaper's name), "author", "date", "year", "month", "url" (link to this article on the newspaper's website), "content" (article text). The articles from this data set were published in the period from 2016 to July 2017.

To extract the CNN articles from the data set we created a Python script. In this program, we used Python library Pandas (McKinney, 2012), which was designed for working with data 3 sets and data manipulation. First, the script collects the data where the newspaper's name is "CNN". Second, it extracts the rows where the year is 2016 and the month is one of the

following: June, July, August or September. Then it iterates through the column "content" in the extracted rows, makes all the letters in them lower-case and checks which of them contains both, the keywords "Russia" and "Syria". Importantly, it does not limit the search to these two words, but looks for all combinations which can contain them (e. g. "Russian", "Syrian" etc.). We repeat the same process for the rows which contain Russian and Syrian presidents' names ("Putin" and "Assad") at the same time, and the Russian minister of foreign affairs' name ("Lavrov") in combination with "Syria". As a result, we got a new data set of 48 CNN news related to Russia and Syria.

2.2. Scraping news

Since we were not able to find the relevant data for “RT”, "Al Jazeera" and “RBC”, we had to build scraping scripts for each of the newspapers' websites (we will call these scripts "scrapers" further). For all of them, we were using Python’s library “Selenium” , which is 4 designed for web-scraping (Jain, Kaluri, 2015). This library has many advantages and supports multiple functionalities as compared with the licensed automation. It allows the designed scripts to communicate with the browser directly with the help of native methods. For RT and "Al Jazeera" we were also using the library “Newspaper3k” (Brena et. al., 2019), which has built-in 5 tools for getting the headline and the body of the article separately. However, when we applied it

3 ​"Pandas". Retrieved from https://pandas.pydata.org/

4 "Selenuim". Retrieved from https://selenium-python.readthedocs.io/ 5 ​"Newspaper3k". Retrieved from https://newspaper.readthedocs.io/en/latest/

(26)

to RBC's website, this library failed to get articles’ text bodies, so we had to build a separate scraper according to its HTML structure.

Before the scraping, on the website of the newspaper in the search engine, we entered the keywords "Russia, Syria, war". After trying to use all the other possible combinations of

keywords connected with the Russian participation in the Syrian war ("Putin", "Assad",

"Lavrov", "Kremlin", "Moscow", "Damascus") we concluded that there was no need to use them, because in all the articles where these words were presented, the words "Russia", "Syria" and "war" were used as well.

For all the three websites, we used a two-step scraping process. The web page with search results was scraped by the first script to get the links of the articles and to store the results in a csv-file. In the second script (which was exactly the same for RT and "Al Jazeera") we created the loop, which did the following for every link in the csv-file with the previously scraped URLs: it got the headline and body of every article and stored them in the Pandas data frame (a multidimensional associative array). After scraping all of the links from the file, it stored the scraped data in the final csv-file with the following columns: "title", "text", "url".

2.2.1 Scraper for RT

For scraping RT's website we were using the module "Webdriver" from the Python 6 library "Selenium" (Salunke, 2014). In order to scrape the web page, we had to close all the pop-up windows first. Selenium has a function, which automatically clicks on pop-up windows on the website (RT had two of them – cookies window (it asks the user to accept cookies) and the subscription window (it can be closed by clicking on the "close" button)). Only after closing these windows the scraper can start getting the data.

RT's website has infinite scrolling: its content is loaded as a user scrolls and clicks on the "Load more" button. This means that we cannot get the data from the page without clicking on this button till the end (and URL is not changed after clicking as well). Hence we created a loop, in which the scraper clicks on the button as many times as needed. When the page is loaded till

6 RT in Russian. Retrieved from: https://russian.rt.com/

(27)

the end, it gets all the links on it and stores them in a csv-file. Then the second scraper (which was described before) scrapes articles from each of the links and creates another file with data.

2.2.2. Scraper for RBC

To scrape the links of the articles from RBC’s website we used the same algorithms as 7 for RT’s website: first, automatically closing pop-up windows with Webdriver, second – scrolling and clicking "load more" button by the script, and then getting the links of the articles using HTML-tags' names.

The second scraper, which goes to each of the links and scrapes the headline and the body of the article, is different from the one which we used for the RT and the "Al Jazeera" websites. The “Newspaper3k” library gets them automatically for the two mentioned newspapers above, but for the RBC website, it failed. So we have created another scraper, which uses Python libraries "Requests" (Chandra & Varanasi, 2015) and "Beautiful Soup" (Nair, 2014). 8 9

“Requests” gets the HTML-page of the article, "Beautiful Soup" searches through elements of the HTML-tags. The scraper looks for the particular tags where the headline and the body of the article are stored, extracts a text from them and stores them in a csv-file.

2.2.3. Scraper for "Al Jazeera"

The "Al Jazeera" website has a different structure from RT's and RBC's websites. While 10 the last two operate with user's scrolling, The "Al Jazeera"'s search engine returns multiple pages with page numbers with a specific number of articles per page as a result. It also does not allow the user to specify the date of publications. Therefore, in the scraper for this website, we had to go through all of the pages with the results for our keywords until we got the ones with the time period, we were interested in. The script works as follows: it clicks on the page numbers until it gets the first page of our time period (previously we manually found the numbers of the pages

7 News of the day about Russia and the world. Retrieved from: https://www.rbc.ru/ 8 ​Requests: HTTP for Humans. Retrieved from https://2.python-requests.org/en/master

9 ​Beautiful Soup Documentation. Retrieved from https://www.crummy.com/software/BeautifulSoup/bs4/doc 10 Breaking News, World News and Video from Al Jazeera. Retrieved from: https://www.aljazeera.com/

(28)

with news of our particular period). From that point, it starts scraping the links and finishes with the last page of our time period. Then it stores the results in a csv-file.

2.3. The results from the data collection

The scraped and extracted Kaggle data set articles are organized into two data sets: one – with global news, another – with Russian news. They do not have the same size: The Russian news data set includes 280 articles (92 from RBC, and 188 from RT), and the global data set includes 200 articles (78 from CNN and 122 from Al Jazeera). Although the goal was to scrape only the news about Russia’s participation in the Syrian civil war, in the scraping results there were articles which contained the words "Russia", "war" and "Syria", but were dedicated to topics different from the original set target for this research. These news articles, irrelevant to the topic of the thesis, had to be removed from the data sets manually. The Russian data set had more irrelevant data, because the Russian government established the law which obliged all the newspapers mentioning ISIS (Islamic State) always to follow this abbreviation with the phrase "Islamic State, the terroristic organization, forbidden in Russia". This made us in the beginning scrape all the news about ISIS, which did not mention Russia except in this warning. The global news data set also had some irrelevant data. For instance, CNN articles were often dedicated to Donald Trump and his quotations about the presidential elections, where he said something about the Syrian war and Russia. Sometimes Russia was mentioned in a context different from the Syrian war, especially when Trump was talking about the future collaboration of the country with the US.

We decided to keep the news articles which were not dedicated to Russia's participation in the Syrian war, but mentioned Russia in the context of the conflict, because sometimes being mentioned here could disclose the newspaper's stance towards the country. For instance, the RT published news about some events in the war without Russia’s participation, but at the end of these articles, providing a background to the war in general, they emphasized the important and positive role of the country in the war. At the same time, "Al Jazeera" could give the same background about the war, but also emphasizing destruction caused by the Russian air forces.

(29)

We eliminated from both data sets all the news which were not relevant to our topic. In the end, the Russian news data set contained 263 articles (85 from RBC, and 178 from RT), and the global news data set amounted to 173 articles (71 from CNN, and 102 from Al Jazeera), 436 articles in total.

(30)

Chapter 3: Annotation

When using supervised machine learning we need to split our data into a training and a testing set (Kotsiantis & Zaharakis, 2007). Training set should take 70-80% of the original data set and must be annotated manually, assigning to each news article one of the three labels – "agree", "disagree" or "discuss". The “agree” label will be chosen by annotators if the journalist agrees with the claim she describes, the “disagree” label is used when she shows disapproval of it, the “discuss” label is used when she neutrally describes the claim.

The problem in creating a theoretical framework for this research was the absence of annotation rules in previous academic studies about automatic stance analysis (Riedel,

Augenstein & Spithourakis, 2017; Zeng, Zhou & Xu, 2017; Hanselowski et al., 2018; Ferreira, 2016). Therefore, we used the previous publications about manual stance analysis in news for creating a new set of rules (Teo, 2000; Vertommen et al., 2012). These annotation guidelines can be used for other types of stance analysis research of news articles.

3.1 Data set description

In the previous academic research on automatic stance analysis, the data sets included a large number of article bodies while the number of "claims” (or article headlines) was nine times less (Hanselowski et. al, 2018). This happened because one headline was connected with several news articles from different newspapers. It was useful for those particular incidents, because it could show the difference in several newspapers' stances towards each claim. However, for this approach the researcher should be convinced, that all the newspapers in her data set (or at least several of them) had articles on the same topic. Our data set could not fulfill this requirement. We have chosen newspapers mainly focused on Russia, the US, and the Middle East in order to see how often they demonstrate their stance in describing Russia's role in the Syrian war. This difference in their main topics also influenced the agenda: for instance, CNN published a lot of articles dedicated to Trump talking about Syria and Russia, the Russian newspapers quoted

(31)

plenty of Russian officials, while "Al Jazeera" was mostly focused on the military attacks in Syria with Russian air forces. Since the claims of the articles varied a lot, we decided to choose another approach for this research: to create pairs of each article's headline and body.

3.2 Annotation methodology

Each article will be annotated by two annotators and for each case of their disagreement, a third annotator will make a decision. Detailed annotation instructions with examples can be found in Appendix I. This guideline is a compilation of many researchers work, such as Martin and White’s (2015); Fowler (1991); Teo (2000); Fairclough (2010), where the author of the thesis has compiled them.

From the “Appraisal” framework the three categories of the “attitude” sub-system were chosen for the annotation instructions: “affect”, “judgement”, “appreciation”. The annotators will have to look for positive or negative assessments of human behavior and character by reference to ethics or morality and other systems of conventionalized or institutionalized norms (the rule based on “judgement” category of “Appraisal”). The annotation instructions also define stance as positive or negative assessments of objects, artifacts, texts, states of affairs, and

processes in terms of how they are assigned value socially (a reference to aesthetics and other systems of social valuation), which are specified in the “appreciation” category of the

“Appraisal”. The annotators will also have to pay attention to the meanings by which propositions are emphasized or mitigated meanings by which the boundaries of semantic categories can be blurred or sharpened (“graduation” category). Stance will also be defined by the adverbs which indicate the subjectivity of the journalist’s statement (“engagement”

category).

The annotation instructions also include rules which are based on the previous research in CDA. Since there is no particular framework for CDA, the patterns from studies, where it was applied for stance analysis of news, were used. According to them, the annotators should define if the sources in the article are specific or vague (e.g. the vague source would be: “some experts think”). Moreover, words, which show how much the journalist trusts the source she refers to, will also be considered as an indicator of stance in this research. Another criterion which was

(32)

found in the research of CDA was the completeness of the information provided in the article. In other words, if the text is written in favor of one source, while another one is almost not

presented, this article should be labeled with the “agree” category. Using unnecessary descriptive words which are not relevant to the topic, will also be considered as an indicator of stance.

The annotators will also analyse the style and “tone” of the texts. The neutral narrative discourse will indicate “discuss” stance, evaluative words and judgments – “agree” or “disagree” stance.

3.3. The inter-annotator agreement

Each of the two annotators was annotating the data sets in separate files. After the

annotation, we created a Python script for calculating the level of agreement between them. From each file, the script extracted the columns with stances (the news articles in the annotators’ files were in the same order, so the indexes of the rows were the same) and compared them. First, it calculated the percentage of agreement. It was between 74% and 75%. However, we had to run a statistical test to decide if this percentage was high or low. For this, we chose Cohen's Kappa statistic (Brennan & Prediger, 1981). Cohen's Kappa is a statistic which measures inter-annotator agreement for categorical values. The difference between Cohen's Kappa coefficient and simple percentage of agreement is that Cohen's Kappa test also uses the chance agreement in the calculation. It calculates the number of times when annotators could agree with each other "by chance", so agreement, in this case, was achieved by luck and should not be included in the final statistic. You can calculate Cohen's kappa as follows: (Observed agreement – Chance

agreement)/ 1 – Chance agreement.

The Python library Scipy has a module for calculating this statistic, so we applied it in 11 the same script. The result calculated by the program was 0.46. The authors of this statistical model recommended to characterize the results of the test over 0.75 as excellent, 0.40 to 0.75 as fair to good, and below 0.40 as poor (Fleiss & Cohen, 1973). Hence, according to them, the

11 sklearn.metrics.cohen_kappa_score — scikit-learn 0.21.3 documentation. Retrieved from:

https://scikit-learn.org/stable/modules/generated/sklearn.metrics.cohen_kappa_score.html.

(33)

result is acceptable (“fair to good”). However, in order to make it more valid, we will ask the third annotator to solve the cases when the first two annotators did not agree with each other.

3.4. Complicated cases for human annotation

The incidents, when two annotators chose different labels for the same article, they were analyzed, because it is necessary to understand which stance patterns are confusing for coders. For instance, the annotators disagreed, when the journalist referenced the source not exactly after reporting the claim at the beginning of the article, but after providing some more details. In this case, the reader could think that the information he read at the beginning was the actual stance of the newspaper, since the source was not specified immediately. However, the source was still mentioned, so it was a matter of interpretation.

In a similar case, which caused disagreement of the annotators, the journalist mentioned the source, but did not specify that this information was provided by it. For instance, in this article, it is not clear if the information in the first four sentences was provided by Dr. Hamza al-Khatib or he just commented on it: "The "last hospital" in east Aleppo is a grim site. The building is crowded and the injured lay all over its corridors. Blood lines the floors, and its smell – along with the sound of screams and cries – fill the air. Much of the surrounding area has been destroyed. "It's a small hospital, but it's accepting lots of victims and the wounded daily, because it's the only hospital left in the city. This is putting a lot of pressure on it," Dr. Hamza al-Khatib told “Al Jazeera”."

Articles which were completely based on newsmaker's opinion were also interpreted differently by the annotators. The journalist cannot quote every phrase of the speaker, because it would make the readability of the article worse. However, when the author of the article leaves it without quotes and references a subjective sentence, which is, actually, the newsmaker's opinion, an annotator may consider it as a newspaper's stance.

The annotators sometimes interpreted differently the sources' reliability. At the guidelines, it was specified that they should not consider it as a real reference such phrases as "some experts say", "some analytics imply" – generalization without specifying the real source or explaining why it is hidden. However, annotators interpreted it in a different way. For

(34)

instance, one of them considered "The Ministry of Internal Affairs said" as a non-reliable source because it was not specified, which representative from MIA, who actually had said that.

However, it is often the case when information is provided by the MIA press-secretary, but the journalist does not specify it. The other example was an article based on a report written by an analytical organization. The author of the article properly specified the name of the organization and then wrote: "the analytics imply" (meaning analytics from the organization). However, one of the annotators interpreted it as a generalization of the source.

Another complicated case for the annotators was the interpretation of documents in the news article. For instance, journalists usually interpret sociological surveys' results and look for trends, but not just announce the numbers from them. RBC published an article which starts with this phrase: "Vladimir Putin over the past year and a half has become less sympathetic to

Russians, indifference is growing towards him, a survey by the Levada Center showed" . One of 12 our annotators labeled this article with "discuss" stance, because it contains a proper reference to the survey. However, another one thought that the journalist showed "agree" stance in it, because she does not directly retell information of the source, but makes conclusions about it.

Annotators could also disagree with each other when a journalist used "double

reference": when she referenced some sources which referenced another source. Stance detection in such a three-layer structure became more complicated. For instance, in this article RT

references another newspaper that references the original document on which the article is based: "For the first time since 2006 Germany introduced a new defense doctrine. It says that Russia will present a challenge to European security in the foreseeable future if it does not change its policy. The German government approved a new defense doctrine on Wednesday, which has not been revised since 2006, when Russia was a partner, there was no civil war in Syria and Libya, there was no "Islamic State"..., and conflict in Ukraine and a migration one crisis, writes Deutsche Welle". One of our annotators has labeled this article with "agree", because the

journalist writes about doctrine as she read it, while actually this information was retrieved from a German newspaper (so the author cannot be sure that the doctrine includes this information). The other annotator did not see any evidence of showing stance in this article.

12 Sociologists have noticed a decrease in the level of sympathy for Putin. Retrieved from:

https://www.rbc.ru/politics/08/08/2016/57a730549a7947cd0f5e6c84

(35)

3.5. Results of the annotation

As our case study has shown, stance remains a complicated term and depends a lot on the type of content and on the "human factor". First, defining stance is often a subjective decision and even with thorough guidelines, there are cases when annotators use their "human logic" which can work differently for two people. Second, when they classify the texts of the "quality press", it is more complicated than the “fake news” classification, because these newspapers cannot explicitly demonstrate their stance, so they do it in a hidden way, which can be confusing for annotators. While the “fake news” media's discourse is often very informal and emotional, which makes the stance in their articles obvious, the “quality press” is not allowed to use it, being limited by many rules. This makes stance detection in their news much more difficult.

Referenties

GERELATEERDE DOCUMENTEN

Uit de resultaten kan worden geconcludeerd dat wanneer een advertentie een laag aantal likes heeft, dit leidt tot zowel een hogere koopintentie als een positievere word of

Door goede ideeën en een innovatieve houding kunnen we samen op weg naar een duurzame toekomst voor de varkenshouderij.. De innovatiegroep van het ComfortClass- project

Learning modes supervised learning unsupervised learning semi-supervised learning reinforcement learning inductive learning transductive learning ensemble learning transfer

Learning modes supervised learning unsupervised learning semi-supervised learning reinforcement learning inductive learning transductive learning ensemble learning transfer

The remaining bed level changes are the variations around the time-averaged bed level which included bed level changes due to bed forms, scour at

The ISS measures operational skills, a set of basic technical skills for the Internet platform; information navigation skills, required for using technology for information

A sufficient condition is obtained for a discrete-time birth-death process to pos- sess the strong ratio limit property, directly in terms of the one-step transition probabilities

to remove any person therefrom in customary occupation, compensation shall be paid as may be agreed upon.” Section 64 of the Land Act states that “any community or persons affected