Assessing political news quality: An automated comparison of political news quality indicators across German newspapers with different modalities and reach

(1)

Assessing political news quality:

An automated comparison of political news quality indicators across German

newspapers with different modalities and reach

Nicolas Mattis

Student number: 12283177

Research Master’s Thesis

Graduate School of Communication University of Amsterdam

Research Master in Communication Science

Supervised by Dr. Anne Kroon

Word count: 7,497

(2)

1

Abstract

In order to best perform their societal functions, news media must adhere to certain normative standards for news quality – especially when reporting about events with political significance. While various past studies have examined (political) news quality, they often differ in the indicators and operationalisations that they use, making it difficult to compare findings across studies. Hence, this thesis proposes a comprehensive framework for

automatically measuring political news quality that is easily scalable and can be applied in various contexts as well as over longer timespans. It combines existing measures with newly developed classifiers that assess impartiality, thereby highlighting the potential that

supervised machine learning has for journalism studies and providing a means for future studies to assess impartiality in an automated manner. Furthermore, this thesis generates new insights into differences in political news quality across German newspapers that differ in their reach (national vs. regional) and modality (online vs. offline). The results indicate that both modality and reach appear to affect newspapers’ performance in terms of political news quality indicators, even though these differences tend to not be particularly pronounced. While especially online newspapers performed comparably worse in terms of indicators such as actor diversity, impartiality, and emotionality, the results suggest that modality and reach alone are not sufficient to explain differences across news outlets. On the whole, this thesis highlights the potential that automated research methods have for future research into (political) news quality and urges scholars to employ and advance existing measures to provide a fuller picture of (political) news quality across countries, outlets and, maybe most importantly, over time.

Keywords: Automated content analysis, News quality, Impartiality, Diversity, Supervised machine learning

(3)

2 Introduction

Often referred to as the fourth estate, news media are widely regarded as crucial for well-functioning democracies (Jacobi, Kleinen-von Königslöw, & Ruigrok, 2016). Building on Locke (1967), Strömbäck (2005) argues that one can describe the relationship between news media and democracy as a social contract: Democracy creates the necessary conditions for news media to operate in, while news media contribute to democracy by providing relevant, high-quality information to both the public and the government, as well as by

serving as a watchdog of a countries’ institutions.To live up to those standards and inform the public both accurately and fairly, news outlets need to adhere to certain normative news quality standards such as diversity and impartiality (Urban & Schweiger, 2014).

Naturally, this begs the question how well newspapers in a given media market adhere to such standards. While there is ample (comparative) research on different news quality indicators such as diversity, negativity, and objectivity (e.g. Burggraaff & Trilling, 2017; Humprecht & Esser, 2018; Jacobi et al., 2016; Masini et al., 2018), studies often differ in their choice and operationalisation of these indicators. Hence, this thesis proposes a comprehensive framework for assessing news quality through an automated content analysis (ACA) by combining existing measures with newly developed classifiers that assess three key indicators of impartiality on the article level.

ACA constitutes an efficient and affordable research methodology for the analysis of large bodies of data (Grimmer & Stewart, 2013) that can be applied to journalistic content in both an inductive and a deductive manner (Boumans & Trilling, 2016). Given that the field of journalism studies tends to largely neglect automated research methods (Boumans & Trilling, 2016), this thesis hopes to a) drive the field methodologically forward – by illustrating the potential of supervised machine learning (SML) and moving beyond mere case studies - and b) facilitate future comparative research by providing a means to assess and monitor

(4)

3 On a theoretical level, this thesis addresses concerns over an overall decrease in

journalistic news quality, that a number of scholars have voiced since new technological affordances and increased economic pressures have begun transforming traditional newspaper markets (e.g. Burggraaff & Trilling, 2017; Humprecht & Esser, 2018; Jacobi et al., 2016; Jungnickel, 2011; McManus, 2009; Plasser, 2005). The underlying argument of those concerns is that the current transformation of the newspaper market results in a fierce

competition for advertising revenue. In order to cope, newspapers attempt to boost their reach to attract advertisers, often at the expense of journalistic quality (McManus, 2009) – a process that scholars refer to as commercialisation (Jacobi et al., 2016) or tabloidization (Esser, 1999).

Commercialisation is often assumed to be especially pronounced in online news content (e.g. Burggraaf & Trilling, 2017). However, existent research on the effects of modality is inconclusive as some researchers have found evidence for lower news quality online (e.g. Burggraaff & Trilling, 2017; Welbers, Van Atteveldt, Kleinnijenhuis, & Ruigrok, 2018), whereas others have found no notable differences (Ghersetti, 2014) or even

contradictory ones (e.g. Humprecht & Esser, 2018). Other important factors that might affect news quality are the structure of a given media market (Esser & Umbricht, 2013) and the size of a newspaper (Masini et al., 2018). For example, Masini et al. (2018) claim that local newspapers can allocate fewer resources to quality reporting, especially about events on the national level. In light of these considerations, this thesis compares political news quality across German newspapers with different modalities (online vs. offline) and reach (national vs. regional). By unravelling the effects that those factors have, this thesis hopes to add to existing research by providing a clearer picture of political news quality in Germany.

In the following, this thesis will a) lay out the theoretical underpinnings of an automated news quality measurement framework, b) apply it to a sample of German

newspapers, c) present the differences across newspapers with different reach and modalities, and d) close with implications of ACA and suggestions for future research.

(5)

4 Theoretical Framework

Political news quality and its indicators

What constitutes good political news? The answer to this question will likely depend on who answers it. As Urban and Schweiger (2014) argue, a journalist might judge an article’s quality by the effort that it took to produce, whereas a reader might simply judge it by how enjoyable it is to read. This study builds on McQuail’s (1992) notion of the

‘marketplace of ideas’ and takes a normative perspective on the quality of news accordingly. Following Urban and Schweiger (2014), it posits that high-quality political news should provide accurate and impartial information that gives room to a wide variety of relevant actors and their positions in order to enhance the public’s understanding of important political matters as well as broader societal debates. This perspective builds on Strömbäck’s (2005) idea of a “participatory democracy”, the notion that citizens should (be able to) participate in all aspects of political life. Naturally, to do so effectively, citizens need to have access to high quality political information – not only during and before elections, but all year round.

Over time, various media and journalism scholars have spelled out the elements that constitute (political) quality news. For example, Jungnickel (2011) identified seven quality criteria, namely lawfulness, accuracy, relevance, comprehensibility, transparency,

impartiality, and diversity, with various sub-dimensions. Urban and Schweiger (2014) propose a somewhat similar, yet slightly more parsimonious model with six quality criteria: diversity, impartiality, relevance, comprehensibility, accuracy, and ethics. Although many analyses of news quality indicators have relied on manual content analyses (e.g. Esser & Umbricht, 2013; Masini et al., 2018; Ramírez de la Piscina, Gonzalez Gorosarri, Aiestaran, Zabalondo, & Agirre, 2015), several of those indicators can be assessed through ACA. In fact, a few studies have already done so (Burggraaff & Trilling, 2017; Jacobi et al., 2016). ACA constitutes a valuable research methodology in journalism studies as it a) significantly reduces the cost of traditional content analysis, b) provides a means to test hypotheses on a larger

(6)

5 scale, and c) potentially might even reveal insights that more traditional methods have missed (Boumans & Trilling, 2016). It also allows researcher to explore over-time developments with comparable ease. In the following, four core dimensions of an automated measurement

approach as taken in this study, namely diversity, impartiality, emotionality, and comprehensibility are discussed.

Diversity

“[D]iversity in public affairs coverage is crucial because the news media are expected to create a mediated public sphere that reflects the diversity of interests, voices, and views in society” (McQuail 1992, as cited in Humprecht & Esser, 2018, p. 1825). However, despite a sharp increase in literature on the topic of news diversity, the concept’s exact definition remains contested (Humprecht & Esser, 2018). Furthermore, diversity can be assessed at different levels of analysis, such as on the article- or newspaper-level (Masini et al., 2018).

Despite these issues, most studies agree on two core dimensions: viewpoint diversity and actor (or source) diversity (e.g. Masini et al., 2018; Urban & Schweiger, 2014; Voakes, Kapfer, Kurpius & Chern, 1996). While these dimensions are undoubtedly intertwined (Masini et al., 2018), they differ in the granularity of their operationalisations. Viewpoint diversity is a multidimensional and context-dependent concept that often refers specifically to frames (e.g. Benson, 2009). Automatically measuring viewpoint diversity is therefore a considerable challenge that exceeds the scope of this project (for an attempt see Czymara & van Klingeren, 2019). Actor diversity in contrast is a more straightforward concept in that it merely measures the quantity and range of different sources. Often, a differentiation is made between elite and non-elite sources (e.g. Humprecht & Esser, 2018). Other studies examine the proportional representation of governing and opposition parties (e.g. van Hoof et al., 2014). The logic underlying both approaches is that elite actors such as governing parties or their representatives are inherently more newsworthy and therefore covered more frequently than opposition parties or laypeople (Castells, 2009).

(7)

6 While the assumption that a greater variety of actors equals a greater variety of news does not necessarily hold true (Carpenter, 2010), actor diversity can have important

implications for viewpoint diversity (Bennet, 1996), as it reveals to what extent different actors are given the space to shape public debates (Benson & Wood, 2015). In fact, Masini and van Aelst (2018) showed that actor and viewpoint diversity are strongly intertwined. Hence, actor diversity can be considered a necessary precondition for viewpoint diversity.

Existing research into actor diversity points towards several medium-specific

differences. For example, Masini et al. (2018) found that overall, national newspapers exhibit greater actor diversity than local newspapers - supposedly due to differences in capital, staff, and resources (for contradicting findings see Voakes et al., 1996). Regarding differences between modalities, Burggraaff and Trilling (2017) argue that commercialisation affects online news outlets more strongly, as they a) face a higher degree of competition within a dynamic and distraction provoking environment, b) exhibit a slightly different understanding of their journalistic roles, and c) profit from detailed insights into what types of articles generate the most attention that allow them to fine-tune news accordingly. Accordingly, they found that online newspapers to amplify differences between elite and popular news outlets. Lastly, Jacobi et al. (2016) demonstrated that online news articles are more likely to focus on leaders and reference elites. Together, these insights motivate the following hypotheses:

H1: National newspapers exhibit greater degrees of actor diversity than local newspapers. H2: The positive effect of national (vs. regional) newspaper types on actor diversity will be

more pronounced for print than online news.

Impartiality

The notion of impartiality emerged as a journalistic norm in the early 20th

(8)

7 ever since (Maras, 2013). It is often equated with objectivity (Boudana, 2016; Maras, 2013), and remains one of the core principles that news editors and journalists around the world operate by (Maras, 2013). However, despite its popularity, impartiality still lacks a clear and agreed-upon definition and operationalisation (Cushion & Thomas, 2019). Prior examinations of impartiality have either examined journalists’ and editors’ selection processes (e.g.

Cushion, Kilby, Thomas, Morani, & Sambrook, 2018), or zoomed in on specific indicators such as the proportion of different sources or the use of and elaboration on statistics (Cushion, Lewis, & Callaghan, 2017, Wahl-Jorgensen, Berry, Garcia-Blanco, Bennett, & Cable, 2017).

This thesis focuses on impartiality on the content level. It builds on Urban and

Schweiger’s (2014) definition of impartiality as “a neutral and balanced coverage of all facts, demands and positions” (p. 823). Accordingly, it employs three key indicators to assess impartiality by: neutrality, balance of viewpoints, and balance of sources. These dimensions are taken from Urban and Schweiger (2014) and lend themselves rather well to content analysis, as articles can be coded according to the presence or absence of each dimension. In its purest form, balance is defined as “the allocation of equal space to opposing views” (Cox, 2007, as cited in Wahl-Jorgensen et al., 2017, p.783). However, systematically balancing sources and viewpoints might still distort reality and introduce artificial balance (Boudana, 2016). A good illustration of this is climate change journalism: Balancing believers and deniers, as has frequently been done in news media (Hiles & Hinnant, 2014), creates a false image of an open debate that is arguably worse than, for example, a “’weight of evidence’ approach” (Cushion & Thomas, 2019, p. 395). For this reason, this thesis

operationalises balance in terms of whether or not an article gives room to challengers of the central actor. An article that does so arguably depicts at least a limited range of sources and viewpoints that a) constitute an attempt by the journalist to create a certain degree of balance, and b) expose citizens to a certain range of views. Neutrality refers to lack of evaluation by the author, which relates directly to the notion of an objective reporting style (Maras, 2013).

(9)

8 Given the various different operationalisations of impartiality, specific insights into differences in impartiality among German newspapers are still missing. Hence, a research question is formulated.

RQ1: Does impartiality differ depending on a) the modality (online vs. print) and b) the type (national vs. regional) of newspaper outlets?

Emotionality & negativity

Various scholars have argued that an increased use of emotions might be one of the ways in which news media react to the economic pressures they are facing (e.g. Burggraff & Trilling, 2017; Jacobi et al., 2016), as emotional news is more likely to grab people’s

attention, therefore maximising readership and advertising revenue (Burggraff & Trilling, 2017; McManus, 2009). This thesis conceptualises emotionality as a bi-polar concept with positivity on the one, and negativity on the other side of the spectrum. Arguably, negativity has received considerably more scholarly attention than positivity. Negative information has repeatedly been shown to attract more attention and be better remembered

(Knobloch-Westerwick, Mothes, & Polavin, 2020; Soroka & McAdams, 2015). This so-called negativity bias provides a strong incentive for journalists to use negativity strategically in order to attract attention. While research also suggests a certain demand for it (Shoemaker & Cohen, 2006, as cited by Burggraaf & Trilling, 2017), similar effects cannot be claimed for positive news. However, it can be argued that strongly positive news still deviates from the ideal of neutrality that is traditionally valued in political news (Jacobi et al., 2016).

Existing research into emotionality has shown that a) regional newspapers employ comparably much negativity (Boukes, & Vliegenthart, 2020), b) print news tends to be more positive than online news (Burggraaff & Trilling, 2017) and c) emotionality is less

(10)

9 to higher reliance on agency material among online newspapers (Jacobi et al., 2016; Welbers et al., 2018). Taken together, these findings motivate the following hypotheses as well as an explorative research question that addresses emotionality across news outlets with a different reach:

H3a: Online news will feature more negativity than print news. H3b: Regional news will feature more negativity than national news. H4: Online news will feature less emotionality than print news.

RQ2: (To what extent) does emotionality differ between national and regional newspapers?

Comprehensibility

Especially in light of the vast amount of literature that suggests that the average citizen lacks a detailed understanding of politics (Lau & Redlawsk, 2001), it is easy to argue that in order to live up to its ideal societal role, news media needs to convey information in an understandable fashion. Although comprehensibility is determined by several factors such as coherence, conciseness or the use of additional stimuli (see Urban & Schweiger, 2014), this thesis focuses exclusively on readability. Readability refers to how easy or difficult it is to read a given text, thereby capturing quite closely what Urban and Schweiger (2014) term simplicity. Readability has been linked to newspaper circulation in Germany in the past (Schoenbach & Lauf, 2002) as it constitutes not only a normative ideal to evaluate news by, but it also appears to be a factor that affects audience evaluation and readership (Humprecht & Esser, 2018). Thus, readability constitutes an important aspect of comprehensibility that can be measured reliably. The readability of German dailies appears to be comparably high (Björnsson, 1983), but the literature does not yet reveal generalisable differences between various types of outlets. Hence, potential differences are explored by means of the following research question.

(11)

10

RQ3: (To what extent) do German news media differ in their readability depending on a) reach (national vs. regional) and b) modality (online vs. print)?

The Framework

Taken together, these quality dimensions result in a comprehensive framework for automatically assessing news quality (see Figure 1). The framework combines various quality criteria that are largely laid out by Urban and Schweiger (2014) (see Appendix A) and, despite being incomplete, allows establishing a benchmark for assessing news quality in a resource-effective manner.

Figure 1. Framework for automatically assessing news quality.

Methodology

This thesis combined several computational methods in order to tap into four

indicators of news quality. Due to a lack of automated measurements for impartiality, a new measurement approach was developed through manual content analysis (MCA) and SML. For

(12)

11 the other three quality indicators, this thesis built on and partly adapted previous work (e.g. Burggraaff & Trilling, 2017; Jacobi et al., 2016, Masini et al., 2018).

Sample

The final sample consisted of 11,491 political news articles that were gathered from six German newspapers’ online and print editions over a seven-week period between the 20th

of April 2020 and the 8th of June 2020. 8,077 duplicated or incorrectly scraped articles were deleted from the initial dataset (N= 19,568). The newspapers had either a national (“Die Welt”, “Die Süddeutsche”, “Der Tagesspiegel”) or a regional scope („Aachener Zeitung“, “Rheinische Post“, “Stuttgarter Zeitung“). The national newspapers are usually referred to as elite newspapers (e.g. Masini et al., 2018). For the regional newspapers, a distinction between elite and popular is more difficult (Boukes & Vliegenthart, 2017) if applicable at all.

Importantly, the sampling was conducted during the height of the Covid-19 crisis. As a result, the article content might differ uniquely from comparable samples.

The German media market

Furthermore, it is important to consider two particularities of the German media market that might affect the results and their comparability to other studies. First German newspapers perform comparably well in terms of news quality, as they profit from strong levels of professionalisation and institutionalised self-regulation (Hallin & Mancini, 2004), a media culture that values the notion of a marketplace of ideas (Esser & Brüggemann, 2010) and a strong public broadcast sector that appears to have spill over effects on other media (Humprecht & Esser, 2018). Moreover, the challenges brought about by declining readership, increased competition, and the internet are less pronounced in Germany than they are in many other countries (Brüggemann, Engesser, Büchel, & Castro, 2016).

Second, German regional newspapers are not per se localised, but often cover a wide range of topics and reach comparably high levels of readership (Humprecht & Esser, 2018). In fact, regional newspapers constitute about 75% of the total market and even quality papers

(13)

12 such as the Süddeutsche “draw a large chunk of their readership from their […] area” (Esser & Brüggemann, 2010, p. 40f). Hence, differences that have been found in other European countries might be less pronounced in the German market.

Data collection

All online content was gathered by means of RSS-scrapers within the inca

infrastructure for automated content analysis (Trilling et al., 2018). All scrapers were written by the author prior to the data collection. The scrapers accessed each newspapers RSS-feed on an hourly basis and checked if new articles were available. If so, the key elements of each article (date, title, teaser, text, category, author) were downloaded, parsed, and stored in a database. However, due to server issues during the sampling period, only very few articles were scraped in the first month (see Figure 2). For the final sample of political online news articles (N= 1,072), only articles that were published in the politics section of a given newspaper were retained. The scrapers are available in a public GitHub repository together with the rest of the code that was run for this thesis (https://github.com/nickma101/Thesis).

Figure 2. Number of sampled online and print political news by publication date

Print articles were accessed through NexisUni, downloaded manually in sets of 100 articles at a time, and parsed with the LexisNexisTools package (Gruber, 2020) in R. Articles

(14)

13 were selected if at least one of the following terms was present in Nexis Uni’s classification section: politik, politische, politisch, partei, parteien, landtag, bundestag, regierung, wahl, wahlen. Arguably, this sampling procedure resulted in a broader scope of articles than the category-based sampling for the online articles. Together with the server issues, this might explain the stark difference in the amount of print (N= 10,419) and online news articles (N= 1,072) in the sample. For an overview of the final sample distribution see Appendix B. Data pre-processing

In order to remove unnecessary noise within the data, several data cleaning steps were performed prior to the hypothesis testing. All article texts were processed with the python packages SpaCy (Honnibal & Montani, 2017) and NLTK (Bird, Klein, & Loper, 2009) in order to remove duplicates, formatting errors, and articles that had not been scraped correctly (e.g. because they were behind a paywall). In addition to that, a second version of the article text was created by removing stop words (words that are very frequent but not important for the meaning of a sentence) and reducing all words to their stems. This step was necessary to improve the accuracy of the emotionality analyses as well as the overall performance of some of the impartiality classifiers.

Independent variables

The two independent variables under study where the reach and the modality of a given news article. An article’s reach (M= .56, SD= .50) was determined by the newspaper that published it and assessed by means of a dummy variable that was coded as one for national and zero for regional newspapers. Similarly, an article’s modality (M= .91, SD= .29) was assessed by means of a dummy variable that was coded with one for print and zero for online articles.

Dependent variables

Following Masini et al. (2018), actor diversity was assessed as a count variable on the article level. This thesis differentiated four actor types, namely political elite actors, political

(15)

14 opposition actors, persons, and organisations. It thereby accounted for the frequency of not only different types of political actors, but also laypeople and non-political organisations. All actors were detected through SpaCy’s NER feature. If an entity that SpaCy had classified as a person was present in one of the manually created political actor lists (see

https://github.com/nickma101/Thesis), it was coded accordingly. If not, it was coded as a generic person with no particular political significance. For each actor group, the overall number of references to their respective actors was calculated. Next, a dummy variable was created for each entity group with the value one, if at least one actor from this group was named and zero if not. Lastly, the four dummy variables were added together into an index that ranged from zero (no actor groups mentioned) to four (all actor groups mentioned) (M= 2.55, SD= .80).

Impartiality was defined as a balanced coverage of relevant sources and viewpoints in

combination with an author that refrains from personal evaluation. As laid out in the

theoretical framework, balance was operationalised in terms of whether or not an article gave room to challengers of the central actor. By using a definition that expects journalists to provide more than just a single view and source for a particular topic or standpoint rather than to achieve a (near-) perfect balance, this thesis hoped to avoid earlier mentioned fallacies of assessing balance.

Impartiality was assessed through three indicators: 1) The presence/absence of balanced viewpoints (“Is the standpoint of the central political actor challenged by another actor in the text?”), 2) the presence/absence of balanced sources (“Does the article quote two or more different types of political actors - e.g. a national elite and a national opposition actor?”), and 3) the presence / absence of evaluation by the author (“Does the author

personally evaluate anything within the article?”). Added together, these indicators amount to a four-point impartiality index that ranges from a minimum of zero for not impartial, to a maximum of three for very impartial (M= 1.39, SD= .80).

(16)

15

Manual content analysis

Given the nuance that was necessary for assessing these indicators, dictionary-based measures did not suffice to accurately determine to what extent an article was impartial. Hence, impartiality was assessed through a SML approach, where binary classifiers were trained on manually coded training material. The manual content analysis was performed by a set of four student coders, the researcher being one of them. Before the final coding, all coders received training and the codebook (see Appendix C) was amended in accordance with the problems that had emerged during this training. Overall, a total of 487 articles were coded into three binary categories. See table 1 in Appendix E for their distribution.

Intercoder reliability

Several intercoder reliability tests were performed to ensure a sufficient level of reliability. Overall, three different sets of articles (Datasets A, B, and C) were checked for intercoder reliability: a) a subsample (N= 25) of the initial print data (N= 250) for all coders, b) a subsample (N= 15) of the online data (N= 150) for two coders, and c) a subsample (N= 10) for a second set of print articles (N= 98) for another two coders. The indicator “neutrality” proved to be reliable across all datasets and coders with a Krippendorff’s alpha of .79 or higher and a Cohen’s Kappa of .60 or higher. However, the other two indicators were less reliable. For balance of actors, dataset A (α = .63) and dataset B (α = .61) failed to meet the recommended intercoder reliability threshold of .667 (Neuendorf, 2002). For balance of viewpoints, the same was true for dataset A (α = .53) For a detailed overview of all results see Appendix D.

Given that SML relies on highly reliable data, coders who didn’t achieve acceptable results in a dataset were excluded from the training data. Specifically, coder 4 was excluded from the training data for balance of both actors and viewpoints, due to the comparably low Cohen’s Kappa results (see Appendix D, table 1). For the same reason Coder 3 was excluded from the online training data for balance of viewpoints (see Appendix D, table 3).

(17)

16

Classifier training & prediction

The classifiers were trained in python using the sklearn package. To do so, the training data for each variable was split into a training (80%) and a validation set (20%). The article text was represented in the form of vectors. Specifically, four types of vectors were created and compared: 1) Count vectors, 2) Term Frequency-Inverse Document Frequency (TF-IDF) vectors with unigrams, 3) TF-IDF vectors with bigrams, and 4) TF-IDF vectors with both, uni- and bigrams. For each indicator, four different types of classifiers were tested: 1) a stochastic gradient descent classifier, 2) a naïve Bayes classifier, 3) a support vector machines classifier and 4) a k-nearest neighbour classifier. All classifiers were cross-validated and their hyperparameters were tuned using either grid-search or randomised search. Furthermore, all classifiers were trained on both the original and the clean text to compare their performance.

Table 1. Best text classification results for impartiality indicators

Indicator Classifier Text Vector type Categories Precision Recall F1

Balance of viewpoints Stochastic Gradient Descent Original Count 0 (N=204) .82 .68 .75 1 (N=99) .52 .70 .60 .69 Balance of actors K-nearest neighbour Original TF-IDF with uni- &

bigrams 0 (N=297) .77 .73 .75 1 (N=152) .43 .48 .46 .66 Neutrality Support vector machine Clean Count 0 (N= 204) .68 .60 .63 1 (N= 283) .72 .79 .75 .70 Classifier parameters as follows:1) Balance of viewpoints: loss="hinge", alpha = .0001, max_iter=200, random_state=8, 2) Balance of actors: default settings, 3) Neutrality: default settings

The final classifier evaluation was based on their precision, recall, and f1-score. Preference was given to balanced results, as both categories were equally important for all indicators. Overall, the distribution of the predicted categories mirrored the distribution of the manually coded categories, except for balance of actors where the trained classifier reversed the two categories’ distribution (see table 5 in Appendix E). Table 1 provides an overview of

(18)

17 the best classification results per indicator as well as the text versions and vector

representations that they worked best on. For additional information see tables 2 through 4 in Appendix E.

Emotionality was defined as “the presence of positivity and/or negativity as opposed

to the absence of both” (Burggraaff & Trilling, 2017, p. 6) in a given news article.

Emotionality was assessed on the article-level through dictionary-based counting of positive or negative words. All analyses were performed based on the Rauh sentiment dictionary (Rauh, 2018), which has been specifically developed for the application to political texts. It augments two more general sentiment dictionaries, namely SentiWS (Remus, Quasthoff, & Heyer, 2010) and GPC (Waltinger, 2010) and allows for a better and more valid measurement of sentiment in political texts (Rauh, 2018). To account for article length, the number of emotional words was divided by the number of words in a text, resulting in a final emotionality ratio that was used for the hypothesis testing (M= .11, SD= .03)

In addition to emotionality, this study also measured negativity. Negativity was assessed through the same dictionary-based procedure as emotionality, where all negative words in a text were counted based on Rauh’s (2018) sentiment dictionary. Dividing the sum of negative words by the number of words in an article resulted in a final negativity ratio (M= .48, SD= .02), that was used for the hypothesis testing.

Readability (M= 40.68, SD= 10.67) was used as a single indicator for news article’s

comprehensibility. It was measured with the Flesch-reading-ease score (FRE), which assigns different weights to a text’s average sentence length (ASL) and the average number of syllables per word (ASW). It was computed with the textstat python package

(https://github.com/shivam5992/textstat). For German texts, the package relies on Amstad’s (1987) adapted formula:

(19)

18 The FRE has been shown to be almost identical to similar other readability measures (Štajner, Evans, Orasan, & Mitkov, 2012) and has been applied to news articles before in various countries (e.g. Amstad, 1978; Dalecki, Lasorsa, & Lewis, 2009; Plavén-Sigray, Matheson, Schiffler, & Thompson, 2017). It ranges from a minimum of zero (very difficult) to a maximum of 100 (very easy).

Data analysis & storage

All hypothesis tests were performed in either Python or SPSS. The code for both the data preparation and the analyses is available in a public GitHub repository

(https://github.com/nickma101/Thesis). The raw data on which the code was run as well as all relevant SPSS output can be accessed on an OSF server (https://osf.io/rdw9z/).

Results

This thesis explored four automatically measured news quality indicators as well as negativity. Table 2 provides an overview with means and standard deviations for the overall sample and the four subsamples under study (see Appendix F for newspaper comparisons). Since the dependent variables were not normally distributed (see Appendix G), all following analyses relied on statistical approaches that do not require normally distributed data.

Actor diversity

The first political news quality indicator under study was the diversity of actors. H1 assumed that national newspapers exhibit greater degrees of actor diversity than regional newspapers and H2 assumed that modality moderates this effect in such a way that online news exhibit greater differences than print news. The two hypotheses were tested through an ordinal regression in SPSS, with the reach and modality dummies as predictors, article length as a covariate and the diversity index as the dependent variable. The results from table 3 supported H2 but not H1, as, contrary to what H1 had expected, the odds of a regional article exhibiting a higher degree of actor diversity was 1.167 (95% CI [1.082, 1.258]) that of a national article. H2 was supported, as an interaction effect showed that the odds of an online

(20)

19 article by a regional newspaper to exhibit higher levels of actor diversity was .71 (95% CI [.599, .905]).1

Overall, H2 was thus supported, whereas H1 was rejected, as regional newspapers displayed higher actor diversity when controlling for article length and comparable actor diversity one when article length was not taken into account.

Table 2.Overall means and standard deviations of dependent variables.

Group N Diversity Impartiality Emotionality Negativity Readability Length

total 11,491 2.52 (.80) 1.39 (.80) .11 (.03) .05 (.02) 40.68 (10.67) 495.96 (709.94) online 1,072 2.27 (.84) 1.38 (.81) .12 (.03) .06 (.03) 40.66 (10.10) 460.12 (310.21) print 10,419 2.55 (.79) 1.39 (.80) .11 (.03) .05 (.02) 40.68 (10.73) 499.65 (738.81) regional 5,038 2.52 (.84) 1.49 (.77) .11 (.04) .04 (.02) 40.66 (10.90) 399.33 (976.15) national 6,453 2.52 (.76) 1.30 (.81) .12 (.03) .05 (.02) 40.70 (10.90) 571.40 (375.08)

Means with standard deviations in brackets.

Length is calculated as the average number of words in a text.

1_{The results must be interpreted with caution, due to bad model fit and violotation of the assumption of}

proportional odds (see Appendix H). Independent Kruskal-Wallis H tests showed a significant mean difference between online (M= 2.27, SD= .84) and print newspapers (M= 2.55, SD= .79); H(1)= 112.70, p= <.001, but an insignificant one between national (M= 2.52, SD= .76) and regional newspapers (M= 2.52, SD= .84); H(1)= .69, p= .407.

(21)

20 Table 3. Ordinal regression results for the effects of reach, modality, and article length on actor diversity and impartiality.

Parameter estimates

Actor Diversity Impartiality

Parameters B (SE) OR (95% CI) B (SE) OR (95% CI)

Diversity index = 0 -4.686 (.12)*** .009 (.007 - .012) - - Diversity index = 1 -2.236 (.05)*** .107 (.097 - .118) - - Diversity index = 2 .252 (.04)*** 1.286 (1.189 – 1.391) - - Diversity index = 3 2.490 (.04)*** 12.061 (1.082 – 1.258) - - Impartiality = 0 - - -3.457 (.06)*** .032 (1.189 – 1.391) Impartiality = 1 - - -.890 (.04)*** .378 (.378 - .446) Impartiality = 2 - - 1.494 (.05)*** 4.046 (4.046 – 4.905) Reach = regional .154 (.04)*** 1.167 (1.082 – 1.258) -.012 (.04) .988 (.916 – 1.066) Modality = online -.514 (.08)*** .598 (.511 - .700) -.224 (.08)** .799 (.682 - .937) Length .001 (<.01)*** 1.001 (1.000 – 1.001) -.002 (<.01)*** .998 (1.048 – 1.699) Reach x Modality -.341 (.12)** .711 (.559 - .905) .288 (.12)* 1.334 (1-048 – 1.699) R2 _.022 _.177 N = 11,491

OR (95% CI) = Odds ratios with 95% confidence intervals Function= Logit

Diversity and impartiality indexes are the intercepts. *p< .05, **p<.01, ***p<.001

Impartiality

RQ 1 asked whether impartiality differs depending on a) the modality (online vs. print) and b) the type (national vs. regional) of different newspaper outlets? This RQ was answered through an ordinal regression with the modality and reach dummies as well as article length as predictors of the dependent variable impartiality. The results from table 3 showed that the

(22)

21 odds of an online article exhibiting a higher degree of impartiality was .80 (95% CI [682, .937]) that of a print article. This effect was statistically significant; X2(1)= 7.65, p= .006. In contrast, the reach of an article had a statistically insignificant effect; X2_{(1)= .09, p= .760.} Article length had a very minor, but significant positive effect on impartiality with an odds ratio of 1.00 (95% CI [1.048, 1.699]); X2(1)= -.002, p< .001. Overall, the model explained 17.7% of the variance in the dependent variable and fit the data significantly better than an intercept only model; X2(4)= 2007.99, p<.001. While the model failed to pass the test of parallel lines (X2(8)= 192.04, p< .001), its outcomes still strongly suggest that online news

articles are on average less impartial than print articles, whereas the reach of a newspaper does not appear to make a difference when length is controlled for (See Appendix H for model fit).

Emotionality & negativity

This thesis assumed that the use of emotionality and negativity are driven by the amount of competition that newspapers face and the resources that they have at their disposal. Specifically, it argued that online news articles (H3a) and regional news articles (H3b) tend to exhibit significantly more negativity than print and national news respectively, as they try to catch readers attention in a way that does not necessarily rely on resource extensive quality reporting. Furthermore, online news articles were expected to feature less emotionality than print news articles (H4), as they tend to rely more on agency-copy.

All hypotheses were tested by means of two linear ordinary least square regressions in SPSS – one with emotionality and a second one with negativity as the dependent variable. The three predictors, namely modality, reach, and article length were added stepwise, thereby allowing for a comparison of model fit with different predictor variables. For both regression models, model fit increased significantly with each added predictor. Overall, the regression model for negativity explained 4.1% and the regression model for emotionality explained 2.1% of the overall variance. See Appendix I for an overview of model fit measures.

(23)

22 Table 4. Ordinary least squares (OLS) regressions for emotionality, negativity, and readability

Emotionality Negativity Readability

b (SE) β b (SE) β b (SE) β

Constant .114 (.001) .054 (.001) 40.64

Modality = print -.006 (.001) -.053*** -.011 -.136*** .019 .001

Reach = national .008 (.001) .116*** .007 .140*** .033 .002

Length .000003(.000) .056*** .000001 (.000) .041***

R2 _.021 _.041 _<.001

SE: standard error. N = 11,491.

*p < .05 **p < .01 ***p < .001.

While the difference in negativity between online (M= .06, SD= .03) and print

newspapers (M= .05, SD= .02) was rather small overall, the results of the regression analysis in table 5 show that this difference was indeed significant, even when controlling for an articles’ length. Thus, H3a was supported. In contrast, the regression results in table 5 did not support H3b, as national newspapers exhibited slightly more negativity (M= .05, SD= .02) than regional newspapers (M= .04, SD= .02) when controlling for article length. Regarding H4, the results of the regression analysis in table 4 showed that emotionality was higher in online news articles (M= .12, SD= .03), than it was in print news articles (M= .11, SD= .03). While this result contradicts the initial hypothesis, it aligns with the negativity results and indicates that online news articles might generally employ more emotional words in order to attract readers’ attention.

RQ3 examined differences in emotionality between national and regional newspapers. Mirroring the results for negativity, the regression results in table 4 showed that news articles published by national newspapers (M= .12, SD= .03) featured a higher proportion of

(24)

23 Figure 3. Emotionality ratio by modality Figure 5. Emotionality ratio by reach

Figure 5. Negativity ratio by modality Figure 6. Negativity ratio by reach

Figure 7. Readability score by modality Figure 8. Readability score by reach

(25)

24 This could indicate that regional newspapers rely more on agency copy in order to make up for a lack of resources.

However, in light of a) the low percentage of variance that the two models explained and b) the fact that in large samples even small differences can become significant, it is important to note that the differences between newspapers with different reach and modalities were rather small on the aggregate and do not explain the variation in emotionality or

negativity very well. Especially given the large variation in individual news articles’ scores (see Figures 3 to 8) and the fact that paid-for online content was missing from the sample. Readability

RQ3 explored differences in readability across modalities (online vs. print) and reach (national vs. regional). It was answered through a linear ordinary least square regression with the readability score as the dependent and the modality and reach dummies as the independent variables. The independent variables were added step by step. Adding reach to the base model with modality as the only predictor led to a significantly better model. The regression results from table 4 showed that on the aggregate there was no significant difference in readability between online (M= 40.66, SD= 10.10) and print (M= 40.68, SD= 10.73) newspapers, nor was there a significant difference between national (M= 40.70, SD= 10.90) and regional (M= 40.66, SD= 10.90) newspapers. Despite interesting and statistically significant differences on the newspaper level2, the overall readability scores for each newspaper were somewhat close to the value of 40, indicating that the articles were somewhat difficult to read but still

understandable for a larger part of the population. see Appendix J for newspaper comparisons. Conclusion & Discussion

In an ideal democracy, news media should provide citizens with high-quality political information, so that they can best perform their civic duties (Strömbäck, 2005). This thesis

(26)

25 automatically measured four important news quality indicators, to explore if and to what extent different modalities and types of German newspapers adhere to this ideal. The results revealed that both the modality and the reach of a newspaper play a role in determining its performance in terms of news quality indicators.

Modality affected news quality in so far, as print news exhibited a more diverse set of actors and a higher degree of impartiality as well as a less emotional and less negative

reporting style. This largely aligns with Burggraaff and Trilling’s (2017) assumption that the comparably high online competition leads to content with lower news quality. However, the results for emotionality and negativity deviate from the outcomes one would expect if online news relied more on agency copy, as suggested by Welbers et al. (2018). That said, the different levels of news quality across modalities might also be (partly) due to only freely available articles being scraped. Possibly, German newspapers offer a certain extent of free but lower-quality content online, whereas high-quality content must be paid for. Future research should try to find ways to overcome the difficulties of sampling paid-for articles in order to investigate if this assumption is true, especially as it might have important societal implications for news consumers that are not willing to pay for online subscriptions.

For newspapers with different levels of reach, fewer differences emerged, although regional newspapers did appear to report in a less emotional and negative manner. This might either be due to reliance on agency copy as a result of comparably limited resources (Welbers et al., 2018) or attest to the comparably high quality of regional German newspapers

(Humprecht & Esser, 2018) – a notion that was supported by comparably high levels of actor diversity and impartiality among regional sample. A third explanation might be that this thesis’ differentiation of reach a) constitutes an oversimplification, as two of the three sampled national newspaper also cater to local audiences, and b) underestimates the high levels of readership that regional newspapers have (Esser & Brüggemann, 2013) which might translate into considerable financial resources that can be invested into quality reporting.

(27)

26 Thus, future research should explore different newspaper classifications, for example

differentiating them by their number of subscribers in order to account for differences in their available resources.

Apart from the theoretical contributions of this thesis, its arguably biggest value lies in proposing a scalable framework for the assessment of news quality and especially in the creation of classifiers that assess impartiality. While the classifiers did not reach optimal performance in terms of their precision and recall and while the balance of actors classifier seemed to overstate the prevalence of a balanced set of actors, they still show the potential of SML approaches for journalism studies. Especially, since such methodological approaches are still under-utilised (Boumans & Trilling, 2015). Future research should build on this work either by advancing the existing classifiers or by developing new and more comprehensive measurements for impartiality and other high-level constructs. Furthermore, scholars should use automated approaches such as the one taken in this study in order to assess the

development of news quality indicators over time. In contrast to the comparison of newspaper categories, research into the over-time development of news quality could better address the arguments that have been put forward about the implications of commercialisation (e.g. Burggraaff & Trilling, 2017; Humprecht & Esser, 2018; Jacobi et al., 2016).

Naturally, the results of this thesis must be interpreted in light og several important limitations. First, the server issues led to a comparably small amount of political news articles that might somewhat impede their comparability with the considerably larger print sample. Second, the classifiers, especially the one for balance of actors, did not reach optimal performance.

Third, the large sample size (N=11.489) might have turned even small differences statistically significant. Hence, it is important to stress that most mean differences in the sample were rather small – at least across the different modalities and levels of reach. This directly relates to a third limitation, namely the classification of newspapers. The regression

(28)

27 models with modality and reach as predictors mostly only accounted for a small amount of variance in the dependent variables. Given the at times considerable differences between individual newspapers, this suggests that other factors such as available resources (Masini et al., 2018), journalistic style and reporting culture might be more important.

Lastly, it is important to stress that dictionary-based approaches cannot substitute the analytical depth and contextualisation that manual content analyses (MCA) provide (Boyd & Crawford, 2012) as they rely on language models that are at best an approximation of the real phenomenon (Grimmer & Stewart, 2013).

Nonetheless, by combining MCA with ACA through SML, this thesis has shown that automated research cannot only provide basic insights into news quality, but that it can in fact also be used for capturing high-level constructs in a resource-efficient and scalable manner. Since doing so holds great potential for comparative studies, this thesis hopes to inspire future applications of SML that draw on and extend the framework employed by this study.

(29)

28 References

Amstad, T. (1978). Wie verständlich sind unsere Zeitungen?[How understandable are our newspapers?]. Unpublished doctoral dissertation, University of Zürich, Switzerland. Bennett, W. L. (1996). An introduction to journalism norms and representations of politics.

Political Communication 13(4), 373–384.

https://doi.org/10.1080/10584609.1996.9963126

Benson, R. (2009). What makes news more multiperspectival? A field analysis. Poetics, 37(5-6), 402-418. https://doi.org/10.1016/j.poetic.2009.09.002

Benson, R., & Wood, T. (2015). Who says what or nothing at all? Speakers, frames, and frameless quotes in unauthorized immigration news in the United States, Norway, and France. American Behavioral Scientist, 59(7), 802-821.

https://doi.org/10.1177/0002764215573257

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: analyzing

text with the natural language toolkit. O'Reilly Media, Inc..

Björnsson, C. H. (1983). Readability of newspapers in 11 languages. Reading Research

Quarterly, 480-497. https://doi.org/10.2307/747382

Boudana, S. (2016). Impartiality is not fair: Toward an alternative approach to the evaluation of content bias in news stories. Journalism, 17(5), 600-618.

https://doi.org/10.1177/1464884915571295

Boukes, M., & Vliegenthart, R. (2020). A general pattern in the construction of economic newsworthiness? Analyzing news factors in popular, quality, regional, and financial newspapers. Journalism, 21(2), 279-300. https://doi.org/10.1177/1464884917725989 Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant

automated content analysis approaches and techniques for digital journalism scholars.

(30)

29 Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural,

technological, and scholarly phenomenon. Information, communication & society,

15(5), 662-679. https://doi.org/10.1080/1369118X.2012.678878

Brüggemann, M., Engesser, S., Büchel, F., Humprecht, E., and Castro, L. (2016). “Framing the Newspaper Crisis.” Journalism Studies 17(5), 533–551.

http://dx.doi.org/10.1080/1461670X.2015.1006871

Burggraaff, C., & Trilling, D. (2017). Through a different gate: An automated content analysis of how online news and print news differ. Journalism.

https://doi.org/10.1177/1464884917716699

Carpenter, S. (2010). A study of content diversity in online citizen journalism and online newspaper articles. New Media & Society, 12(7), 1064-1084.

https://doi.org/10.1177/1461444809348772

Carpenter, S., Boehmer, J., & Fico, F. (2016). The measurement of journalistic role enactments: A study of organizational constraints and support in for-profit and

nonprofit journalism. Journalism & Mass Communication Quarterly, 93(3), 587-608. https://doi.org/10.1177/1077699015607335

Castells, M. (2013). Communication power. Oxford University Press. Oxford.

Cushion, S., Lewis, J., & Callaghan, R. (2017). Data journalism, impartiality and statistical claims: Towards more independent scrutiny in news reporting. Journalism Practice,

11(10), 1198-1215. https://doi.org/10.1080/17512786.2016.1256789

Cushion, S., Kilby, A., Thomas, R., Morani, M., & Sambrook, R. (2018). Newspapers, impartiality and television news: Intermedia agenda-setting during the 2015 UK general election campaign. Journalism Studies, 19(2), 162-181.

https://doi.org/10.1080/1461670X.2016.1171163

Cushion, S., & Thomas, R. (2019). From quantitative precision to qualitative judgements: Professional perspectives about the impartiality of television news during the 2015 UK

(31)

30 General Election. Journalism, 20(3), 392-409.

https://doi.org/10.1177/1464884916685909

Czymara, C. S., & van Klingeren, M. (2019). New perspective? Comparing Frame

Occurrence in Online and Traditional News Media Reporting on Europe’s “Migration Crisis”. https://doi.org/10.31235/osf.io/h3tpy

Dalecki, L., Lasorsa, D. L., & Lewis, S. C. (2009). The news readability problem. Journalism

Practice, 3(1), 1-12. https://doi.org/10.1080/17512780802560708

Esser, F. (1999). Tabloidization'of news: A comparative analysis of Anglo-American and German press journalism. European journal of communication, 14(3), 291-324. https://doi.org/10.1177/0267323199014003001

Esser, F., & Brüggemann, M. (2010). The strategic crisis of German newspapers. The

changing business of journalism and its implications for democracy, 39-54.

Esser, F., & Umbricht, A. (2013). Competing models of journalism? Political affairs coverage in US, British, German, Swiss, French and Italian newspapers. Journalism, 14(8), 989-1007. https://doi.org/10.1177/1464884913482551

Grimmer, J., & Stewart, B. M. (2013). Text as data: The promise and pitfalls of automatic content analysis methods for political texts. Political analysis, 21(3), 267-297. https://doi.org/10.1093/pan/mps028

Gruber, J. (2020). LexisNexisTools. An R package for working with newspaper data from 'LexisNexis’. Retreived from: https://github.com/JBGruber/LexisNexisTools

Hallin, D. C., & Mancini, P. (2004). Comparing media systems: Three models of media and

politics. Cambridge university press. https://doi.org/10.1017/CBO9780511790867

Hiles, S. S., & Hinnant, A. (2014). Climate change in the newsroom: Journalists’ evolving standards of objectivity when covering global warming. Science Communication,

(32)

31 Honnibal, M., & Montani, I. (2017). spaCy 2: Natural language understanding with Bloom

embeddings, convolutional neural networks and incremental parsing.

Humprecht, E., & Esser, F. (2018). Diversity in online news: On the importance of ownership types and media system types. Journalism Studies, 19(12), 1825-1847.

https://doi.org/10.1080/1461670X.2017.1308229

Jacobi, C., Kleinen-von Königslöw, K., & Ruigrok, N. (2016). Political News in Online and Print Newspapers: Are online editions better by electoral democratic standards?.

Digital Journalism, 4(6), 723-742. https://doi.org/10.1080/21670811.2015.1087810

Jungnickel, K. (2011). Nachrichtenqualität aus Nutzersicht. Ein Vergleich zwischen

Leserurteilen und wissenschaftlich-normativen Qualitätsansprüchen. M&K Medien &

Kommunikationswissenschaft, 59(3), 360-378.

https://doi.org/10.5771/1615-634x-2011-3-360

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big data & society,

1(1), 2053951714528481. https://doi.org/10.1177/2053951714528481

Knobloch-Westerwick, S., Mothes, C., & Polavin, N. (2020). Confirmation bias, ingroup bias, and negativity bias in selective exposure to political information. Communication

Research, 47(1), 104-124. https://doi.org/10.1177/0093650217719596

Lau, R. R., & Redlawsk, D. P. (2001). Advantages and disadvantages of cognitive heuristics in political decision making. American Journal of Political Science, 951-971.

https://doi.org/10.2307/2669334

Leung, D. K., & Lee, F. L. (2015). How journalists value positive news: The influence of professional beliefs, market considerations, and political attitudes. Journalism Studies,

16(2), 289-304. https://doi.org/10.1080/1461670X.2013.869062

Locke, J. (1967). Locke: Two treatises of government. Cambridge University Press. Maras, S. (2013). Objectivity in journalism. John Wiley & Sons.

(33)

32 Masini, A., Van Aelst, P., Zerback, T., Reinemann, C., Mancini, P., Mazzoni, M., ... & Coen,

S. (2018). Measuring and explaining the diversity of voices and viewpoints in the news: A comparative study on the determinants of content diversity of immigration news. Journalism Studies, 19(15), 2324-2343.

https://doi.org/10.1080/1461670X.2017.1343650

Masini, A., & Van Aelst, P. (2017). Actor diversity and viewpoint diversity: Two of a kind?.

Communications, 42(2), 107-126. https://doi.org/10.1515/commun-2017-0017

McManus, J. H. (2009). The commercialization of news. In The handbook of journalism

studies (pp. 238-254). Routledge.

McQuail, D. (1992). Media performance: Mass communication and the public interest (Vol. 144). London: Sage.

Plasser, F. (2005). From hard to soft news standards? How political journalists in different media systems evaluate the shifting quality of news. Harvard International Journal of

Press/Politics, 10(2), 47-68. https://doi.org/10.1177/1081180X05277746

Plavén-Sigray, P., Matheson, G. J., Schiffler, B. C., & Thompson, W. H. (2017). The readability of scientific texts is decreasing over time. Elife, 6, e27725.

https://doi.org/10.7554/eLife.27725.029

Ramírez de la Piscina, T., Gonzalez Gorosarri, M., Aiestaran, A., Zabalondo, B., & Agirre, A. (2015). Differences between the quality of the printed version and online editions of the European reference press. Journalism, 16(6), 768-790.

https://doi.org/10.1177/1464884914540432

Rauh, C. (2018). Validating a sentiment dictionary for German political language—a workbench note. Journal of Information Technology & Politics, 15(4), 319-343. https://doi.org/10.1080/19331681.2018.1485608

(34)

33 R. Remus, U. Quasthoff & G. Heyer: SentiWS - a Publicly Available German-language

Resource for Sentiment Analysis. In: Proceedings of the 7th International Language

Resources and Evaluation (LREC'10), pp. 1168-1171, 2010

Schoenbach, K., & Lauf, E. (2002). Content or design? Factors influencing the circulation of American and German newspapers. Communications, 27(1), 1-14.

https://doi.org/10.1515/comm.27.1.1

Soroka, S., & McAdams, S. (2015). News, politics, and negativity. Political Communication,

32(1), 1-22. https://doi.org/10.1080/10584609.2014.881942

Štajner, S., Evans, R., Orasan, C., & Mitkov, R. (2012). What can readability measures really tell us about text complexity. In Proceedings of the the Workshop on Natural

Language Processing for Improving Textual Accessibility (NLP4ITA) (pp. 14-21).

Strömbäck, J. (2005). In search of a standard: Four models of democracy and their normative implications for journalism. Journalism studies, 6(3), 331-345.

https://doi.org/10.1080/14616700500131950

Trilling, D., Van De Velde, B., Kroon, A. C., Löcherbach, F., Araujo, T., Strycharz, J., ... & Jonkman, J. G. (2018, October). INCA: Infrastructure for content analysis. In 2018

IEEE 14th International Conference on e-Science (e-Science) (pp. 329-330). IEEE.

https://doi.org/10.1109/eScience.2018.00078

Urban, J., & Schweiger, W. (2014). News Quality from the Recipients' Perspective: Investigating recipients' ability to judge the normative quality of news. Journalism

Studies, 15(6), 821-840. https://doi.org/10.1080/1461670X.2013.856670

Van Hoof, A. M., Jacobi, C., Ruigrok, N., & Van Atteveldt, W. (2014). Diverse politics, diverse news coverage? A longitudinal study of diversity in Dutch political news during two decades of election campaigns. European Journal of Communication,

(35)

34 Voakes, P. S., Kapfer, J., Kurpius, D., & Chern, D. S. Y. (1996). Diversity in the news: A

conceptual and methodological framework. Journalism & Mass Communication

Quarterly, 73(3), 582-593. https://doi.org/10.1177/107769909607300306

Wahl-Jorgensen, K., Berry, M., Garcia-Blanco, I., Bennett, L., & Cable, J. (2017). Rethinking balance and impartiality in journalism? How the BBC attempted and failed to change the paradigm. Journalism, 18(7), 781-800. https://doi.org/10.1177/1464884916648094 Waltinger, U. (2010, May). GermanPolarityClues: A Lexical Resource for German Sentiment

Analysis. In LREC (pp. 1638-1642).

Welbers, K., Van Atteveldt, W., Kleinnijenhuis, J., & Ruigrok, N. (2018). A gatekeeper among gatekeepers: News agency influence in print and online newspapers in the Netherlands. Journalism Studies, 19(3), 315-333.

(36)

35 Appendix A

(37)

36 Appendix B

Sample distribution

Table 1. Sample distribution across news outlets and modalities.

Newspaper Print articles Online articles Total articles

Der Tagesspiegel (national) 1,286 264 1,550

Die Süddeutsche (national) 3,720 175 3,895

Die Welt (national) 831 177 1,008

Aachener Zeitung (regional) 970 168 1,138

Rheinische Post (regional) 2,375 173 2,548

Stuttgarter Zeitung (regional) 1,237 115 1,350

(38)

37 Appendix C

Codebook for manual content analysis of impartiality training material

V1. Coder ID

01 Coder 1

02 Coder 2

03 Coder 3

04 Coder 4

V2. Article identification number

V3. News outlet 1 Aachener Zeitung 2 Stuttgarter Zeitung 3 Rheinische Post 4 Der Tagesspiegel 5 Die Welt 6 Die Süddeutsche

V4. Who is the central political actor in the story? (if in doubt, see list of actors below) 1 A governing party or a member of it on the national level

2 An opposition party or a member of it on the national level 3 A governing party or a member of it on the regional level 4 An opposition party or a member of it on the regional level 5 A foreign/international politician, party, or organisation 6 No political actor mentioned

Indicators of importance are…

… duration, space of information about the actor … frequency of being mentioned

(39)

38 … mentioned in the headline or teaser

Notes:

➢ If two actors are equally prominent in the article with regard to the above criteria,

then count the number of references to each actor and choose the one who is most often referred to. However, this rule only applies if two actors are really exactly evenly prominent with regard to the above criteria.

➢ Everything that happens on the federal state level or below is considered regional ➢ If there are two equally central actors of opposing categories, code for the one that is

mentioned first (headline included)

➢ Foreign/international actors are all political actors that are not working in German

politics. This includes foreign countries, heads of states or other foreign politicians, foreign parties, international political organisations (e.g. NATO, EU) and also German politicians that work on the EU level.

➢ It doesn’t matter if political actors are not very prominent in an article. As long as it

mentions at least a single political actor once or more, that is enough to code for central political actor.

➢ See Appendix A for a list of relevant politicians and parties per category

V5. Balance of political viewpoints

“Is the standpoint of the central political actor challenged by another actor in the text?” 1 Yes

2 No

Notes:

➢ A challenge has to be expressed in the form of a quote (either direct or indirect) ➢ The challenging actor can either be …

… another political actor (of the same or a different party), or

… another actor such as an expert, a journalist, or anyone else who is relevant in the context of the article’s topic

… the author

➢ Challenging a viewpoint means critically engaging with it. Therefore, it encompasses

(40)

39

Example (for code 1):

➢ Berlin - Tübingens Oberbürgermeister Boris Palmer (Grüne) hat Forderungen nach

einem Parteiaustritt zurückgewiesen. „Selbstverständlich trete ich nicht aus meiner Partei aus“, sagte Palmer am Freitag der „Bild“-Zeitung. „Ich bleibe weiterhin aus ökologischer Überzeugung Mitglied der Grünen. Da die Vorwürfe gegen mich von meinen Gegnern erfunden beziehungsweise konstruiert worden sind, gibt es überhaupt keinen Grund, darüber nachzudenken.“

Der Landesvorstand der Grünen in Baden-Württemberg hatte den umstrittenen Kommunalpolitiker zuvor zum Parteiaustritt aufgefordert. Mit seinen Äußerungen stelle sich Palmer gegen politische Werte und Grundsätze der Partei und agiere „systematisch“ gegen sie, erklärte der Landesvorstand nach einer Sitzung am Freitagabend. Mit seinem Auftreten diene der Politiker „nicht der politischen oder innerparteilichen Debatte, sondern der persönlichen Profilierung“.

V6. Balance of political sources

“Does the article quote two or more different types of actors (V4)?” 1 Yes

2 No

Notes:

➢ A quote can be either direct, indirect, or a mix of the two; e.g.:

o „Selbstverständlich trete ich nicht aus meiner Partei aus“, sagte Palmer am Freitag der „Bild“-Zeitung“

o Flynn’s Eingeständnis, dass er im Dezember 2016, also vor der

Amtseinführung Trumps, den russischen Botschafter bei einem zunächst bestrittenen Geheimtelefonat um eine zurückhaltende Reaktion auf die vom amtierenden Präsidenten Barack Obama verhängten Sanktionen bat, war ein wichtiger Beleg für die Zusammenarbeit der Trump-Kampagne mit Moskau

o Mit seinen Äußerungen stelle sich Palmer gegen politische Werte und

Grundsätze der Partei und agiere „systematisch“ gegen sie, erklärte der Landesvorstand nach einer Sitzung am Freitagabend.