The Culture Club? A Comparative Big Data Analysis of Country Visibility and Soft Power in European Foreign Reporting

(1)

Graduate School of Communication

_____

Master’s Thesis

The Culture Club?

A Comparative Big Data Analysis of Country Visibility and Soft Power

in European Foreign Reporting

_____

Niklas Melcher 11183055

Research Master’s Communication Science Supervisor: Dr. Penelope Sheets Thibaut

(2)

Abstract

This study aims to examine soft power in the European sphere by analyzing country visibility in cultural reporting. To do so, a definition of cultural reporting in contrast to non-cultural reporting is proposed and used to train a supervised machine learning classifier. The two kinds of reporting are then compared in the realm of country visibility and its predictors. This is executed on a dataset of 241,840 news articles published over 15 months in 6 news outlets based in Germany, the Netherlands and the UK. Results showed an observable difference between cultural and non-cultural reporting in terms of country visibility.

Comparing the analyzed countries, similarities and country-specific differences were found. Most importantly, it could be established that status and relatedness are predictors for visibility in both kinds of reporting while negative events—in most cases—are a predictor only for non-cultural reporting. Only the German outlets posed an exception in so far that negative events predicted mentions also in cultural reporting. Overall, this confirmed the conceptual implementation of soft power. In a broader sense, this study’s results indicated European cultural interest only in a few high-status or related countries, while countries facing crises did not receive cultural attention.

Keywords: soft power, cultural reporting, machine learning, automated content analysis, country visibility

Word count: 8999 (a word limit of 9000 was agreed on with the supervisor given the complexity of the method)

(3)

The Culture Club?

A Comparative Big Data Analysis of Country Visibility and Soft Power in European Foreign Reporting

“Foreign culture is as necessary to the spirit of a nation as is foreign commerce to its industries.” – Ameen Rihani

In a globalized world, news consumers are exposed to a plethora of information on nations from across the globe. This information can relate to any country and cover a vast array of topics. Which countries are popular in the news has an influence on audiences’ worldview and the self-perception of countries as for most citizens around the world the news media remains the main source of input from foreign countries (Kim & Barnett, 1996). It also influences which nations are seen as important (Semetko, Brzinski, Weaver & Willnat, 1992), as well as the self-perception of each individual country (UNESCO, 1954). This is relevant as it shows that news coverage influences the global political order as well as the individual’s perception of the world.

The presence of foreign nations in the news has been studied in the field of

international news flows, “one of the chief subjects in international communication” (Kim & Barnett, 1996, p.323), by evaluating country visibility in the news over many decades. Different scholars have assessed which nations could claim significant dominance or particularly sparse coverage in the news and have agreed in essence on a rather unequal representation of the world (e.g. Wu, 2000; Wu, 2003; Segev, Sheafer & Shenhav, 2013). These studies also established specific determinants such as national traits, relatedness and events to predict country visibility (Sheafer, Ben-Nun Bloom, Shenhav & Segev, 2013). However, studies in this field have typically only examined the overall presence of all or a few selected countries. So far, an examination of the visibility of different countries in different kinds of news has been neglected.

(4)

This is surprising, given that a country’s appearance in different contexts has different implications. That is, while countries may be prominent due to crises or for hard power reasons—strength measured in coercive or monetary sway—they can also be visible in news in ways that convey soft power—or, the “ability to obtain preferred outcomes through attraction” (Nye, 2009, p.160). In essence, one country can be said to exercise soft power when it is able to exercise influence through its appeal in another country—in particular through its culture, values, and policies (Nye, 2008). Evidently, inferring soft power by simply evaluating country visibility in all news reporting would be flawed. Such an approach would rank countries highly that receive attention because of crises, war or other negative events that constitute the very opposite of attraction. Instead, this paper argues that a proper evaluation of a country’s soft power would require examining its mentions within a certain type of news. Specifically, given the essential role of culture for soft power (Nye, 2008), cultural news. In other words, cultural reporting about a specific country in any other given country’s news media indicates its attractiveness and, by implication, its soft power.

However, previous research—certainly in international news flows—has largely ignored any difference between cultural reporting and other reporting. Being able to make this distinction would present the opportunity to identify which nations’ news visibility is actually an indication of soft power, and which nations’ visibility is due to factors more connected to either hard power or (negative) events. This would not only help the scientific quantification of soft power; it would also help to understand which nations hold our cultural attention globally.

This research aims to fill this gap by establishing a way to distinguish cultural

reporting from non-cultural reporting, and by examining—through a ‘big data’ approach—the visibility of the worlds’ nations in cultural reporting. By first proposing an operationalization of cultural reporting, a machine learning approach is employed to train and evaluate a

(5)

articles from three different countries are examined for mentions of countries in the two kinds of reporting. To structure this inquiry, the following research questions are posed:

RQ1: To what extent can a difference in country visibility between cultural reporting and non-cultural reporting be detected?

As free access to comprehensive datasets of news articles is limited and every language requires separate manual training of a classifier, the present analysis focuses on news outlets in three countries: Germany, the Netherlands and the United Kingdom (UK). These were picked as they are all Central European democracies and share some cultural similarities but at the same time have their own distinct culture and language. Within the news of these countries, the visibility of all 193 member states of the United Nations (n.d.) as well as Kosovo, Palestine, Taiwan and Vatican City is compared.

RQ2: To what extent can similarities and differences in both cultural and non-cultural reporting be detected when comparing news from the Netherlands, Germany and the UK?

Conducting comparisons, it should be possible to understand differences in kinds of reporting and the countries mentioned therein. Consequently, it can also be identified which countries receive the majority of attention in cultural reporting.

RQ3: Which countries can claim to exert soft power in the European sphere?

Theoretical framework

Soft power, cultural reporting and the attractiveness of nations

The term soft power, coined in 1990 by Joseph Nye Jr., is used to illustrate the ability of nations to influence other nations in terms of attraction. Nye developed the concept over the years and explained that the very essence of soft power, emphasized by the non-coercive nature, is the culture of a nation (Nye, 2004; Nye, 2008; Nye, 2009). This is extended by a country’s ability to live up to its political values as well as the perception of the country’s

(6)

legitimacy in foreign policies (Nye, 2008). With these features, soft power has been labeled “well-established” for analyses “between communication and influence in an international context”, (Pamment, 2014, p. 52). Hence, despite it being a rather complex concept to quantify, there have been several efforts to quantify it (see Pamment, 2014). Most

prominently the Soft Power 30 (2019) ranking countries by soft power led by France, the UK and Germany. However, there are no soft power scales per country or region, or such that take a fully comprehensive list of countries into account.

Hence, based on the central role of culture for soft power, the attention to a foreign nations’ culture demonstrates its soft power. Given that the presence of culture can be measured through media attention (Nye, 2008), assessing country visibility in cultural reporting will indicate which countries exert soft power. From a societal perspective, this distinction helps understand which countries are mentioned in the news due to their cultural influence rather than due to crises or other non-cultural reasons.

To identify cultural reporting, the ongoing debate on the complexity of the term ‘culture’ (for example discussed by Spencer-Oatey, 2008) has to be acknowledged. While it goes beyond the scope of this paper to identify a final definition, to detect cultural reporting, this study relies on Spencer-Oatey’s rather universal definition of culture as "a fuzzy set of basic assumptions and values, orientations to life, beliefs, policies, procedures and behavioral conventions that are shared by a group of people” (2008, p.3). A news item can thus be

considered cultural reporting when it covers specific cultural elements or arts; both in terms of high culture and popular culture (Nye, 2008). Cultural elements manifest in the news through the coverage of any cultural layers defined by Hofstede, Hofstede & Minkov (2010): values, rituals, heroes and symbols. The arts include, among others, applied arts, music, literature and visual arts (Janssen, Kuipers & Verboord, 2008). Therefore, news articles that are mainly concerned with arts and cultural elements can be labeled cultural reporting, while those that are not consequently are non-cultural reporting.

(7)

News flow theory, country visibility and advancement

To quantify soft power by analyzing country visibility in cultural reporting, the advances of the international news flow theory (INFT) research field can be consulted and advanced. INFT is concerned with news coverage of foreign nations and the analysis of differences in countries’ newsworthiness (Kim & Barnett, 1996; Grasland, 2019). The field originated from two articles proposing general rules to explain discrepancies in news coverage of foreign countries, both published 1965 in the same volume of the Journal of Peace Research. On the one hand, Galtung and Ruge (1965) proposed 12 hypotheses to explain how certain events turn into news in foreign countries. Eight of these were culture-free, meaning little variation between reporting nations was expected: frequency, threshold, unambiguity, meaningfulness, consonance, unexpectedness, continuity and composition. For the remaining four differences between reporting nations were expected: reference to elite nations, reference to elite persons, personalization, and negativity. These 12 factors adhered to what they called an additivity hypothesis, assuming that the more of these factors appeared together, the more newsworthy—and thus likely to be covered in another nation’s news—it was. On the other hand, Östgaard (1965) distinguished between external and inherent factors as determinants of news values. With external factors, Östgaard put comparatively more emphasis on the interplay of economic and political factors. Inherent factors consisted of simplification, identification and sensationalism. Through the processing of news, Östgaard (1965) proposed that the publication of news would not solely depend on their own

newsworthiness but also the simultaneous existence of other big news on the international market.

This combination of Galtung and Ruge’s (1965) news values and Östgaard’s (1965) news factors hit a certain zeitgeist in the social sciences. What followed were years of extensive research on newsworthiness and increasing empirical interest in the characteristics of international news flows (Segev, 2015). Examples of advances in the field of global news

(8)

flows include works on the (systemic) determinants for international news flow that established the importance of economic strength, location, population and language in

determining a country’s visibility in international news (Kim & Barnett, 1996; Wu, 2000; Wu, 2003). They also include space-time interaction models that found that usually less visible countries can temporarily receive excessive coverage due to exceptional events or long-term crises (Grasland, 2019) and first big data approaches which found additional evidence for the over-proportional news attention for wealthy countries (Guo & Vargo, 2017). Furthermore, much comparative work on country visibility has concluded the importance of (foreign) population size, distance, conflicts and GDP per capita as predictors (Jones, Van Aelst & Vliegenthart, 2011; Segev, 2015). Finally, it has been found that the mechanisms and hierarchies of news flows are vastly reproduced in online news (Himelboim, Chang & McCreery, 2010).

Today’s news flow research mainly focuses on three groups of variables to explain a country’s prominence: national traits (e.g. size and power), relatedness (e.g. proximity in terms of geography and demography between the (in)visible country and the news country) and events (e.g. disasters and conflict; Segev, 2015; Sheafer et al., 2013; Wu, 2000). While national traits can explain which countries generally receive great attention, relatedness is meant to explain the different regional focuses. Events explain regional and global focus alike based on the extent of the event and its relevance. Putting this framework to use, there have also been different approaches: Hur (1984) distinguished between actual flow analysis, dealing with direction and magnitude of flows, and coverage analysis, dealing with the amount and characteristics of coverage.

Building on Galtung and Ruge’s (1965) and Östgaard’s (1965) initial ideas, years of subsequent research and theoretical advancements, this paper will look at country visibility in the news. However, as most INFT research has analyzed all (foreign) news (see e.g. Segev et al., 2013; Wu, 2003), the aim of this study is to advance this field of research by employing a

(9)

coverage analysis distinguishing between different kinds of news. In this case the distinction of news of cultural character (cultural reporting) from those of non-cultural character (non-cultural reporting) as it expected that important implications for soft power are tucked away in the oftentimes overlooked cultural section. This advancement will, on the one hand, be

methodological, as a machine learning approach is utilized to distinguish between the two. On the other hand, it will be theoretical, as the comparison of cultural and non-cultural reporting constitutes the aim to include the concept of soft power in news flow research.

Hypotheses

To answer the RQs, some explorative analyses are necessary in terms of country visibility and type of reporting. But when comparing cultural reporting with non-cultural reporting, also certain assumptions can be made. Several predictions from the INFT are expected to be equally important for both kinds of reporting. First, in regard to national traits. Focusing on one of the main national traits, status, it can be assumed that foreign country coverage follows a logic of “the rich get richer” (Segev, 2015, p. 425). Scholars found that countries with big economic and political power tend to be emphasized more (Wu, 2000; Wu, 2003) with strong correlations between the GDP and a country’s salience (Guo & Vargo, 2017; Grasland, 2019). Less economically strong, non-Western countries tend to be generally underrepresented as shown by studies covering respectively 16 and 17 countries from 5 regions (Segev, 2019; Wilke, Heimprecht & Cohne, 2012). Janssen et al.’s (2008) study that looked at arts and culture coverage generated similar results in terms of status. Given that and the circumstance that politically elite countries like the US take a unique position in the cultural industries (Wu, 2003), it assumed that:

H1: Status is a predictor for country visibility in cultural reporting and non-cultural reporting alike.

(10)

Relatedness plays also a big role as a determinant in news flow research. Studies found that in news reporting nearby or neighboring countries are more visible (Grasland, 2019; Kim & Barnett, 1996, Janssen at al., 2008; Wilke et al., 2012), and that countries from which many immigrants stem receive more coverage (Wilke et al., 2012). Finally, a shared language appeared to play a role in foreign reporting (Kim & Barnett, 1996; Grasland, 2019). Since arts and culture from countries with close connections, many immigrants, or a shared language can be assumed to be more relatable, it may be argued that relatedness is also a predictor for cultural reporting. Hence, the assumption:

H2: Relatedness is a predictor for country visibility in cultural reporting and non-cultural reporting alike.

Differences between the two kinds of reporting are expected when looking at the third important predictor in INFT research: events. In this context, events refer to ‘exceptional’ circumstances (see Grasland, 2019), explicitly negative. Studies have found that there are numerous countries (temporarily) overly represented in the news due to crises such as terror attacks or civil war (Guo & Vargo, 2017; Grasland, 2019). These countries’ coverage is caused by deviance, the unusual character of what is happening (Shoemaker, Danielian & Brendlinger, 1991; Wu, 2003). Thus, once these unusual events have passed, or—more importantly—when it is not about the unusual event itself, these countries receive less

coverage. Consequently, while negative events lead to increased non-cultural coverage, this is not expected for cultural reporting.

H3: Negative events are a predictor for country visibility in non-cultural reporting but not in non-cultural reporting.

Method

As the essence of this research is to understand information flows in a globalized world, it was decided to execute this study through a contemporary approach that enables

(11)

examining extensive amounts of data. In this case that meant supervised machine learning (SML) and automated content analysis. The majority of the quantitative steps of this study were executed in the Python programming language either through the use of Spyder or

through Jupyter Notebooks. For data processing and analysis, the scikit-learn (Pedregosa et al., 2011), pandas (McKinney, 2010) packages were deployed.

This method allowed examination of media coverage in previously unprecedented ways (Boumans & Trilling, 2016). With some exceptions (Segev, 2015; Guo & Vargo, 2017), most news flow research has relied on manual content analyses of only a limited number of mentioned countries or a short period of time. The shortcoming of this is the difficulty of capturing larger-scale trends and overall insights into international news flows (Guo & Vargo, 2017). In contrast, a big data approach offers the advantage of analyzing huge datasets and making comprehensive assertions (Halford & Savage, 2017). In the present case, it meant analysis on a census of 241,840 articles over 15 months published in 6 news outlets examined on the presence of 197 countries. Hence, this paper is—to the best of the author’s

knowledge—the first approach in comparing vast amounts of news articles in several countries on their cultural reporting applying an SML technique.

Sample choice

To compare the country visibility, online articles from news outlets in three European countries were analyzed: Germany, the Netherlands and the United Kingdom—referred to as ‘news countries’. This was to assess commonalities regarding the exertion of soft power in Europe, but also to detect news country-specific patterns. As “they largely determine whether and how other media and the wider community discuss subjects” (Janssen et al., 2008, p. 725), it was decided to focus on high-quality outlets in all countries. To avoid distortion of the results through the unique affiliations or routines of one news outlet, two outlets per country were chosen. Considering that globalization and intercultural interest are deeply interlinked

(12)

with the world wide web, it was focused solely focused on the output of the online outlets. With Frankfurter Allgemeine Zeitung (FAZ), Süddeutsche Zeitung (SZ), The Telegraph (Telegraph), the Guardian and NRC Handelsblad (NRC), mainly established quality

newspapers were selected. Nederlandse Omroep Stichting (NOS) posed one exception as it is not a newspaper, but a Dutch public service broadcaster initially only concerned with

television. Nowadays, however, the online news outlet constitutes the only news website that ranked in the top 30 most-visited websites in the Netherlands in the quality sector (Alexa Internet, n.d.). Hence, it was deemed suitable for this comparison.

Coding and procedure

Data retrieval. Except for the Guardian data, all datasets were downloaded in CSV

format from the Infrastructure for Content Analysis (INCA) database (Trilling et al., 2018). This database includes a vast amount of, among others, news articles in several languages from numerous sources. The retrieved articles were published between 01.06.2018 and 31.08.2019, the only period in which exhaustive data for all outlets was available. Since the INCA data was scraped by different scientists at different time points, it was necessary to tidy up some of the datasets. In a preparatory step, the NOS and NRC files had to be arranged manually through Libre Office since not all fields were aligned. In the cases of The Telegraph and NRC, lead text and full text of the articles were merged, whereas for the other outlets no lead text was available—presumably because it was included in the full text.

The Guardian data was directly scraped from the official API (The Guardian Open Platform, n.d.) to which access was granted by the Guardian team. This data collection had to be executed manually because INCA lacked access to full-text data of the Guardian articles. Given the Guardian’s importance and the lack of suitable replacements in INCA, this step was deemed necessary. The different methods of data retrieval were not expected to cause complications as the number of Guardian articles in the INCA database during the time frame

(13)

as well as the structure of content were alike. To scrape the Guardian articles, a scraper was written in Python by setting the parameters in such a way to reflect those of the INCA

database. Articles were scraped in four steps as manual tryouts showed that the API could not handle more than approximately 180 pages of content while solely showing 200 articles per page. Consequently, several scrapes were necessary but were subsequently merged into one dataset.

Data characteristics. For all articles, only the date (YYY-MM-DD), the title and the

full-text content were included for the analysis. Articles with empty titles or texts, dates outside of the range (caused by clutter in the data) and articles shorter than 100 characters were removed (see Burggraaff & Trilling, 2020). As not every individual datapoint could be manually rearranged (the rearrangement explained above only happened for batches of data), also those few single articles that were too unorganized to be processed were excluded. Identical duplicates published on the same day that included the same title or those including the exact same text were also excluded as they were more likely indicating a scraping error than true double publication. Articles that only shared the same title on different days were not excluded as this was either caused by reoccurring titles with different content (e.g., “The 8 things you might have missed this weekend”) or through recycling—which indicates

additional exposure and, in turn, extra visibility. Hence, there were still n = 1,719 (0.71%) cases of same article titles.

Out of the initial number of N = 271,295 articles, N = 241,840 articles remained after processing and cleaning up. Table 1 shows an overview of the number of articles in the initial and cleaned datasets per news outlet. After the cleanup, the datasets were sorted by date and every article received an ID identifying the outlet followed by a continuous number between 0 and the length of the outlet’s dataset.

(14)

Table 1. Articles per outlet before and after cleaning up

FAZ SZ NOS NRC Telegraph Guardian

Original 30,264 40,414 21,969 28,940 51,190 98,518

Cleaned 26,289 34,047 18,615 23,630 42,670 96,589

A preliminary overview of the date distribution showed that the numbers of articles from the INCA database were lower in the month of September 2018 (Appendix A), presumably due to some scarping error. This was unavoidable but is noteworthy. The same goes for the different lengths of the articles as illustrated by mean and median in Table 2. During the manual coding of the articles, it became apparent that especially British newspapers had a tendency to include extensive articles, potentially due to the popular practice of extensive live feeds. NOS had the shortest articles which reflects its status as a legacy TV outlet.

Table 2. Median and mode article-length per outlet

FAZ SZ NOS NRC Telegraph Guardian

Mean 2,695 3,132 1,862 2,750 4,404 5,169

Median 2,585 3,022 1,541 2,064 3,407 4,140

Creation of classifiers. The next step in the process was to create SML classifiers, to

distinguish between cultural and non-cultural reporting. To do so, the two outlet’s datasets per language were merged into one. This meant news country datasets of n = 60,366 for the Germany, n = 42,245 for the Netherlands and n = 139,259 for the UK respectively. These datasets were subsequently shuffled to facilitate random allocation of all articles. The first 1,500 articles of the randomly sorted datasets were then exported as a CSV file including the title, text, date and ID to be manually coded. For this, a codebook was created including an elaborate explanation of the operationalization as outlined below. The main researcher

(15)

manually coded all n = 4,500 articles in three languages, a second coder coded n = 300 articles of which n = 150 each in the Dutch and British datasets (13.3% of all training data). Testing inter-coder reliability showed that the coders reached an overall agreement of 91.70% and a Cohen’s k of .64 which is substantial and, given the complexity on the definition of culture and the smaller share of cultural reporting, was deemed satisfactory. The manual classifications were then merged with the respective articles in the datasets. Then, the title and full text of each article were merged and subsequently preprocessed: all words were set to lower case and stop words, punctuations as well as special signs and symbols (e.g. HTML coding) were removed to have a clean text only consisting of words (see Vermeer, Araujo, Bernritter & Van Noort, 2019).

Operationalization and analyses

Cultural reporting. For the manual coding (which, as a matter of course, also

determined the automated coding) cultural reporting was coded as present when a news article mainly covered the arts or cultural elements. To specify this, it was relied on the art definition of Janssen et al.’s (2008) study on cultural coverage in four Western newspapers. The arts contain “applied arts (architecture, arts and crafts, fashion, design), classical music, dance, film, literature, popular music, television, theater, and visual arts” (Janssen et al., 2008, p. 725). This was combined with Hofstede et al.’s (2010) layers of culture: values could e.g. be manifested through trends and fashion, rituals through traditions and religion, heroes through philosophers or pop stars and symbols through language and tourism. Considering Nye’s definition of culture as a “set of practices that creates meaning for a society” (2008, p. 96), cultural reporting was further defined to be manifested through both high culture tailored for elites and popular culture focused on mass entertainment.

During the manual coding, every article was coded as either cultural reporting (= 1) or non-cultural reporting (= 0). For this, the coders were instructed to code an article as cultural

(16)

reporting when the title or the first up to 8 sentences unambiguously indicated culture as defined above. If that was not the case, this meant that culture was not the article’s main topic and it was coded as non-cultural reporting. Overall, 13.67% (Germany = 13.13%, Netherlands = 11.47%, UK = 16.4%) of the articles in the three manually coded datasets were coded as cultural reporting.

To assess how well an SML classifier can recreate human coding, a so-called ‘train-test validation’ was performed. The machine was trained with 1,200 articles per language and the success of this training was validated with the remaining 300 (20%). As there are different ways to train a classifier, a total of 4 validations were executed to evaluate the most

successful one. These consisted of either a logistic regression (LR) classifier or a multinomial Naïve Bayes (NB) classifier combined with either a count vectorizer (CV) or a TF.IDF

vectorizer. An LR classifier learns the probability of a sample belonging to a class (cultural or non-cultural reporting) whereas NB has the naïve assumption of every feature being

independent. The difference between the vectorizers was that a count vectorizer simply counts word frequencies while for TF.IDF the words are weighted, and the value of a word increases proportionally for each count. More frequent words hence receive a lower weight.

These 4 classifiers were compared by looking at overall accuracy, as well as precision, recall and the so-called ‘F1 score’ in classifying an article as cultural reporting (which was considerably more difficult due to the low percentage of these articles). Accuracy describes how many articles overall were coded just as in the manual coding. Precision describes how many articles coded as positive (=1) are actually positive according to the manual coding. Recall describes how many articles the machine rightfully coded positive out of all that were manually coded positive. The F1 score is the weighted average of precision of recall. The results are illustrated in Table 3. In all cases, the combination of TF.IDF and logistic

regression (emphasized in bold) was deemed the most suitable and was subsequently used for the automatic classification of the uncoded data. For the German dataset, this decision was

(17)

straightforward. For the other two, this decision was made because this classifier enabled the highest level of recall which was of the highest importance for this study. This was because ensuring the highest amount of actual cultural reporting was correctly coded as such—in order to examine visibility within this kind of reporting—was the most important criterion (see Vermeer et al., 2019 for a detailed overview of scores and a similar evaluation).

Table 3 Overall accuracy and precision, recall and F1 score for classifying cultural

reporting of vectorizer-classifier combinations

Accuracy Precision Recall F1

German CV & NB .91 .72 .53 .61 CV & LR .90 .70 .48 .57 TF.IDF & NB .87 .0 .0 .0 TF.IDF & LR .93 .73 .73 .73 Dutch CV & NB .92 .68 .62 .65 CV & LR .94 .79 .65 .71 TF.IDF & NB .89 .0 .0 .0 TF.IDF & LR .93 .71 .71 .71 English CV & NB .91 .65 .72 .69 CV & LR .93 .83 .66 .73 TF.IDF & NB .85 .0 .0 .0 TF.IDF & LR .91 .64 .82 .72

The choice of a logistic regression classifier also offered the opportunity to look into the black box of coding with the help of eli5 (Korobov & Lopuhin, 2016), a Python library that is able to debug machine learning classifiers. Illustrating the words with the highest weights in labeling something cultural reporting, eli5 showed that the automatic coding was conceptually appropriate and coherent for all three languages. For example, the top 20 words

(18)

with the highest positive weight included ‘film’, ‘music’, ‘audience’, ‘series’ and ‘novel’ for Germany, ‘book’, ‘museum’, ‘festival’, ‘film’, and ‘show’ for the Netherlands and ‘film’, ‘music’, ‘show’, ‘festival’ and ‘theatre’ for the UK. Interestingly, the words with the highest negative weight included ‘police’ in all cases and ‘Trump’ in the cases of Germany and the UK. The full list of the top 20 weights can be found in Appendix B.

Country visibility. To analyze the differences between cultural and non-cultural

reporting in terms of country visibility, the mentions of each country were counted. This included all 193 countries recognized by the United Nations (n.d.) as well as Kosovo,

Palestine, Taiwan and Vatican City. To do so, in all languages, a list of so-called dictionaries (dicts) was created based on the Python library country lists that included 255 country names and their ISO alpha-2 code (e.g. MX for Mexico)1. These dicts were first shortened to exclude autonomous or oversea territories belonging to sovereign states (e.g. Gibraltar) and non-countries (e.g. Antarctica). It was then extended with a manually created list that included synonyms for countries (e.g. Holland for the Netherlands) and merged terms for multi-word countries (e.g. south_africa for South Africa). The latter was necessary because the present approach only allowed for the counting of single words—thus, multi-word countries in the articles were also replaced by merged terms before analysis. Based on this list of dicts, country mentions were subsequently counted in all articles. If mentions of the formerly excluded oversea territories were found, they were counted for their sovereign country (e.g. Gibraltar for the UK). Exceptions were the two Virgin Islands and Sint Maarten/St. Martin where clear allocation was not possible and mentions were not counted. The home countries of the respective news outlets were excluded from the analysis.

Country overview and visualization. In order to understand what predicts mentions

in both cultural and non-cultural reporting, it was deemed necessary to have an explorative look at the wealth of data. For this, the top 10 most-mentioned countries for each kind of

1

(19)

reporting and news country were visualized with matplotlib (Hunter, 2007). To assess trends, similarities and differences, an overview of the 25 countries per kind of reporting overall and for each news country was created (Appendix C). The overall data was computed by

calculating the average of the share (%) of mentions of each country in the three news countries per kind of reporting. For the mentions of Germany, the Netherlands and the UK respectively this average consisted only of the two shares of the other news countries.

Additionally, an alphabetic list of all countries and their respective share of all mentions was created (Appendix D). Based on the overall scores of this list, two choropleth maps were created with Datawrapper (n.d.). With the same tool, stacked bar charts with countries sorted by region and income level (based on The World Bank, n.d.) were created.

Status, relatedness and negative events. As a final methodological step, a set of

external predictor variables was included to test the hypotheses. For this, a CSV dataset was manually created. It included all 197 countries’ data for the predictor variables status, relatedness, and negative events.

To quantify status for H1, the per capita Gross Domestic Product (GDP) in 2018 for all countries was retrieved from The World Bank (n.d.). Where data was not available for 2018, data of the most recent year available was used. Only in the cases of Syria and Venezuela was this data more than three years old. Where no data at all was available, estimates from other sources were consulted. This was the case for North Korea (Bajpai, 2019), Taiwan (Countryeconomy.com, n.d.) and Vatican City (Encyclopedia.com, 2020). Per capita GDP was measured in 1,000 US$ and reached from 0.27 (Burundi) to 185.74 (Monaco).

For the quantification of relatedness (H2), a binary variable was manually created for Germany, the Netherlands and the UK (a simplified version of Wu’s [2003] approach). This variable was coded = 1 for all neighboring countries sharing an open land border, all countries in which the majority of the population are native speakers of the same language, and for the top three countries from which the most foreign-born immigrants stem according to the

(20)

Federal Office for Migration and Refugees in Germany (2014), the United Nations (2015) and Iamexpat.nl (Blair, 2012).

To measure negative events (H3), the 2018 Global Peace Index (GPI), developed by the Institute for Economics & Peace (2018) in cooperation with a panel of experts, was consulted. This index measures the relative peacefulness of nations. The 2018 report covered 163 independent states, 99.7% of the world’s population, through the use of 23 qualitative and quantitative indicators such as societal safety, ongoing conflict, militarization or crime. It reached from 1.07 (Iceland) to 3.57 (Afghanistan). Given the relative complexity of this scale, it was decided to exclude countries without a GPI score from analyses including the GPI rather than assigning them mean or median values. These were only relatively small countries such as Dominica, Kiribati, Palau or Tuvalu.

Regression model. To test the hypotheses, a multiple regression model was created as

illustrated by the formula below. As the country visibility was measured in country mentions it was a count variable with an irregular distribution, which would have usually demanded a negative binomial regression. However, considering the complex interpretation of such regressions, it was decided to assess whether an OLS regression with the natural logarithm of y + 1 would help simulate normal distribution. As illustrated in Figures E1-E12 in Appendix E, this was possible in all cases. Hence, it was possible to execute an OLS regression. To represent the actual one-unit increases per country mention, the predictor variables’ coefficients were calculated by multiplying them with eb (exponentiation), whereas b was the coefficient of the logarithmically (log) transformed regression.

(21)

Results

As an initial exploration of the data, overall country visibility is illustrated in Table 4. Specifically, Table 4 presents the percentage of news articles within each kind of reporting that mention at least one foreign country. For all news countries, the share of these articles was greater in non-cultural reporting. Finding a distinct difference between the two kinds of reporting on datasets of such size may be deemed meaningful2. Albeit not huge, the

systematically lower visibility of foreign countries indicates a relatively stronger focus on the local when it comes to cultural reporting. However, that at least two-fifths of all reporting— and at least one-third of cultural reporting—includes other countries also shows that

newspapers in all countries have a somewhat international orientation. British outlets, in particular, have an especially high share of articles mentioning foreign countries.

Table 4. Articles mentioning foreign countries: share of all articles (%) and total number

German Dutch British

In cultural reporting 34.78% (2,261) 36.30% (1,439) 41.17% (10,558) In non-cultural reporting 40.60% (21,857) 42.72% (16,355) 53.27% (60,517) In both kinds of reporting 39.97% (24,118) 42.12% (17,794) 51.04% (71,075)

With the aim of generating more profound insights for all RQs, the country mentions were compared. Due to the vast amount of data points, this part of the analysis was visualized in detail. To be able to compare cultural reporting and non-cultural reporting despite the distinctly higher amount of the latter, this comparison was based on each country’s share (%)

2

It was consciously decided to not include a chi-square or a t-test here. For one, this was because these are meant for hypotheses testing which is not the case here. Moreover, with sample sizes like the present ones, chi-square tests are almost always significant. Since the variables constitute count data, a t-test would also not have been appropriate and a log-transformation would have meant missing data in terms of, for example, standard deviations. Instead, it was deemed that the differences for cases and sample sizes like the present speak for themselves.

(22)

of all country mentions per kind of reporting wherever necessary. Figures 1-6 show a

visualization of the top 10 most-mentioned countries per news country, per kind of reporting3. Regarding the hypotheses, these visualizations already offer some insights. It is

observable that high-status countries gathered a high share of mentions for both kinds of reporting: the United States (US), France, China, Germany and the UK are present in the top-ten countries for all six analyses (considering that the latter two were not counted in their own news outlets) and Italy ranked within the top-eleven. These are first indications for the

support of H1.

In line with H2, also related countries were present for both kinds of reporting. For German news outlets, Austria was in the top 10 for both kinds of reporting and Switzerland was 9th for mentions in cultural reporting and 15th in non-cultural respectively. For Dutch news, Belgium was ranked 7th and 11th highest, Germany 4th and 3rd. The same goes for the UK, where Ireland ranked 5th in both cases and India 4th and 7th. The latter constitutes the immigrant country with the highest ranks.

Next to the similarities between the news countries, country-specific observations were the strong visibility of the US in Germany and the Netherlands and the ubiquitous presence of Australia in the UK.

3_{Appendix C shows the percentage data of the top 25 countries with the most mentions per kind of reporting,} per news country.

(23)

Figure 1 & 2. Most mentioned countries (ISO codes) by total mentions in Germany

Cultural reporting Non-cultural reporting

Figure 3 & 4. Most mentioned countries (ISO codes) by total mentions in the Netherlands

Cultural reporting Non-cultural reporting

Figure 5 & 6. Most mentioned countries (ISO codes) by total mentions in the UK

(24)

Figures 7 and 8 group the three news countries together, and map the overall relative visibility of all countries, in % of country mentions, per kind of reporting4. Here we see interesting similarities between the two types of reporting, as well as several important differences. When looking at countries that face negative events (H3), the data give a first indication towards the support of the hypothesis. Many countries ranking high on the GPI (indicating less peacefulness) also ranked high in non-cultural reporting specifically. For example, Syria had an overall lower share of the visibility in cultural reporting (1.15%) than in non-cultural (1.78%) which was also the case for all three news countries separately. The same applies for Russia (2.47% vs. 4.76%), Iran (0.72% vs 2.72%) and North Korea (0.23% vs. 1.17%). Russia even ranked within the top 6 for all news countries in non-cultural

reporting, Iran within the top 12 and Syria within the top 25. They all ranked lower in cultural reporting.

Further important observations can be made when analyzing the data of Figure 7 and 8. For one, the relatively lower visibility in cultural reporting for the US (7.17% vs. 9.33%) and China (4.25% vs. 6.74%). Additionally, the figures show a striking absence of African countries as well as a higher share for Middle Eastern countries in non-cultural reporting. What is visible in both the graphs and the choropleths is also that there are some noteworthy countries which received visibly more coverage in cultural reporting. Most prominently,

France (6.38% vs 4.49%), Japan (2.96% vs. 1.45%), Italy (3.73% vs. 2.84%) and Spain (2.48% vs. 1.99%). Also, Israel ranked a lot higher for cultural news in the Netherlands and Germany. Finally, while an overall strong focus on Europe is notable, it appears even stronger in

cultural reporting separately.

4_{Appendix D includes an entire overview of the share of all countries per news country as well as the overall} average scores per kind of reporting.

(25)

Figure 7. Choropleth map showing share (%) of country mentions in cultural reporting

(26)

To formally test the hypotheses about which factors predict country visibility in the two types of reporting, two OLS regression models per news country were run. As Tables 5, 6 and 7 show, all models were significant and the results point in a clear direction. For cultural reporting, as expected, status and relatedness were significant predictors for all three news countries, while—in line with the expectations—for Dutch and British articles negative events were not a significant predictor. For the German outlets, however, negative events also significantly predicted a country’s mention in cultural reporting.

For non-cultural reporting, all three variables were significant in the cases of all three news countries. Thus, H1 and H2 could be fully supported and H3 could be supported for Dutch and British5 outlets but not for the German. Besides, the non-cultural reporting models universally could explain a higher percent of variance than those predicting cultural reporting as is shown by their higher adjusted R2 scores. It needs to be kept in mind that the predictors’ coefficients were log-transformed before the analysis (see methods section). Thus, the

exponentiated coefficients in the table indicate the actual predicted increase in mentions for every one-unit increase in the predictor variables while the other predictor variables do not change.

5

Given the country’s surprising prevalence in the UK, as a robustness check, a regression without Australia was run. It showed similar results, but, as could be expected, relatedness became insignificant for non-cultural reporting. This highlights the importance of Australia especially for the relatedness variable (as without it, the by far most-important related country was missing while many less attention-grabbing countries [e.g. Barbados, St. Kitts] remained in the variable). It indicates that other related countries may not be so important non-culturally. Other than that, it confirms the findings.

(27)

Table 5. Regression model predicting number of mentions for German outlets (N=161)

Cultural reporting Non-cultural reporting

b SE exp(b) b SE exp(b) Constant 1.35* 0.54 2.22** 0.64 GDP 0.04*** 0.01 1.04 0.06*** 0.01 1.06 Relatedness 1.12* 0.45 3.07 1.28* 0.53 3.61 GPI 0.58* 0.23 1.78 1.18*** 0.27 3.25 R2 .27 .34 R2adjusted .26 .33 F 19.23*** 27.27*** Note. * p <.05. ** p <.01. *** p <.001.

Table 6. Regression model predicting number of mentions for Dutch outlets (N=161)

b SE exp(b) b SE exp(b) Constant 1.31* 0.57 1.71** 0.60 GDP 0.04*** 0.01 1.04 0.05*** 0.01 1.06 Relatedness 2.09*** 0.56 8.05 2.06** 0.59 7.84 GPI 0.41 0.24 1.51 1.46*** 0.25 4.29 R2 .28 .35 R2adjusted .27 .34 F 20.70*** 28.18*** Note. * p <.05. ** p <.01. *** p <.001.

Table 7. Regression model predicting number of mentions for British outlets (N=161)

b SE exp(b) b SE exp(b) Constant 3.78*** 0.61 4.15*** 0.66 GDP 0.04*** 0.01 1.04 0.06*** 0.01 1.06 Relatedness 1.26** 0.46 3.53 1.06* 0.50 2.89 GPI 0.27 0.26 1.30 1.06*** 0.28 2.88 R2 .26 .30 R2adjusted .24 .28 F 18.22*** 22.05*** Note. * p <.05. ** p <.01. *** p <.001.

(28)

Discussion and conclusion

This study developed an operationalization of cultural reporting to distinguish news articles that, by mentioning foreign countries, might indicate those countries’ soft power in European news. Based on a machine learning classification and the automated analysis of almost 250,000 news articles across six news outlets in three countries, it was possible to assess trends between cultural reporting and non-cultural reporting in terms of country visibility. In regard to the questions posed at the outset of the paper, this study was able to find that there is indeed a difference between the two types of reporting in terms of foreign country visibility. It could be confirmed that status (H1) and relatedness (H2) are predictors for both kinds of reporting and, as expected, negative events were only a predictor for non-cultural reporting (H3), with the exception of the German news outlets, where negative events were a predictor for both kinds of reporting. Given the connection between peace and

attraction (Nye, 2008), these results helped validate the association between cultural reporting and soft power as theorized. The present findings were thus not only in line with previous studies (Sheafer et al., 2013; Segev, 2015; Wu, 2003), they also go beyond them by

expanding our understanding of country visibility by different types of news. Likewise, they followed the need for establishing new models of evaluating soft power (Pamment, 2014).

While the three news countries showed overall similar trends, some country-specific findings deserve elaboration. For the UK, the comparatively higher share of news mentioning foreign countries may be due to the status of English as the global lingua franca: English-language British reporting might not only be focusing on the British audience but partly also on international readers. Furthermore, Australia’s ubiquitous presence in British outlets is striking and illustrates the importance of cultural and historical ties (Wilke et al., 2012) for foreign coverage. Nonetheless, the magnitude of Australia’s visibility is surprising and

demands further examination. The same goes for the presence of the US in the British outlets; given the US’s outstanding popularity in foreign reporting (Wilke et al., 2019) its great

(29)

importance for the Netherlands and Germany is less surprising than its relatively small significance for the UK. Lastly, that negative events are also a significant predictor for cultural reporting in Germany hints at the possibility that German news outlets have

comparably more cultural interest in crisis countries—something to be investigated further. Going into detail for both kinds of reporting, the three news countries showed similar trends in terms of country visibility. The vast majority of all news was about Europe, North America and East Asia, complemented by Russia, India and Australia. On the whole, as also shown by hypothesis testing, high-status countries (measured by GDP per capita in the present study) generally received many mentions as show the examples of France, China, Italy, Germany and the US. Likewise, some of the respective related countries received much attention. Putting this all into perspective, it indicates that which countries we see in the news is, regardless of the kind of reporting, determined by status and relations.

Concerning the differences between the two kinds of reporting, the explorative analyses showed in detail that many countries with higher scores on the negative events

variable (GPI) were ranking higher for non-cultural reporting. For example, North Korea, Iran, Russia or Syria. That, despite their medium GPI scores, also the US and China received more coverage in non-cultural reporting is connected to their involvement in crises that do not concern their own country (Wilke et al., 2012). This reflects the literature on the explanatory power of conflict (Guo & Vargo, 2017; Segev, 2019) but also advances the findings by attributing it to non-cultural reporting specifically. However, negative events do not equal coverage. Apart from the northern territories and South Africa, Africa as a continent—

including many countries ranking high on the GPI—receives almost no attention even in non-cultural news. While the Middle East’s relatively high non-non-cultural coverage may be

connected to its physical proximity and the growing number of immigrants to the West (Segev, 2015), this shows that even for negative events there is a hierarchy: they matter more if the country facing them also matters more in terms of status or relatedness.

(30)

Cultural reporting in specific included a generally high proportion of European countries, with many of them (France, Spain, Italy) receiving comparably more attention as compared to non-cultural reporting—indicating cultural preferences for the geographical vicinity. Although proportionally less than in non-cultural reporting, also the US played a big role reflecting its unique role in the cultural industries (Wu, 2003). A special case of

considerably more attention in cultural reporting was Japan—presumably due to the

combination of high status and exotic Far Eastern culture as a counterpart to Western cultural dominance (Goldstein-Gidoni, 2005). A phenomenon that deserves further attention. In contrast, the comparatively low visibility of high GPI countries in cultural reporting points to the conclusion that news outlets are rarely interested in troubled countries beyond the problem itself (this might exclude German outlets, given the significance of negative events there also for cultural reporting). Moreover, the relatively lower share of articles mentioning foreign countries in cultural reporting may be explained with the local bias of arts coverage, the tendency of journalists to cover culture at a national level as it is the realm where their judgment counts (Janssen et al., 2008). In contrast, the increasing globalization of politics may explain higher international coverage in non-cultural reporting (Bekhuis, Meuleman & Lubbers, 2013).

Putting the detailed explorative findings and the hypotheses testing together shows an explicit picture. For both kinds of reporting, a few countries rule the agenda. On the non-cultural side, those are countries that are either financially mighty, closely related to reporting countries or involved in crises. On the cultural side, the power is not distributed more evenly but only with an (even) higher focus on Europe. In the analyzed countries, the soft power elite constitutes mainly high-status countries from the immediate surroundings—something that has elsewhere been coined cultural provincialism (Grasland, 2019). In essence, this

emphasizes that globalization in the news means not an equal global exchange but rather a worldwide awareness of the superpowers (Wilke et al., 2012). Even more so, this study finds

(31)

that, albeit for a somewhat different set of countries, this is just as well the case for cultural reporting in specific—where it is about the global soft superpowers.

Taking a look at the study as a whole, some assumptions can be made. The

circumstance that, between June 2018 and August 2019, the sample constituted a full census of articles in six news outlets indicates explanatory power. The findings of analyses of almost a quarter of million articles in some the world’s most important countries’ (Wilke et al., 2012) most important quality newspapers may—at least for the Western world—give a

comprehensive insight into cultural reporting and soft power. In other European countries and the US similar patterns may be expected, while a smaller focus on Europe and bigger on the immediate surroundings can be expected in Asia (Segev & Blondheim, 2013; Segev, 2019). As European outlets have been found to be the least unequal in foreign news reporting in global comparison (Segev, 2019), it may not be expected that news elsewhere shows a more diverse picture. Ironically, especially in under-represented Africa comparatively less attention to the immediate surroundings is given (Segev, 2019) indicating that even within Africa the coverage of African countries is relatively little.

Methodologically, this study benefited from recent advances in computational social science (Trilling & Jonkman, 2018). The proposed definition of cultural reporting appeared to be replicable with a supervised machine learning classifier, posing a good argument for additional ‘big data’ research in the realms of soft power and country visibility.

Yet, a few limitations remain. For one, it may be questioned why only full country names were taken into account. While country possessives or city names could also have indicated country visibility, the time constraints and methodological steps undertaken for this study meant that only full country names were feasible for this first step. However, this was not expected to influence the results as it may be expected that a country’s full name would at least be mentioned once in an article covering it. Furthermore, as this study was above all comparative, even if the unlikely event of overlooking would occur, this would have been the

(32)

case for all articles and hence not have interfered with the results. Future studies should nonetheless consider adding additional indicators. It should also be highlighted again that the month of September 2018 was lacking data which, given that this study did not look at over-time changes, was not expected to have changed the results significantly.

Apart from the possibilities mentioned already, by isolating cultural reporting, additional future research advances are possible. The success of the classifier opens many opportunities in researching characteristics of soft power, for example through unsupervised machine learning or by combining automated and manual analyses. It would especially be relevant to assess the context—tone, sentiment, valence—in which a country is mentioned. This would help, for example, advancing the soft power evaluation even further by assessing which countries receive particularly much positive coverage. Future research could also assess which variables function as a predictor for only cultural events (as a counterpart to negative events). Finally, with longitudinal studies, it would be possible to understand changes in cultural reporting as well as to assess trends.

All in all, this study aimed to help understand which countries could claim to exert soft power in the European sphere by distinguishing between cultural and non-cultural

reporting and measuring the differences in country visibility. Having assessed these dynamics on a dataset of considerable size, we can conclude that in the central European media

landscape, countries of high status as well as those with strong relations to the news country receive the most cultural and non-cultural coverage. Most importantly, however, they show that there are many countries that receive a lot of coverage due to crises in the news, but we learn little about these countries from a cultural perspective. This implies that we do not necessarily have much interest in countries that face crises other than the crises themselves. The overall results imply also that not only for non-cultural reporting but also for cultural reporting, news focus mainly on a selected elite. In conclusion, a few peaceful, rich and closely related countries exert the most soft power on other nations in Europe.

(33)

References

Alexa Internet. (n.d.). Top Sites in Netherlands. Retrieved from:

https://www.alexa.com/topsites/countries/NL.

Australian Government – Department of Foreign Trade and Affairs. (n.d.). United Kingdom. Retrieved from: https://dfat.gov.au/geo/united-kingdom/Pages/united-kingdom-country-brief.aspx.

Bajpai, P. (2019, October 6). How the North Korea Economy Works. Investopedia. Retrieved from: https://www.investopedia.com/articles/investing/013015/how-north-korea-economy-works.asp.

Blair, C. (2012, July 16). Relatively few foreigners living in the Netherlands. I am Expat. Retrieved from: https://www.iamexpat.nl/expat-info/dutch-expat-news/relatively-few-foreigners-living-netherlands.

Bekhuis, H., Meuleman, R., & Lubbers, M. (2013). Globalization and support for national cultural protectionism from a cross-national perspective. European Sociological Review, 29(5), 1040-1052, doi:10.1093/esr/jcs080.

Burggraaff, C., & Trilling, D. (2020). Through a different gate: An automated content analysis of how online news and print news differ. Journalism, 21(1), 112-129, doi:10.1177/1464884917716699.

Boumans, J. W., & Trilling, D. (2016). Taking stock of the toolkit: An overview of relevant automated content analysis approaches and techniques for digital journalism scholars. Digital journalism, 4(1), 8-23, doi:10.1080/21670811.2015.1096598.

Countryeconomy.com. (2020). Taiwan GDP - Gross Domestic Product. Retrieved from:

https://countryeconomy.com/gdp/taiwan.

Datawrapper [Online data visualization tool]. (n.d.). Retrieved from

https://www.datawrapper.de/.

(34)

https://www.encyclopedia.com/places/spain-portugal-italy-greece-and-balkans/italian political-geography/vatican-city.

Federal Office for Migration and Refugees (BAMF). (2014). Migrationsbericht des Bundesamtes für Migration und Flüchtlinge im Auftrag der Bundesregierung. Migrationsbericht 2012. Retrieved from:

https://www.bamf.de/SharedDocs/Anlagen/DE/Forschung/Migrationsberichte/migrati onsbericht-2012.pdf?__blob=publicationFile.

Galtung, J., & Ruge, M. H. (1965). The structure of foreign news: The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of peace research, 2(1), 64-90, doi:10.1177/002234336500200104.

Goldstein-Gidoni, O. (2005). The production and consumption of ‘Japanese culture’ in the global cultural market. Journal of consumer culture, 5(2), 155-179,

doi:10.1177/1469540505053092.

Grasland, C. (2019). International news flow theory revisited through a space–time interaction model: Application to a sample of 320,000 international news stories published

through RSS flows by 31 daily newspapers in 2015. International Communication Gazette, 0(0) 1–29, doi:10.1177/1748048518825091.

Guo, L., & Vargo, C. J. (2017). Global intermedia agenda setting: A big data analysis of international news flow. Journal of Communication, 67(4), 499-520

doi:10.1111/jcom.12311.

Hafez, K. (2011). Global journalism for global governance? Theoretical visions, practical constraints. Journalism, 12(4), 483-496, doi:10.1177/1464884911398325.

Halford, S., & Savage, M. (2017). Speaking sociologically with big data: Symphonic social science and the future for big data research. Sociology, 51(6), 1132-1148,

doi:10.1177/0038038517698639.

(35)

coverage: Old global hierarchies in a new online world. Journalism & Mass Communication Quarterly, 87(2), 297-314, doi:10.1177/107769901008700205. Hofstede, G., Hofstede, G. J., & Minkov, M. (2005). Cultures and organizations: Software of

the mind (3rd edition). New York: McGraw-hill. Retrieved from:

https://www.bookdepository.com/Cultures-Organizations-Software-Mind-Third-Edition-Geert-Hofstede/9780071664189.

Hunter, J. D. (2007). Matplotlib: A 2D Graphics Environment. Computing in Science & Engineering, vol. 9, no. 3, pp. 90-95, 2007, doi:10.1109/MCSE.2007.55.

Hur, K. K. (1984). A critical analysis of international news flow research. Critical Studies in Media Communication, 1(4), 365-378, doi:10.1080/15295038409360047.

Institute for Economics & Peace. (2018). Global Peace Index 2018: Measuring Peace in a Complex World. Retrieved from:

http://visionofhumanity.org/app/uploads/2018/06/Global-Peace-Index-2018-2.pdf. Janssen, S., Kuipers, G., & Verboord, M. (2008). Cultural globalization and arts journalism:

The international orientation of arts and culture coverage in Dutch, French, German, and US newspapers, 1955 to 2005. American sociological review, 73(5), 719-740, doi:10.1177/000312240807300502.

Jones, T. M., Van Aelst, P., & Vliegenthart, R. (2013). Foreign nation visibility in US news coverage: A longitudinal analysis (1950-2006). Communication Research, 40(3), 417-436, doi:10.1177/0093650211415845.

Kim, K., & Barnett, G. A. (1996). The determinants of international news flow: A network analysis. Communication Research, 23(3), 323-352,

doi:10.1177/009365096023003004.

Korobov, M. & Lopuhin, K., 2016) (2016). Eli5. Debug machine learning classifiers and explain their predictions. Retrieved from: https://pypi.org/project/eli5/.

(36)

the 9th Python in Science Conference, 51-56 (2010). Retrieved from:

http://conference.scipy.org/proceedings/scipy2010/mckinney.html

Nye Jr, J. S. (1990). Soft power. Foreign policy, (80), 153-171, doi:10.2307/1148580. Nye Jr, J. S. (2004). Soft power: The means to success in world politics. Public affairs.

Retrieved from:

https://www.publicaffairsbooks.com/titles/joseph-s-nye/soft-power/9780786738960/.

Nye Jr, J. S. (2008). Public diplomacy and soft power. The annals of the American academy of political and social science, 616(1), 94-109, doi:10.1177/0002716207311699. Nye Jr, J. S. (2009). Get smart: Combining hard and soft power. Foreign affairs, 160-163.

Retrieved from: www.jstor.org/stable/20699631.

Östgaard, E. (1965). Factors influencing the flow of news. Journal of peace research, 2(1), 39-63, doi:10.1177/002234336500200103.

Pamment, J. (2014). Articulating influence: Toward a research agenda for interpreting the evaluation of soft power, public diplomacy and nation brands. Public Relations Review, 40(1), 50-59, doi:10.1016/j.pubrev.2013.11.019.

Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., … &, Duchesnay, E. (2011). Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research, 12, 2825-2830 (2011). Retrieved from:

http://jmlr.org/papers/v12/pedregosa11a.html.

Segev, E., & Blondheim, M. (2013). America's global standing according to popular news sites from around the world. Political Communication, 30(1), 139-161,

doi:10.1080/10584609.2012.737418.

Segev, E., Sheafer, T., & Shenhav, S. R. (2013). Is the world getting flatter? A new method for examining structural trends in the news. Journal of the American Society for Information Science and Technology, 64(12), 2537-2547, doi:10.1002/asi.22932. Segev, E. (2015). Visible and invisible countries: News flow theory revised. Journalism,

(37)

16(3), 412-428, doi:10.1177/1464884914521579.

Segev, E. (2019). From where does the world look flatter? A comparative analysis of foreign coverage in world news. Journalism, 20(7), 924-942, doi:10.1177/1464884916688292. Semetko, H. A., Brzinski, J. B., Weaver, D., & Willnat, L. (1992). TV news and US public

opinion about foreign countries: The impact of exposure and attention. International Journal of Public Opinion Research, 4(1), 18-36, doi:10.1093/ijpor/4.1.18.

Sheafer, T., Ben-Nun Bloom, P., Shenhav, S. R., & Segev, E. (2013). The conditional nature of value-based proximity between countries: Strategic implications for mediated public diplomacy. American Behavioral Scientist, 57(9), 1256-1276,

doi:10.1177/0002764213487732.

Shoemaker, P. J., Danielian, L. H., & Brendlinger, N. (1991). Deviant acts, risky business and US interests: The newsworthiness of world events. Journalism Quarterly, 68(4), 781 795, doi:10.1177/107769909106800419.

Spencer-Oatey, H. (Ed.). (2008). Culturally speaking: Managing rapport through talk across cultures. 2nd edition. A&C Black. Retrieved from:

https://www.bloomsbury.com/uk/culturally-speaking-second-edition-9780826493101/

The Guardian Open Platform. (n.d.). Retrieved from: https://open-platform.theguardian.com/

The Soft Power 30. (2019). A Global Ranking of Soft Power 2019. Portland, Facebook, USC Center for Public Diplomacy. Retrieved from:

https://softpower30.com/wp-content/uploads/2019/10/The-Soft-Power-30-Report-2019-1.pdf.

Trilling, D., & Jonkman, J. G. (2018). Scaling up content analysis. Communication Methods and Measures, 12(2-3), 158-174, doi:10.1080/19312458.2018.1447655.

Trilling, D., Van De Velde, B., Kroon, A. C., Löcherbach, F., Araujo, T., Strycharz, J., ... & Jonkman, J. G. (2018). INCA: Infrastructure for content analysis. In: 2018

IEEE 14th International Conference on e-Science (e-Science) (pp. 329-330). IEEE, doi:10.1109/eScience.2018.00078.

(38)

United Nations (n.d.). Member states. Retrieved from https://www.un.org/en/member-states/. United Nations. (2015). Trends in International Migrant Stock: Migrants by Destination and

Origin (XLS). United Nations, Department of Economic and Social Affairs. Retrieved from: https://www.un.org/en/development/desa/population/migration/data/estimates2/

data/UN_MigrantStockByOriginAndDestination_2015.xlsx.

UNESCO. (1954). Convention for the protection of cultural property in the event of armed conflict with regulations for the execution of the convention. The Hague, May 14, 1954. Paris: UNESCO. Retrieved from:

http://portal.unesco.org/en/ev.php-URL_ID=13637&URL_DO=DO_TOPIC&URL_SECTION=201.html.

Wilke, J., Heimprecht, C., & Cohen, A. (2012). The geography of foreign news on television: A comparative study of 17 countries. International Communication Gazette, 74(4), 301-322, doi:10.1177/1748048512439812.

World Bank. (n.d.). GDP per capita (current US$). Retrieved from:

https://data.worldbank.org/indicator/NY.GDP.PCAP.CD?name_desc=false

Wu, H. D. (2000). Systemic determinants of international news coverage: A comparison of 38 countries. Journal of communication, 50(2), 110-130, doi:

10.1111/j.1460-2466.2000.tb02844.x.

Wu, H. D. (2003). Homogeneity around the world? Comparing the systemic determinants of international news flow between developed and developing countries. Gazette (Leiden, Netherlands), 65(1), 9-24, doi:10.1177/0016549203065001134.

Vermeer, S. A., Araujo, T., Bernritter, S. F., & van Noort, G. (2019). Seeing the wood for the trees: How machine learning can help firms in identifying relevant electronic word-of mouth in social media. International Journal of Research in Marketing, 36(3), 492 508, doi:10.1016/j.ijresmar.2019.01.010.

(39)

APPENDIX Appendix A. Article date distribution in news outlets

Month FAZ SZ NOS NRC Telegraph Guardian

06.2018 1459 2044 1449 1636 1601 6747 07.2018 1575 2307 1251 1531 1492 6581 08.2018 1938 2714 1280 1446 1426 6207 09.2018 71 128 76 308 300 6373 10.2018 2184 2963 1683 1892 1838 6869 11.2018 2148 2858 1527 1868 1832 6651 12.2018 1896 2390 1404 1733 1705 5759 01.2019 1845 2308 1294 1758 1701 6402 02.2019 1938 2613 1358 1784 1734 6130 03.2019 1816 2520 1306 1801 1741 6564 04.2019 1818 2283 1320 1795 1750 6163 05.2019 1788 2476 1310 1654 1629 6720 06.2019 1880 2375 1010 1822 1791 6631 07.2019 2100 2726 1212 1671 1649 6457 08.2019 1833 1342 1135 1459 1441 6335