A different gate : how platform traits lead to differences in the content of online and print news

(1)

A different gate: how platform traits

lead to differences in the content of

online and print news

Master’s Thesis

Christiaan Burggraaff 10004226

Graduate School of Communication

Master’s Program Communication Science Supervisor: Damian Trilling

24th June, 2016

Abstract

This study is the first to apply computer-aided quantitative content analysis of Dutch online and print news in order to find differences in news values in articles on the two platforms. A Python script was written and used to analyze 888,057 online and print news articles of major Dutch online and print news sources. As expected, online and print news environments were found to react differently to the changes that journalism sees itself confronted with. Commercialization, as well as a higher workload for and a different role conception of online journalists lead to a more one-dimensional style of reporting, as well as less control of power elites compared to print news. Implications for journalisms role in modern democracy are discussed.

(2)

Introduction

Over the last years, news websites have become a more and more important news source for Dutch news consumers. While the circulation of newspapers continues to decrease (Papieren oplage krant daalt verder, 2015), citizens have become more and more reliant on online news sources (Crossmediaal nieuws lezen is de norm geworden, 2016; Mitchelstein, & Boczkowski, 2010). Given the amount of users, it is clear that both channels are important factors in the news diet of the average Dutch citizen: about 50 percent of the Dutch read a newspaper, and readership of online news is almost equally high. And while the news

coverage on these two channels may seem interchangeable for the average news consumer, it is, in fact, not at all clear to what extent online and print news content are different. Based on past research, three causes of possible differences were identified.

Firstly, Journalism is going through an era of commercialization (McManus, 2009). Journalists try to reach as large an audience as possible because media companies are dependent of large audiences in order to attract advertisers and investors. Yet, while online, journalists have detailed insights into their readers preferences due to extensive website metrics (Anderson, 2011, Tandoc, & Thomas, 2015; Welbers, van Atteveldt, Kleinnijenhuis, Ruigrok, & Schaper, 2015), in case of print, journalists can only rely on occasional surveys amongst their readers to find out about their readers preferences. Past research has shown that this knowledge of online journalists shapes news decisions (MacGregor, 2007; Lee, Lewis, & Powers, 2014).

Secondly, in an online environment, journalists have a much higher workload than in a print environment. Journalists have to produce more stories per day and usually have to work in a smaller team. As a consequence, they have less time for important journalistic tasks like fact checking. (Witschge & Nygren, 2009).

Finally, it has been shown in several studies that online and offline journalists differ when it comes to their role conception. Cassidy (2005) found that in a print environment, the interpretative/investigative role is rated as significantly more important than in an online environment, while in an online environment, the disseminator role is rated most important.

(3)

Online, journalists want to publish news stories as soon as possible, while for print journalists, the daily deadline marks the first occasion for an article to be published. .

All of these three factors (commercialization, differences in workload, differences in

news routines) exert an influence on the so called gatekeeping process, “the process of

selecting, writing, editing, positioning, scheduling, repeating and otherwise massaging information to become news” (Shoemaker, Vos, & Reese, 2008). Given that the external differences on the gatekeeping process differ between online and print media, it is an important question if these differences lead to a difference in news content. Therefore, this study applies a quantitative content analysis of Dutch online and print news articles to answer the following question: to what extent does the difference in influences on the

gatekeeping process in online and print news environments lead to a difference in the content of the stories on the two respective channels?

From a scientific point of view, this question is of great relevance: while most scholars agree that journalism fulfills an important task in any (functional) democracy, it has not yet been investigated if online journalism can meet the necessary standards that print journalism could meet over decades. By using a fairly new method, i.e. quantitative, computer-aided content analysis, this study can shed more light on the important, yet barely investigated issue if there are content differences between online and print news.

The public relevance of this research is even higher. Given that journalism is a prerequisite for democracy (Strömback, 2005; Siegert, Gerth, & Rademacher, 2011; Jacobi, Kleinen-von Königslöw, & Ruigrok, 2015), and since journalism is the main source of

information for society (Zelizer, 2009), it is important to know if there are differences between online and print news. If so, a change in the news diet in favor of online news could lead to a shift in knowledge of news consumers. In order to fulfill its important role for democracy, journalistic content has to meet several requirements: citizens have to know about current (political) affairs, journalism has to control politics and the economy, and journalism must initiate public debate (Siegert et al., 2011). It is not clear if online news can meet these

(4)

standards the way print journalism did. If not, it is important to be aware of this as soon as possible in order to think of other ways to facilitate the prerequisites for democracy.

Theoretical Framework

Key Concepts. For this study, it is very important to fully grasp the concept of journalism: Weischenberg, Malik and Scholl (2012) define journalism as the act which “provides the public independently and periodically with information and issues that are considered newsworthy, relevant and fact-based”. In case of online news, this information is brought to the public via the internet, while for print news, the information reaches the audience via the newspaper. The only exception to this rule are digital copies of newspapers (readable, for example, on a tablet computer), because their content is identical to the print edition of the respective newspaper.

The difference between online and print journalists is harder to make. While there are newsrooms in the Netherlands that are solely concerned with the production of online news, all major print newspapers have a website as well. In those cases, there is no longer a strict division between online and print journalists. Those news companies are examples of

newsroom convergence. Depending on the level of newsroom convergence, a smaller or

larger subset of the employees produces both stories for the website and for the print edition (García Avilés, Meier, Kaltenbrunner, Carvajal, & Kraus, 2009; Vobic, 2011). Of course, there are still specialists that work exclusively for one platform, but they have colleagues that work for both channels. The consequences of newsroom convergence will be discussed later.

When it comes to the content of news, news values are a very useful approach. News values are about the content of news stories, but the essence of news values is that they serve as a heuristic that helps journalists to select newsworthy stories (Galtung & Ruge, 1965, Harcup & O’Neill, 2001), i.e. to decide what is news that is worth publishing and what is not. According to Galtung and Ruge (1965), the more news values are prevalent in a news story, the higher the likelihood that journalists will consider it news.

(5)

Effects of commercialization. The gatekeeping process has been explained earlier, and so has been the fact that this process is subject to exterior influences. Amongst the most influential external factors for the gatekeeping process are, probably, corporate influences. According to Entman (2005, p. 58), generating advertising revenue is one of two central aims of media companies, the second one being editorial work. In their work, journalists have to consider both aims at all times.

Accordingly, while they will probably be reluctant to admit it, economic considerations are an important factor in the daily work of journalists. At least subconsciously, journalists, in their work, are influenced by economic demands. For example, when journalists publish a story online, they monitor the number of readers and aim at the highest possible value: while they may not consciously do this with the intention of generating profit for their company, this is exactly what they do. Market considerations affect journalists work and thus have an influence on the gatekeeping process (Tandoc, 2014).

Yet, online journalists have more possibilities to act upon these considerations than print journalists. On online news websites, journalists have a wide set of website metrics which enable them to see exactly which stories are read a lot and which are not (Tandoc & Thomas, 2015; Tandoc & Vos, 2015; Welbers et al., 2015; Anderson, 2011; Karlsson & Clerwall, 2013), in case of print media, these numbers are not available.

Several studies with different methods have found that website metrics do indeed shape the news selection process. Several studies suggested quite a strong influence from audience clicks on the gatekeeping process (Anderson, 2011; Lee, Lewis, & Powers, 2014; Jacobi et al., 2015; Welbers et al., 2015). Thus, on this point, there is an important difference between the online and print gatekeeping process: online, news decisions are adapted almost immediately to the readers preferences, while for print media, that is impossible. It is not very surprising that online media try to serve their readers preferences. Online, it is very easy for news consumers to visit another website, whereas in case of print newspapers, readers are less flexible because they usually have a subscription which cannot simply be cancelled.

(6)

Further, online news consumers do not only read news on the news websites themselves: quite often, they visit such websites through links they find on other platforms like Google News, Facebook and Twitter. Thus, for online journalists, it is very important to monitor and live up to the preferences of news consumers: if they fail to do so, news consumers will not even bother to visit the company’s website.

Workload. The demands of the market are manifold and go beyond the need to attract a large audience. Accordingly, its influences on the gatekeeping process are massive, too. As was outlined in the previous section, online news websites are active in a much more competitive environment than traditional newspapers. Thus, journalists working in an online environment have to produce more stories and have to work much faster than their print colleagues (Witschge and Nygren, 2009; Bivens, 2008). Mitchelstein and Boczkowski (2009), Boczkowski (2009) and Quandt (2008) add to these findings that online journalists usually have to work on several tasks at the same time.

Online journalists not only have to produce news at a very high pace, almost half of the online journalists report they have less time for fact checking because of the high speed in which stories must be published (Cassidy, 2006). Sometimes, even ethical standards are violated in order to be fast enough (Agarwal and Barthel, 2013). This finding is in line with Cassidy (2006), who found that online journalists are less likely to apply general ethical rules than their print colleagues. This is a massive difference in terms of gatekeeping: online journalists have less time to verify facts, and the high workload can eventually lead to the neglect of important journalistic guidelines. It goes without saying that this is a serious downside of online journalism compared to print journalism.

Role Conception. Apart from the fact that commercialization pressures journalists working in an online environment to work fast, it also seems to be part of their ‘DNA’. Past research has found differences when it comes to the role conceptions of online and print journalists (Paulussen, 2004; Beam, Weaver, & Brownlee, 2006). More specifically, online journalists tend to score high on the disseminator role, which means that they see

(7)

on the interpretative role dimension (Deuze & Dimoudi, 2002; Cassidi, 2005; Møller-Hartley, 2013; Carpenter, Boehmer, & Fico, 2015). As just pointed out, this may come at high cost: in order to be faster than their competitors, online journalists sometimes neglect ethical

standards.

Newsroom convergence and its consequences. The previous sections assume a clear difference between online and print journalists, as well as the coexistence of two separate newsrooms. Yet, as was already mentioned, this is not the case. Over the last couple of years, the structure of newsrooms has changed in such ways that the division between online and print has become less clear (García Avilés et al., 2009; Vobic, 2011): in an effort to cut costs, media companies aim for a certain degree of convergence between the online and print crew. Journalists of different platforms cooperate, stories are published on both channels and some journalists even work for both platforms.

Of course, newsroom convergence makes a lot of sense from an organizational point of view. For a media company, it does not make sense to produce identical articles twice: sometimes, a print article can be distributed online, and vice versa. This does not only enable news organizations to cut their staff and their budget, it also is a way to reduce the workload for journalists. Yet, while this trend must be accounted for, newsroom convergence does not mean that the sheer existence of a difference between online and print journalists must be questioned. As García Avilés et al. (2009) point out, newsroom convergence is a delicate issue, and many media companies struggle with finding the right level of convergence. Therefore, it is definitely not the case that the division between online and print journalists is obsolete. The borders may have been lowered, but it is still valid to investigate both print news and online news as existing, different entities: merging both platforms completely may be desirable in terms of corporate interests, yet unachievable from a practical point of view.

At the same time, whenever newsroom convergence leads to articles that are

published both online and in the respective newspaper, the gatekeeping processes of online and print mingle, creating a different news production process. These articles are not

(8)

and cannot be assigned to either one of them. Sometimes, these articles are even the result of cooperation between online and print journalists (García Avilés, Meier, Kaltenbrunner, Carvajal, & Kraus, 2009). Thus, in order to be able to investigate the difference between online and print articles, it is important to think of these articles as a third category, next to exclusive online and print news. This is accounted for in the analysis.

Roundup. To sum up, due to the differences between online and print environments when it comes to the gatekeeping process, the content of news on the two channels may be expected to differ. The exterior and interior influences on the gatekeeping process differ between the two platforms, and a convincing argument has been made for the assumption that online and print news is produced in a substantial different way. Given the

characteristics and boundaries of online news environments, it may be expected that online news, to some extent, shows greater similarity with popular news than traditional

newspapers. Over years, popular (tabloid) newspapers have been amongst the best-selling newspapers, and attracting large audiences is exactly what is expected from online news (Jonsson, 2007). Attracting large audiences is not only feasible for online news sources, but also desirable from a commercial perspective.

News values. As a next step, it is important to investigate how these differences manifest themselves in the news content. As mentioned before, this study investigates the content of news by measuring the prevalence of news values. It is important to be aware of the fact that news values are not an objective criterion. They are applied in order to decide what is or is not newsworthy, but the way they are applied can also be influenced by external influences, for example organizational needs (Staab, 1990).

Given the technical boundaries and the time that was available for this project, it is beyond the scope of this study to investigate all news values identified by Harcup and O’Neil. Because the method of choice was computer-aided content analyses, only those news values could be considered for the analyses that can be translated to computer-readable code. Further, it was beyond the scope of this study to code those news values that require

(9)

extremely advanced programming skills or large computational power. Neither the time nor the computational power were available to apply such methods.

Therefore, this study will focus on the presence (or absence) of a subset of news values on which online and print news are likely to differ due to the earlier mentioned differences between online and print news environments. Additionally, those news values whose presence or absence could be a threat to democracy will be analyzed. Given all these considerations and limitations, the following subset of Harcup and O’Neils news value list will be investigated (Harcup & O’Neil, 2001, p. 279):

1. Power Elite: “Stories concerning powerful individuals, organizations or institutions” 2. Celebrity: “Stories concerning people that are already famous”

3. Entertainment: “Stories concerning sex, showbusiness, human interest, animals, an unfolding drama, or offering opportunities for humorous treatment, entertaining photographs or headlines”

4. Bad News: “Stories with particular negative overtones, such as conflict or tragedy” 5. Good News: “Stories with particular positive overtones, such as rescues and cures” 6. Follow-up: “Stories about subjects already in the news”

Power Elite. Focus on power elite is probably the single most defining news value: for a story to be newsworthy, it has to deal with important, relevant, well-known entities in order to be of interest. Further, in a democracy, one of the most important tasks of the media is to control the elites and to question their doings (Siegert et al., 2011). Given the

importance of this news value, it is important to verify if there are differences. Thus, the following research question was investigated:

RQ1: To what extent is there a difference in the amount of news on the power elite between the various news sources?

Focus on the power elite can mean that only these elites are mentioned, but it can also mean that there is a focus on the people within those elites. The manner in which different media report on elites might differ. Örnebring and Jönsson (2004) found that popular newspapers are more likely to focus on people than elite newspapers. Further, Jacobi et al. (2015) found

(10)

that the difference in focus on political leaders between elite and popular newspapers is found to be larger online. Thus, it may be expected that popular newspapers are more likely to apply a personalized reporting style, and that this difference is larger online.

H1A: The degree of personalization is expected to be higher for popular newspapers compared to elite newspapers.

H1B: The difference between elite newspapers and popular newspapers in the degree of personalization is expected to be larger in online news compared to print news.

Celebrity news/entertainment news. Celebrity news and entertainment news may be the most despised categories of news by those who see news standards declining: this kind of news deals with topics like sex, show business and human interest (Harcup & O’Neill, 2001). While hardliners say that citizens to be able to serve democracy need the “Full News

Standard” (Zaller, 2003), examples from the tabloid press show that celebrity and

entertainment news attract large audiences. Amongst all British newspapers, the tabloids have the largest circulation (Jonsson, 2007), and the German BILD is the largest newspaper in Germany when it comes to circulation (IVW, 2014). Since online journalists need to attract large audiences, the prevalence of news stories on these topics may be assumed to be much bigger.

In his research on the stories in online and print media, Maier (2010) did indeed find that celebrity/entertainment news was one of only three news categories (out of a total of nineteen) in which online newspapers published more stories than print newspapers, and also van der Wurff, Lauf, Balcytiene, Fortunati, Holmberg, Paulussen and Salaverria (2008) found that news websites publish more entertainment stories than print newspapers.

Apparently, in an online environment, entertainment and celebrity news are considered adequate stories that ‘pass the gate’, while in a print environment, this is less likely.

Further, these kind of stories are easy to interpret and do not call for a lot of research, which makes them easy to produce, (Bird & Dardenne, 2009 in Handbook of Journalism Studies, p. 209; Lehman-Wilzig & Seletzky, 2010). Given the small online staff sizes (Singer,

(11)

2006; Mitchelstein & Boczkowski, 2009) and the high work pressure, this is an important asset of these news stories: they do not require a lot of work. Thus, it is likely that especially online, these stories are present because they require less effort and attract many readers . H2: The amount of stories with celebrity news is expected to be higher in online news compared to print news.

H3: The amount of stories with entertainment news is expected to be higher than in online news compared to print news.

Bad News/Good News. It is widely acknowledged that journalists have a tendency to cover bad news (Leung & Lee, 2015). In their paper on news values, Galtung and Ruge (1965) offer several explanations for this: negative news is usually unexpected, unambiguous, it has a higher frequency and it fits into most people’s picture of the world. It has also been shown that negative news generally tends to attract a larger audience than positive news (“if it bleeds, it leads”). Thus, especially journalists who are very aware of the commercial aspects of their job should be expected to publish bad news.

On the other hand, the rise of infotainment and soft news may also contribute to the production of positive news. As Leung and Lee (2015,) found, journalists tend to believe that touching, positive stories are popular within the audience. Since online, journalists do not have to worry about a lack of space (Jacobi et al., 2015), it would be very well possible that online, there are not only more negative emotions, but also more positive emotions. In order to write entertaining articles, journalists may make use of an emotional tone of voice rather than a neutral writing style. Up to this point, research has not yet extensively investigated the amount of good versus bad news that passes the gate in online or print environments. Thus, the following research question will be investigated:

RQ2A: To what extent is there a difference in the relative amount of negative news between print and online news?

RQ2B: To what extent is there a difference in the relative amount of positive news between print and online news?

(12)

RQ2C: To what extent is there a difference in the degree of emotionality between print and online news?

Given the competitiveness in online news environments, it is of extreme importance that titles catch the interest of readers and make them click a given article. In print news, it is mostly the front page headline that can affect the decision to buy a newspaper, while online, every article title has to attract attention for the corresponding article because otherwise, the article is not read. Therefore, it is important to investigate the amount positivity and negativity in article titles separately. Thus, the following research question was formulated:

RQ2D: To what extent do online and print news titles differ in terms of the amount of good/bad news?

Follow-up. Given the budget cuts that journalists have to deal with, journalists favor stories that are easy to produce. Labor costs are among the highest expenses, which makes investigative journalism (which leads to unique scoops) more expensive than follow-up stories on issues that are already in the news. Further, journalists in an online environment have to produce way more stories on a daily basis than their print colleagues (Witschge and Nygren, 2009), and they want to publish stories as quickly as possible (Agarwal and Barthel, 2013). Further, online journalists report they feel like they are not executing proper (i.e. investigative) journalism, because the workload is too high (Witschge and Nygren, 2009). Of course, this means online journalists have significantly less time to verify facts and to

investigate new sources than print journalists. This trend is in line with statements from newsmakers themselves who say that online, they strive for the ideal of ongoing 24 hour coverage, where being just minutes behind the competitors is already seen as failure (Bivens, 2008,). Accordingly, online journalists are expected to produce significantly more follow-up stories than their print colleagues.

H4: The amount of follow-up news is expected to be higher in online news than in print news.

News values that were not investigated in this study include surprise, magnitude, relevance, and the newspaper agenda. Unfortunately, it was beyond the scope of this article to

(13)

investigate more news values, and therefore, those which seemed most promising in terms of possible differences were selected. The only exception to this is surprise, which might seem as a news value that is helpful in order to attract large audiences. Yet, this news value is almost by definition hard to analyze in a computer-based analysis. In order to get reliable results, information has to be organized in certain patterns that the computer can find, which is almost exactly the opposite of surprise. Therefore, although this news value may be of interest, it was not accounted for.

Method

For this study, a dataset created by a research group of the University of Amsterdam was used. The database consists of digital versions of print newspaper articles, as well as free online articles of several national Dutch (newspaper) news websites. The online articles were downloaded by following the links provided in the respective websites RSS-feeds of the online news sources. In the next step, those articles were simply scraped from the website and stored in the database, along with some other features like the date of the articles (Trilling, 2014). For the print articles, next to the articles, the same additional features were stored.

Instead of sampling, the method of quantitative content analysis allows for

investigation of the entire population in a given time frame. The dataset consisted of a total of 899,331 articles, of which some cases were duplicates. Further, in case of the online version of de Volkskrant, some extremely long articles had the exact same length and scored almost identical across all variables. Because their title shared great similarity as well (it seemed like they were part of a frequently updated blog), only one copy of them was kept because such high, identical scores on article length cannot be valid. After removal of these duplicates (N=6427), the number of remaining articles for the analysis was 892,904. Further, several articles (N=4,847) had a length of less than 100 characters. It seems unlikely that these articles are in fact valid cases and were therefore deleted. The analyzed dataset therefore consisted of 888,057 articles, all of which were published between the 1st of January, 2014

(14)

and the 12th of April, 2016. The sample contained 444.520 online articles (411.622

exclusively so), 443.537 print articles (410.872 exclusively so), and 65.563 articles that were published both online and in print. In table 1, the exact numbers of articles per source are listed.

Table 1: Number of articles per source

source online print non-exclusive: online (print)*

nu.nl 28,014 n.a.** n.a.

AD 28,422 92,815 2,270(2,269)

Telegraaf 175,162 103,478 10,557(10,568)

NOS 23,844 n.a. n.a.

Volkskrant 69,822 58,185 8,740 (8,641)

NRC 22,417 69,601 4,700 (4,504)

Trouw 5,270 56,877 948 (937)

Metro 54,581 29,916 5,683 (5,746)

Geenstijl 4,091 n.a. n.a.

Total 444,520

(411,622)***

443,537 (410,872)***

65,563

*While the number of non-exclusive articles is approximately the same within two platforms of one source, the numbers need not be equal because based on one online article, two print articles may be written or vice versa. ** n.a. = not available

*** numbers in brackets do not include articles published both online and print

In order to investigate the prevalence of different news values on these various platforms, this study applies quantitative content analysis. The articles were automatically coded using a Python script. Each news value was defined in terms of certain criteria that could be translated to python code. Below, for each news values, the way it was coded will be

explained in further detail. While reliability is not an issue in computer coding, validity will be discussed (Boumans & Trilling, 2016).

Article Length. The articles in the sample varied greatly in length. After some impossible values were removed, the overall average article length was 1725.44 characters (SD=1950.96). The shortest article was 100 characters long (the title excluded), the longest article was 134544 characters long.

(15)

Power Elite. News on power elite was defined in terms of political, geographical and economic power. In order to find characteristic words of political power, articles about political news were coded and references to political power were stored in a list (Appendix A). The coding was iterative and was continued until no new words came up. The same process was repeated for characteristic words of economic power.

In order to find countries that can be considered to be part of the power elite, countries fulfilling one or more of the following criteria were included: for those countries belonging to the global G20, the country name, capital, seat of government and the biggest city were added to the list with political power references. Then, this list was extended by those countries that have the highest GDP per citizen and the highest overall GDP. Again, the capital, biggest city and seat of government were included. In the coding process, a bag of words approach (Harris, 1954) was chosen: every occasion of one of the words from the list in the body of an article meant an increase in power elite score by 1 point, an occurrence in the title meant an increase by 2 points (M=3.41, SD=4.08, range[0;77]).

Personalization. Personalization was investigated using named entity recognition (Nadeau & Sekine, 2007). Named entity recognition is part of natural language processing, enabling the computer to identify certain entities (like locations, persons) in a text. More specifically, the Python Natural Language Toolkit (NLTK, Bird, Klein & Loper, 2009) was used to train a naïve bayes classifier and chunker in order to find named entities in the articles. As training data, the Dutch version of the conll2002 data was used (Tjong Kim Sang, 2002), which comes with the nltk.corpora package. The trained NER-model achieved an F1-score of 69.3 percent (precision=66.9%, recall=71.9%). Using this NER-model, the articles were searched references to persons, which resulted in a personalization score per article by counting the number of entities that were recognized as being persons (M=3.06, SD=4.49, range[0;771] (sic).

Celebrity news. In order to find celebrity news, first, articles with celebrity news were coded. In this process, a list of ‘celebrity jobs’ was created in order to find the various

(16)

carried out until no further references could be found. In order to make sure that only well-known celebrities were included, the word ‘famous’ was added for most keywords. With the resulting list of potential jobs (Appendix B), a SPARQL query was set up in order to find DB-pedia articles (the machine-readable form of WikiDB-pedia). The objective of this query was to find Dutch Wikipedia articles about persons, where one of the ‘celebrity jobs’ was mentioned in the pages abstract. For some of the more specific keywords (like TV-anchormen), a prerequisite to be included was that they were Dutch in order to avoid an unnecessary high number of false positives, while in case of, say, actors (which in Dutch has a less broad meaning than in English), this requirement was not set. Still, the query resulted in a list of almost 10.000 celebrities. The dataset was then searched for the appearance of one or more of those celebrities, resulting in a celebrity score for every article by simply counting the number of occurrences (M=0.31, SD=0.93, range[0;65]. The sample contained a total of 156,158 celebrity articles.

Entertainment news. The used database also stores the subdirectories of the website where a given article was published. Not all news sources categorize their articles reliably, but nu.nl does do exactly this. Thus, in order to find words characteristic of

entertainment articles, the nu.nl articles from the category entertainment news were

collected. For these articles, a frequency analysis of the words was executed. In the resulting list, those words that appeared less than ten times were deleted, as well as those words that by no means can be expected to be characteristic of entertainment news only. An example of this would be the term “vs”, as an abbreviation of versus as well as the short form the Dutch translation of the United States (US). This resulted in a list of characteristic words (N=214, Appendix C) that all articles in the total set were scanned for. Because these words still are not all typical of entertainment news only, it was required that all articles contained a certain number of different words from the list, depending on the article length in tokens. Articles were then assigned an entertainment score, were words in the body of the article meant an increase of one, while words in the title meant an increase of two (M=2.34, SD=3.02, range[0;116].

(17)

Table 2 shows the minimal entertainment score for an article of a given length to be considered an entertainment article, the false positives were calculated based on the nu.nl data. In the final dataset, the number of entertainment articles was 62,154 articles. Precision of this method was 80.2%, recall was about 98%.

Table 2: Criteria for Entertainment articles

article length minimum entertainment score false positives

article length < 500 4 2.1 %

500 < article length < 1500 6 1.5 %

1500 < article length < 2500 7 1.6 %

2500 < article length < 3500 10 0.6 %

3500 < article length 12 1.6 %

Of course, for the classification of articles as being or being not entertainment articles, machine supervised learning could have been applied. Yet, as not all sources classify their articles, creating a valid training dataset is almost impossible. Still, the approach was tested: after training a naïve Bayes classifier on 14,000 nu.nl articles, the test on another 14,000 nu.nl articles yielded an accuracy of about 91 percent. Given the problems of making good training data, this method was not applied as the method above yielded acceptable results.

Positive/negative news. In order to measure the amount of positive and negative news, a sentiment analysis was carried out for each article using the Sentistrength software for Dutch (Thelwall, Buckley, Paltoglou, Cai & Kappas, 2010). Each article was assigned a score for the amount of positivity (M=2.02, SD=1.01, range[1;5])/ negativity (M=-2.80, SD=0.91, range[-5;-1], which makes it possible to compare the emotionality of different articles. As Thelwall et al. point out in their article, sentiment is not a two-dimensional scale formed by positivity on the one and negativity on the other end: rather, both are concepts that do not necessarily have to be correlated strongly and as such can (and have to be)

measured individually.

Differences in article titles were investigated by calculating the log likelihood distribution in order to find over- or underrepresented words in either of the two corpora

(18)

(online news titles vs. print news titles; Rayson & Garside, 2000). In appendix I, a table with those words that were characteristic of one of the two corpora can be found (LL > 500). Quite some clear differences were found, which are discussed below.

Follow-up news. An article was considered a follow-up article if its topic was covered in another article from the same source that was published up to two days in advance. In order to find such articles, the cosine similarity between a published article and all articles on the following two days was calculated. If the cosine similarity was higher than .5, an article was considered a follow-up article of the previously published article. This threshold was determined by trying several thresholds on a random dataset of 100 different articles, and it turned out that a threshold of .5 yielded the best results. Each article was labeled as being a follow-up article or not.1

In total, 49.975 (5.63%) of the articles were found to be follow-up articles.

Online and print articles (non-exclusive articles). As elaborated in the previous sections, it is important to account for newsroom convergence. Articles that were on two platforms of one source were found by using the cosine similarity. For example, for the web version of the NRC, in order to find non-exclusive articles on a given day, the cosine

similarity was calculated for those online articles compared to all print articles from that very same day, as well as those on the day before and the day after. It was determined that if the cosine similarity between an online and a print article was above .7, the articles were

considered as being non-exclusive and were analyzed as a separate category of articles. In total, 65.563 articles were found to be published both online and in print.

1

For follow-up news, in a functioning beta-version of the script, the time window for follow-up articles could be set with a higher degree of precision by comparing two articles at a time. This way, it could be determined precisely how much time passed within the publishing of the first and follow-up article. Further, for each article, it could be determined how many follow-up articles were based on that very same article. Unfortunately, this script performed very poorly because the number of required calculations exceeded the power of the CPU. An alternative approach with latent semantic indexing, that would have enabled an equal level of accuracy, also failed due to CPU-limitations. In the final version of the script, the cosine similarity was used, but such implemented that instead of one-on-one comparisons, larger subsets of the data were compared at the same time by applying sparse matrix multiplication.

(19)

Results

Table 4 shows the overall results for all variables, as well as the results for some of the subgroups which are relevant for the analysis.

Research Question 1. Hypothesis one was investigated by carrying out a multiple regression analysis (table 3). Regression diagnostics can be found in appendix D. As can be seen in table 4, on average, the number of elite references is higher in print news (M=3.59, SD=4.28)) than in online news (M=3.24, SD=3.85). Yet, interestingly, when article length is controlled for, print articles turn out to score lower than online articles. This means that the prevalence of elite references in online articles is more dense, but it is important to notice that overall, the number of elite references is still higher in print news. Popular newspapers, on average, score lower on elite references than do elite newspapers. Non-exclusive articles did not differ significantly from other articles. The results show that elite news coverage varies between the different sources.

Hypothesis 1. In order to test hypothesis 1A, another multiple regression analysis was executed (table 3). Regression diagnostics can be found in appendix E. Quite

interestingly, while the results from table 4 indicate that popular newspapers score lower on personalization than elite newspapers, the results of the regression analysis show that this effect is spurious: when controlling for article length and platform, it turns out that on average, popular newspapers score higher on personalization than elite newspapers. The regression model shows that, as expected, the degree of personalization is higher in popular news than in elite news sources: hypothesis 1A was confirmed. Further, personalization was found to be used more often in print news than in online news.

Concerning hypothesis 1B, there is an interaction effect of platform and news type: as expected, online, the difference between popular news and elite news is bigger than in print. Therefore, hypothesis 1B was confirmed.

(20)

table 3: results Regression Analyses

elite news: RQ1 personalization: H1 negative sentiment: RQ2 pos. sentiment: RQ2 emotionality: RQ2

b (SE) β b (SE) β b (SE) β b (SE) β b (SE) β

article lengtha .001 (.000) .50 *** .001 (.000) .58 *** -.0001 (.000) -.38 *** .0002 (.000) .40 *** .0003 (.000) .49 *** platform_print b -.54 (.008) -.07 *** .44 (.014) .05 *** .06 .03 *** .12 .06 *** .05 .02 *** non-exclusiveness c .01 (.014) .00 n.s. -.08 (.015) .00 *** -.01 .00 *** .02 .01 *** .03 .01 *** news type d -.77 (.008) -.09 *** .32 (.013) .03 *** .09 .05 *** .07 .03 *** -.02 -.004 *** interaction e -.27 (.017) -.03 *** R² (N) .28 (832,108) .34 (832,108) .15 (832,108) .17 832,108 .25 (832,108)

a: article length in words, b: 0=online, 1=print, c: 0=exclusive, 1=non-exclusive, d: 0=elite news, 1=popular news, e: 0=popular news; online, 1=popular news; print

n.s.: not significant * p<.05 **p<.01 ***p<.001

Logistic Regression Analyses

celebrity news: H2 entertainment news: H3 follow-up: H4

or (SE) e z or (SE) z or (SE) z

article lengtha 1.70 (.006) f 151.88 *** 1.00 (.000) 90.01 *** 1.00 (.000) -1.48 n.s.

platform_print b 1.11 (.007) 17.32 *** .92 (.008) -9.32 *** .60 (.006) -51.21 ***

non-exclusiveness c 1.00 (.011) .19 n.s. .97 (.015) -1.86 n.s. .83 (.016) -9.76 ***

news type d 1.07 (.007) 10.53 *** 1.05 (.010) 5.05 *** .96 (.010) -3.93 ***

Pseudo-R² (N) .04 (832,108) .02 (832,108) .04 (832,108)

a: article length in words, b: 0=online, 1=print, c: 0=exclusive, 1=non-exclusive, d: 0=elite news, 1=popular news; e: or=odds ratio; f: variable article length was log transformed

(21)

table 4: overall means and standard deviations

*excluded: nos, nu, geenstijl **no data available

Variable M (SD)

article length celebrity articles entertainm ent articles power elite news positive sentiment negative sentiment follow_up news online and print articles personaliz ation overall (N=888,057) 1725.54 (1950.96) 0.18 (0.38) 0.07 (0.26) 3.41 (4.08) 2.02 (1.01) -2.80 (0.91) 0.06 (0.23) 0.08 (0.27) 3.06 (4.49) online (N=444,520) 1312.36 (1546.15) 0.15 (0.36) 0.07 (0.25) 3.24 (3.85) 1.88 (0.96) -2.77 (0.87) 0.07 (0.25) 0.08 (0.28)* 2.37 (3.88) print (N=443,537) 2139.64 (2209.62) 0.20 (0.40) 0.07 (0.26) 3.59 (4.28) 2.16 (1.03) -2.83 (0.96) 0.04 (0.20) 0.07 (0.26) 3.76 (4.93) only online (N=411,622) 1288.17 (1482.48) 0.15 (0.36) 0.07 (0.25) 3.20 (3.81) 1.87 (0.96) -2.76 (0.87) 0.07 (0.26) n. a.** 2.34 (3.83) only print (N=410,872) 2136.79 (2204.84) 0.20 (0.40) 0.07 (0.26) 3.58 (4.28) 2.16 (1.03) -2.83 (0.96) 0.04 (0.20) n. a.** 3.76 (4.92) non-exclusive (N=65,563) 1894.31 (2236.83) 0.18 (0.39) 0.07 (0.26) 3.60 (4.32) 2.08 (1.01) -2.83 (0.91) 0.05 (0.21) n. a.** 3.22 (4.82) Popular news (N=525,557) 1276.23 (1211.16) 0.16 (0.37) 0.07 (0.25) 2.64 (3.33) 1.96 (1.00) -2.68 (0.92) 0.06 (0.23) 0.07 (0.26) 2.54 (3.59) Elite news (N=362,500) 2376.96 (2545.74) 0.19 (0.39) 0.08 (0.26) 4.53 (4.75) 2.11 (1.01) -2.97 (0.87) 0.06 (0.23) 0.09 (0.29) 3.81 (5.46)

(22)

Hypothesis 2. Hypothesis 2 was investigated using a logistic regression (table 3). For the logistic regression model to converge, the variable article length had to be log transformed. The results show that for print articles, the likelihood to contain celebrity news compared to offline articles is about 11.14% higher, so hypothesis 2 was rejected. Further analysis showed that popular news sources are more likely to contain celebrity news than elite news sources. No significant difference was found between non-exclusive articles and the other articles. The likelihood for the various sources to publish articles with celebrity references can be found in table 5. It must be said that the number of celebrity articles that was found in the dataset was unlikely high. It seems like the amount of celebrities that resulted from the DB-pedia queries was too high: instead of measuring celebrity references, the measurement instrument rather was another, worse, measure of personalization that a measurement of celebrity news. A logistic regression analysis controlling for personalization showed that effects of platform are spurious, while the effect of personalization on celebrity news was highly significant.

Table 5: Likelihood for an article to contain celebrity references

Source Likelihood*

Online 16.13 %

Print 17.61 %

elite news 16.32 %

popular news 5.57 %

elite news 5.36 %

* likelihood was calculated by centering all other independent variables and then calculating the chance for an article from the respective platform to contain to be a follow-up article (i.e. score 1 instead of 0).

Conclusion and Discussion

To our knowledge, this paper is the first to apply a quantitative, computer-aided content analysis in combination with DB-pedia and Named Entity Recognition (NER) in order to shed light on differences in the prevalence of news values in online and print news articles. This approach combines several assets of computer aided coding and as such offers important insights in the differences between online and print news. While earlier studies were mostly limited to analyzing differences in the way online and print journalists execute their job or describing the market of journalism, this study is the first to study the content of Dutch news.

As outlined before, reliability is not an issue in a computer-aided content analysis, while validity can be problematic. Over the given time frame, all articles were included: thus, the results should represent the current situation: external validity should be guaranteed. Yet, conclusions about the past are dangerous because the media environment is in a continuous state of transition. Concerning internal validity, where possible, precision and recall were calculated. While in case of entertainment articles, the results were excellent, the precision and recall for the NER-tagger were acceptable but left room for improvement. It has been

(26)

shown that NER is a useful approach in measuring personalization, but in future research, this method needs some refinement. In case of positive and negative emotions,

Sentistrength has so far mainly been used for analysis of rather short texts. While the

software seems capable of analysis of longer texts as well, it would be useful to evaluate the method in comparison to, for example, a machine supervised learning approach in order to more precisely evaluate if the measurement is valid. Concerning follow-up news and non-exclusive articles, it may be useful for future research to check if the results match manual coding, but no reasons were found to seriously question the results of the applied method (cosine similarity).

All analyses show that there are significant differences between online and print news. Therefore, the basic assumption that the differences between online and print news environments lead to a difference in the gatekeeping process and therefore to a difference in news content is confirmed. At the same time, newsroom convergence seems to lead to a mixture of the gatekeeping process of the two platforms. On all variables of interest, non-exclusive articles were found to score in between the scores for non-exclusive articles. Apart from this conclusion, newsroom convergence will not be further discussed.

It is important to notice that the high workload of online journalists clearly affects the form and the content of online news (Boczkowski, 2009; Mitchelstein and Boczkowski, 2009; Witschge and Nygren, 2009). Despite the smaller staff sizes, online journalists manage to publish about the same number of articles as their print colleagues. In order to be able to publish so many articles, online journalists publish shorter articles. This may of course have to do with the characteristics of the internet itself, but the workload definitely contributes to this.

The high workload also effects the coverage on the power elite. As outlined,

controlling power elites is one of the most important tasks of journalism (Siegert et al., 2011). While the relative number of elite references is higher in online news, the total coverage on the power elite is more extensive in print news and elite news sources. Scrutiny in covering the power elite requires extensive research, and online journalists lack this time. This study is

(27)

the first to show this important difference in a large-scale content analysis, and it is important to realize that online journalism may be less capable to fulfill this important task for

democracy compared to print journalism. This interpretation requires further analysis, which may be provided in upcoming qualitative research.

The lack of elite news is just one symptom of the problem of high workload. Another point of attention is the overall emotionality of news articles. The fact that print news scores higher on emotionality may not so much be caused by an overly emotional style of writing, but rather by a lack of reflection in online articles. Online and print news have almost equal scores on negativity, but online, the amount of positive news is much lower. While print journalists cover two sides of every story (both positive and negative), online journalists tend to focus on the negative side of every story (“if it bleeds, it leads”). This relates to several earlier studies that found that online journalists lack the time to do research (Witschge and Nygren, 2009), do not always stick to ethical rules (Agarwal & Barthel, 2013) and want to publish articles as quickly as possible (Bivens, 2008; Agarwal & Barthel, 2013). Apparently, the desire (and need) to work fast sometimes withholds online journalists from covering two sides of every story: it almost seems like online journalists work a bit sloppy. Of course, the believe that negative news attracts more readers also contributes to its prevalence.

The analysis of the titles further fortifies this assumption. The major difference between online and print titles is that online, journalists are more likely to apply words that create strong negative emotions, while print news titles indicate a higher degree of reflection and background stories. Apparently, in order to attract readers, online journalists deem it necessary to attract readers writing in an emotional tone. Of course, it must be mentioned that online, the choice of an article title is very important: the title is by far the most important part for a readers decision to click on an article or not. In order to attract readers and make them click a given article, the titles have to appeal to the reader. This difference in titles shows that online journalists aware of the need to attract more readers and act accordingly (Entman, 2005; Tandoc, 2014).

(28)

The results also confirm differences in role conception. Online journalists see

themselves as disseminators, while print journalists see themselves as interpreters (Deuze & Dimoudi, 2002; Paulussen, 2004; Cassidi, 2005; Beam, Weaver, & Brownlee, 2006; Møller-Hartley, 2013; Carpenter, Boehmer, & Fico, 2015). Online journalists publish their stories as quickly as possible and do not take the time for reflection or to gather further information but rather publish their articles right away. This leads to shorter, sometimes unbalanced articles.

The working speed of online journalists may also explain differences in the score on personalization. Print articles, on average, score higher, but this does not necessarily mean that print articles focus more on private lives. It can also mean that print journalists quote more human sources (e.g. spokespersons), while online journalists tend to quote power elites instead of people from within these elites. This is a faster way of working because no extra research needs to be done: power elites have access to the media, for example by sending press releases, but individual people do not. This pattern of source usage explains why elite news coverage is so dense in online news, while personalization is rather low. Print journalists tend to make more effort in approaching several (human) sources when doing research, while online journalists prefer to mention only the name of a power elite, for example an organization, instead of doing further research (Witschge & Nygren, 2009). The fact that popular news scores higher on personalization than elite news may be due to the fact that, independent from source usage, focus on people is indeed higher in popular news (Ornebring & Jönsson, 2004). Hypothesis 1B shows that especially online, this is true: while both popular and elite online journalists may use less human sources than their print

colleagues, popular online journalists still write quite some personal stories, while their elite colleagues do not do this.

This findings on source usage match the finding that online, journalists are more likely to publish follow-up articles. Follow-up articles are easier to produce then other articles, because instead of finding a completely new idea, journalists can work on one story for which they only have to look for updates. Given the small staff size and the high workload, this is a very easy way to produce many articles, even under time pressure. In line with findings of,

(29)

amongst others, Cassidy (2006), Bivens (2008) and Carpenter, Boehmer and Fico (2015), online journalists always want to publish news as soon as possible and therefore rather publish a second article with new findings than wait until they had time for further research. Thus, apart from the workload, the role conception of online journalists also has a key role in this process.

When it comes to commercialization, online, more entertainment news was found to be published (van der Wurff et al., 2008; Bird & Dardenne, 2009; Lehman-Wilzig & Seletzky, 2010; Maier, 2010). Next to being easy to write, they also offer some light entertainment to the reader and therefore may attract many readers. In future research, it may be interesting to account for the number of readers for a given article in order to further investigate to what extent journalists do indeed consider website metrics at this point.

Other than expected, the findings on celebrity news were different: print news was found to contain more celebrity news than online news. Yet, as already discussed, the measurement instrument seems to be corrupted. In future research, the DB-pedia approach should be changed in such way that the results are more clearly specified.

Overall, a strong argument has been made for the fact that there are strong

differences between online and print news in terms of coverage, and that these differences are caused by differences in the gatekeeping process. These differences are related to the characteristics of online and print journalists in general, but also by the different

characteristics of the two news environments. This study has shed important insights into the consequences of these differences for news content and is the first study to investigate them on a large scale, while at the same time applying so far unused techniques. While this study provides a rather general overview and can be seen as a first step in the quantitative

analysis of news values in Dutch newspapers, further (qualitative) research is necessary. For future research, it may be fruitful to also investigate the actual amount of readers in order to shed further light on the use of website metrics. Given the importance of

journalism for democracy, this study should serve as a first wakeup call: there are important differences between online and print journalism, not all of which can be ignored. While online

(30)

news only partially goes into the same direction as popular news, the traces of

commercialization, the workload and different role conceptions are clearly found in such way that online news is produced with higher speed, resulting in news that seems to be made on the run and lacks reflection. While no ‘quick fix’ for these differences and the arising

problems can be found, it is important to know about these trends in order to be able to react upon them. As these developments are still going on, differences may become more extreme in the future, and whether it is good or bad, the position of journalism is changing.

References

Agarwal, S. D., & Barthel, M. L. (2015). The friendly barbarians: Professional norms and work routines of online journalists in the united states. Journalism, 16(3), 376-391. doi:10.1177/1464884913511565

Anderson, C. (2011). Between creative and quantified audiences: Web metrics and changing patterns of newswork in local US newsrooms. Journalism, 12(5), 550-566.

doi:10.1177/1464884911402451

Arant, M. D., & Anderson, J. Q. (2001). Newspaper online editors support traditional standards. Newspaper Research Journal, 22(4), p. 57.

Beam, R. A., Weaver, D. H., & Brownlee, B. J. (2009). Changes in professionalism of U.S. journalists in the turbulent twenty-first century. Journalism & Mass Communication Quarterly, 86(2), pp. 277-298. doi:10.1177/107769900908600202

Bird, S. E., & Dardenne, R. W. (2009). Rethinking news and myth as storytelling. In K. Wahl-Jorgensen & T. Hanitzsch (Eds.). The handbook of media studies (pp. 205-217). New York: Routledge.

(31)

Bird, S., Klein, E., & Loper, E. (2009). Natural language processing with Python: Analyzing

text with the Natural Language Toolkit. Sebastopol: O’Reilly Media.

Bivens, R. K. (2008). The internet, mobile phones and blogging: how new media are transforming traditional journalism. Journalism Practice, 2(1), pp. 113-129.

doi:10.1080/17512780701768568

Boczkowski, P. J. (2009). Rethinking hard and soft news production: from common ground to divergent paths. Journal of Communication, 59(1), pp. 98-116. doi:

10.1111/j.1460-2466.2008.01406.x

Boumans, J. W., & Trilling, D. (2015). Taking stock out of the toolkit. Digital Journalism, 4(1), pp. 8-23. doi:10.1080/21670811.2015.1096598

Carpenter, S., Boehmer, J., & Fico, F. (2015). The measurement of journalistic role enactments: a study of organizational constraints and support in for-profit and nonprofit journalism. Journalism and Mass Communication Quarterly. doi:

10.1177/1077699015607335

Cassidy, W. P. (2005). Variations on a theme: The professional role conceptions of print and online newspaper journalists.(author abstract). Journalism & Mass Communication Quarterly,

82(2), pp. 264-280. doi:10.1177/107769900508200203

Cassidy, W. P. (2006). Gatekeeping similar for online, print journalists. Newspaper Research

Journal, 27(2), pp. 6-23.

Crossmediaal nieuws lezen is de norm geworden (2016, January 18). Retrieved from http://www.ndpnieuwsmedia.nl

(32)

Deuze, M., & Dimoudi, C. (2002). Online journalists in the Netherlands: towards a profile of a new profession. Journalism, 3(1), pp. 85-100. doi:10.1177/146488490200300103

Donsbach, W. (2004). Psychology of news decisions: Factors behind journalists’ professional behavior. Journalism, 5(2), 131-157. doi:10.1177/146488490452002

Entman, D. (2005). The nature and sources of news. In G. Overholser & K.H. Jamieson (Eds.). The institutions of American democracy: the press (pp. 48-65). New York: Oxford University Press

Galtung, J., & Ruge, M. (1965). The structure of foreign news. The presentation of the Congo, Cuba and Cyprus crises in four Norwegian newspapers. Journal of Peace Research,

2(1), 64-90. doi:10.1177/002234336500200104

García Avilés, J. A., Meier, K., Kaltenbrunner, A., Carvajal, M., & Kraus, D. (2009). Newsroom integration in Austria, Spain and Germany. Journalism Practice, 3(3), pp. 285-303. doi:10.1080/17512780902798638

Harcup, T., & O'Neill, D. (2001). What is news? Galtung and Ruge revisited. Journalism

Studies, 2(2), 261-280. doi:10.1080/14616700118449

Harris, Z. S. (1954). Distributional structure. Word, 10(2-3), pp. 146-162. doi:10.1080/00437956.1954.11659520

Jacobi, C., Kleinen-Von Königslöw, K., & Ruigrok, N. (2015). Political news in online and print newspapers: Are online editions better by electoral democratic standards? Digital

(33)

Johnson, S. (2007). ‘They just make sense’: tabloid newspapers as an alternative public sphere. In R. Butsch (Ed.). Media and Public Spheres (pp. 83-95). London: Palgrave Macmillian UK

Karlsson, M., & Clerwall, C. (2013). Negotiating professional news judgment and “ clicks”.

Nordicom Review, 34(2), pp. 65-76. doi:10.2478/nor-2013-0054

Lee, A., Lewis, S., & Powers, M. (2014). Audience clicks and news placement: A study of time-lagged influence in online journalism. Communication Research, 41(4), pp. 505-530. doi:10.1177/0093650212467031

Lehman-Wilzig, S., & Seletzky, M. (2010). Hard news, soft news, ‘ general’ news: The necessity and utility of an intermediate classification. Journalism, 11(1), pp. 37-56. doi:10.1177/1464884909350642

Leung, D. K. K., & Lee, F. L. F. (2015). How journalists value positive news: The influence of professional beliefs, market considerations, and political attitudes. Journalism Studies, 16(2), pp. 289-304. doi:10.1080/1461670X.2013.869062

MacGregor, P. (2007). Tracking the online audience: metric data start a subtle revolution.

Journalism Studies, 8(2), pp. 280-298. doi:10.1080/14616700601148879

Maier, S. (2010). All the news fit to post? comparing news content on the web to

newspapers, television, and radio. Journalism & Mass Communication Quarterly, 87(3-4), pp. 548-562. doi:10.1177/107769901008700307

Maier, S. R. (2010). Newspapers offer more news than do major online sites. Newspaper

(34)

McManus, J.H. (2009). The commercialization of news. In K. Wahl-Jorgensen & T. Hanitzsch (Eds.). The handbook of media studies (pp. 218-233). New York: Routledge.

Mitchelstein, E., & Boczkowski, P. J. (2009). Between tradition and change. Journalism,

10(5), pp. 562-586. doi:10.1177/1464884909106533

Mitchelstein, E., & Boczkowski, P. J. (2010). Online news consumption research: An assessment of past work and an agenda for the future. New Media & Society, 12(7), pp. 1085-1102. doi:10.1177/1461444809350193

Møller-Hartley, J. (2013). The online journalist between ideals and audiences. Journalism

Practice, 7(5), pp. 572-587. doi:10.1080/17512786.2012.755386

Nadeau, D., & Sekine, S. (2007). A survey of named entity recognition and classification,

Lingvisticae Investigationes, 30(1), pp. 3-26. doi: http://dx.doi.org/10.1075/li.30.1.03nad

Örnebring, H., & Jönsson, A. M. (2004). Tabloid journalism and the public sphere: A historical perspective on tabloid journalism. Journalism Studies, 5(3), pp. 283-295. doi:10.1080/1461670042000246052

Papieren oplage kranten daalt verder (2015, May 15). Retrieved from http://nos.nl/

Paulussen, S. (2004). Online news production in flanders: How flemish online journalists perceive and explore the internet's potential. Journal of Computer‐Mediated Communication,