3bij3 – Developing a framework for researching recommender systems and their effects

(1)

Felicia Löcherbach Student number: 11359471

University of Amsterdam

Master’s Thesis

Graduate School of Communication

Research Master’s programme Communication Science Supervised by Dr. Damian Trilling

(2)

Abstract

The online news environment of today is increasingly developing from one-size-fits-all approaches towards personalized news selections for each user, relying on algorithmic solutions for extracting relevant articles and composing an individual’s news diet. However, the impact of such recommendation algorithms on how we consume and perceive news is still understudied – despite having important consequences for journalism practitioners as well as theoretical developments in the field of (political) communication research. To shed more light on this, this article develops one of the first software solutions for such studies, and presents a framework called 3bij3. 3bij3 means three by three in Dutch, and signifies the most prominent feature of a Python-based news application that forms the interface for study participants. It displays a 3x3 grid of the nine most relevant news articles to a user, selected by different mechanisms (e.g., explicit customization vs based on past behavior vs random). 3bij3 can be used to conduct large-scale field experiments, in which participants’ use of the site can be tracked over extended periods of time. Compared to previous work, 3bij3 gives researchers control over the recommendation system under study and creates a realistic environment for the participants. It integrates web scraping and parsing, supervised machine learning for assigning topic labels to articles, different recommender systems, a web interface for

participants, gamification elements (earning points for interaction), and a user survey to enrich the behavioral measures obtained. Due to the usage of different databases (ElasticSearch, MySQL), content and user data (incl. clickstreams) can be combined in one overarching framework. A prototype of the application was tested on a small set of participants, revealing that the system is working as proposed and is applicable to a broad variety of devices, browsers, and operating systems – showing a promising approach to be extended in future studies.

(3)

3bij3 – Developing a framework for researching recommender systems and their effects Introduction

News usage online has undergone considerable changes over the past few years: Increasingly, the selection and presentation of news gets adapted to each user individually (Thurman & Schifferes, 2012). The key element behind such processes are recommender systems, algorithms that decide which articles are displayed to whom based on criteria such as past behavior and ratings of similar users. While those systems already form an integral part of online newspapers and especially social network sites, their impact on how we consume and perceive news is still understudied. Better understanding recommender systems is imperative for practitioners and academia: Journalists and editors need more insights into how

algorithmic recommenders can complement and guide news selection to better fit their audiences’ wishes while still maintaining the vital functions of journalism for democracy via editorial selections (Bhaskar, 2016; Pariser, 2011a; Schlesinger & Doyle, 2015). For political communication research, a better understanding of how recommender systems affect processes of selective exposure and impact political attitudes and knowledge is needed. Studying how these algorithms change our news environment and what consequences this entails is crucial to take established theories and models to the 21st century.

Therefore, this article sets out to develop one of the first research designs to tackle the issue of studying recommender systems in the context of news and political communication. In this paper, I present 3bij3, jokingly dubbed 3rd best tool for the inquiry of journalistic news consumption. More literally, 3bij3 means three by three in Dutch, and signifies the most prominent feature of a news application developed for this purpose: It displays a 3x3 grid of the nine most relevant news articles to a user. It represents a new framework for investigating different news recommenders and their impact on news usage and selection. The design derives from the necessity to use techniques from computational sciences to inform research of communicative phenomena, merging methodological innovation with theoretical approaches of political communication. It allows for implementing different selection mechanisms of news from various sources on-the-fly while tracking user behavior. The resulting digital traces are enriched with information about the user through ratings and surveys. By that, the main

(4)

contribution of this article is to offer a solution to particular challenges related to the study of recommender systems and their impact, answering the question:

How can news recommender systems and their effects be adequately researched?

This framework will be exemplary applied to a Dutch context, as the Netherlands can be seen as one of the forerunners of online news consumption, with online channels as most used source of news (Newman, Levy, & Nielsen, 2017) and services such as the news aggregator Blendle making recommendation algorithms popular (Klöpping, 2016).

Two main theoretical aspects will be discussed: So far, discussions about how algorithm-based selection affects the diversity of the news diet has mostly been viewed negatively, as limiting broadness of viewpoints and topics. However, recent studies challenge this conception by showing that, especially compared to other selection processes (e.g. editorial), algorithms might not lead to more narrowed media diets after all (Möller, Trilling, Helberger, & van Es, 2018; Nguyen, Hui, Harper, Terveen, & Konstan, 2014). Shedding more light on this discussion is one of the aims of the developed framework. Furthermore, it is of increasing importance to understand how a tailored news diet is perceived from the user’s perspective. Seeing the news consumer as a self-determined and competent actor requires taking into account to which extent they should take an active part in the news selection process (Bozdag, 2013).

In the following theoretical section, important definitions are clarified and the role of diversity and the user’s perspective in news recommendations are discussed before giving an overview about past lines of research on recommenders and their methodological approaches.

Theoretical Background and Related Research

Personalization, Recommendation, Customization – Interwoven Concepts

Looking into past research on recommendation systems, it becomes evident that the field is scattered across different disciplines, making an overarching framework for researching recommenders challenging due to complementary use of terms. For example, personalization is sometimes referred to as algorithmic process (Bozdag, 2013) while others see its main aspect in user-based decisions (Dylko, 2016). Furthermore, the labeling of various concepts is

(5)

inconsistent, blurring lines between theoretically different phenomena. Thus, it is of importance to disentangle those concepts to allow for a shared understanding.

The general process of adapting content to fit each individual user is termed

personalization. One important aspect of personalization concerns the question who (or what) performs the selection of items. In the following, two different types of personalization are described, judging from the degree to which the user plays an active role in the process: mostly user-based versus predominantly algorithm-based selection. In reality, these types cannot be seen as entirely distinct and are often used complementarily. It is, for example, possible that algorithm-based recommenders are continuously adapted through explicit user feedback in the form of ratings. Additionally, other forms of pre-selection and personalization exist, such as based on editorial decisions in newspapers (e.g. ideological stance of a

publication, Shoemaker, Eichholz, Kim, & Wrigley, 2001).

Thus far, personalization highly involving the user versus selection processes rather relying on algorithms have been termed explicit personalization and implicit personalization (Bozdag, 2013; Haim, Graefe, & Brosius, 2018; Thurman & Schifferes, 2012), self-selected personalizationand pre-selected personalization (Zuiderveen Borgesius et al., 2016), static personalizationand dynamic personalization (Li Kwang Wee, Cheong Tong, & Jung Chng, 1997) or system-driven versus user-driven personalization (Dylko et al., 2017). Another option of expressing these differences is by using the term customization (Beam, 2014; Bright & Daugherty, 2012; Dylko, 2016; Kang & Sundar, 2016; Pazzani & Billsus, 2007) for the “degree to which a user explicitly interacts in the personalization process” (Beam, 2014, p. 1022) in contrast to using the term recommendation or recommender system for describing low explicit user involvement in the selection process. As Jonnalagedda, Gauch, Labille, and Alfarhood (2016) put it: “Recommender systems proactively [emphasis added] present users with information related to their interests rather than requiring the user to search for (...) information based on explicit queries” (p. 1). Thus, in the following, personalization will be referred to as higher-order concept to customization and recommendation. The former

requires active involvement of the user, compared to only little or no explicit user involvement with more influence of algorithmic selection for the latter.

(6)

The main focus of this paper is aimed at recommender systems and how they impact the news environment. One of the dimensions they presumably have the largest effects on is the diversity of the news diet (Helberger, Karppinen, & D’Acunto, 2018; Pariser, 2011b) – but in order to understand these effects it is important to clarify what diversity is and why it should matter when researching news consumption.

The Importance of Diversity

Diversity can be conceptualized regarding different dimensions. This article mostly focuses on the aspect of topical variety: “This so-called content diversity oftentimes related to the mere appearance of topics in their most basic form, such as ‘public affairs’ or ‘base-ball’ ” (Haim et al., 2018, p. 2). A selection is thus seen as more diverse when referring to multiple news genres – for example related to politics or entertainment. Other possible dimensions concern the number of different (political) perspectives and plurality in tone or style (Helberger et al., 2018). Two different viewpoints indicate the importance of (topical) diversity in the context of recommenders: preserving quality and diversity of democratic discourse and enhancing the performance of the recommender itself.

The importance of a diverse media diet for democracy depends on which position is taken, such as a deliberative, adversarial and individual autonomy perspective (Helberger et al., 2018): The first stresses that the quality of public debate relies on including the whole range of opinions and topics in the discussion while the second focuses on diversity as an important factor in bringing less popular topics on the agenda. Regarding such societal functions of news, the terms Filter Bubbles (Pariser, 2011b) and Echo Chambers (Garrett, 2009) made their way into academic and public awareness, with Forbes calling 2017 the “Year of the Filter Bubble” (Leetaru, 2017). They describe the assumption that algorithmic

recommenders present users only with content that aligns with their own and their social networks’ world view, hindering societal discussions and leading to less diverse viewpoints and polarization. However, the existence as well as effects of filter bubbles are still up to debate (Zuiderveen Borgesius et al., 2016) and the possible impact of recommenders on diversity might not be as one-sided or large as expected (Flaxman, Goel, & Rao, 2016; Haim

(7)

et al., 2018; Möller et al., 2018; Nguyen et al., 2014).

Lastly, the individual autonomy perspective values diverse exposure “simply because it extends individual choice and affords individuals more opportunities to realize their interests” (Helberger et al., 2018, p. 194). This liberal understanding of democracy values that the individual has her own right to autonomously decide what news to consume out of a broad range of options. This conception might imply having the users make every decision independently and with all options given to enhance their autonomy, ultimately benefiting diversity. However, two aspects have to be considered here: information overload and selective exposure. The former describes that filters need to reduce the complexity of all the

information available to not overwhelm the individual (Haim et al., 2018). Reducing the information overload or “choice overload” (Knijnenburg, Willemsen, Gantner, Soncu, & Newell, 2012) is seen as one of the most important features of recommender systems (Bozdag, 2013; Díaz, García, & Gervás, 2008; Konstan & Riedl, 2012). Thus, when not being given the option to have a limited set of items to choose from, the users’ ability to select the desired and best-fitting content for themselves is also limited. The latter states, based on the theory of cognitive dissonance (Festinger, 1957), that diversity of topics and viewpoints can also be limited by the users themselves (un)consciously due to the tendency to select

attitude-congruent, familiar content rather than challenging views. Simply providing the highest choice diversity (all news) might thus not enhance the exposure diversity (what is chosen). Lastly, it can be argued that “people might be interested in things that they did not know they were interested in” (Bozdag, 2013, p. 217) – thus having an algorithm present items from different topics could in some cases provide the user with more opportunities to explore their own interests than having them select the same content over and over again.

With regard to evaluating the performance of an algorithm, especially during the early years of news recommender development the main focus was put on measures of precision and accuracy (see for example Bomhardt and Gaul (2005), Bellogin, Castells, and Cantador (2011) or Bogers and Van den Bosch (2007)) – extracting items that are as close as possible to the user’s profile. However, it has been shown that this measure is not enough to judge the quality of a recommender accurately as the danger of presenting the user with too similar items is

(8)

inherent to this approach. The sole purpose of recommender systems lies not only in retrieving the best matching results but also showing a variety, including serendipitous items that

surprise the users or allow them to be exposed to something unexpected (Kotkov, Wang, & Veijalainen, 2016). Therefore, it has become common to also include an element of diversity in recommending algorithms – while still providing accuracy and relevant items to the user (Bozdag, 2013; Bridge & Kelly, 2006).

However, in how far the user actually accepts different recommenders and is affected by them might be susceptible to their attitudes and personality. Thus, to fully understand the impact of recommenders, these individual differences need to be taken into account.

The User’s Perspective

Several factors related to the individual user have been shown to influence the usage and experience of news personalization. Users vary in their openness towards personalization and specifically recommender systems. In some instances, as Dylko (2016) suggests, “users might try to sabotage operation of a website that utilizes system-driven customizability (e.g.,

purposefully ignore precustomized content provided to them)” (p. 398) once they are aware of an algorithm. General attitudes towards personalization (whether it is beneficial for the society as a whole as well as useful for the individual) play a role (Newman et al., 2017) and concerns regarding privacy further could affect the openness of users to engage with recommendation systems. There is extensive media attention for privacy and the protection of personal data in the Netherlands (Custers, Dechesne, Georgieva, & van der Hof, 2017). Moreover, concerns about companies using private information to personalize content are present for many Dutch citizens (European Commission, 2015). Thus, the topic of privacy protection and data use for personalization can be seen as an important issue to consider.

Apart from this, personality traits have been shown to play an important role for preferences of recommender systems (Nguyen, Maxwell Harper, Terveen, & Konstan, 2017). In particular, the preference for being in control of a situation, also called desirability of control (Burger & Cooper, 1979), can be seen as a factor influencing the extent to which the user prefers customization (being in charge of content selection) over recommendation (having

(9)

less say in the selection of news from which to choose) (Bright & Daugherty, 2012; Sundar & Nass, 2001). Furthermore, the extent to which an individual has an inclination of feeling overwhelmed by information might enhance the susceptibility to have someone (i.e. editor) or something (i.e. algorithm) do the selection and thereby reduce the feeling of overload.

Researching Recommendation Algorithms

In order to research personalization and its impact on news selection, several strategies have been used. Three overarching groups are summarized in the following. By that, their strengths and limitations are uncovered, mapping the aspects that need to be addressed by the framework developed in this paper.

Experimental studies. One line of research, rather located in political and

communication science, are classical experimental studies. They are aimed at researching the impact of recommender systems on news selection without actually putting realistic

recommendation algorithms into action. Beam (2014) as well as Dylko et al. (2017) and Dylko et al. (2018) rely on similar strategies: Participants are asked about their political ideology (e.g. liberal or conservative), which forms the base for recommendations supposed to

“replicate the process used by the real-world Web sites”(Dylko et al., 2018, p.23) by increasing the amount of ideology-congruent articles on a specific topic on the website while decreasing dissonant articles on the same topic. This research is mostly oriented towards studying selective exposure in a narrow political sense (e.g. pro- vs counter-attitudinal articles). Another related set of studies such as Yang (2016), Messing and Westwood (2014) and Knobloch-Westerwick, Sharma, Hansen, and Alter (2005) is more focused on the impact of displaying recommendation features (such as ‘most-viewed’ tags) on selective exposure.

Indeed, with regards to internal validity and controlling for very specific factors, these studies show a promising approach. Insights into aspects such as the influence of visible cues on the users’ selection is important for researching the effects of recommender systems. However, they mostly lack external and ecological validity: Participants are exposed to stories all dealing with the same topic in slightly changed versions – a situation which an individual consuming news online will most likely never encounter. Having a choice between two

(10)

topically identical but politically detrimental articles at the same time on one platform can be seen as rather unrealistic. Furthermore, recommendations are usually only presented once to the user and based on rather isolated, abstract factors (such as ideology). This limits the studies’ actual value for researching the impact of recommendation algorithms as

“performance of most recommender systems evolves over time” (Ricci, Rokach, & Shapira, 2011, p.343), and an increasing familiarity of the users with the system also changes how they interact with it Ricci et al. (2011).

In addition, the article content is mostly related to fictive scenarios and events, not taking into account that news consumption is inherently linked to getting up-to-date, relevant news (Garcin & Faltings, 2013; Sood & Kaur, 2014b). In contrast, looking at other domains where recommendation algorithms are often used and researched, such as entertainment goods (books, movies) or experience goods (restaurants, events), the aspect of timeliness is less of importance – while an old book might still be interesting, the news of yesterday is not (Sood & Kaur, 2014a). Moreover, such setups are inherently tied to a system with a limited amount of parties and clear positions (i.e. the US context) while only to some extent being applicable in multi-party systems with less distinct political stances as they are found in the European context.

Thus, to study and measure the real impact news recommenders have on the individual, it is of importance to use methods that more closely capture the experience users have when selecting news online. Crucial aspects to consider are the use of actual, real-time news while still controlling for possible confounding factors influencing the news selection and that users are exposed to recommendations multiple times to capture effects over time. Furthermore, other dimensions apart from political stance have to be taken into account to go beyond the American context.

Input-output analyses. A second line of studies is aimed at uncovering the workings and effects of existing recommendation algorithms such as YouTube (O’Callaghan, Greene, Conway, Carthy, & Cunningham, 2015) or Google News (Haim et al., 2018) by querying their API for recommended items or performing input-output analyses. These studies are of

(11)

have to treat the actual algorithm as ‘black box’ without full understanding of its workings. Thus, once the algorithm of the company is changed, their value to explain the

real-world impact of recommender systems diminishes. It is therefore necessary to develop a system that allows for having control over the actual algorithms selecting the content so that the effects of possible changes in their workings can be studied.

Simulation studies. Lastly, studies situated rather in informational and computational science focus on evaluating the functioning and outcomes of recommending algorithms (Bridge & Kelly, 2006; Karakaya & Aytekin, 2017; Möller et al., 2018; Sood & Kaur, 2014b). To reach this goal, the performance of different recommenders on predicting ratings of existing or simulated datasets (such as the MovieLens dataset) is evaluated by using measures of accuracy, diversity and novelty. While being a good approach for judging the performance of algorithms regarding certain predefined measures, such studies remain in an “evaluation setting where recommendation approaches are compared without user interaction” (Ribeiro et al., 2014, p. 9).

However, user experience and standardized measures cannot be seen as equivalent – the algorithm can measure a high diversity or accuracy while this is not or to a very different degree perceived by the user (Knijnenburg et al., 2012; Willemsen, Graus, & Knijnenburg, 2016). Evaluating performance without taking the user into account is thus insufficient to research actual effects of recommenders (Garcin & Faltings, 2013). In addition, as mentioned above, choice diversity and exposure diversity refer to different concepts: Only because a recommender provides the user with more diverse options, it does not necessarily imply that more diverse content is actually chosen (e.g. due to selective exposure and personality factors). Lastly, feedback and reactions of users are non-legible to accurately evaluate recommender systems.

To sum up, these different approaches show varying strengths and limitations, each offering important insights. Thus, a useful approach can be seen in merging aspects from the domains of computational and social sciences to address existing limitations.

(12)

Towards studying recommender systems from a computational communication science viewpoint.

The usage of computational methods combined with theories of (political) communication science offers considerable advantages for addressing challenges in researching recommender systems. Computational methods facilitate the collection, processing and enrichment of large-scale content data as well as behavioral data (i.e.

clickstreams), which in combination can be used for insightful analyses. However, at the same time methodological challenges arise: Ethical questions of data collection, validity and

reliability of data and representativeness of findings are some of the most pressing issues (Boyd & Crawford, 2012; van Atteveldt & Peng, 2018) – calling for bringing these methods to use in clearly as research identified settings (i.e. with explicit consent of users). Furthermore, acknowledging the gap between behavioral data and user experience (Knijnenburg et al., 2012) makes it important to combine different data sources (including surveys and experiments) with behavioral data to get more precise insights into social phenomena (Shah, Cappella, &

Neuman, 2015). Another crucial aspect to keep in mind is that analyses should be embedded in context and applied with profound theoretical knowledge to gather new insights and make substantial use of any kind of data (Boyd & Crawford, 2012; Kitchin, 2014). Thus, the

framework proposed in this paper makes use of the advantages computational methods provide while still keeping in mind that the design of studies conducted within it need to be grounded in theoretical foundations to achieve more insights into news selection processes and their effects. It is not primarily aimed at making recommenders more efficient, improving website designs or user ratings – leaving that to professionals – but rather at giving an outlet for addressing a range of important questions in a valid way suitable for the present time, getting closer to modeling the news consumption environment of today.

A Methodological Agenda

Based on the insights gained from taking a closer look at theoretical background and related research on recommender systems, I will develop a web application in the following methodological section, addressing the question:

(13)

How can we develop a method in order to test the perception and impact of different recommender systems?

The main features of the proposed application, derived from the shortcomings identified in past research, include (1) The creation of an environment for the controlled testing of different types of algorithmic news recommenders, allowing for performance testing as well as user interaction (2) Using real-time, actual news to test the recommenders, enhancing

ecological validity and modeling user experience while still in an experimental setting (3) Inclusion of different elements of personalization – recommendation as well as customization – to enable testing their impact on user experience and diversity measures and (4) Enrich behavioral data with information about the user and their feedback to more closely capture the user’s perspective.

In a subsequent evaluation based on a small case study, answers to the following subquestions will be provided to use the experiences gained for further developing and implementing the framework:

• Which soft- and hardware requirements have to be fulfilled to implement the application successfully (over a longer period of time)?

• How is the application used and evaluated by actual users?

• How (topically) diverse is the content shown and selected in the application compared to the articles retrieved overall?

Method

The main strategy for addressing the questions above is to build a news web application with different selection mechanisms. In the past, various attempts have been made at building websites for testing the impact of different forms of personalization, with the earliest working systems more than two decades ago (Li Kwang Wee et al., 1997). Since then, frameworks such as PNS (Personalized News Service, Paliouras, Mouzakidis, Moustakas, and Skourlas (2008)), PEN (Personalized NewsGarcin and Faltings (2013) or the NZZ Companion (Leuener, 2017) have been developed to test real-time recommendation systems. While PNS was one of the

(14)

first academic applications of using RSS feeds for gathering content, the last two systems were implemented in cooperation with specific websites, aiming at improving their traffic and revenue. Important insights can be gained from such frameworks (and many more not mentioned here). However, the main purpose of this paper is not to necessarily improve the recommenders and the application to draw as much traffic as possible but rather to research their effects on the user and their selections and take into account important content-wise dimensions (such as diversity).

The following sections are aimed at explaining the structure of the web application and its different parts in detail. It was developed within the Flask microframework (Grinberg, 2014; Ronacher, Lord, Mönnich, & Unterwaditzer, 2010), which allows for building the whole application in the Python programming language. It is divided into three parts: Content retrieval, processing, and enrichment, mostly taking place outside of the actual application, Recommendation and customization, describing the different mechanisms via which articles are selected, and Flow of user interaction, elaborating on the intended usage and functions of the application and the final questionnaire. The overall structure with all relevant elements is depicted in Figure 1. The code of the application can be found publicly accessible on GitHub1. It is deployed on a remote Linux server2, hosting the application and the models necessary for text enrichment and comparisons as well as an ElasticSearch and MySQL database.

Content Retrieval, Processing, and Enrichment

One of the most important issues to tackle when researching news personalization is the timely component of the content. Following the method outlined in Trilling (2014), this can be achieved by querying the RSS feeds of newspapers, thus retrieving the URLs of the newest published articles as well as the content provided by the feed. A Python program was written to query the feeds of pre-defined sources every thirty minutes, keeping the selection

up-to-date. In a next step, scrapers written for the INCA project at the University of Amsterdam (Trilling et al., 2018) 3 are used to extract the whole content of the articles, including title, teaser, text, and pictures, by parsing the HTML following the retrieved links.

1https://github.com/FeLoe/3bij3

2The application can still be reached via www.3bij3.nl, using the account ‘tester3bij3’ (username and password) 3https://github.com/uvacw/inca

(15)

(16)

The resulting data is saved in ElasticSearch, a NoSQL database best suited to store documents, also due to its high flexibility and scalability (Günther, Trilling, & Van de Velde, 2018). It uses a key-value structure that enables storing additional information on a document. After this, several processing steps take place to structure the raw text, preparing it for subsequent algorithms. Thus, the pattern library is used for part-of-speech tagging (Smedt & Daelemans, 2012), which allows for only keeping word groups most likely to be substantially relevant for the text (adverbs, adjectives, nouns). The processed text is saved as a separate key in the ElasticSearch database. Furthermore, each article gets enriched by assigning a topic to it, indicating its news section (e.g. sports, entertainment).

To create the topic tag, the most convenient way would be to retrieve the categorization made by the newspaper itself. However, this approach is prone to problems as every newspaper uses different categorizations and labels (e.g. culture vs entertainment) and as the categories within each newspaper tend to change for specific events (e.g. changing sports to Olympics) and new tags appear every day. Using a static system of assigning topical tags is thus not feasible. The process has to be dynamic and adapt to new content in an automated way, giving the need to (un)supervised machine learning. It was chosen to use supervised machine learning by applying a passive-aggressive classifier trained on data collected during the NWO-VENI project “The contingency of media’s impact on national parliaments: a comparative study” 4, following the exact steps outlined in Burscher, Vliegenthart, and De Vreese (2015) to train the classifier. While the original dataset had 31 different issue categories, these were grouped into nine overarching categories for the application, derived from typical newspaper categories (domestic news, foreign news, economy, entertainment, crime, science, environment,

immigration, sports). The trained classifier showed acceptable recall and accuracy (average F1 score of .68 with none of the categories below .55; F1 score is equal to the harmonic mean of recall and precision), indicating sufficient performance for automated coding of article topics. This procedure is only possible due to having access to this large annotated dataset – stressing the need for open data resources, making the training of such classifiers for research projects feasible. Other options to approach the issue include unsupervised topic modeling via Latent

4A project description can be found here: https://www.nwo.nl/onderzoek-en-resultaten/onderzoeksprojecten/i/26/5226.htm

(17)

Dirichlet Allocation (LDA) (Blei, Ng, & Jordan, 2003). The trained topic classifier is used to assign a topic key to each retrieved article in the database, after processing the text as

described in Burscher et al. (2015) and transforming it with a tf·idf vectorizer.

Recommendation and Customization

General types of recommenders. In general, recommenders can be divided into three types: content-based (sometimes also referred to as semantic filtering, Möller et al., 2018), collaborative, and hybrid (Bridge & Kelly, 2006; Knijnenburg et al., 2012). Most

recommendation systems rely on building a user profile based on explicit (e.g. ratings) or implicit (e.g. clicking behavior) feedback – except for those that only offer recommendations on an item-to-item basis without taking the user’s history into account (Möller et al., 2018).

Content-based systems “recommend an item to a user based upon a description of the item and a profile of the user’s interests.” (Pazzani & Billsus, 2007, p. 339). They rely on identifying attributes of an element to judge how well it fits a user’s profile. The items are annotated with specific features such as the topic and author of the article, or more formal categories such as length, making a comparison between item and user profile possible.

In contrast, collaborative recommenders use the profiles of similar users to infer which items fit a particular user. They “automate the process of ‘word-of-mouth’ recommendations: items are recommended to a user based upon values assigned by other people with similar taste.” (Bozdag, 2013). These kind of recommenders are only of limited use for this paper due to the cold start problem of having no ratings available at the beginning of the testing phase (Ricci et al., 2011). Furthermore, the proposed application will only be tested on a small sample of users, making the implementation of collaborative recommenders not feasible (Paliouras et al., 2008).

Thus, the recommenders used in this specific test instance of the framework are limited to content-based recommenders – but could be extended to other kinds of algorithms with collaborative elements when given a larger sample size and longer testing period. After all, the most common form of recommenders today are hybrid recommenders, which combine

(18)

demographics, communities or editorial selections – for example to solve new-item problems in collaborative recommenders by integrating a content-based element (Ricci et al., 2011).

The different groups. Currently, the application includes four different groups of selection mechanisms for articles: one random group, one group with customization and two recommendation algorithms. These allow for comparing different types of personalization. They are all used to select nine stories out of the most recent articles in the ElasticSearch database (the last 20 published by each newspaper). The amount of nine articles reduces the choice for the user to a manageable amount and allows for presenting all options on one screen – an important factor to consider due to selection and positioning effects (Teppan & Zanker, 2015). For the first group, a random sample is retrieved, making this a baseline or control group. In the other conditions, three articles are retrieved randomly while the other six are selected based on specific rules. This procedure allows for having always an element of serendipity and randomness in the selection, avoiding trapping the user in an impermeable bubble, only showing content and topics that are very similar to the past selections.

Customization. The customization condition by default also randomly displays all articles – until the user actively intervenes. This can be done by selecting between one and three topical categories in the side-menu that subsequently appear more often. When selecting one topic, up to six articles having the same tag are displayed to the user (depending on the amount of articles with that topical tag being left in the database). When selecting two, three articles each of these topics are displayed, and two per category if three are selected. These topic settings remain in place until the user changes them again, giving full control about the weight topics are assigned in the selection and how often they appear.

Topic-based recommender. The first recommender condition is based on a similar principle, however without explicit user action but rather using implicit measures (i.e. past selections) to infer preferences for topics. In a first step, the topics of all articles the user has selected in the past are retrieved. Next, three topics are randomly selected from this list, giving frequently appearing topics a higher chance of getting into the final selection. For each of the three topics, two articles are shown. Thus, in case a user only selected articles from one topic in the past they get shown six articles of that topic, with more diverse past preferences the

(19)

selection also gets more diverse. Apart from specifically selecting certain topics to ‘trick’ the algorithm in case its workings are figured out by the user, no active control over the selection of stories is given.

Similarity-based recommender. The third group deviates from the usage of the topic variable in determining the most relevant articles for the user. Instead, it uses word-vectors for determining similarity between documents. The general procedure has been applied to

recommender systems in the form of pairwise cosine similarity in various studies. For this, each document is represented in terms of the Vector Space Model (VSM) as a vector of term weights, and the similarity between two documents is estimated by taking the cosine of the angle between the vectors. This procedure was first explored more than four decades ago (Salton, 1971) and remains still popular for calculating document similarities for

recommenders.

However, it suffers from limitations with regard to similarity detection between

documents: The VSM features are considered to be independent – thus two words are seen as entirely different. While this might be sufficient for other domains, for natural language processing it leads to issues: “For example, words ‘play’ and ‘game’ are of course different words and thus should be mapped to different dimensions in SVM; yet it is obvious that they are related” (Sidorov, Gelbukh, Gómez-Adorno, & Pinto, 2014, p. 492). Thus, Sidorov et al. (2014) introduced a new measure, termed “soft cosine measure” (p. 491) which can be used to calculate the soft similarity between documents. By including a measure of similarity of word vectors derived from a larger training corpus (si j) into the original cosine similarity formula,

the equivalency of words can more accurately be detected, leading to

so f tcosine1(a, b) = Í ÍN i j si jaibj q Í ÍN i j si jaiaj q Í ÍN i j si jaiaj where si j = sim( f i, f j).

As the soft cosine similarity uses a sparse matrix for similarity queries, it is considerably faster than other approaches in this domain (e.g. Word Movers Distance, Kusner, Sun, Kolkin, & Weinberger, 2015) while showing almost no loss in precision 5 Considering that

(20)

recommending newspaper articles in real-time on user request essentially requires calculations to be as fast as possible, the soft cosine measure is clearly preferable.

The Python library Gensim (Řehůřek & Sojka, 2010) offers an implementation of the soft cosine measure. To use it, word vector embeddings are needed, produced by a word2vec model, a technique developed by Mikolov, Corrado, Chen, and Dean (2013). The embeddings used here were trained on a dataset of all the print issues from several Dutch newspapers (Telegraaf, NRC, Volkskrant, Algemeen Dagblad) between 2000 and 2015, thus overall representing the Dutch newspaper market 6. Essentially, these embeddings show a word in relation to the context around it. The second element needed for the soft cosine measure is the corpus of documents against which similarity queries are compared – in the case of this application, these are the newest newspaper articles, whose similarity with each of the articles read by a user (saved in the MySQL database) will be calculated. Taking this corpus, a

dictionary is made which maps words and their integer ids. Subsequently, the dictionary is used as input for a tf·idf model, a transformation that gives more weight to infrequently occurring terms. This tf·idf model and the dictionary represent the input for calculating a sparse similarity matrix which in a final step serves as input for making an index for similarity queries. All the steps described above would have to be executed every time a user performs a request for getting new articles. However, this requires the user to wait 40 seconds for the computations to finish before getting new content. To solve this issue, a Python script was used to perform all the steps outlined above every 30 minutes on the newest content available in the database – outside of the application. During this process, the dictionary, article ids, and index are saved to the disk and can be retrieved on user request to perform the similarity queries.

In the application, on every request the past articles a user read are retrieved and for each of these articles, the three most similar new articles are determined. Now having a list

containing three new articles per past article, the most frequently occurring articles are

selected and subsequently presented to the user. This procedure can be seen as superior to just averaging the similarities of one article with all past articles to find the most fitting ones, as https://github.com/witiko/gensim/blob/softcossim/docs/notebooks/soft_cosine_tutorial.ipynb

6The classifier was provided by Dr. Anne Kroon from the University of Amsterdam who trained it on the contents of the INCA project

(21)

similarities could ‘cancel each other out’ – if a person for example has an interest in sports and politics, a new sports article could show high similarity to a sports article read in the past but low similarity to a politics article, leading to medium similarity on average. In the same realm, a culture article could have a medium similarity to both sports and politics, leading to the same result. However, considering that this article fits one of the interests of the user very closely while the culture article does not, averaging has only limited use as a selection mechanism in this case. Selecting the articles that appear to be the most similar to the highest amount of past articles is seen as the best solution to map best the users’ past behavior.

Flow of User Interaction

While the user is interacting with the application, their behavior is recorded at several steps along the way. After a registration including ethical consent, the user can login on the website using a username and password. Each log in is saved to a MySQL database, including information about the device being used to access the website. This allows for selecting the appropriate design for screen size by having a responsive layout ensuring the highest flexibility for the users. After this, the user is led to the main page of the application, showing nine articles either in a 3x3 format or below each other (Figure 2).

(22)

Figure 3. Display of points status and article detail page

For every article, the title and a short teaser are displayed as well as a color-marked indication of the topic category on top of the article. In a side bar, options for getting information about the project (FAQs), contacting the researcher, checking the percentage of completion of the study (Figure 3), and inviting friends to use the application are given. Additionally, the customization group gets the option to select topics from eight different categories. When selecting an article, the user is taken to a detail page where title, time of publication, teaser, picture, and text are presented (Figure 3). Below the article, a 5-star rating system is given to collect instant feedback about the article from the user. Lastly, in case the display of the article did not work as intended (e.g. due to scraping errors), the user is given the option to report the story instead of rating it. This way, low ratings due to faulty presentation (which could lead to misinterpretation) are prevented. All the different actions are recorded in the MySQL database and, adding an element of gamification, the user receives points for all actions (as displayed in Figure 1). Only if the application is used for a certain period of time (at least on 10 different days) and enough interaction has taken place (100 points), does a link

(23)

to the final questionnaire appear on the website, allowing the participant to finish the study. In the final questionnaire, basic socio-demographic characteristics (gender, age, education) are followed by questions about the general interests and personality of the user. Here, desirability of control, general news topic interest, and the tendency for information overload are investigated. In a next block, the user’s attitude towards personalization is investigated with four questions, aiming at different dimensions such as individual preference for pre-selected news and different sources of pre-selection, a societal dimension, and a privacy dimension. Lastly, the usefulness of the website and perceived diversity of news items are evaluated before a question about behavioral intentions (using the website in the future) concludes the survey. The full questionnaire and descriptives of the different measures are shown in Appendix A and B).

Evaluation

Hard- and Software Requirements

The application as described above was put to a test during a period of 8 weeks in April to June 2018, preceded by a pre-test to identify possible problems and improve the visual presentation. Taking a closer look at the soft- and hardware required to implement the

application (for Python modules see requirements file on GitHub), the main elements to run it continuously on a remote server are described in (Grinberg, 2014), including the nginx, gunicorn and supervisor packages. In addition, the two databases (ElasticSearch, MySQL) and their memory and storage usage are an important factor to be considered when taking the application to a larger context. For this particular instance of the application, the online versions of four major Dutch newspapers (Algemeen Dagblad, De Telegraaf, NRC

Handelsblad, De Volkskrant) and one online-only news website (nu.nl) were chosen as sources to ensure a broad range of topics and sufficient supply of up-to-date content. The amount of daily scraped stories per source is depicted in Figure 4, showing considerable differences between the outlets, while all follow a weekly news cycle. Nu.nl, Algemeen Dagblad and De Telegraaf issue considerably more stories via their RSS feed than NRC and Volkskrant7.

7Since beginning of May problems with the Volkskrant scraper occurred, explaining the low number of articles. Given the numbers prior to this event, it can be assumed that the actual numbers are close to NRC

(24)

Overall, between 114 and 749 stories were collected per day, summing up to around 40,000 documents after eight weeks, taking 7 GB of space on the disk. In contrast, the MySQL database took much less space (200 MB). In addition, the application itself needs around 1.6 GB. Considering all additional files such as system files, libraries and other software, 60-70 GB of disk space would be needed to ensure that the application can run for a year. Furthermore, 10 GB RAM have been proven sufficient to handle the workload. Overall, the application ran smoothly the whole testing period, with only few reported articles by the users (mainly due to minor style issues) and no further complaints, proving that the application is a working system.

Figure 4. Amount of newspaper articles collected per outlet per day

User Interaction and Evaluation

The sample was drawn from students of the University of Amsterdam who were rewarded with credits after finishing the study. In total, 58 participants started the study, and 19 finished it, including the final questionnaire. This already indicates one issue: It became apparent that most participants started using the application and then stopped after a few days. Possible reasons for this can be found in the difficulty to integrate the application in

everyday-life (i.e. remembering to visit the website on a regular basis). To counter this,

participants were offered a service of daily email reminders to check the application – however, only three participants signed up for it. Another solution could be an automated reminder

(25)

system already integrated in the application sending a warning when being inactive. However, this has to remain the user’s choice to avoid creating a negative picture of ‘spamming’. One other important aspect can be seen in raising the (monetary) reward for participants – more strongly incentivizing the user to take measures to integrate the application into daily life. Given this small sample size, comparisons between the four different groups are not feasible, thus the focus will be put on overall measures. It took users around 13 log-ins to complete the study and reading on average 76 articles (only receiving points for around 46 due to a limit of five stories per day). The participants spent 1.5 hours on reading articles until finishing, around 90 seconds per article. However, the time spent on each article differs considerably, ranging from 3 seconds to more than four minutes. This already indicates that it is of

importance for the interpretation of results to identify participants that sped through the study without actually reading the articles. The average rating of the news stories was at 3 stars, and all participants used the full range (between 1 and 5). Thus, the stories were seen mostly as acceptable. On average, participants gave around 80 ratings until finishing the study, while the invitation function was not used. This shows that measures taken for encouraging people to rate the articles (including a warning before leaving the site without rating) and following the intended usage flow were successful, resulting in having a rating and direct feedback for almost all stories, which could in the future be incorporated in more refined recommenders.

In total, 15 respondents were assigned to the customization condition, 12 to the topic-based recommender, 9 to the similarity-based recommender and 22 to the random condition. In the customization group, only six of the respondents made at least once use of the option to customize. Either the respondents were not aware of the option to customize – calling for making the category selection more visible and even explicitly explaining it – or they did not want to customize, due to the added effort or lack of interest. Nonetheless, in a larger study to actually compare the different conditions it should be ensured that all participants are fully informed about the option to customize to make sure the right interpretation can be applied.

Given the small sample size the results from the final survey can at the most be

tentatively interpreted ( descriptives in Appendix B). Participants were on average 25 years old and 58% identified as female. All completed or are currently enrolled in higher education. All

(26)

in all, they rather preferred being in control of content selection over editorial or algorithmic selection and see as most important aspect for news supply that they have access to a broad range of topics. On a societal level, all users strongly agreed that everyone needs to be informed about certain news and that all people should have the same access to news. On a more personal level, the fear to miss important information and challenging viewpoints was present for most users, while privacy concerns were seen as rather unimportant in comparison. The website itself was rated as moderately positive, with ratings between 4 and 5 (7-point scale) for aspects such as functionality. Similar ratings appeared for the intent to use a future version of the website, indicating moderate support. Considering that this application

prototype is still lacking regarding visual aspects and more sophisticated recommenders, the results can be seen as promising while showing that room for further improvement is given.

Topic Diversity in Newspapers Overall and in the Application

Looking at the distribution of topics in the different newspapers, it becomes apparent that they put the focus on different aspects (Figure 5): While in Algemeen Dagblad a clear preference for sports news can be seen, De Telegraaf rather issues news about crime and justice, and nu.nl leads regarding economic news. In NRC and Volkskrant entertainment news are most prevalent. This at first seems rather surprising, considering that these two outlets are generally speaking seen as quality broadsheets with high prestige (Burggraaff & Trilling, 2017; Vliegenthart, 2014), rather being associated with hard news (such as politics or

economics). However, when looking at the definition that Burscher et al. (2015) employed for coding the topics for the supervised machine learning classifier, it becomes apparent that it also includes cultural topics such as ‘literature’ and ‘museums’. As Burggraaff and Trilling (2017) note in an analysis of the Dutch media landscape: “In line with the vanishing

distinction between low and high culture, we observed that also our classifier picked up both, for instance, rumors about artists, and serious reviews about movies, CDs, or books. (. . .) what we measured should rather be called ‘culture and entertainment”’ (p.14). Thus, it could be considered to use a different topical tag here to not mislead the user, also including the cultural aspect. Science, environment and immigration were the least reported topics by all

(27)

outlets – while also being more detailed categories compared to for example domestic news. These overall results have to be interpreted cautiously as the automatic topic classification can not provide 100% accurate results, nonetheless this general overview shows that most topics are sufficiently present in the news with considerable differences between outlets.

Figure 5. Amount of topics per newspaper overall

Turning to what was shown and selected in the application shows that for most topics the amount of displayed news (Figure 6 and 7) is similar to the share of selected news – only for sports news a clear difference can be seen. This relates to the results from the questionnaire, with participants expressing a low interest in sports news, thus not choosing them even when frequently being exposed to them. Participants were on average exposed to four to five

different article topics per selection, showing a broad diversity of topics, in the extremes going from only having a single topic to eight different options (both occurred five times).

Entertainment and sports appeared most often as prevalent topic in selections, mirroring the overall topic distribution.

However, the most prevalent topic was overall less chosen than other topics in the selection (712 vs 1141), showing that participants did not simply choose the topic most appearing on their screen. The same is found for the selection of recommended articles: Participants chose non-recommended articles more often than recommended articles (386 vs 247, excluding selections without recommended articles). These results stress the importance

(28)

Figure 6. Topics of displayed news Figure 7. Topics of selected news

of testing systems with user interaction to better understand the actual reaction to different news environments and options to choose from.

Conclusion: A Working System With a Research Agenda

This paper set out to propose an overarching framework for studying different forms of personalization and especially recommender systems online in an ecologically and externally valid way. It addressed several challenges that became apparent from past research, especially in the context of news and political communication: So far, the communication science perspective remained rather limited with regard to including actual, realistic recommendation algorithms in experimental settings, as for example the studies by Beam (2014) and Dylko et al. (2017) showed. This issue was addressed by building a web application in which the the articles shown to the user are presented in a realistic setting with the possibility to implement various types of recommendation algorithms for testing. This makes the framework and application very flexible and usable for various questions concerning (news) recommender systems and their effects. Furthermore, the amount of content (and different cues) shown to the user can be efficiently controlled for and varied if necessary, preserving the experimental character of this research.

Going one step further, the actual latest articles from different newspapers are retrieved, processed, and enriched in an automated way to be employed in the application, due to using techniques from the computational sciences such as natural language processing, supervised

(29)

machine learning and similarity queries. By that, participants are no longer presented with mock stories far away from their normal news consumption, but with up-to-date content that actually is of interest to them. This can be seen as a crucial factor for effectively studying news recommendations (Garcin & Faltings, 2013; Sood & Kaur, 2014b).

Furthermore, being given such high control over the actual algorithms implemented in the application allows to go beyond the input-output analyses of existing websites (as done by Haim et al. (2018) or O’Callaghan et al. (2015)) – nonetheless, results from such studies could effectively serve as indicators on how to build realistic models to use in the application. Being able to fine-tune the parameters of the specific mechanisms and research their influence on selection processes as well as users’ attitudes can give more insights into their workings than only trying to uncover existing systems. This can be seen as especially important when researching more complex theoretical constructs such as diversity for which the testing and evaluation of different recommender systems and their effects can be seen as crucial aspect (Möller et al., 2018). Especially in the context of news consumption and its implications on a broader societal level, designs such as the one proposed here are imperative – as they go beyond testing effectiveness or user satisfaction (as is the case for most research into

consumption goods or improvement of website strategies) but rather try to capture the impact of algorithm design on the information environment and selection.

The field most involved in recommender systems, informational and computational science, naturally is more advanced in studying the workings and effectiveness of those selection mechanisms. However, as was shown for example in Bridge and Kelly (2006); Karakaya and Aytekin (2017) or Sood and Kaur (2014b), they often do not take the one component central to researching effects (especially in the social sciences) sufficiently into account: the user. Technology and its impact should not only be studied in isolation of the humans using it, especially as sometimes largely differing results between on- and off-line evaluation settings are found (Garcin et al., 2014). The advantage the proposed framework has over these studies is combining the algorithms with the people interacting with them more insights can be gained in how they are used, perceived, and what impact they have.

(30)

implemented with the first prototype of the application. The general system is working and the basic elements form a solid structure to build on (usable on a broad variety of devices,

browsers, and operating systems) and can in the future be applied to large-scale field experiments over extended periods of time. However, to further develop and use it for substantial analyses, several measures have to be taken, apart from more practical advices as given above: First of all, a development of more advanced recommendation algorithms modeling specific aspects such as diversity need to be implemented – a step requiring collaboration with other fields such as computational sciences (articles discussing the implementation of diversity-sensitive recommenders are for example Karakaya and Aytekin (2017), Bridge and Kelly (2006)) as well as building on a larger sample of respondents. This would allow for collaborative and hybrid recommenders and incorporating user feedback to better model the algorithms actually being used by news companies to research their effects.

Secondly, to bring the application to a larger scale, matters of copyright have to be taken into account – possible solutions involve collaborations with news media (as was done by Garcin and Faltings (2013) or Leuener (2017)) or, to prevent narrowing the content to one specific outlet, redirecting the participants to the actual source websites (such as for example Paliouras et al. (2008) did). By that, the news usage experience would also become more realistic while solving issues with scraping and parsing content from websites with ever-changing layouts – however, with the downside of partially losing control over the

experimental setting, given that participants leave the website and are exposed to other links or stories that differ between the various outlets.

Lastly, a resulting improved application would have to be used by participants over a longer period of time (several months or more) to actually capture changes in aspects such as content diversity which are assumed to take longer to come into effect Möller et al. (2018) and meaningfully evaluate recommendation algorithms. That said, the proposed framework offers a great flexibility for testing different recommenders and their effects in a realistic setting, giving the opportunity to further explore the effects of tailored news environments on the news diet and perception of individuals.

(31)

Acknowledgements

This work was carried out on the Dutch national e-infrastructure with the support of SURF Cooperation.

(32)

References

Beam, M. A. (2014). Automating the News: How Personalized News Recommender System Design Choices Impact News Reception. Communication Research, 41(8), 1019–1041. doi: 10.1177/0093650213497979

Bellogin, A., Castells, P., & Cantador, I. (2011). Precision-oriented evaluation of

recommender systems: an algorithmic comparison. In Proceedings of the fifth acm conference on recommender systems(pp. 333–336).

Bhaskar, M. (2016). In the age of the algorithm, the human gatekeeper is back. The Guardian. Retrieved from https://www.theguardian.com/technology/2016/sep/30/ age-of-algorithm-human-gatekeeper

Blei, D. M., Ng, A. Y., & Jordan, M. I. (2003). Latent dirichlet allocation. Journal of Machine Learning Research, 3(1), 993–1022.

Bogers, T., & Van den Bosch, A. (2007). Comparing and evaluating information retrieval algorithms for news recommendation. In Proceedings of the 2007 acm conference on recommender systems(pp. 141–144).

Bomhardt, C., & Gaul, W. (2005). Newsrec, a personal recommendation system for news websites. In C. Weihs & W. Gaul (Eds.), Classification—the ubiquitous challenge (pp. 394–401). Springer.

Boyd, D., & Crawford, K. (2012). Critical questions for big data: Provocations for a cultural, technological, and scholarly phenomenon. Information Communication and Society, 15(5), 662–679. doi: 10.1080/1369118X.2012.678878

Bozdag, E. (2013). Bias in algorithmic filtering and personalization. Ethics and Information Technology, 15(3), 209–227. doi: 10.1007/s10676-013-9321-6

Bridge, D., & Kelly, J. P. (2006). Ways of computing diverse collaborative recommendations. In Lecture notes in computer science (Vol. 4018 LNCS, pp. 41–50). Springer. doi: 10.1007/11768012_6

Bright, L. F., & Daugherty, T. (2012). Does customization impact advertising effectiveness? An exploratory study of consumer perceptions of advertising in customized online environments. Journal of Marketing Communications, 18(1), 19–37. doi:

(33)

10.1080/13527266.2011.620767

Burger, J. M., & Cooper, H. M. (1979). The desirability of control. Motivation and Emotion, 3(4), 381–393. doi: 10.1007/BF00994052

Burggraaff, C., & Trilling, D. (2017). Through a different gate: An automated content analysis of how online news and print news differ. Journalism, 1–18. doi:

10.1177/1464884917716699

Burscher, B., Vliegenthart, R., & De Vreese, C. H. (2015). Using Supervised Machine

Learning to Code Policy Issues. The ANNALS of the American Academy of Political and Social Science, 659(1), 122–131. doi: 10.1177/0002716215569441

Chen, G. M., Chock, T. M., Gozigian, H., Rogers, R., Sen, A., Schweisberger, V. N., . . . Wang, Y. (2011). Personalizing News Websites Attracts Young Readers. Newspaper Research Journal, 32(4), 22–38. doi: 10.1177/073953291103200403

Custers, B., Dechesne, F., Georgieva, I., & van der Hof, S. (2017). De bescherming van persoonsgegevens - Acht Europese landen vergeleken. Retrieved from

https://www.rijksoverheid.nl/documenten/rapporten/2017/10/05/ tk-bijlage-de-bescherming-van-persoonsgegevens

Díaz, A., García, A., & Gervás, P. (2008). User-centred versus system-centred evaluation of a personalization system. Information Processing and Management, 44(3), 1293–1307. doi: 10.1016/j.ipm.2007.08.001

Dylko, I. (2016). How Technology Encourages Political Selective Exposure. Communication Theory, 26(4), 389–409. doi: 10.1111/comt.12089

Dylko, I., Dolgov, I., Hoffman, W., Eckhart, N., Molina, M., & Aaziz, O. (2017). The dark side of technology: An experimental investigation of the influence of customizability technology on online political selective exposure. Computers in Human Behavior, 73, 181–190. doi: 10.1016/j.chb.2017.03.031

Dylko, I., Dolgov, I., Hoffman, W., Eckhart, N., Molina, M., & Aaziz, O. (2018). Impact of customizability technology on political polarization. Journal of Information Technology & Politics, 15(1), 19–33.

(34)

10.2838/552336

Festinger, L. (1957). A theory of cognitive dissonance. Stanford University Press.

Flaxman, S., Goel, S., & Rao, J. M. (2016). Filter bubbles, echo chambers, and online news consumption. Public Opinion Quarterly, 80(Specialissue1), 298–320. doi:

10.1093/poq/nfw006

Garcin, F., & Faltings, B. (2013). PEN recsys. In Proceedings of the 2013 international news recommender systems workshop and challenge on - nrs ’13(pp. 3–9). doi:

10.1145/2516641.2516642

Garcin, F., Faltings, B., Donatsch, O., Alazzawi, A., Bruttin, C., & Huber, A. (2014). Offline and online evaluation of news recommender systems at swissinfo. ch. In Proceedings of the 8th acm conference on recommender systems(pp. 169–176).

Garrett, R. K. (2009). Echo chambers online? Politically motivated selective exposure among Internet news users. Journal of Computer-Mediated Communication, 14(2), 265–285. doi: 10.1111/j.1083-6101.2009.01440.x

Gebhardt, W. A., & Brosschot, J. F. (2002). Desirability of Control: Psychometric Properties and Relationships with Locus of Control, Personality, Coping, and Mental and Somatic Complaints in Three Dutch Samples. European Journal of Personality, 16(6), 423–438. doi: 10.1002/per.463

Grinberg, M. (2014). Flask web development. O’Reilly Media.

Günther, E., Trilling, D., & Van de Velde, B. (2018). But how do we store it? data architecture in the social-scientific research process. In C. Stuetzer, M. Welker, & M. Egger (Eds.), Computational social science in the age of big data. concepts, methodologies, tools, and applications(pp. 161–187). Cologne: Herbert von Halem.

Haim, M., Graefe, A., & Brosius, H. B. (2018). Burst of the Filter Bubble?: Effects of personalization on the diversity of Google News. Digital Journalism, 6(3), 330–343. doi: 10.1080/21670811.2017.1338145

Helberger, N., Karppinen, K., & D’Acunto, L. (2018). Exposure diversity as a design principle for recommender systems. Information Communication and Society, 21(2), 191–207. doi: 10.1080/1369118X.2016.1271900

(35)

Jonnalagedda, N., Gauch, S., Labille, K., & Alfarhood, S. (2016). Incorporating popularity in a personalized news recommender system. PeerJ Computer Science, 2, e63. doi:

10.7717/peerj-cs.63

Kang, H., & Sundar, S. S. (2016). When Self Is the Source: Effects of Media Customization on Message Processing. Media Psychology, 19(4), 561–588. doi:

10.1080/15213269.2015.1121829

Karakaya, M. Ö., & Aytekin, T. (2017). Effective methods for increasing aggregate diversity in recommender systems. Knowledge and Information Systems, 1–18. doi:

10.1007/s10115-017-1135-0

Kitchin, R. (2014). Big Data, new epistemologies and paradigm shifts. Big Data & Society, 1(1). doi: 10.1177/2053951714528481

Klöpping, A. (2016). One million. Medium. Retrieved from

https://medium.com/on-blendle/one-million-6390860c2a34 Knijnenburg, B. P., Willemsen, M. C., Gantner, Z., Soncu, H., & Newell, C. (2012).

Explaining the user experience of recommender systems. User Modeling and User-Adapted Interaction, 22(4-5), 441–504. doi: 10.1007/s11257-011-9118-4 Knobloch-Westerwick, S., Sharma, N., Hansen, D. L., & Alter, S. (2005). Impact of

popularity indications on readers’ selective exposure to online news. Journal of broadcasting & electronic media, 49(3), 296–313.

Konstan, J. A., & Riedl, J. (2012). Recommender systems: from algorithms to user experience. User Modeling and User-Adapted Interaction, 22(1/2), 101–123. Kotkov, D., Wang, S., & Veijalainen, J. (2016). Knowledge-Base d Systems A survey of

serendipity in recommender systems. Knowledge-Based Systems, 111, 180–192. doi: 10.1016/j.knosys.2016.08.014

Kusner, M. J., Sun, Y., Kolkin, N. I., & Weinberger, K. Q. (2015). From Word Embeddings To Document Distances. Proceedings of The 32nd International Conference on Machine Learning, 37, 957–966.

Leetaru, K. (2017). Why 2017 Was The Year Of The Filter Bubble? Retrieved from https://www.forbes.com/sites/kalevleetaru/2017/12/18/