Down The Vaccination Rabbit Hole: A
Study On YouTube’s Recommendation
Algorithm
What kind of content does YouTube present to its users?
Abstract
The spreading of fake news online has been a widely discussed topic recently. As an increasing amount of people are using the Internet in the quest for health related topic, the presence of misinformation becomes a threat to the public health, especially in relation to vaccination. The anti-vaccination movement is gaining more traction online. YouTube has been facing criticism on their recommendation algorithm, which is said to present users with sensational, controversial and fake news content. This thesis aims to investigate how YouTube presents the topic of vaccination on its platform through ranking the search results and the recommendation algorithm and therefore either confirms the criticisms or reveals that YouTube has indeed changed their algorithm. This is done by crawling data through the YouTube Data Tool from the Digital Methods Initiative and visualizing this in Gephi. All analysed videos of the first 15 top ranked search results for three distinct queries ([vaccine], [vaccination] and [immunization]) contained positive messages towards vaccination. How YouTube exactly sorted these videos remained unclear, as various factors (e.g. date, views, likes & dislikes) did not seem to matter for being ranked first. For the related video network, the networks of two different top results for two distinct queries ([should I vaccinate my kids] and [immunization]) were compared. Here, it became evident that the nodes anti-vaxx community did appear influential in the [should I vaccinate my kids] query, but appeared less influential in the network of the top video from the [immunization] query. For both queries, it was found that users were not put in filter bubbles or echo chambers, because videos of multiple communities and standpoint were presented.
1. Introduction 4
2. Theoretical Framework 9
2.1. The Roles Algorithms Play 9
2.1.1. Filter Bubbles 10
2.1.2. Echo Chambers 11
2.1.3. Algorithms on YouTube 12
2.2. Fake News on Platforms 17
2.2.1. Twitter 17
2.2.2. Facebook 20
2.2.3. YouTube 21
2.3. The Anti-Vax Movement Online 22
2.3.1. Twitter 22 2.3.2. Facebook 25 2.3.3. YouTube 27 2.3.3.1. E-Health Literature 27 2.3.3.2. Medical Research 28 3. Method 30
3.1. The YouTube Data Tool 31
3.2. YouTube’s Dominant Voice 32
3.2.1. Query Design 32
3.2.2. Analytical Procedure 33
3.3. YouTube’s Recommendation for Videos 33
3.3.1. Query Design 34
3.3.2. Analytical Procedure 34
3.3.3. Data Visualisation 35
4. Results 36
4.1. YouTube’s Dominant Voice 36
4.1.1. Vaccine 36
4.1.2. Vaccination 38
4.1.3. Immunization 40
4.2. Related Video Network 42
4.2.1. E-Health Related Video Network 42
4.2.2. Medical Related Video Network 44
5. Discussion 46
5.1. Video Lists & YouTube’s Dominant Voice 47
6. Conclusion 48
1.Introduction
We use the Internet and all its potential for an ever growing number of things, including researching our health and related issues (Holone, 299, Mavragani & Ochoa, 2, Donzelli et al., n.pag). With the increase of misinformation, or as the popular term goes: ‘fake news’, this could potentially have far-reaching consequences.
Fake news is a concept that has been stoned to death in the news, on social media and in numerous different academic debates, but that does not make the issue any less dangerous. It has been named one of the most current threats to our society (Shao et al, 1), a key marker in the current media landscape (Corner, 1100) and could pose serious threats and challenges for online platforms (Hendricks & Vestergaard, 75). Even though fake news has existed long before the birth of the Internet (Hendricks & Vestergaard, 63, Burkhardt, 6), the medium did enable the spreading of news, real and fake, much quicker and easier. Anyone who has access to a computer is able to produce content and the Internet provides a stage with an immense audience. Also, gatekeeping was (and in some cases still is) very limited (Burkhardt, 7). All these factors contribute to the emergence of online fake news (Hendricks & Vestergaard, 75) and constitutes serious potential hazards to our society. One of the main worries is how quickly and widely misinformation spreads in comparison to actual news, especially through the means of social media platforms. A study showed that on Facebook, fake news had generated 8.7 million engagements between February and November 2016, while real news only generated 7.3 million engagements (Levinson, 15).
The gravity of the situation has not gone unnoticed by the European Union. On numerous occasions, it has vowed to restrict the spreading of fake news on the Internet. Notwithstanding the fact that freedom of expression is seen as an important core value in the European Union, the democratic society is dependant upon the availability of multiple verifiable and reliable information sources that citizens can rely on to help them form their opinions. The dissemination of reliable information is increasingly threatened because of the “deliberate, large-scale, and systematic spreading of disinformation” (European Commission, 1). Nowadays, social media is not merely used as a way of expressing ourselves and keeping in touch with our network of relations, the platforms now also serve as a vehicle to “mobilize the public around social issues and causes” (Rogers, 1). These platforms also have “the power to influence opinions increasingly” which allows minorities to “disproportionately impact public sentiment and legislation” (Diresta &
Lotan, n.pag). As the Internet has become an important actor in our quest for information, fake content can potentially have far-reaching consequences. The Web is, for example, used for our research of political candidates on who we are contemplating to vote, but also for our health related issues. It therefore acts as an important component in our decision making process (Love et al., 568, Smith & Graham, 2).
As the threat that fake news imposes upon our society is so unanimously identified, the European Union has formed a special committee that has been tasked with defining a Code of Practice regarding dis/misinformation (Isitman & Sommer, n.pag.) The committee has proposed a Code of Practice which, among others, addresses both the rights and the obligations of large online platforms such as Facebook, Google, Twitter, as well as software providers, such as Mozilla, advertisers and trade associations that represent online platforms and advertisers. The aim of this Code of Practice is to produce an online ecosystem that is transparent, trustworthy and accountable, which in turn should ensure that it will be able and willing to protect its users from disinformation. The Code puts the responsibility with the big online platforms and tech companies, which are expected to perform due diligence checks on the origin of the advertisements and ensure transparency regarding political advertising. Furthermore, the aforementioned stakeholders will also endeavor to close down fake accounts and identify automated bots. In addition to these mandatory actions, they must cooperate with European and national audio-visual regulators, independent fact-checking organizations and researchers to ensure an effective detection and flagging framework (European Commission, 8-9). Each of the above stakeholders face their own specific issues that they have to resolve. According to the latest update from the European Union Committee, Facebook, Twitter and Google have taken measures for implementing the commitments that were demanded by the European Union, but the platforms must provide more transparency how they will implement “consumer empowerment tools and boost cooperation with fact-checkers and the research community across the whole EU” (European Union, n.pag.). Twitter has paid particular attention to undertaking “actions against malicious actors, closing fake or suspicious accounts and automated systems/bots” (n.pag.), while Mozilla has worked on an improved version of its browser which should protect the user data in such a way that it can not be used for misinformation campaigns (n.pag.). Google has prioritized improving its “ad placements, transparency of political advertisement” (n.pag), and empowering users in their online experience by providing tools, support and information.
YouTube was bought by Google in 2006 (Sweney & Johnson, n.pag.) which herewith ensured its dominance in the online video realm (Gillespie, 2). As a platform, YouTube has replaced traditional media forms and is seen as one of the primary gatekeepers of cultural discussions. Such gatekeeping responsibilities require complex and sturdy control and compliance to protect both their individual users, key political groups, and the public interest. Since YouTube is an important part of the Google empire, it would therefore also have to fall under the European Union Code. Platforms do not just accommodate politically driven clients to ensure their profitability, but are also known for striking a “regulatory sweet spot” (2) between the legislative protections benefiting them and not having to execute obligations that do not work for them (2). While the measures that are being taken will surely help diminish the dissemination of fake news, they do not address all crucial factors in the distribution chain and the role the information plays in the decision-making process of individuals. Perhaps this is due to the fact that these kind of measures would not benefit the platforms in terms in profitability. As Tufekci claimed in her article in the New York Times, YouTube promotes content that is extreme and has the potential to radicalize people (n.pag.). The algorithm is designed to let the users be “drawn” into the website, thus resulting in an increase of watchtime through their recommendation features, and it seems to work. Over 70% of the watch time is generated from these recommendations (Nicas, n.pag.). This recommendation of personalized results can lead users into the so-called ‘filter bubbles’: an extremely personalized space (Pariser, 75). In this space, algorithms automatically endorse content the individual presumably agrees with and therefore amplifies certain beliefs and/or standpoints (Flaxman, Goel & Rao, 299), this will again have a substantial impact on the bottom line. Another example are the online ‘echo chambers’, in which individuals are predominantly exposed to opinions that resonate with their own (299). YouTube is criticized for providing both through its built-in recommendation algorithm, which “promotes, recommends and disseminates videos in a manner that appeared to constantly up the stakes” (Tufekci, n.pag.). The algorithm therefore diminishes the variation in the public knowledge and undermines the (political) dialogue in our society (Gillespie, 2).
An alarming healthcare trend has regained traction thanks to social media: not vaccinating children out of fear for potential risks. The anti-vaccination (or: anti-vaxx) movement can be seen as a relevant case study for addressing the issue of the dissemination of fake news through algorithms, as previous research has proven that the anti-vaxx movement is gaining more territory online every day. Bassi et al. have identified it as a conspiracy theory that continues to attract new followers online (18). Prior to the universal vaccinations in the 1980s,
measles alone had claimed the lives of around 2.6 million people. Immunization is often regarded as the most effective medical intervention in combating these diseases, as it has drastically decreased the annual numbers of mortalities resulting from these diseases. However, measles outbreaks still occur annually while the largest part of the population has ready access to the vaccines. These outbreaks can be potentially fatal to groups of humans with increased vulnerability, such as children and the elderly. In the United States, seventeen states allow a “believe exemption” from the vaccines for the diseases mentioned above, a law which is increasingly used by parents who “are often skeptical about the safety of the MMR vaccine and consider mandatory vaccinations as violations of personal freedom of choice” (Yuan & Crooks, 197). An argument that is also commonly used by parents, is that vaccines supposedly have a correlation with autism disorder - a claim that was initiated by Andrew Wakefield in an article which was published in The Lancet (198). Even though this argument has been scientifically debunked on numerous occasions, the claim is still regularly used by the anti-vaxx movement. This group of people uses multiple social media channels to promote and disseminate their beliefs (Yuan & Crooks, 198) and with people consulting the Internet more often for health related advice and information, this could potentially have fatal consequences (Holone, 299, Mavragani & Ochoa, 2, Donzelli et al., n.pag).
Combining the increase of fake news online, recommended content that puts us in the feared filter bubbles and echo chambers as well as the rising health trend of not vaccinating children, raises alarming questions. As mentioned before, the information we see on the Internet plays an important role in our decision-making processes for multiple issues, including health-related topics. If the information that is provided on the Internet is not factually correct and this faulty information is further enhanced through algorithms, this could potentially have far-reaching consequences for numerous issues such as our health. In the case of vaccines, it “can mean the difference between life and death” (Holone, 298). Social media platforms have the power to create positive as well as negative attitudes towards vaccination and can act as a vehicle “for lobbies and key opinion leaders to influence other citizens” (Donzelli, n.pag.) Much of the research that has been undertaken in this field has focussed on the anti-vaxx movement on either Twitter or Facebook, and will be discussed in more detail in the theoretical framework. The issue of vaccination information has also been studied on YouTube, but this research has primarily focused discourse and content analysis, especially in Italy. As mentioned earlier, YouTube has been identified as a space where people turn to for health information and plays a crucial role in “divergent practices of community building” (Shifman, 189). It therefore is an
important actor and potential vehicle for the online anti-vaxx movement. Because the number of people with potentially fatal, and through vaccination perfectly preventable, illnesses is rising it is important to map the spreading of misinformation online on all major platforms, including YouTube. It is therefore crucial to gain more insights on what kind of content is presented and how it spreads throughout the platform. In addition to this, the World Health Organization actually recommends that the vaccination hesitancy and all the possible causing factors are being consistently monitored (Covolo, 1693). This monitoring must be done in order to conceive countermeasures to change the public opinion. As Smith and Graham also mention in their article, there is a shortage of research that maps the online anti-vaccination community (2).
This thesis aims to fill that gap in the literature by giving into how anti-vaccination content is presented on YouTube. This will be done by examining the ranking of the search results and investigating the recommendation system. Whereas previous research has mainly focussed on mapping the anti-vaccination movement on Twitter and Facebook and pure discourse analysis on Youtube, this thesis will investigate how the theme ‘vaccination’ is presented on YouTube, as the platform has been criticized for promoting anti-vaccination videos (O’Donovan & McDonald, n.pag.). By using the YouTube Data Tool from the Digital Methods Initiative, this research will firstly look at how YouTube ranks vaccination content. Furthermore, two different so-called ‘rabbit holes’ will be compared. By crawling t wo different queries through the YouTube data tool, one derived from the e-health literature and from the medical studies, the pathways of two different videos will be investigated. By performing such research, one can see where the recommendation algorithm takes viewers: do the different pathways meet? And if so, to what extent? This kind of data can be used for exploring the criticism on the YouTube recommendation algorithm: will it show an enclosed anti-vaxx space, and therefore confirms the criticism on their built-in recommendation algorithm causing echo chambers, filter bubbles and promoting extreme content? Additionally, finding an answer to these questions will map the anti-vaccination network on YouTube and will also provide overall insights on if there is any form of misinformation regarding vaccination. Therefore, the research question that this thesis aims to answer is as follows: how does YouTube’s search- and recommendation algorithm present content to its users?
Firstly, the theoretical framework will further elaborate on the characteristics and impact of filter bubbles and echo chambers. The YouTube’s recommendation algorithm will also be extensively discussed. Secondly, this chapter will also address various research projects that have been undertaken to map both the dissemination of fake news as well as the anti-vaxx
movement on online platforms such as Facebook, Twitter and YouTube. The method will account for the choices that were made in the research design. All findings will be presented in the results chapter and further analyzed and discussed in the discussion chapter. Finally, the research question will be formally answered and accompanied by the most important takeaways, laid down in the conclusion.
2. Theoretical Framework
This theoretical framework will elaborate on specific concepts and current research practices that are deemed crucial for the relevance and methodology of this thesis. Firstly, it will address the importance of algorithms on the online platforms in order to better understand its implications and potential impact. Secondly, various work on the dissemination of fake news on Twitter, Facebook and YouTube will be explored. Thirdly, the research to date on how the anti-vaxx movement is continuing to expand and manifest itself on the aforementioned platforms will be discussed.
2.1. The Roles Algorithms Play
Due to the rise of social media, issues of political polarization and the increase of the amount of fake news online, the debate on algorithms has intensified (Rieder et al., 51). As Gillespie describes in his text ‘The Relevance of Algorithms’, this form of technology plays an increasingly influential role in what we encounter online (2). As the online realm and real-life are becoming more intertwined, algorithms are therefore also becoming increasingly entangled in our daily lives (Rieder et al., 51). Users interact with this type of technology when they, for example, “search for information, buy and sell products, learn or socialize” (51). Algorithms are the result of various complicated processes which have a profound influence on our decision making process, beliefs and opportunities. They decide what sort of information we encounter, the type of people we engage with on social networking platforms and what is “hot” or “trending”: it essentially navigates the Internet for us, often without our “conscious” consent (Gillespie, 2). It is therefore essential to understand how these algorithms can lead to filter bubbles and echo chambers, what these concepts precisely entail and what their implications could be for unrestricted information flows in our contemporary society. Important political actors have called for more transparency on how these technological mechanisms work (Rieder et al., 51). However, the information on how
they operate often stays in the “black box” (Wesseling et al., 52), disabling researchers to thoroughly study them.
Though it has been argued that the abundance and diversity of information will continue to expose individuals to multiple ideas, breaking them free from “insular consumption patterns” (Flaxman, Goel and Rao, 299), the proof of the existence of filter bubbles and echo chambers and their impact on discussions and information flows should not be taken lightly.
2.1.1. Filter Bubbles
Over the past years, the personalization technology has taken over the Internet, producing product and information offerings catered towards individual users. As O’Callaghan et al. state: “social media platforms do not merely transmit content, but filter it” (460), and therefore do not lay the responsibility of causing the filter bubbles with the users, but with the platforms. These kinds of technologies play an important role in the profits in the online/tech companies. Amazon reported that as much as 35% of its sales are generated by recommendation systems and Netflix stated that 75% of the viewed content derived from the recommendations they gave to their viewers (Nguyen et al., 677). As Gillespie mentions in his text, certain worries have been addressed concerning the information algorithms and how they “could undermine our efforts to be involved citizens” (22). The first, and most prevalent one, being the issue of personalized search results. Nowadays, the search engines could produce two totally different outputs for two different people that enter the exact same query. On a news site or social networking site, the information that is offered can be exactly catered and customized to the preferences of the users (either set by the user and/or the website itself) (22). When such information is very highly personalized, “the diversity of public knowledge and political dialogue may be undermined” (22) because individuals do not see other information that they may need in order to compose a balanced and unbiased view. The personalizing algorithms and the way they are used and managed, are also posed as an important issue for two reasons. Firstly, most people are not aware of this personalization, and even when they do know about it, it is difficult to grasp the vast technology behind it and the unsolicited choices it makes for them. Secondly, the algorithms can create the so-called filter bubble (Holone, 298).
According to Pariser, algorithms and the individual’s preferences for like-minded information lead us into ‘filter bubbles’ (7), also described as “a self-reinforcing pattern of narrowing exposure that reduces user creativity, learning and connection” (Nguyen et al., 677).
Furthermore, filter bubbles enforce ideological segregation by automatically recommending more content that is in line with the previous previous search results and topics with which the individual is likely to correspond with (Flaxman, 299). Nguyen et al. have studied this concept by comparing “content diversity of recommended movies at the beginning and at the end of a user’s observed rating history” (683). They found that the variation of recommended movies became narrower over time and therefore provided evidence for a potential filter bubble (683).
In relation to the central research question of this thesis, it is important to investigate the relationship between the filter bubble and the possible implications it might have on issues of personal health. According to Holone “it becomes important to look at how the filter bubble can play an important, possibly dangerous role in the type and quality of health information people are accessing online” (300). In his text, Holone sketches a common scenario of a parent researching vaccinations for his or her children. When the parent queries ‘vaccines and children’, over 35 million hits are presented, which are not sorted by “objective relevance” (298), but is influenced by what you have queried/searched before, the social network one is active in, when one is searching and where that search is coming from. Holone stated that over 200 factors determine what the searcher will see, inevitably giving different searchers different outcomes. He also states that the personalized search can be potentially beneficial in some cases, as it only presents results that are deemed relevant and therefore increase the user-friendliness. However, in other cases “it can mean the difference between life and death” (299). This is because the search results when querying ‘vaccination benefits’ or ‘are vaccinations dangerous’ are largely dependent on (and influenced by) our “search history, social network, personal preferences, geography and a number of other factors” (301), the objective results will not come forward, hence stimulate the spreading of misinformation on vaccines.
2.1.2. Echo Chambers
The Internet has created an abundance of information and facilitates creating, sharing and accessing of this information anytime and anywhere. Because there is so much information readily available, users may choose to only consume and engage with content that is in line with their own beliefs and opinions (Flaxman, Goel & Rao, 299). According to Bessi et al., echo chambers are isolated spaces online, where ideas and opinions are being exchanged and confirmed by others in these same spaces. Flaxman, Goel and Rao agree with this notion, also conceptualizing it as a space where “individuals are largely exposed to conforming opinions”
(299). These spaces host like-minded individuals who share for instance comparable political views or adhere to the same (conspiracy) theories. Once one is in these spaces, the information that is encountered has the same tone of voice, “basically echoing each other” (1). These echo chambers create ”homogeneous groups” and “affiliate with individuals that share their political view” (Colleoni, Rozza & Arvidson, 319).
In a research study of Bessi et al. from Cornell University for the World Economic Forum, the research team studied and analyzed the dynamics in such an echo chamber. They found that the most discussed topics were the environment, diet, health and geopolitics, and that users consumed the content related to these topics in similar manners. Also, users often jump from topic to topic, and once they are engaged with a specific conspiracy theory they are more likely to also engage with another one. According to them, every new like on the same subject of conspiracy theories increases the probability that the user will jump to the next theory by 12% (Bessi et al., 9). They have also found that the conspiracy theories “create a climate of disengagement from mainstream society and from officially recommended practices” (2) and also specifically name the conspiracy theories on vaccines here. The conclusion of their study is that every individual is able to produce or find information that resembles their own beliefs. As Quattrociocchi says it, “an environment full of unchecked information maximizes the tendency to select content by confirmation bias” (n.pag.)
In another research performed by Vicario et al., the researchers addressed different determinants that play a role in the dissemination of misinformation by performing a quantitative analysis. They investigated how users on Facebook consumed information which was related to two different narratives: scientific and conspiracy stories. They found that, even though users consumed scientific and conspiracy content in a similar manner, the cascade dynamics did vary. Vicario et al. also found that “selective exposure to content is the primary driver of content diffusion and generates the formation of homogeneous clusters” - the “echochamber” (554).
2.1.3. Algorithms on YouTube
YouTube was founded in 2005 and quickly grew out to be one of the, if not the most used video website (Davidson et al., 293) globally. Google bought Youtube in October 2006 for $1.65 billion, ensuring their position in the online online video realm (Gillespie, 2). On average, more than one billion users watch six billion hours each month and one hundred new hours of video material are being uploaded to the platform every single minute (Weiman, 11 & Davidson et al., 293).
Therefore, YouTube can be regarded as one of the most influential and biggest actors online. The platform has become a place for political groups to express their opinions, a place where public discussion takes place and as a communication tool for certain issues. As the CEO of Google, Sundar Pichai, said: “YouTube is a place where we see users not only come for entertainment. They come to find information” (O’Donovan & McDonald, n.pag). Davidson et al. describe multiple reasons why users land on the YouTube platform. They, for example, watch one single video that they originally found somewhere else, which they call ‘direct navigation’, or they intend to find specific videos that relate to a topic that has their interest. This is called ‘search’ and ‘goal oriented browse’. A final reason is that users purely want to be entertained by the found content. This is the ‘unarticulated want’ and this is facilitated by personalized video recommendations (293). This is different from platforms such as Facebook or Twitter. On these platforms users often see content from accounts they have chosen to follow, but YouTube has an important and active role in presenting information to its users that they have not chosen themselves through this recommendation system (Nicas, n.pag.).
The objective of this system is helping the users in their quest for videos that they find relevant and engaging in order to increase and maximize the watch time. To meet this objective, it is essential that these recommendations are up-to-date and mirror the user’s recent activity on the website (293) through a well-working algorithm.
The ‘YouTube algorithm’ itself is made up of multiple different algorithms (e.g. “recommended, suggested, related, search, MetaScore” (Gielen & Rosen, n.pag.)) and are specifically designed to meet the objective of maximizing the watch time. The watch time is not identical to the actual minutes watched. According to Gielen and Rosen “It is a combination of the following: ● Views ● View Duration ● Session Starts ● Upload Frequency ● Sessions Duration ● Session Ends” (n.pag.)
The recommendations that the algorithm provides have to be “reasonably recent and fresh, as well as diverse and relevant to the user’s recent actions” (Davidson et al., 294). Even though the exact working of the algorithm is kept secret (Gielen & Rosen, n.pag.), researchers have retrieved some information on the data input by closely examining data. The data for the input is generated
through the users activity: the watched, favored and liked videos. It subsequently expands theseseeds to other sets of videos that are set by “traversing a co-visitation based graph of videos” (Davidson et al., 294). Once this is effectuated, these videos are ranked by various types of signals for diversity and relevance (294).
This recommendation system appears to work very well for YouTube, as the recommended videos account for 60% of the video clicks (Davidson et al., 296). However, YouTube has been heavily criticised on its design of the built-in recommendation algorithm that controls the videos a user sees. The algorithm personalizes the content that is in the related video section (Tufekci, n.pag.). It could be argued that this recommendation algorithm guides users a ‘filter bubble’ (Pariser, 7) and traps them there. The way in which these types of algorithms work, can result in users finding and merely viewing the videos that depict social and/or political perspectives that they would agree with (Gillespie, 2).
Rieder et al. have performed a study in which they observed YouTube’s ranking of search results on seven different sociocultural issues over specific periods of time (50). The search feature is a relevant object of study, as the process includes forms of mediation, the curating of content and therefore also presents certain perspectives or viewpoints. As a method, they proposed a ‘scraping’ approach for their research, enabling them to observe the performance of the algorithm. This supported them in gaining more understanding how the algorithm worked and allowed them to explore the other types of agencies which were involved. The researchers approached the search function as a “socio-algorithmic process involved in the construction of relevance and knowledge around a large number of issues” (52). Rieder et al. organized their research around ‘ranking cultures’, which they conceptualized as the “unfolding processes of hierarchization and modulation of visibility” (52). The ‘relevance’ search algorithm, that provides a list of videos after the user has entered its query, can be seen as a two-stage process, consisting of retrieval and ranking. In the retrieval phase, all the videos that correspond to a certain query are singled out. This selection process can rely on a form of text matching in video titles and/or the text in the description box. Other factors, like comments or images, are also scanned. More often than not, the amount of matching items is huge, and ranking is therefore crucial. According to Rieder et al., there are multiple approaches for the ordering of these results: “one could distinguish between factors that are query independent and factors that draw on the query and its context” (53). The independent factors are fairly straightforward. These are, for example, channel subscribers, watch time, likes and views. These factors, among others, decide the relevance of a video. However, when personalization or other similar techniques are
added to the process of ranking, it becomes more complex (53). As Rieder et al. investigated the ranking of the search results over time, they found that the search function on YouTube reacts heavily to both attention cycles and whether the content is “YouTube-native” (64). They also found that the algorithm feeds of controversy and audiences that are loyal, ranking videos that check these boxes much higher than other videos with more views. As the researchers also state, these outcomes provide meaningful insights on how the ranking algorithm on YouTube promotes content that is gaining the most attention and is controversial.
The outcome of this study corresponds to the observations Tufekci made. Her article lacks any data supported evidence for her claims, mainly due to the fact that useful data is difficult to come by as the technology behind the algorithm is a very well kept secret, but the article does provide useful insights on YouTube as “the Great Radicalizer” (n.pag. ). The observations that she made on YouTube’s recommendations and auto-playing feature are in line with the findings from the research from Rieder et al., in the sense that she also found that the algorithm promoted controversial material, such as videos on the “existence of secret governmental agencies and allegations that the United States government was behind the attacks of September 11” (n.pag.). Even with topics that were not related to politics the same pattern emerged: after consuming videos on vegetarianism YouTube recommended content about veganism, and one ends up being exposed to running ultramarathons videos when the person was looking at videos on jogging. According to Tufekci, the YouTube algorithm has decided that people are sensitive to more extreme content than they intended to watch in the first place. Because the audience is intrigued, their view time becomes longer. This increase in viewing time is beneficial to YouTube, as this means that they can generate more money from advertising income (n.pag.).
Another study by O’Callaghan et al. also proved that the recommendation algorithm on YouTube excluded information that did not align with a user’s existing perspective. This could again lead to a filter bubble. Their case study related to the extreme right (or ER) content on YouTube, and they found that once a user has watched ER related content once, it is very unlikely that that same user sees any content opposing this ER political standpoint. They therefore state that “this suggests that YouTube’s recommender algorithms are not neutral in their effects but have political articulations” (460). An important take-away from this research is that there YouTube’s recommendation system has an influence on the political thoughts of its audience, and therefore potentially also their political actions.
The statements that were made in the previous study, also corresponds with a notion from Jack Nicas. He claims that when users reveal a political bias through their choice of content, YouTube normally recommends videos that confirm these biases, sometimes with even more extreme standpoints, and hence start creating the filter bubbles. These standpoints can be “divisive, misleading or false” (n.pag.), and given the fact that the website has over 1.5 billion users, they have much power in influencing the opinions of their audiences. As mentioned earlier, there is an important difference between platforms. On sites, such as Facebook and Twitter, users get to somewhat choose the content they see by friending and following the people they want. YouTube, however, largely chooses the content their viewers through their recommendation system. An investigation by Nicas showed that these recommendations often lead its users to “channels that feature conspiracy theories, partisan viewpoints and misleading videos” (n.pag.). Merely querying ‘the Pope’, while being logged out of YouTube and clearing all internet history, produced multiple videos containing conspiracy theories and sensational content. Videos with titles such as “How Dangerous is The Pope” from the ‘Alltime Conspiracies’ channel, or “What If The Pope Was Assassinated?” by ‘LifeBiggestQuestions’ are the top hits. The algorithm does not necessarily choose ‘extreme’ videos, but searches for clips that generate high traffic and are keeping viewers on the website. These videos tend to be sensational and extreme of nature (n.pag).
In January 2019, O’Donovan et al. also performed a study to check if the promises made to fix the YouTube recommendation algorithm that were actually executed. They found that despite these vows for improvement, YouTube still suggested “conspiracy videos, hyperpartisan and misogynist videos, pirated videos and content from hate groups” (n.pag.) after common news-related queries. O’Donovan et al. found that it just takes nine clicks in the recommendation system to go from “an anodyne PBS clip about the 116th United States Congress” to an “anti-immigrant video from a designated hate organization” (n.pag.). This anti-immigrant video is called ‘A Day in the Life of an Arizona Rancher’ and depicts Richard Humpries, who tells the audience about an incident “where a crying women begged him not to report her to Border Patrol” (n.pag.), while she was not aware of the fact that he already did. It has over 47.000 views and the top comment reads “Illegals are our enemies, FLUSH them out or we are doomed” (n.pag.). This video dates back from 2011, and in 2016 the uploading party has been identified as an anti-immigrant hate group. Nevertheless, YouTube still recommended this video in 2019 after searching for ‘US House of Representatives’, whilst operating in a clean research browser. To increase the understanding of such algorithms as well as the alleged changes they have been
through, Donovan et al. ran a new series of tests. Through Google Trends, they determined which news terms and political related search terms were most popular, and then searched these on Google. They watched the result and consequently clicked the next top recommended video through the ‘up next’ algorithm. Each query was made in a clean research browser that was not linked to any personal accounts. This is done to effectively demonstrate how the algorithm operated when personalization is left out. They searched 50 unique queries, and went down the ‘rabbit holes’ 147 times. This resulted in 2.221 videos played. The researchers claimed that the results of this research “suggest that the YouTube users who turn to the platform for news and information (...), are not well served by its haphazard recommendation algorithm, which seems to be driven by an id that demands engagement above anything else” (n.pag.). Observations were made on how the algorithm make a “decisive but unpredictable” (n.pag.) turn in a certain content direction. While in some cases it was a harmless magic trick tutorial, in other cases it was a “series of anti-social justice warrior or misogynist clips” (n.pag.). Videos containing partisan content and misinformation were also recommended, spreading wrong information to its viewers on many occasions. Even though Youtube had promised improvements of its algorithm, the type of videos that is proposed is still controversial and often contains wrong information (n.pag.).
2.2. Fake News on Platforms
As was mentioned in the introduction, fake news is not a new concept that has been conceived over the past couple of years (Hendricks & Vestergaard, 63 & Burkhardt, 6). It existed long before the Internet, but the medium does enable the misinformation to spread at a much faster pace to a much greater audience (Burkhardt, 7). Gatekeeping is starting to develop but has been very limited on certain platforms to date. All these factors have contributed to high levels of fake news online (Hendricks & Vestergaard, 75) and causes the public to be seriously misinformed about certain issues. One of the main breeding grounds and means of dissemination of fake news are social media platforms, also due to the fact that there is an “asymmetry of passion” on these kind of platforms. The most compelling type of content is often also the most sensational (O’Donovan & McDonald, n.pag.) and therefore also the most popular.
2.2.1. Twitter
Twitter is often described as a microblogging service (Kwak et al., 591 & Love et al., 568). The social network was founded in 2006 and allows users to select people they want to follow and
share their own messages publicly. Over a decade later, Twitter has emerged as an enormous platform to which people turn to for, amongst others, their news and health information. There are over 150 million active users who account for more than 1 billion messages every three days. (Love et al. 568). It has also become a breeding ground for fake news (Vosoughi, Roy & Aral, 359) and a place where certain discourse is promoted, including conspiracy theory topics, such as vaccinations (Love et al., 568). Twitter has the power to connect and organize users from all over the world, creating online communities rather than physical ones (Diresta & Lotan, n.pag.).
As we saw previously, the recent technological developments make fake news spread even faster. An example of such technologies that are very prevalent on Twitter are bots. These are programmes that may appear as human users, but actually spread automatically generated information through the platform at rapid speed, and are therefore very well suited for disseminating fake news. A computer scientist at the University of Southern California, Emilio Ferrera, has made an estimate which states that 15% of all Twitter profiles are bots (Hendricks & Vestergaard, 75). These fake users online are not only problematic as they can spread the fake news immensely fast, but a story that is read and spread by great amounts of people appears more to be more credible. This causes people to believe fake news just because an overwhelming number of other people seem to believe it as well (75).
Since the American elections in 2016 and the European Code of Conduct, Twitter has taken various measures to hopefully reduce, and ideally eliminate, the widespread misinformation on the platform. Before 2016, Twitter did little to nothing to stop the disseminating of fake news and therefore allowed Russian troll factories to prosper (Collins, n.pag.). Twitter did have some blockbots on their platform since 2012, but these did not specifically target the misinformation accounts (Geiger, 794). Over the last couple of years Twitter has increased the number of preventative systems to prevent the spreading of information. A Twitter spokesman announced that the rules on the “use of stock or stolen avatar photos” and the “use of intentionally misleading profile information” (Collins, n,pag) will be updated with the objective to diminish the network of bots. Twitter may also close fake accounts that perform and engage with malicious behaviour. An example of this is the closing of roughly 10.000 accounts that discouraged voting (Collins, n.pag.). When a user had spread the wrong date for the midterm voting in the United States, Twitter immediately banned him. The user responded: “Were they really banning people for saying [vote on] November 7? Lol, whoops. Maybe that’s what got me shadowbanned.” (Collins, n.pag.). Accounts that purposely targeted Democrats and told them to
vote on the day after the actual voting day, were stopped by an algorithm that disabled their tweets before they were able to publish them and could misinform other users (Collins, n.pag.).
A research by Shao et al. studied the spreading of fake news by bots on Twitter. They utilized two tools that are able to conceive a systematic analysis of the spreading of fake news, and how they are manipulated by social bots, namely Hoaxy and the Botometer. Hoaxy is a platform that can be used “to track the online spread of claims” and can therefore be used to track how fake news and fact-checking spreads on Twitter (Shao et al., 12). Botometer is a machine learning algorithm that detects social bots. It is used to evaluate the level to which an account shows particular similarities with a bot account (13). The Twitter Search API was utilized to collect the 200 most recent tweets and the 100 most recent tweets which mentioned the account in question. From this data, they extracted “features capturing various dimensions of information diffusion as well as user metadata, friend statistics, temporal patterns, part-of-speech and sentiment analysis” (13). Shao et al. crawled articles that were published by various independent fact-checking organizations and websites that supposedly publish fake news on a continuous basis. During their research, they collected 15.053 articles which were fact-checked and 389.569 claims that were either unsubstantiated or debunked. Through the Twitter API, Hoaxy has collected 1.133.674 public posts that contained links to fact-checks and 13.617.425 posts which linked to claims. On average, fake news websites produced a 100 articles per week and each article had approximately 30 tweets per week (13). They conceptualized success by how many people shared an article and by the number of posts that contained a link. In both cases, there was a widespread “distribution of popularity spanning several orders of magnitude” (4), meaning that the majority of published articles go unnoticed, but that still a significant part goes viral.
The most claims were spread through tweets and retweets, and not so much in replies. For many articles, there were only one or two accounts identified that were accountable for the brunt of all activity. Some accounts even shared the same claim up to a 100 times. Shao et al. call these the ‘super-spreaders’ and found that these were “more likely to be bots compared to the population of users who share claims” (5). From their research, it also showed that accounts which were given the highest bot scores, targeted the users with higher median numbers of followers and a low variance. This could lead to bots exhibiting claims to highly influential people, like journalists or politicians, which then would lead to fabricating the appearance that a claim is widely shared and thus increasing the changes that the people will share it.
Their solutions consist of “curbing social bots”, as this mitigates the risk of the spread of online fake news. For a successful implementation, social media platforms and academic researches should work closely together to be successful. The downside of merely using this solution is that the algorithms behind these mechanisms could possibly make mistakes. According to Shao et al. ”even a single false-positive error leading to the suspension of a legitimate account may foster valid concerns about censorship” (12). This solution would call for a human intervention, but due to the scale and volume of the issue, this would not be feasible. Therefore, research on abuse detection technology becomes increasingly important. Another way to go would be the deployment of CAPTCHAS. This technology is already deployed on a large scale and has proven to be successful in its fight against “email spam and other types of online abuse” (12).
2.2.2. Facebook
It probably will not come as a surprise to the informed reader, but recent studies have shown that fake news on Facebook remains prevalent (Quattrociocci, n.pag, Monti et al., 1). A study that focused on the conspiracy theories as forms of misinformation was performed by researchers from the World Economic Forum. Here, they state that one of the dangers of conspiracy theories is that they “tend to reduce the complexity of reality” (Quattrociocci, n.pag) and therefore misinform its users.
Likes, shares and comments enable us to grasp the social dynamics on Facebook. By using this form of data, the driving forces behind the spreading and consumption of (fake) news can be understood. In Quattrociocci’s study, he looked at how 2.3 million Facebook users interacted with political information during the Italian elections in 2013. They followed 50 different sources, which were divided in three different categories: 1. Mainstream media, 2. Online political activism, and 3. Alternative information sources (sources that are not recognized by science and mainstream media outlets). The study tracked all the users’ interactions with these pages (e.g. likes, comments and shares) over a period of six months. The outcome of this study was that the actual content, and therefore also the quality of the information it provided, had little to no effect on how it spread. Posts with unsubstantiated claims, political activism and regular news had the same engagement patterns. An explanation for this could be the so-called echo chambers and people looking for information that is in line with their own beliefs, disregarding its source (n.pag.).
Facebook has undertaken certain measures to combat fake news. It created a so-called immune system that is supposed to protect the platform from infections by bots, and has five prominent fact-checking organizations verifying the information that is posted. The big social media platform has also launched a specific feature in parts of Europe that gives readers access to the outcomes of the fact-checkers, called Related Articles. In addition to that, a Facebook spokesman said that they had updated their advertising policy with specific regard to combating fake news, by clearly stating that they will not “display ads that show misleading or illegal content, to include fake news sites” (Wingwield, Isaac & Banner, 1). However, this will not stop the spreading of fake news through posts. For this type of dissemination, other solutions will have to be thought of.
2.2.3. YouTube
YouTube has been criticized for promoting conspiracy theories, hyperpartisan news, misogynist content, pirated videos and hate groups on their platform after searching for ordinary news-related topics (O’Donovan et al., n.pag.). Especially the recommendation feature is under fire, as it often directs users to channels that contain videos with “conspiracy theories, partisan viewpoints and misleading videos” (Nicas, n.pag.) even when the users in question did not show any interest in the proposed content. From these observations, it can be stated that YouTube is deliberately pushing viewers in the direction of controversial and extreme content (n.pag).
Because of these heavy criticisms, YouTube has tried to make improvements in their advertisement policies and added extra verified information on conspiracy theory videos. A YouTube spokesperson stated that they “have strict policies that govern what videos we allow ads to appear on, and videos that promote anti-vaccination content are a violation of those policies” (Shu, n.pag.). However, many of the anti-vaccination videos are not made with the intention to generate a profit, they are merely out there to let their voices be heard. Having such an advertisement policy in place might be a good start, but it will not solve the spreading of fake news on the platform.
In addition to this, YouTube announced In 2018 that they were looking into the possibility
of changing the design of the videos to another format. This format was made to promote relevant and fact-checked information from credible sources next to the videos containing conspiracy theories (Nicas, n.pag.). In May 2019, YouTube added links to the background information on Wikipedia next to the conspiracy videos, which they call “information cues”
(Etherington, n.pag.). They cues will appear as pop-up in the form of a boxed text, and are intended to “prevent alternative perspectives on subjects including chemtrails and the supposedly fake Moon landing” (n.pag). This has also been the case for a number of anti-vaccination videos. An information panel was added that explained what the MMR vaccine actually does as well as links to relevant Wikipedia articles. The intention behind this was to debunk any conspiracy theories that were mentioned in the video (O’Donovan & McDonald, n.pag.). While many see this as a solid effort, there are also critics who do not think adding links is enough to fight misinformation. They find that this new tweak is merely a trick from YouTube to relieve themselves of the responsibility to take a closer (and more critical) look at what actually is causing this issue. Banning this type of content as a whole is not an option, as YouTube “encourages free speech and defends everyone’s right to express unpopular points of view” and “allow all our users to view all acceptable content and make up their own minds ” (O’Callaghan, 460). This is a a grounded justification, as freedom of speech is of course a basic human right. But how far do we let this go, as we can see that this ‘freedom of speech’ in the form of fake news is taking a toll on our contemporary society?
2.3. The Anti-Vax Movement Online
Since the anti-vaxx movement loses most of its credibility when it comes to scientific evidence and its supporters are relatively small in number, it has started to fall back on different social media platforms to enhance its presence and get its message across. According to Diresta and Lotan, the anti-vaxx community has leveraged on the reach and power of Twitter, Facebook and YouTube for a longer period of time. Social media has proven its potential to have a significant impact on debates and version of events, and therefore also influence the public’s opinion. (n.pag.) In addition to this, people are researching the web more for health related information and rely more on what they encounter online (Holone, 299, Mavragani & Ochoa, 2, Donzelli et al., n.pag). It is therefore essential to understand who is participating in the spreading of anti-vaxx misinformation online on the different platforms and how this is done.
2.3.1. Twitter
The anti-vaxx movement has been actively using Twitter as a vehicle to get their arguments across. Diresta and Lotan have analysed the hashtags that were used by anti-vaxx Twitter users in their fight against Californian legislation that forbids parents to not vaccinate their children
based on religious or philosophical reasons. In August 2014, the anti-vaxx activists mainly focused on spreading the misinformation on autism and vaccination by using #cdcwhistleblower. This is a hashtag that refers to the conspiracy theory that the CDC (Center for Disease Control and Prevention) is covering up certain information that would prove a correlation between the MMR vaccination and autism. As the legislation progressed, #cdcwhistleblower started to be replaced by #sb277 (the number of the bill). Diresta and Lotan have mapped the various groups within the anti-vaxx community on Twitter and found that the groups are “extremely well organized and passionate” (n.pag.) and have succeeded tweeting themselves into an echochamber. Another finding from this research is that the users that do know about these hashtags or do not explicitly look for them are not very likely to come into the space on Twitter, which can be seen as a good thing. However, the anti-vaxx tweets are designed in such a way that they are supposed to erode the confidence in vaccinations. Dana Gorman, an active anti-vaxx leader, has stated that it is one of her goals to make new parents question everything about the vaccinations (n.pag.).
Love et al. have analysed the content that is driving the discourse about vaccination on Twitter. The study has used NodeXL, a network analysis tool, to sample messages containing a specific keyword regarding vaccination (e.g. vaccine, vaccination, immunization). After extracting tweets that were not in the English language, the researchers were left with 6.827 tweets that were posted from January 8 to January 14, 2012. These tweets were arranged by the activity level (responses from other users, reposting, marked as favorite or other sharing) within a network to establish the “most popular and most socially influential messages” (568). The messages that did not receive any form of engagement were removed from the analysis, since the researches did find that these tweets did not classify as influential and/or did not spread any information on Twitter or through other linked platforms. After this final extraction, the sample contained 2.580 tweets. These were then coded for a number of key variables:
● Frequency
● Tone toward vaccinations ● Links to sources
If the tweets contained any scientific claims, these were coded as either ‘substantiated’ or ‘unsubstantiated’ and further categorized according to the Centers for Disease Control’s ‘Pink Book’ resource (568). The study showed that “no particular subject, source, or user dominated the conversation” (570). The messages that were most popular contained the following topics:
● The development of a NeuVax E-75 vaccine (2.5%) ● The effectiveness of the herpes vaccine for women (1.5%)
● The recommendation of the human papillomavirus vaccination from the Center for Disease Control (1.5%)
● The possible approval for the lung cancer vaccine (<1%)
● Sharing blog posts containing content that discredited the autism and vaccination relation (<1%).
The accounts that posted most frequently, often included hyperlinks to various websites. There was a total of 341 links, which forwarded to :
● Health websites (16%) ● National media (13%) ● Medical organizations (12%) ● Digital news institutions (10%) ● Alternative therapies (5%)
The tweets containing professional sources were treated either neutrally of positively, which indicates that “individuals employ Twitter to share knowledge useful to other conversations” (570). They found that in the initial 2.580 tweets, 33% had a positive position towards vaccination, 54% had a neutral tone and that 13% were negative (570).
Another study performed by Yuan and Crooks investigated the patterns of communication between anti vaccine users and pro vaccine users on Twitter during the national measles outbreak in 2015 in the United States. They conducted this research by studying a retweet network, which was made up from 660.892 tweets. These tweets were all related to the MMR vaccine and came from 269.623 users. They classified these users in three different groups: 1. Anti-vaccination, 2. Neutral to vaccination and 3. Pro- vaccination groups. By combining opinion groups and a structural community detection, the researchers found that anti- and pro-vaccine users mainly retweet from their own opinion groups. The users that were classified as having neutral opinions were found to be more distributed across communities. The research also concluded that the anti-vaccination Twitter users were “highly clustered” and acted in “enclosed communities” (197). This makes it a difficult task for health organizations and/or governments to penetrate these spaces with scientific evidence and awareness campaigns about the importance of vaccination (204).
2.3.2. Facebook
It is 2019, and under the pressure of lawmakers and while the United States is in the midst of a national spread measles outbreak, Facebook has stated it is working on preventing “anti-vaccination content from being algorithmically recommended” (O’Donovan & McDonald, n.pag.). In 2015, Zuckerberg wrote in a Facebook post that “vaccination is an important and timely topic. The science is completely clear: vaccinations work and are important for the health of everyone in our community” (Wong, n.pag.). However, when Facebook users are seeking information on vaccination on Facebook, they might be pulled towards anti-vaccination propaganda with no scientific evidence.
In a research conducted by the Guardian, it was discovered that the search results Facebook users were presented with often contained pages and groups that were full of anti-vaccination conspiracy theories, despite labeling by Facebook of fake news as a “real-world harm” (Wong, n.pag.). On the first page with ‘vaccination’ search results, there were eight anti-vaccination pages (which had twelve in total). In addition to this, the autofill function suggested search terms that could lead users towards anti-vaccination content. The suggestions for groups showed “vaccination re-education discussion forum”, “vaccine re-education”, “vaccine truth movement” and “vaccine resistance movement” (n.pag.) after just typing ‘vaccination’. Each of these groups had over a 140.000 members. The Guardian also found that Facebook accepts advertising income from anti-vaccination groups, which boost the spreading of fake news even more. This information was retrieved from their advertising archive. These advertisements contained false statements (e.g. “vaccines kill babies” (n.pag)) and were exclusively aimed at women. Allowing anti-vaccination groups and content on the platform itself might still be attributed to free speech and is also not in violation with Facebook’s content rules (n.pag), but accepting advertisements and allowing them to be specifically targeted towards women, who are most likely mothers on a quest for vaccination information for their babies, is deliberately influencing users through fake news.
A further mapping of the anti-vaxx movement on Facebook was done in an article from Smith and Graham. The researchers examined the exact structure and nature of the discourse around the anti-vaccination Facebook pages in Australia. By doing this, they aimed to provide meaningful insights on how these communities operate and how these attitudes could possibly be countered (1-2). Six public pages were chosen for further examination: Fans of the AVN, Dr.
Tenpenny on vaccines, Great mothers (and other) questioning vaccines, No vaccines Australia, Age of autism and RAGE against the vaccines (3-4). These pages were purposively sampled by triangulating different data between Australia and North America. Sampling in this manner helped the researchers identify the groups that were most relevant to the anti-vaccination community and produced important insights into this movement. The pages had to meet the following criteria:
1. Explicit anti-vaccination focus
2. Widely available to the greater public
3. Easily found through querying keywords (e.g. “vaccine concern”, “vaccine choice”, “vaccines and autism” and “anti-vaccination” (4)
4. Updated every (other) day 5. Minimum of 1000 likes
6. Show proof of user interaction (e.g. at least 10 likes on posts, comments and sharing) (4)
The data from these pages over a time period of three years was collected through the Facebook API and “the Social MediaLab package for the R programming language” (5). The data set consisted of posts, comments and likes. This data scraping resulted in “14.736 unique posts, 242.813 unique users, over 2.5 million likes and 291.520 comments” (5) and these posts were shared more than 2 million times. This dataset was analysed through a range of different methods: “social network analysis, gender prediction (...) and generative statistical models for topic analysis” (1). Among the chosen groups, there were both so-called ‘community pages’ and pages that were run by an important public figure in the anti-vaxx community. The pages from the latter often had more likes. However, despite the high level of engagement, the anti-vaccination community was found to be relatively loose and sparse. This community does not show a lot of features of “close-knit communities of support with participants interacting with each other in a sustained way over time” (14). However, this does not imply that these communities are not potentially dangerous. As the researchers put it: “simply participating in a community of like-minded others may reinforce and cement anti-vaccination beliefs” (14), hence creating filter bubbles. The textual analysis and topic modelling showed that there is a “prevalence of conspiracy-style thinking”, which is not deemed surprising (15). This research has provided proof for the filter bubble, and addresses the threats these bubbles pose regarding misinformation. Once someone comes across such anti-vaccination content, these attitudes will be reinforced.