• No results found

Prism and Protest: A Network Analysis of Twitter

N/A
N/A
Protected

Academic year: 2021

Share "Prism and Protest: A Network Analysis of Twitter"

Copied!
72
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Keywords

Twitter, PRISM, NSA, protest, The Day We Fight Back, Discourse, Network Analysis

1. Introduction

One of the first revelations of the Edward Snowden leaks was the existence of a program run by the United States National Security Agency (NSA) codenamed ‘PRISM,’ formally known as SIGAD US-984XN.1 PRISM is designed to collect and capture information from major technology firms through the use of secret orders approved by the Foreign Intelligence Surveillance Act Review Board2. While the full extent of the data-collection efforts is still not known, what is known is that multinational corporations such as Google, Apple, and Microsoft are targeted and in the past were often complicit in these surveillance

activities. The Snowden leaks continue to be reviewed by journalists, and though there were numerous other programs and possible abuses

revealed, information surrounding PRISM was the catalyst for

international discussion about the surveillance reach of the United States and allied countries. This discussion often took place and spread through social media, with a particularly robust use of Twitter to share

information, organize around hashtags, and directly address political leaders and members of the intelligence community. As the story was originally broken by Glenn Greenwald of The Guardian newspaper3, much of the information shared included numerous links to news agencies that provided a steady stream of new revelations and in-depth articles. Due to this multifaceted use Twitter can be seen as both a news gathering and dissemination engine in addition to its use as a platform for

1 Chappell, Bill. "NSA Reportedly Mines Servers Of U.S. Internet Firms For Data."NPR. NPR, 6 June 2013. Web.

<http://www.npr.org/blogs/thetwo- way/2013/06/06/189321612/NSA-reportedly-mines-servers-of-u-s-Internet-firms-for-data>.

2 Weiner, Eric. "The Foreign Service Intelligence Act: A Primer." NPR. NPR, 18 Oct. 2013. Web. 21 June 2014. <http://www.npr.org/templates/story/story.php?

storyId=15419879>.

3 Ball, James. “NSA collects millions of text messages daily in 'untargeted' global sweep.” The Guardian 16 January 2014. Web.

<http://www.theGuardian.com/world/2014/jan/16/NSA-collects-millions-text-messages-daily-untargeted-global-sweep>

(2)

communication in general. Theories surrounding this concept and others, as well as related findings, will be reviewed throughout this thesis

After months of continued release of classified information, public attitudes ranging from disappointment to outrage grew more and more into organized dissent, eventually leading to the global The Day We Fight Back protests. This action was planned to take place both online and offline on February 11 2014. It attempted to bridge the divide between mainstream advocacy and corporate groups and more alternative

activists such as members of Anonymous and other resistance affiliated organizations. This story, one of media both new and old, provides a perfect opportunity to apply digital methods, a theory of research which will be expanded on further. For the purpose of deeper inquiry both into how findings generated via these methods can be related to those offline, and also how they can be applied in a natively digital context, various sources will be reviewed (Rogers 1). One goal of this thesis is to show that digital methods are indeed a formidable way to study modern global events in both online and offline contexts.

Grounded within platform studies, a method of specifically studying online websites, tools, and networks, this thesis seeks to investigate the evolution of community discourses on Twitter over time surrounding the United States National Security Agency’s PRISM program and

subsequent The Day We Fight Back protests. Various network analysis methods will be used to explore more closely the formation of these networks, specifically focusing on co-mention, co-hashtag, hashtag-user, and hashtag-host relationships. For each of these relationships,

visualizations will be produced which will then be analyzed in order to discover meaningful findings within the data. These findings will include summaries and conclusions relating to various Twitter statistics and the network graphs. The more in-depth explanation of these analytical choices will be discussed in later sections.

(3)

Tarleton Gillespie defines platforms as “content-hosting intermediaries,” exemplified by Facebook, YouTube, and Twitter (Gillespie 350). With over 200 million monthly active users, and points in time where tweet rates reach into the thousands per second4, Twitter is one of the most active platforms for people around the world to share information. Danah boyd and Nicole Ellison expand on Gillespie’s concept of platforms by focusing on social media tools as “social network sites as web-based services that allow individuals to construct a public or semi-public profile within a bounded system, articulate a list of other users with whom they share a connection, and view and traverse their list of connections and those made by others within the system” (boyd and Ellison 211). Using certain digital methods, and expanding on boyd and Ellison’s concepts of digital connections, this research will enhance our understanding of how social media is used in response to major news events. As more findings are discovered, the power of digital methods will only increase; a goal which this thesis will help achieve. This research will add to existing methods, and in a sense create a specific global news related method by combining existing forms of analysis into a system that can be applied to any event where there is sufficient activity on Twitter.

Utilizing the Digital Methods Initiative Twitter Capture and Analysis Tool’s (TCAT) ‘PRISM’5 and ‘The Day We Fight Back’6 datasets, five network-based approaches and generated visualizations will be used to analyze the data. For the PRISM dataset, four days have been chosen for deeper review: June 6, October 22, December 17, and January 17. The Day We Fight Back dataset, which was considerably smaller, had two days selected: February 6 and February 11. The selection process for 4 During the announcement of Osama Bin Laden’s death, Twitter recorded its highest tweet rate

of 5,106 tweets per second. Source: <http://techcrunch.com/2011/05/02/bin-

ladenannouncement-Twitter-traffic-spikes-higher-than-the-super-bowl>

5 DMI Twitter Capturing and Analysis Toolset. Source: https://tools.digitalmethods.net/ coword/Twitter/analysis/index.php?dataset=PRISMs (Accessed 28 February 2014) 6 DMI Twitter Capturing and Analysis Toolset. Source: https://tools.digitalmethods.net/ coword/Twitter/analysis/index.php?dataset=The Day We Fight Back (Accessed 28 February 2014)

(4)

these days is discussed further in the methods section of this thesis. For each of the selected days a consistent system of review will be applied. First, a table of the top ten most active users by degree will be created. Second, a social graph by mention will be created and reviewed

(including a version stylized with Eigenvector centrality). Third, a co-hashtag graph will be analyzed. Chosen days will also be studied further by using hashtag-user and hashtag-host bipartite graphs. The data

collected by the DMI’s Twitter Capture and Analysis Tool included

2,525,154 tweets for the PRISM data and 160,222 for The Day We Fight Back. While the large number of tweets may initially seem to complicate the analysis, using specific methods of filtering, manipulation, and

visualization can effectively parse the information into a state where it can be analyzed more easily. In the age of big data, large datasets like this actually lend themselves to more consequential review, as many network related findings are strengthened by more actors, as this

increases the accuracy of calculations such as link weight or degree. The research questions posed below will be answered through the use of a wide range of tools, methods, visualizations, and forms of analysis. It is an exciting time when research can take place in near-real time; while an issue is still being discussed and the online communities are still

evolving. With the use of tools like TCAT, data capture can be conducted while the issue is still fresh, which allows for analysis immediately, which can in-turn inform changes to what is being captured. This cycle works to improve the overall quality of the information, and the resulting findings.

(5)

2.1. How have Twitter discussions and communities formed and evolved surrounding the global debate about the Edward Snowden leaks, specifically the NSA’s PRISM program?

2.2. How have Twitter discussions and communities formed and evolved surrounding “The Day We Fight Back” global protests?

2.3. After applying various analytic tools such as clustering, degree, and centrality algorithms to co-mention, co-hashtag, and bipartite graphs, can the structure of these online discussions be shown to correlate with on-the-ground events?

(6)

3. Object of Study and Practices

3.1. Twitter as an Object of Study

Information exchange has long been the keystone of a well-structured society, and the digital networked era that we live in today greatly transforms the ways in which people communicate and interact. It has given people the ability to create, share, and exchange content and ideas across virtual communities, breaking down many traditional barriers to communication such as location. Twitter, the object of study in this thesis, is a social media and microblogging platform that allows its 232 million monthly active user to share information described by the

company as “What’s happening?” (Lunden). This information is shared in the form of short texts known as tweets, limited to 140 characters. It was founded in 2006 by Jack Dorsey in San Francisco, and initially launched as an “urban lifestyle tool for friends to provide each other with updates of their whereabouts and activities” (Rogers 1). With approximately 500 million tweets sent every day by its 100 million active daily users, a key factor behind the popularity of the platform is its ease of use. The site strives to “provide a light-weight, easy form of communication that enables users to broadcast and share information about their activities, opinions and status” (Java, Finin, Song, and Tseng 1). Twitter users can also choose to have their profiles available to the public or only to their contacts. Java et al. studied the intentions of active Twitter accounts at a community level, and observed that these users often join communities that share similar interests. Moreover, users may have different reasons for participating and engaging in these communities. The authors explain this further by writing that, “while some act as information providers, others are merely looking for new and interesting information” (6). In comparison to Facebook status updates, Twitter’s strict character limit often produces short and precise responses that shape the way in which the medium is used. The popularity and attention social media sites receive is undeniable according to a Pew Research Center report, which

(7)

ranks Facebook in first place according to traffic figures, attracting 71% of all adult Internet users, whereas Twitter receives 16%, Pinterest 15%, and Instagram 13%. As figure 1 below highlights,

Figure 1 Twitter’s Top 5 global markets of active users as of October 2013. Source: http://

mashable.com/2013/11/20/Twitter-users-countries (Accessed May 24 2014)

the plurality of its users is based in the U.S. with the remaining 75.7% of users spread across the rest of the world, most notably in Japan,

Indonesia, the United Kingdom, Brazil, Spain and Saudi Arabia (Fox via PeerReach). This is important as it demonstrates the ubiquitous nature of social media, which means there are wide varieties of ideas, beliefs, and topics of discussion. With so many users, it is important to look into how the site is actually used, beyond what a cursory review would

demonstrate. As Richard Rogers notes, early Twitter studies focused around the platform’s banal and shallow communicative practices (1). This research will go beyond this perceived banal nature of Twitter and study how the platform is used for serious political and social discourse. These platforms have become increasingly important as the Internet

(8)

moves away from Web 1.0 concepts to newer Web 2.0 theories and tools. Web 2.0 approaches are particularly useful for allowing users to produce and share data together rather than the traditional consumer/observer of media (O’Reilly 1-2). This shift has various and wide-ranging effects. For example, Esteve and Borge write that these news mediums allow for more rapid and meaningful communication between citizens and their governments and related political parties (2-3). However, the same attributes of Twitter that allow for this also facilitate a certain loss of control over online perceptions and dissemination of information. The open nature of social media creates an atmosphere where users can follow, retweet, and participate with groups whether or not they have any real personal connection or obligation to the said group (Scarrow 15). It is these new forms of interaction that make sites like Twitter so

interesting to study and analyze, as they provide a wealth of data surrounding these digital relationships.

One of the first organizations to conduct an analysis of Twitter was the marketing firm Pear Analytics, which at the time determined the meaning of tweets to be of little interest and claimed that 40% of them were

“pointless babble” (Rogers 4). Other scholars such as Java et al. have considered most tweets “daily chatter,” illustrated with what has become known euphemistically as ‘food tweets’ (Java et al. 2). These studies investigate and highlight Twitter’s use by people to socially connect through small talk, or what Malinowski describes as ‘phatic communion’ (10). Moreover, others have argued that social media platforms like Twitter should be studied as spaces of ‘networked sociality,’ where neither the dialogue nor the information exchange is the main focus of critique (Miller, Gillespie, and Wittel). The shift to using Twitter as an object of study for the following of news events parallels the platform’s tagline change in 2009 from “What are you doing” to “What’s

happening?” (Rogers 1). For David Crystal, this signified “a move from an ego to a reporting machine” (4) and it was met with a change in focus from “me-tweets”, to what Naaman et al. have classified as ‘information

(9)

sharing’ tweets (Naaman, Boase, and Lai 2010). During the same year, Twitter introduced a ‘Trending Topics’ section to users as an attempt to identify “the hottest emerging topics of discussion” (Twitter Help Center 2014).

Biz Stone, a Twitter co-founder, characterized the platform as a

“discovery engine for finding out what is happening right now” (Twitter Help Center 2014). ‘Trending Topics’ lends itself to the user furnished ‘#hashtag,’ which acts as a method to connect tweets to specific users, groups, larger themes, and specific events. Using the hashtag as an identifier and effectively sorting the tweets into similar subgroups allows users to easily search for Tweets that interest them. Similarly, tweets can also be directed to a specific individual(s), and it is through this

‘@mention’ function that public Twitter-based conversation occurs. Scholars have noted Twitter’s capacity to act as a tool for following, reporting and, at times, breaking news events (Arceneaux and Weiss 2010). The platform received significant attention in its use to

disseminate information during the 2008 Mumbai Attacks (Dolnick), the January 2009 Hudson River landing of a US Airways flight (Beaumont), and the death of Osama Bin Laden in 2011 (Mengdie et al.). In addition, others have debated viewing and studying Twitter as a revolutionary tool, particularly when referencing to the 2009 Iranian protests and Arab Spring (Bruns, Liang, Morozov, and Shirky). During event related

research, the hashtag included in a tweet becomes the means to follow the action, and also one to categorize a set of tweets in order to study an event online and monitor the related events on the ground (Rogers 5). More recently, Twitter has become studied as an archived object. Whereas “Twitter Studies II” refers to the study of user accounts for event-following purposes, “Twitter III” is studied as raw data, providing an opportunity to apply new and more advanced methods (7). Raw data, in contrast to past content and sentiment analysis, facilitates more

(10)

as interconnectivity and clustering information derived from @mentions. The relative openness and ease of data collection on Twitter has

presented researchers from various fields with attractive datasets for analysis, with lines of inquiry drawing from the built in features of

retweets for significant posts, hashtags for subject matter categorization, @mentions/replies, as well as following/followers, and shortened URLs. The strict 140 character limit and relative size similarity in tweet

collections has also made way for facilitated textual review such as co-word analysis, themes this research will use as well (7).

While Twitter research is conducted more often within the field of digital humanities, it is important to be aware of the implications and

restrictions associated with this new type of data. As Lev Manovich points out, only the social media companies themselves have access to large social datasets (Manovich 5). Even though researchers can access a considerable amount of data through Application Programming

Interfaces (API’s), only part of the data can be collected through these APIs. For instance, an important limitation of the Twitter streaming API is the lack of historical data. As danah boyd and Kate Crawford point out, many social media platforms offer poor archiving and search functions (boyd and Crawford 4). Therefore, researchers are more likely to focus on topics of study taking place in the present or immediate past due to the difficulty, or impossibility, of accessing older data.

For Twitter, the Search API only generates results from the past six to nine days, and with the Streaming API only new tweets can be captured in near real-time. Currently, one of the only ways to access a collection of older tweets is through commercial social data resellers such as DataSift or GNIP (Wagner), which can charge up to $10,000 a month for a topic sample of around 100,000 Tweets (GNIP Pricing Page). Furthermore, Twitter only makes a fraction of its real-time material available through the APIs. The data from the API consists of the so called “spritzer” which contains roughly 1% of public tweets (boyd and Crawford 7). Only a

(11)

handful of companies and startups have access to larger samples such as the ‘gardenhose’ API, which accounts for roughly 10% of all public

tweets, or the full ‘firehose’ API, which contains all tweets (boyd and Crawford 7). Twitter is not totally transparent about the inner workings of its APIs and thus it is not exactly clear how the samples are created. Given this uncertainty, it is difficult for researchers to make claims about the specific quality of the data that they are analyzing (Gerlitz and Rieder 6-8). Nevertheless, Gerlitz and Rieder argue that in the absence of access to a full sample, a random sample provided through the Streaming API can serve as suitable source of data (13). Despite these limitations, there is still a wealth of social media information specifically surrounding Twitter that researchers can analyze.

Social media has come to play a notable role in shaping political discourse in the US and other parts of the world (O’Connor et al., Conover et al., Tumasjan et al.). In a study on the use of

Twitter surrounding the 2009 German elections, Tumasjan et al. show that Twitter is extensively

used for the dissemination of politically relevant information (Tumasjan, Spenger, Sander, and Welpe). After analyzing approximately 100,000 tweets which contain a reference to either a political party or a politician, they suggest that political tweets mirror the political landscape of the offline, and that the online can be used to predict election results to a certain extent. This thesis will aim to show this same mirroring takes place for the online discussion surrounding PRISM and The Day We Fight Back. According to the Pew Research Centre, almost three quarters (72%) of online US adults currently use social networking platforms such as Facebook and Myspace, and 18% of them use Twitter (Smith and

Brenner 2). When reviewing individual political positions, the Pew

Research Centre found voters were equally likely to use online platforms to engage with politics regardless of affiliation (among total Internet users, 58% of Democrats and 54% of Republicans do so). For example, this holds true for both supporters and detractors of the Tea Party

(12)

movement (7). In their 2010 study of social media communication during the US midterm elections, Conover et al. demonstrated an existing

polarization on Twitter. Through analyzing a network of political retweets, they demonstrated that politically active members are

organized into homogenous communities and segregated along partisan lines (1).

It is not only important to be aware of what kind of information is shared among Twitter users, but to also examine if this information has influence on these users, and in which ways? Influence can be described as the ability for something, in this case tweets, to assume power over other users of the network (Cha, Haddadi, Benevenuto, and Gummadi). As discussed earlier, one of the main channels people use to interact with communities on Twitter is through the use of @mentions (a subset of which is @replies). Through strategic use of these @mentions, certain users can gain prominence within their community and the wider Twitter user base, with some accounts attracting millions of followers (Cha et al., 2010). Research has demonstrated that groups with highly

interconnected @mentions often share common interests and follow many of the same news sources (Weng, Lim, Jiang, and He). Kwak et al. (Kawk, Lee, Park, and Moon 2) refer to this in their article as homophily: the concept that people with similar interests have contact at a higher rate than people who don’t have similar interests (McPherson, Smith-Lovin, and Cook). As has been shown, this trend of forming insulated groups creates an environment where little is shared between

communities, and when this sharing does occur, it is often highly

polarized. It is useful here to mention the issues that arise as one realizes that social media platforms like Twitter are not simply benevolent

organizations, but rather corporate entities looking to monetize the information collected (Langlois, McKelvey, Elmer, and Werbin 2). These inherent methods of control via the profit-driven approach to social media have powerful effects on the platforms and how people use them. Advertising, for example, informs much of the design, user interface, and

(13)

user experience in order to maximize revenue. While forces like this may seem minor at first, they accumulate to have real impact on how sites like Twitter are used. This juxtaposition within digital objects is known as ‘double articulation,’ which is the process by which the sharing of information on one level or part of a platform creates new and ever changing articulations, from the technological to the sociological (5). Concepts such as double articulation demonstrate the need for digital methods researchers, particularly in regard to social media, to always have a critical approach and realize where hidden inconsistencies and power structures may exist. This thesis aims to use Twitter as an object of study, specifically focused on the discussions surrounding the news of Edward Snowden's leaks, and the revelation of the PRISM program in particular, as well at The Day We Fight Back protests in response to these leaks. In order to study these topics accurately, Twitter must first be examined in the context of a news gathering and dissemination platform. The section below will review relevant literature and case studies

explaining Twitter’s usefulness as a place where news is reported, spread, and discussed.

3.2. Twitter as a News Gathering and Dissemination Platform

In recent years, there has been considerable study of Twitter as a news gathering, dissemination, and discussion platform. Most of this research focuses on Twitter usage during crises. While the PRISM leaks may not be exactly similar to on-the-ground events such as the Haitian

earthquake or the 2011 London riots, they provide an excellent example of a mainly media-based crisis, one based around issues of public

relations, rumors, and concerned citizens around the world. While there are certainly differences between a physical crisis being discussed on Twitter and a generally online/media crisis, it is useful to review certain case studies to help demonstrate that Twitter is in fact used for this purpose, and is quite successful. However, this is not to say that social media is always accurate in its portrayal of events. This is important due

(14)

to the circumstances and troves of information surrounding the NSA leaks. On social media all news travels faster, for better or for worse. The Westgate Mall attack in Kenya demonstrated just how rapidly and widely rumor-based tweets travelled throughout the network. These rumors were disseminated various actors, from individuals interested in the events to the main Kenyan police force account (Mohammed).

This section covers how the news spread on Twitter, as well as how a protest was organized on the same platform. These forms of community evolution and specification can be described as one type of the online public sphere. Zizi Papacharissi makes a distinction between a public space and a public sphere. While a public space “could facilitate

discussion that promotes a democratic exchange of ideas and opinions. A virtual space enhances discussion; a virtual sphere enhances democracy” (Papacharissi 11). Angels Adams Parham writes that “expression is

central to the very idea of a public...participation occurs by expressing one’s ideas, concerns, and interests within a community of others who have some aspect of life in common” (Parham 202). For the PRISM case study, many felt their rights were being violated. While there was some discussion in support of the NSA, the majority of the community rallied around issues of privacy and government overreach. However, Parham also raises issues with defining these online groups as public spheres in and of themselves, stating that the formation of social media

communities is not enough to create a true public sphere. In order for a public sphere to be effective, it must include “the combination of such spaces [of expression] with sustainable networks of individuals” (203). Beyond just connecting networks, these spheres must “allow participants to leverage their multiple locales, skills and resources for the benefit of individual users and the community as a whole” (203).

This combination of locales, skills, and resources exists and is demonstrated through the discussions surrounding PRISM and the related protests. The formation of these ‘ad hoc publics’ demonstrate

(15)

how robust online interaction can be a catalyst on-the-ground response (Bruns and Burgess). The concept of ad hoc publics is particularly interesting in the context of mainstream media generated conversation and citizen based responses. In contrast to traditionally formed

communities, often in a ‘post hoc’7 manner, platforms like Twitter allow for the real-time communication and organization of people in response to “emerging issues and acute events” (7). However, there is a difference between the creation of ‘issue publics,’ ‘ad hoc publics,’ and their

numerous variations. The one theme that ties these terms together is the dynamic nature in which they are formed. This provides researchers with a plethora of data that can be used to analyze community evolution, from determining the early movers in a conversation, to which news agencies are the most retweeted, and which accounts are the most influential in the online network.

Twitter being both a news gathering and dissemination source increases the difficultly in defining the platform. Kwak et al., for example, writes that Twitter could be both a social media platform and news media agency at the same time (Hermida). In many ways Twitter combines aspects of both of these sectors, with its rapid style and more informal tone coinciding with the broadcasting of powerful and serious news information. Twitter is also an interesting case study as it is a tool that not only re-broadcasts stories written by mainstream news agencies, but also allows users to exchange firsthand accounts of events. In fact,

research has shown that it is about a 50/50 split between Twitter breaking a news story compared to a larger group such as CNN

(Hermida). Twitter is particularly advanced when it came to providing consistently updated information, especially during events with

numerous moving parts and various actions and events taking place at the same time. While it would seem that these two methods of news dissemination would be at odds, this is no longer the case.

(16)

Recently, even traditional news sources have begun partnering with Twitter, seeing the value in the firehose of information. In early 2014, a partnership was created with CNN and the New York startup Dataminr. The initiative was designed to help journalists cover breaking news by making sense of the flood of public information on Twitter (“Announcing Dataminr for News” 2014). Dataminr uses machine learning algorithms to analyze the Twitter API and highlight the needle in the haystack so CNN reporters can find the most important, relevant, and reliable facts and images from around the world. News groups like CNN have put Twitter front and center in their news gathering as far back as 2009, when a Twitpic of passengers on the wing of a crashed plane floating in the Hudson became the iconic image that helped define the story8. As another example, CNN first learned about a shooting at a mall in Maryland through Dataminr, which had been deployed in-house for several months, and picked up on a tweet from a first responder on the scene. The alert helped CNN be one of the first on the story (Chariton). All these examples provide context to how Twitter was used surrounding news of the largest leak of classified information ever was reported by

The Guardian in June of 2013. Section four will discuss both the leak

itself and the protests in response.

8 Krums, Janis. "There's a Plane in the Hudson. I'm on the Ferry Going to Pick up the People. Crazy. - via @jkrums." TwitPic. N.p., 15 Jan. 2009. Web. <http://twitpic.com/135xa>.

(17)

4. Focus of Inquiry 4.1. The Leak

Considered one of the largest releases of classified information in history, Edward Snowden’s revelations about the extent to which the United States National Security Agency (NSA) spied on people around the world put a spotlight on the extent of modern surveillance (Greenwald).

Snowden began the process of collecting and storing vast amounts of classified documents during his time as a subcontractor for the NSA in Hawaii. In May of 2013, he flew to Hong Kong to meet with journalist Glenn Greenwald and documentary film maker Laura Poitras. In the lead up to this meeting, Snowden went to great measures to protect his

identity by communicating through encrypted emails, Tor, and other anonymity systems. He went by the codename ‘Verax,’ and specifically asked to never be quoted at length due to the possible use of stylometry9 to determine his identity (Gellman). After The Guardian published the first story based on the documents on June 5 2014, Snowden was already en route to South America via Russia. However, the United States

9 From Wikipedia: Stylometry is the application of the study of linguistic style, usually to written language, but it has successfully been applied to music and to fine-art paintings as well. Stylometry is often used to

(18)

government revoked his passport after determining his identity, leaving him in a state of limbo at the Moscow airport, where he was stuck for 39 days (Owen and Gabbatt). After arduous diplomatic wrangling, Russia eventually offered Snowden asylum for one year so he could remain in the country until a longer term plan about his future could be developed. While the final destination of his trip was never revealed; the assumption was he was travelling to a friendly South American nation, possibly

Ecuador (Dangl).

The first program to be revealed, and the topical focus of this thesis, is PRISM, internally known as SIGAD US-984XN (Chappel). In the months since the initial release, many more programs and covert activities have been described. These include Tempora, Boundless Informant,

XKEYSCORE, Muscular, and numerous others. As mentioned earlier, the focus of the data collected for this research surrounds the PRISM

program in particular (though many other topics are covered because of the hashtags used to source the tweets). The PRISM program collects stored Internet communications based on demands made to Internet companies such as Google, Apple, Yahoo, AOL, and others under Section 702 of the FISA Amendments Act of 2008 (Gellman and Poitras). This act forces any company served to turn over any data that matches court-approved search terms. While the existence of the Foreign Intelligence Surveillance Act Court (FISA) has been known for many years, the release of PRISM information showed just how often the court

capitulated to the NSA (and other agencies) requests, having denied only 11 requests out of 33,949 since 1979, and none since 2010 (Foreign Intelligence Surveillance Act Court Orders 1979-2014). Documents indicate that PRISM is "the number one source of raw intelligence used for NSA analytic reports," and that it “accounts for 91% of the NSA's Internet traffic acquired under FISA section 702 authority” (Gellman and Poitras).

(19)

As more and more information became available, citizens and

governments of the world spoke out against what they often viewed as an overreach of the NSA and other allied intelligence services. This growing discontent may have predicted the eventual protests. According to Wael Salah Fahmi, the development of Internet technology has influenced citizen mobilization. They go on to describe how “the age of

communications and its associated transnational public spheres has witnessed the emergence of new social movements represented through loosely organized open networks…Flows of people, information, images, easily cross borders with a greater degree of flexibility than ever before” (Fahmi 89). Germany, France, Italy, and other western powers were all under specific surveillance by the NSA, from Angela Merkel’s (President of Germany) phone being tapped, to members of the French parliament having all of their emails collected and analyzed (Poitras, Rosenbach, and Starck). In order to see how mobilization can be influenced by social media, the framing of these events must be investigated. One definition explains that “At the most basic level, frames are organizing principles that are socially shared and persistent over time, that work symbolically and to meaningfully structure the world. Framing recognizes the ability of a text to define a situation, to define the issues, and to set the term of the debate” (Reese and Lewis 777). Framing is both active and passive, as some agencies purposefully attempt to influence the debate, while other concepts and feelings come about naturally as more people join the conversation and begin to agree on certain terms, ideas, and sentiments. In the subsequent months, both government officials and citizens in general around the world started speaking out with stronger and

stronger rhetoric, with people calling for both online and offline protests of what was viewed as a violation of privacy and civil rights.

It is important to note that the US government defended its actions vigorously. US officials have disputed some aspects of The

Guardian and Washington Post stories and have defended the programs

(20)

that it has helped to prevent acts of terrorism, and that it receives

independent oversight (Gorham). This oversight is meant to be conducted by the executive, legislative, and judicial branches though evidence of the creation of the program shows that little to no oversight was taking place, and when concerns were raised they often were ignored by top management at the NSA (United States of Secrets 2014). On June 19 2013, US President Barack Obama, during a visit to Germany, stated that the NSA's data gathering practices constitute "a circumscribed, narrow system directed at us being able to protect our people" (Connolly). One of the public responses to statements like these came in the form of The Day We Fight Back protests, an organization that began online but quickly spread to include on-the-ground events. This drew numerous participants, and included groups from the EFF and ACLU to more radical Anonymous affiliated and cyber-anarchist members.

4.2. The Response

One of the first organized responses to the Snowden leaks was The Day We Fight Back project; a one day event to publically protest the NSA surveillance programs (Gross). The protest was a collaboration between numerous advocacy groups focused on online privacy and civil rights more broadly. This event was fairly original as it was a ‘digital protest,’ with over 6000 websites participating (Brown). With the origination and implementation of this protest being online, it provides a unique case study, allowing for expansion on papers such as Learning from the

Crowd: Collaborative Filtering Techniques for Identifying On-the-Ground Twitterers during Mass Disruptions, by Starbird et al. In this paper,

Starbird and her co-authors describes how Twitter is used during mass

disruption events, which she describes as “an event affecting a large

number of people that causes disruption to normal social routines. Examples of mass disruption events include natural disasters, acts of terrorism, mass emergencies, extreme weather events and political protests” (Starbird, Muzny, and Palen 2). The example in Starbirds’s article is the Occupy Wall Street protests, describing how Twitter was

(21)

used in the group’s organization and execution. This thesis will expand on this concept, studying what will be referred to as a digital mass

disruption event. Social media can provide numerous advantages for

people in their communication with each other. By spreading their messages electronically, opinions are shared quicker and cheaper than with, for example, more conventional techniques like the distribution of leaflets and posters. While these moves toward digital disruption

continue to increase, there are questions that must be raised in regard to the power of social media to create change and the ability of these tools to truly challenge the power of the state.

A history of this protest is needed to provide further context to the goals and inspirations behind the movement. The Day We Fight Back was intended as a day of "worldwide solidarity" in protest against NSA

surveillance. The action was jointly a show of support in the aftermath of the suicide of Aaron Swartz, a young computer programmer and activist for Internet rights, as well as a collective response to the continued release of information about surveillance programs. In the US, a main goal of the protest was to encourage passage of the USA Freedom Act10, a bill that seeks to limit the power of the government when it comes to telephone data collection (Brown). Additionally, the official website,

images, and videos urged people to call Congress and voice opposition to the FISA Improvements Act, which the ACLU called "a dream come true for the NSA" that would "codify the NSA's unconstitutional call-records program and allow bulk collection of location data from mobile phone users” (Wagstaff). After extensive organization, the date of February 11 2014 was chosen for the protest. Throughout the course of the day, #StopTheNSA trended, along with the hashtag for the protest itself. In the methods and findings sections of this thesis, the actual Twitter activity and community formation will be explored in much more detail, but it is interesting to note that media sources used trending hashtags as an indicator of success in organizing and reaching people. Bruns and

(22)

Burgess explain the usefulness of this data when writing, “The dynamic nature of conversations within hashtag communities provides fascinating insights into the inner workings of such ad hoc issue publics. This

dynamic nature enables researchers to trace the various roles played by individual participants (for example as information sources, community leaders, commenters, conversationalists, or lurkers)” (7).

To refer back to the earlier discussion of ad hoc publics, The Day We Fight Back is a great example of how communities can form not only in an ad hoc nature, but through a spectrum of timings and methods. There were already groups that found solidarity in countering surveillance, and they were eager to join the new community being formed in response to the NSA. These networks could be considered pre-formed, yet looking for a cause to rally around. Ad hoc groups were created in direct response to the trending of the hashtag, drawing people who hadn’t participated before into the discussion. Then there were those who joined the

conversation after the fact, having seen the importance or interest in the topic. There was also a decidedly offline effect the generally online

protest had, with marches and other on-the-ground demonstrations and events taking place in 15 countries (Gabbatt). In the end, the protest banner (and other branding) was seen by approximately 30 million people, 84,000 tweets were sent, 550,000 emails were sent, 420,000 Facebook shares took place, and 300,000 signatures were collected (The Day We Fight Back: By The Numbers). The official website collected some of these top tweets, displaying prominently on the front page those they found powerful. Some of these important tweets included Tim

Berners-Lee writing: “Today I took a stand against mass surveillance. Will you join me? https://The Day We Fight Back.org/ #stopspying,” and US Senator Tom Udall tweeting: “Proud sponsor of #USAFreedomAct 2 protect privacy & civil liberties, reform the #NSA & #FISA court

http://1.usa.gov/1eQJpXl The Day We Fight Back,” along with hundreds of other legislators and thousands of companies (The Day We Fight Back: Notable Tweets). This thesis will explain these concepts further, mainly

(23)

studying how communities formed and who was in control of the online discourses, through the use of digital methods as well as determining if these communities match wider events and discussions. The next section focuses on an expansion of these methods and how they will be applied to the research question posed earlier.

5. Methodology

5.1. Selecting the Datasets

All of the data used for this thesis was gathered using the Digital

Methods Initiative ‘Twitter Capture and Analysis Tool’ (DMI-TCAT). The website explains that “The Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT) is a set of tools to retrieve and collect

(24)

tweets from Twitter and to analyze them in various ways. Captured datasets can be refined in different ways (search queries, exclusions, data range, etc.) and the resulting selections of tweets can be analyzed in various ways.” The captured data is outputted in standard file formats such as CSV for tabular files and GEXF for network files. For the PRISM dataset, it is important to note that any Twitter user can select arbitrary hashtags to annotate his or her tweet(s). This also allows users to

produce tweets with multiple unrelated or even contradicting hashtags in order to reach a wider audience, or to magnify their viewpoint to a

specific user group. The latter is a common practice that Conover et al. describe as “content injection” (Conover, Ratkiewicz, Francisco,

Goncalves, and Menczer 94). They demonstrate that during the US congressional midterm elections in 2010, nearly 30% of all tweets with the generally recognized left-leaning hashtag #p2 (Progressives 2.0) originated from right-leaning users (94-95). Therefore, one needs to take into account that a hashtag does not necessarily mean that the individual tweet belongs to the associated politically or opinionated viewpoint. However, when multiple hashtags are often used together they form clusters around similar viewpoints, strengthening the classification of both the wider communication as well as those of particular

sub-networks.

There are also critiques of using a dataset of this nature that must be addressed. As Borra and Rieder write, “DMI-TCAT relies on Twitter’s APIs and is therefore bound to their possibilities and limitations” (6). The tool works by linking to both the streaming11 and REST12 Twitter API’s, each providing different information that can be collected and stored. There is an inherent concern with the power Twitter has over its own information dissemination that could call into question certain aspects of the data. For example, when it comes to the 1% sample of tweets, the process Twitter uses to determine this sample is unknown (Morstatter, Pfeffer, Liu, and Carley.). As information regarding Twitter’s goals,

11 https://dev.Twitter.com/docs/api/streaming 12 https://dev.Twitter.com/docs/api

(25)

processes, and codes in relation to their API’s is scant, new media researches have to accept many of these critiques and continue the planned analysis. With such an active user base, even options like the 1% streaming API provides millions of tweets and robust metadata. The streaming API alone does not provide the capability to search or store historical tweets, allowing a dataset drawn from it to only collect

information from the time the capture begins. Borra and Rieder attempt to rectify this constraint by using the REST API as well, which allows for the collection of historical tweets within approximately a week (7). The tool also addresses other concerns through the use of data completion algorithms and code designed to increase the speed of querying the datasets.

Once a collection topic has been initiated, TCAT begins storing and analyzing the information. The first step in this thesis was determining how to study these massive collections of tweets. Thanks to TCAT’s design, it was easy to determine which days were worthy of further analysis based on tweet activity. It seemed reasonable to choose specific days and apply a consistent method to each in order to demonstrate the effectiveness of these methods. This approach will draw interesting

themes out of what is, in its raw form, a highly complex dataset. As figure 2 shows, there were numerous spikes in activity surrounding the PRISM discussion on Twitter. The four most active days were chosen for further analysis due to the level of participation and the technical restraints of attempting to analyze every day since TCAT began collecting Tweets. The same system was used for choosing which days to review in regard to The Day We Fight Back. This was decidedly easier as the protests were planned for a specific day (February 11 2014), and thus the majority of the activity took place on or around that date. As discussed in the literature review, there may be gaps in information, though this won’t necessarily be able to be detected. For the purposes of this thesis, the dataset will be assumed to be sufficient or at least the research will analyze the dataset as is, without attempting to inject other data into the

(26)

information that has been recorded. The use of numerous complimentary methods, as an analytical approach, will aid in extracting the most

interesting findings from the data and facilitate the creation of a more cohesive conclusion. Each figure below includes, along with the activity chart, more basic statistics for each day such as how many tweets were captured. After the days had been chosen based on activity, looking into wider media environment will help answer why there was such activity on these dates.

5.1.1. PRISM

The PRISM dataset included tweets containing the hashtags or keywords ‘cloud, cyberwar, Greenwald, NSA, palanteer, PRISM, Snowden, spying, wiretap.’ These keywords were chosen by the designers of TCAT when they began collection around PRISM discussion on Twitter. Figure 2 is an example of a Tweet activity chart. Each spike corresponds to a specific day. Each of the days labeled in the graph will be described in more detail below.

Fig. 2 Tweet activity data for the TCAT ‘PRISM” dataset. Source: http://tcat.digitalmethods.net/analysis/ (Accesses April 19 2014)

(27)

Fig. 3 Tweet activity data for June 10 2013. Source: http://tcat.digitalmethods.net/analysis/(Accessed April 19 2014)

June 10 was the day after Edward Snowden revealed himself as the source of the leaked NSA documents. While The Guardian published stories beginning on June 6, the day Snowden went public as the

whistleblower, June 10 was the most active day in the entire dataset. It was very important information, as people then knew exactly who released the documents. The unmasking also created a form of pseudo-entertainment, with the media creating a narrative similar to that of a fictional Jason-Bourne –esque movie surrounding his movements and plans to avoid capture by the US or other governments. Theories, from the reasonable to the outrageous, were posed by mainstream news agencies such as CNN in order to keep focus on what was essentially a man stuck at an airport. The demands of the 24 hour news cycle would seem to predicate much of this speculation. As more information became available the media began focusing on the content of the leaks rather than the drama of the whistleblowers movements.

(28)

Fig. 4 Tweet activity data for October 22 2013. Source: http://tcat.digitalmethods.net/analysis/ (accessed April 19 2014)

American pop-star Katy Perry releases her album titled Prism (“Prism” KatyPerry.com). While the majority of PRISM related discussion this day focused on Katy Perry, with targeted data filtering and manipulation, interesting pieces of information surrounding the NSA discussion, even when almost drowned out by the album release, will be discovered.

5.1.1.3. December 17 2013

Fig. 5 Tweet activity data for December 17 2013. Source:

http://tcat.digitalmethods.net/analysis/ (accessed April 19 2014)

On December 17, Snowden wrote an open letter to the people of Brazil about NSA surveillance of their country. Snowden wrote, "Until a country grants permanent political asylum, the U.S. government will continue to

(29)

interfere with my ability to speak...going so far as to force down the Presidential Plane of Evo Morales to prevent me from traveling to Latin America!" (Levs). Although not as skewing as in the October 22 dataset, Katy Perry did release a single on this day, adding some users who may not have been present if this were not the case.

5.1.1.4. January 17 2014

Fig 6. Tweet activity data for January 17 2014. Source: http://tcat.digitalmethods.net/analysis/ (Accessed April 19 2014)

A joint investigation between The Guardian and Channel 4 news reveals the NSA's bulk collection of foreign text messages, including messages between people who have not committed a crime and are not suspected of ties to terrorism. The agency reportedly stores the data - which

includes both metadata and content - for years in its DISHFIRE database, where it can be searched by a number of criteria. American data is

excluded from this specific database (Ball).

5.1.2. The Day We Fight Back

The Day We Fight Back dataset included tweets containing the hashtags or keywords ‘stopspying, StopTheNSA, The Day We Fight Back.’ As this was a one-day event, there was one very large spike in activity, with a smaller jump in the use of the keywords a few days in advance, when planning was taking place.

(30)

Fig. 7 Tweet activity data for the TCAT ‘The Day We Fight Back’ dataset. Source: http://tcat.digitalmethods.net/analysis/ (Accessed April 22 2014)

5.1.2.1. February 6 2014

Fig. 8 Tweet activity data for the TCAT ‘The Day We Fight Back’ dataset. Source: http://tcat.digitalmethods.net/analysis/ (Accessed April 22 2014)

On February 6, the official blog for the protest released a preliminary list of global events taking place in support of the movement (The Day We Fight Back Official Website). When this list was made public, the related hashtags were used to promote these events and generate further

interest for the actual protest, which would take place on February 11.

(31)

Fig. 9 Tweet activity data for February 11 2014. Source: http://tcat.digitalmethods.net/analysis/ (Accessed April 22 2014)

February 11 was the official day of the The Day We Fight Back protest (The Day We Fight Back Official Website). As mentioned earlier,

thousands of online and on-the-ground events took place in support of pro-privacy and often specifically anti-NSA viewpoints. The protest was a collaboration between numerous advocacy groups focused on online privacy and civil rights more broadly, with over 6000 websites

participating and numerous offline organizations (Brown).

(32)

Fig. 10 Screenshot of the Gephi ‘overview workspace’

All of the graphs in this thesis will be produced using the Gephi software. Gephi is an open source network analysis and visualization tool. It was designed by students at The University of Technology of Compiègne (Université de Technologie de Compiègne or UTC), and written in Java on the netbeans platform (Jacomy, Venturini, Heymann, and Bastian). Gephi is useful for analysing data from a Social Network Analysis (SNA)

perspective. SNA is a discipline that emerged out of the work of social psychologists Jacob Moreno and Kurt Lewin during the thirties and forties (Rieder 6). A Social Network Analysis interprets social units as networks; groups conformed from direct interactions rather than from classic social categories framed in socio-economic properties (6-7). Figure 10 shows the overview workspace, where much of the analytical tools, algorithms, and back-end programs are located. Gephi has been used for network analysis of Twitter for topics such as Australian

elections (Bruns 1335). Bruns used both @mention network and hashtag graphs to identify certain changes in the online discussion and

demonstrate clustering of active participants, a similar method to what this thesis will utilize in regard to the PRISM program and The Day We Fight Back protests (1323). For the purposes of the graphs used in this project, which will be discussed in more depth later, the “Force Atlas 2” algorithm was used. This algorithm is a “force directed layout: it

simulates a physical system. Nodes repulse each other (like magnets) while edges attracts the nodes they connect (like springs). These forces create a movement that converges to a balanced state” (Jacomy et al.). With options such as scaling and prevent overlap, the algorithm can be adjusted as it runs to produce the most clear graph. As with any force directed algorithm, certain problems arise, such as no one node being able to be assigned a fixed position. Rather, the nodes as placed as a direct result of its attraction/repulsion of other nodes. The designers of the tool add that “There is at least one issue with this strategy: graphs do not all always converge to the same final configuration. The result

(33)

approximations of the algorithm. The process is not deterministic, and the coordinates of each point do not reflect any specific variable.” Despite some of these restrictions, Gephi was extremely useful for

producing graphs that helped describe the Twitter discourse surrounding the PRISM program and The Day We Fight Back protests. Each type of graph will be described in more depth below.

5.3. Tweet Statistics Method

In order to provide context to later findings, basic information about the PRISM and The Day We Fight Back datasets was collected using TCAT. This information contains the number of tweets, number of tweets with links, number of tweets with hashtags, number of tweets with

@mentions, number of retweets, and number of @replies. Statistics such as tweets containing links, and the number of retweets can give a

broader view of the use of Twitter to discuss the Snowden leaks and subsequent protests.

5.4. Gephi Social Graph by Mention Method

This thesis used the ‘social graph by mentions’ option to illuminate

communities and key actors in the online debate surrounding PRISM and The Day We Fight Back. In order to show how these conversations and communities changed over time, three date ranges were chosen. Force Atlas 2, a built in algorithm, allowed the graphs to be consistently representative of the dataset. Co-mention graphs were clustered and colored based on modularity class. Modularity measures how well a network decomposes into separate communities. This structure, often called a community structure, describes how the broader network is compartmentalized into sub-networks. These sub-networks have been shown to have significant real-world meaning. As the third research question stated, a goal of this research is to test this claim. Digital methods have in the best been shown to correlate with certain offline events, but generally retrospectively, reviewed after the discussion has ended. By studying a currently active topic, these methods can be shown

(34)

to mimic events as they take place. After running Force Atlas 2 the nodes were than ranked by degree (size of the node). A review of Swedish elections using co-mention networks demonstrated how useful this

method can be in studying issues of a political nature (Larsson and Moe). By using co-mention networks, the authors are able to extract key users and clustering, allowing them to “indicate that core users of the

#val2010 hashtag employed quite diverse uses and engaged in different varieties of network connections with each other” (740). In order to find similarly interesting results, settings such as scaling, gravity, and prevent overlap were utilized in order to make the visualizations more readable and explicative. Scaling was increased to allow for clusters to be more easily seen. The ‘prevent overlap’ option was checked to ensure node labels were visible. Ranking settings were also used to illuminate the findings more easily in the graph. Nodes were ranked by ‘degree’ which are the number of time they were connected to another node. Edges were ranked by weight to show how strongly the nodes were

interconnected, the more times a connection was made, the stronger the weight. In the preview workspace, settings such as node outline color, curved edges, fonts, and transparency were manipulated to make the graphs more readable.

5.5. Eigenvector Centrality

Eigenvector centrality is a measure of the influence of a node in

a network. This measure assigns relative scores to all nodes based on the concept that fewer connections to high-scoring people contribute more than equal connections to low-scoring nodes. Google's PageRank is a variant of the Eigenvector centrality measure (Austin). Gephi can

produce a graph that plots Eigenvector centrality in a visualization which makes it easier to understand than just a number. One of these graphs will be generated from each co-mention graph. While much of the

information surrounding centrality in general can be extracted from the co-mention graph itself, the centrality-ranked visual can provide a

(35)

dominated by individual users, or if the debate was more distributed, with no one node taking a centrally powerful position. Each day had a Gephi graph produced with the nodes ranked by Eigenvector centrality, with dark blue as the most influential to dark red as the least influential. As a method, centrality can draw out if there are contradictions or hidden findings within a specific network. An example of a non-obvious finding could be the detection that a user within a smaller or less connected modularity class is actually an influential member of the overall discussion. This connects with concepts discussed earlier by Weng, mainly how certain users, while not particularly powerful in any one community, show their influence by connecting many communities to each other. While the degree is often closely correlated with centrality, as will be seen in the findings there are some cases where this is not

entirely true. It is within these cases where the statistical capability demonstrates its worth in adding to and enhancing the analysis and results.

5.6. Gephi Co-Hashtag Method

The co-hashtag graph was produced in Gephi using the Force Atlas 2 layout. The dataset used for this graph was the hashtag co-occurrence information for the dates selected. In order to cluster the hashtag nodes by similar interconnectivity, they were ranked by modularity class and colored accordingly, using the auto-selected pallet. Scaling was changed, as was overlap, to create a more understandable visualization. The use of co-hashtag graphs can help highlight how online publics interact within their own communities, and also how their message spreads to other online groups. Since hashtags are an important part of Twitter culture, creating visual tools to see how they are interconnected can provide a simple way to summarize what are often confusing and highly complex actor networks. Each graph will be further analyzed by determining which hashtags were most popular, and what the hashtags are referring to. This helps provide context to the clustering that will be seen. Using the crowd sourced hashtag definition websites www.tagdef.com and www.hashtags.org, and cross-checking their lists with our information

(36)

gathered from Twitter, a “definition/meaning” was determined for each hashtag.

5.7. Gephi Bipartite Hashtag-User Graph Method

The bipartite graph of hashtags and users allows for the identification of how hashtags are used among different Twitter users. The graph will show a connection between the user and the hashtag each time a user co-occurs with a certain hashtag. The more often the user and hashtag appear together, the stronger the link weight between the two. This will reveal clusters of users with similar interests, or at least those who congregate around similar topics. Parham’s concept of content injection applies to this type of graph, as link-weight does not necessarily mean a particular interest in the topic. While often interest is a connecting factor, it is easy and common for users to appropriate trending hashtags to promote unrelated tweets. In order to create this visualization, the Force Atlas 2 algorithm was used in Gephi. This graph produced a very complicated network, with numerous hashtags pushed to the outside ring, symbolizing they were connected to one or less users. The

topography ‘degree range’ filter was then used to choose hashtags that were only used three or more times. This filter was used due to

processing limitations (computer memory), and in order to produce a more readable final result as the graph would be far to cluttered without it. Within Force Atlas 2, various settings were manipulated to provide more understandable results. Scaling was increased to different levels dependent on each graph’s attributes. The ‘prevent overlap’ box was checked in every case. In the preview workspace, labels were added, proportional to size, and edge width and opacity manipulated to make the graph as readable as possible (choices such as making the node outlines match the fill color).

5.8. Gephi Bipartite Hashtag-Host Graph

The bipartite graph of hashtags and hosts allows for the identification of how hashtags are networked amongst different types of sources on the

(37)

Internet, and how the appearances of hashtags differ amongst different source types. In the graph, there is a link between the host and the hashtag each time a hashtag occurs with a certain host (website). The more often the host and hashtag appears together, the stronger the link. In a similar fashion to the hashtag-user graph, numerous settings in Gephi’s overview, data laboratory, and preview workspaces were

manipulated to produce the most precise and visually readable graphs. The hashtags that appear in the original dataset were not excluded for this graph in order to demonstrate how hosts related to the top hashtags. The Force Atlas 2 algorithm was also used for this visualization due to its ability to highlight clusters and relationships between nodes through spatial placement.

5.9. Top Users by Degree

Fig. 11 Screenshot of the Gephi ‘Data Laboratory workspace’

By using the Gephi data laboratory feature, users can be ranked by numerous metrics. In order to determine who the most important and active users were for a particular day, degree was used. Gephi offers three degree-based options: In-degree, out-degree, and overall degree. For the purposes of this research, degree was used to identify top users.

(38)

In-degree is the number of times that the user was @mentioned by another, while out-degree was how many times that user @mentioned someone else (Wolfram Mathworld). Degree, in how it is determined by Gephi, is the combination of both of these numbers. For example, if @Test had an in-degree of 15 and an out-degree of 3, the overall degree would be 18. While in and out degree can both bring out certain

interesting findings on their own, adding them together helps normalize the numbers, and assists in removing random users or people attempting to hijack the discussion by endlessly mentioning key actors. The specific degree changes based on the information entered into Gephi. For the purposes of this research, the top ten ranking was chosen via TCAT, and thus the degree information is tailored for this number of accounts.

6. Findings

This section will discuss in depth the findings for each day chosen for targeted analysis, as well as a more generalized view, delving into how the communities and discussions changed throughout the datasets. For each day reviewed, statistics and graphs discussed in the methodology section will be used to shed light on the Twitter discourse. The

subsections below will consist of three visualizations created in Gephi: a co-mention network graph ranked by degree and colored by modularity, a co-mention graph ranked by degree and colored by Eigenvector

centrality, and a co-hashtag graph. These graphs will be augmented with more basic Twitter statistics which help provide underlying context, such as top users and numbers of tweets. For each table, the top users will be identified, using public information on Twitter and other resources such as personal blogs, websites, and other social profiles.

(39)

6.1. June 10 2013 User Degree @GGreenwald 43 @LiberationTech 26 @_CypherPunks_ 24 @ioerror 22 @Asher_Wolf 18 @Declanm 17 @YourAnonNews 16 @Edrather 15 @Guardian 14 @FloridaJayHawk 14

Fig. 12 Top ten Twitter users by degree for June 10 2013. Source: Gephi data laboratory

For June 10, the top user (fig. 12) was @GGreenwald, journalist Glenn Greenwald’s personal account. Greenwald, as mentioned in the literature review, was the first journalist Edward Snowden contacted, and also first to report on the PRISM program in The Guardian on June 6(Greenwald). Greenwald’s articles became the focal point for the entire story, as he had a direct line of contact with Snowden, and was also in possesion of the millions of classified NSA documents. The second most mentioned user on the most active day was @LiberationTech. The account defines itself as a “high-volume news feed on tech, democracy, freedom, human rights & development,” based out of Stanford University

(@LiberationTech profile). The third most mentioned user on this day, @_CypherPunks_, is also one that appears in the top ten for other days, such as October 22 (fig. 16). This account describes itself as covering “Impacts of new technologies and regulations. Crypto, Privacy,

Anonymity, Anticorruption, FreeSpeech and NetFreedom,” with a website (cpunks.wordpress.com) listed as the accounts home page

(@_CypherPunks_ profile). Other top ten users, such as @Asher_wolf, @YourAnonNews, @ioerror, and @FloridaJayHawk, are associated with individual activists or activist groups such as Anonmymous and the Electronic Frontier Foundation. The remaining accounts, @Guardian, @edrather, and @Declanm are journalism related. @declanm, for

example, is the personal account of Declan McCullagh, a prolific blogger who has worked for CBS News, Wired and others on tech topics.

(40)

Fig. 13 Social mention graph for June 10 2013. Colored by modulatiry class

As figure 13 shows, there is defined clustering taking place within the top mentioned users. The orange cluster, with @GGreenwald as the most connected account (determined by the size of the node), represented mainstream news accounts with other reporters like @Declanm and @edrather highly interconnected. At this point in the online discussion, more alternative news groups and activists, such as @_CypherPunks_ and @LiberationTech, are placed in the same modularity class,

demonstrating that they are communicating often between themselves, but not necessarily between other clusters on a large scale. Later graphs will show that activist groups begin to form their own cluster more

strongly, but also increase their cross-group dialogue, demonstrating a trend that may be a partial catalyst for the future protests. Other groups, such as the red nodes, are politicaly motivated users such as

@FloridaJayHawk and similarly opinionated bloggers. Results such as these emerged natuarlly from the method and the data, without any significant human post-processing save for minor seatetic changes. Figure 14, seen below, is the same social mention graph with nodes ranked by Eigenvector centrality rather than modularity. These graphs are included to provide extra information on each day, as the coloration

(41)

will easily identify which users were most influential. Unlike the social mention graphs, each Eigenvector graph will not be expanded upon. Rather, the graphs are included to help add to the information discovered graphs such as figure 13.

Fig. 14 Social mention graph for June 10 2013. Colored by Eigenvector cnetraility number (blue=1 to red=0)

Referenties

GERELATEERDE DOCUMENTEN

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

that MG joins a rational rotation curve as well as the condition that such a joining occurs at the double point of the curve. We will also show,that an

In the case where the initial settlement cracks only consist of shear cracks that do not penetrate the entire concrete section above the steel bar, a pure plastic shrinkage

The extraction of the fetal electrocardiogram from mul- tilead potential recordings on the mother’s skin has been tackled by a combined use of second-order and higher-order

Using diverse lenses (from industry versus academia to visions from researchers in education, information retrieval and human- computer interaction, to name a few), we explore

What are the methodological strengths and weaknesses of the critical incident technique CIT, network analysis, and the Communication Satisfaction Questionnaire CSQ in the context

With such a large number of secure constraints, and a significantly improved sample of galaxy members in the cluster core, we have improved our previous strong lensing model

This part of the research looks closer into the Dutch co-housing projects, instead of the previous parts which were more about an overview of co-housing in the Netherlands, but