Moving Between Quality and Quantity

(1)

Pascal Janssens 10067264

Email: pj.pascaljanssens@gmail.com

Supervisor: Esther Weltevrede

MA Thesis

New Media and Digital Cultures University of Amsterdam

27 June 2014

Moving Between Quality and Quantity

(2)

Abstract

The decentralization of national governmental tasks relating to health care, participation and youth are considered as important issues during the Dutch local elections of 2014. This study analyzes the relation between the tweets of politicians and those of other users on Twitter that relate to this issue of decentralization. The study builds upon the philosophies of the actor-network theory, as well as those from the digital methods in order to understand how the users and the politicians tweet about the issue and how their tweets relate to each other. The method identifies different sub-issues that the different political parties tweet about and the sub-issues that users associate with the different parties. The sub-issues that the politicians tweet about suggest that there are two groups of political parties: one group that tweets about health care related sub-issues and one group that tweets about participation related sub-issues. These two groups of political parties correspond with the leftwing and rightwing distinction that is made between these parties. The study uses the actor-network theory to reflect on its own method in order to question the role of digital tools within the humanities. The argument is that there is an uncomfortable relationship between the humanities researcher and the tools that are used which perform complex algorithmic analyzes. The question is being put forward whether there is a need for a clear set of rules for conducting digital research that keeps the traditional interpretative character of the humanities present.

Keywords: actor-network theory, digital methods, Twitter, election, politics, issue-mapping, big data, digital humanities, digital tools

Acknowledgement

I could not have written this thesis without the support from all the students of the New Media Masters program and all the "library friends" I have made during the process. I would like to thank Eshter Weltevrede for her supervision and all of her insightful comments. I thank Erik Borra for his technical support on the DMI-TCAT. Special thanks go to Buᶊra Alparslan for her editorial help and all the years of moral support.

(3)

Table of Content

1. Introduction page 4

2. Politics of the social (media) page 7

2.1 Politics of Twitter Studies page 7

2.2 Studying Politics on Twitter page 10

2.3 Politics of the Social page 13

3. Methods, and its Social Life page 19

3.1 DMI-TCAT and the Politics of Tools page 20

3.2 Data Sets and Scraping Politics page 22

3.3 Issues, Query Design and "spending time with the data" page 25

3.3.1 Sub-Issues page 30

3.4 Tracing Associations page 33

4. Tracing Findings page 36

4.1 Surface Reading, Tracing Correlations and Connections page 36 4.2 Close Reading, Following Actors and Giving Context page 49

4.2.1 The Connection of D66 page 50

4.2.2 The Connection of PvdA page 52

4.3 Between Surface and Close Reading page 53

5. Conclusion Page 56

Bibliography Page 59

Cited Tweets Page 64

(4)

1. Introduction

On the sixth of January 2014, the Dutch politician Van Toorenburg tweets that the 'municipal councils are more important than ever when taking into account the decentralization.'1 The tweet refers both to the upcoming local elections in the Netherlands, which took place on the 19th of March, as well as an important issue during these elections, namely, the decentralization of national governmental tasks to the local governments. The local governments will be responsible for three sets of tasks which are now still part of the national government. These three sets of tasks are themed as: "participation", "health care" and "youth". The plans to decentralize these tasks are anything but clear and therefore open to interpretation. Since the newly elected local governments will be the ones who decide how to execute these plans, they play a key role within the political debate in the weeks upcoming to the elections. In this study, I will focus on the three themes of decentralization on Twitter and I will analyze the meaning of these three themes through related sub-issues. I will use two data sets, one of tweets from politicians and one that mentions politicians or political parties. The central question of this study is: To what extent do politicians and users on Twitter tweet about similar issues that relate to the decentralization of health care, participation and youth, and how can the relationship between politicians and users on Twitter be characterized? Since the making of these plans are still in an early phase, their precise meaning are yet unclear. This study will analyze the different sub-topics that users and politicians on Twitter associate with the decentralization. I will propose a quantitative research method for identifying the relationship between the users and politicians, and a qualitative research method for understanding this relationship.

Elections have provided opportunities to study Twitter from several points of views: the tweets are analyzed in attempts to predict elections outcomes (e.g. Tumasjan et al. 2011 and Sang and Bos 2012), to characterize the "Twitter behavior" of politicians (e.g. Vergeer, Hermans and Sams 2011), to study social movements during the election period (e.g. Solow-Niederman and Gae 2011) and to explore new (media) campaigning methods (e.g. Vergeer, Hermans and Sams 2013). The different studies on data sets of tweets from election periods come from different academic fields, such as the social sciences, humanities and the computer sciences. Therefore, this type of data (large sets of tweets on politics) has been approached with

1

Every the cited tweet in this thesis are personally translated from Dutch. Only in the third section the original tweets are quoted because they emphasize the use of the Dutch language. All the original tweets can be found in the "Cited Tweets" section.

(5)

different methods where different questions are being discussed. This study aims to understand the relationship between politicians on Twitter and the users that tweet about them. I will identify the topics that politicians associate with the three issues of decentralization and relate them to the topics associated by the users. The method used builds on the ideas from the actor-network theory, as described by Bruno Latour (2005) and those of the digital methods as described by Richard Rogers (2013a). Besides describing the relationship between politicians and users on Twitter, this study aims to reflect on the role of digital tools and platforms that are becoming more prominent in the social sciences and humanities. The actor-network theory as a theoretical framework builds on the idea that the researcher needs to follow the actors and trace their associations. I will combine this framework with that of the digital methods which states that the researcher needs to follow the medium specificities and especially the different uses of these specificities. I will use the actor-network theory as a reflection on the study itself. I will question how to understand the roles of the researcher, the digital tools and the data are as actors within the network of the methodology. This allows me to give and explore the agency, of not just the researcher, but also the data and the digital tools that are used.

The study is divided into three sections. The following section will discuss the previous works on Twitter studies and builds towards a theoretical framework. Within that section I will discuss the different usages of Twitter and how this influenced the research of this platform. This contributes to the framework of digital methods. Additionally, the section explores some of the previous studies on Twitter during elections. There, I will identify the problems and obstacles for these studies and how the actor-network theory contributes to a new understanding of Twitter studies. The third section will describe the method that is used for the empirical study. While describing the method I will reflect on the tools that are used and how they are shaping the research. The discussion focuses on the agency of the platform and the digital tools that are used and what their role is within the methodology. Furthermore, I will propose how to let categories that relate to the issues, emerge from the tweets themselves. This allows me to understand which sub-issues are associated with the three issues on decentralization. The fourth section will provide two readings of the findings. First, I start off by a surface reading which arguments are built upon quantitative data. This reading aims to understand which sub-issues, which were identified in the third section, can be related to which actor. This makes it possible to understand where the politicians and the users connect. The second reading will be a close reading. The

(6)

close reading, which is a qualitative reading of the data, aims to give meaning to the connections which were identified between politicians and users. It will zoom in on the data and discuss the primary texts of this study, the tweets themselves. The section ends with a discussion of both readings. I will argue how both readings can feed opposing arguments while, at the same time, they are highly dependent of each other. Their combination will provide a discussion of the main research question which will be discussed within the conclusion.

(7)

2. Politics of the social (media)

Studying Twitter has been done from several fields with different perspectives and goals, this also accounts for studies on elections and Twitter. In this section I will reflect on previous studies on elections and Twitter and I will discuss how this study relates to them. This study positions itself within the framework of digital methods, as described by Rogers (2013a) and that of the actor-network theory (Latour 2005). This section will question what medium specific objects there are on Twitter and how they can be repurposed for digital research. Furthermore, I will discuss how the actor-network theory and the digital methods framework can complement each other and how this is possible on Twitter. This framework will give a new perspective on the medium specific objects of Twitter. It raises the question, which I will discuss in the end of this section, whether Twitter, as a social media, is helpful in answering questions about the social.

2.1 Politics of Twitter Studies

Twitter is a mainstream microblogging service which is used by different actors, each for their own specific purpose. Besides small talks, users use Twitter for news sharing, collaboration, self-expression, status updating but also for marketing and advertisements (van Dijck 2011, 337). Because the platform is what the users make of it and because users use the platform in different ways, its meaning becomes an unstable one. The platform is used by different actors for different purposes. According to Rogers (2013a), the digital methods approach is to "follow the medium" (24). Following the medium can be understood as repurposing the medium specific features in order to conduct research. Following the medium 'refers to media's ontological distinctiveness' (Rogers 2013a, 25). For this study, this means that it is necessary to understand what the platform specific objects are, how they are used, and how they can be repurposed for conducting research. The way that Twitter can be studied is dependent on the way that Twitter organizes the platform and the way that users are using it.

By using the word "platform" the "platform lens" (Gillespie 2010) is being put forward. Gillespie's "politics of platform" perspective is useful to reveal how a specific platform positions itself towards the different actors which are of interest of (or who have an interest in) the platform. The users of Twitter, meaning the ones who use Twitter, include journalists which use Twitter to retrieve and spread news, celebrities which use Twitter to inform their fans, companies

(8)

which use Twitter for advertisements, politicians which use Twitter for their campaigns, activists, even bots are registered users which use Twitter to spam, and there are many more to name. On the "about page" of Twitter.com there are three more users (again the ones that use Twitter): business clients, media and developers. Twitter Inc. provides the platform for all of these different actors, so that they can benefit from each other. Puschmann and Burgess (2014) study "Twitter and the politics of platforms" by focusing on the platform providers, end users and third parties in relation to data access, ownership and control (45). They discuss how data is the main value for the platform, but not for its individual users. For them the data generated by their activity is merely a by-product (Puschmann and Burgess 2014, 47). There are other actors at stake who use the platform mainly because of this data which are mainly corporate and governmental actors (Puschmann and Burgess 2014, 52). Besides the actors that use Twitter that Puschmann and Burgess describe, it is possible to add the researcher as another actor who has its own specific usage of Twitter and who benefits from the other actors.

The way that the researcher can repurpose Twitter as an object of study is highly dependent on the way that the actors use and organize the platform. Rogers (2013b) outlines how Twitter as an object of study has transformed since its launch in 2006. The first Twitter studies are characterizes as "banal", "phatic" and "shallow". These studies were mainly focused on questioning the value of a message that contains a maximum of 140 characters. One of these "banal" studies suggested that 40% of all the tweets could be characterized as "pointless babble"2" (Rogers 2013b, 357). Miller (2008) who studied Twitter as a "phatic communion" argues that the informational content of the messages are useless and meaningless. For phatic tweets, the sole purpose is to contribute to process of communication (394). The Twitter platform is mainly used in order to maintain and sustain a "connected presence" (Miller 2008, 394). In general, researchers of these early Twitter studies treated the platform as a shallow social networking site. However, as the users of Twitter started to change their usage of the platform, Twitter as an object of study changed as well.

The second Twitter study Rogers identified characterizes Twitter as 'a news medium for event-following' (2013b, 359). The changes that Twitter made to its own platform, as well as the

2

It was only later that these tweets became "de-banalized", as Rogers argues. For example a study done by Chen (2012) who analyzed geo-tagged tweets to understand where users used the words soda, pop or coke (all these words refer to the same thing). As it turned out, the use of each of these words is highly regional dependent. The study de-banalized the tweets that were considered "pointless babble", such as "I'm drinking a coke with my friends."

(9)

change in the way that users are using it, are crucial for the transformation of Twitter to a news and event-following platform. In 2009 the users on Twitter started to add the hashtag symbol (#) to their tweets in order to make the topics searchable (Rogers 2013b, 357). Twitter, in its turn, started to hyperlink these hashtags. Hashtags label the tweets with a particular topic, so that users can follow the conversation of this particular topic (boyd, Golder and Lotan 2010, 1). Small (2011) studies the nature of the Canadian political hashtag #cndpoli and questions to what extent the tweets with this hashtag are conversational (874). Her findings suggest that less than ten percent of the tweets could be considered as conversational tweets (889). However, Small only considers @reply and retweeted tweets as conversational tweets. This is questionable because hashtagged tweets are directed to the audience of the particular hashtag. Although these tweet do not have to be directed to an individual, by adding a hashtag a user contributes to the conversation as a whole. The retweet feature also takes an important role within Twitter as a news or event storytelling machine (Rogers 2013b, 361). boyd, Golder and Lotan (2010) argue that 'while retweeting can simply be seen as the act of copying and rebroadcasting, the practice contributes to a conversational ecology in which conversations are composed of a public interplay that give rise to an emotional sense of shared conversational context' (1). The "emotional sense of shared conversational context" that boyd, Golder and Lotan link to retweets could also account for the hashtag. The hashtag in particular allows users to respond to emerging issues and events (Bruns and Burgess 2011, 7). Hashtags communities, enable researchers to study the different actors that are participating and how they are responding to these emerging issues (Bruns and Burgess 2011, 7). Both the hashtag and the retweet enables researcher to understand the dynamics of the conversational space that surrounds these objects.

Rogers' third and final Twitter study is "Twitter as (archived) data set" (2013b, 363). Rogers points towards the rise of projects that are archiving tweets. Tweets have become data sets which can be studied both as banal media as well as news media. Twitter has made its access to their data base of tweets relatively easy. This makes it attractive for research especially because of the "built-in means of analysis" (Rogers 2013b, 363), such as the retweet, the hashtag, the @mention and the @reply. These platform native objects lend themselves to be aggregated and analyzed. Following the medium here means understanding the specific usage of these platform specific objects in order to understand what questions can be asked and how they can be put to use for analysis. The platform has changed by Twitter, but also by its users, making it a

(10)

more event-driven and news medium. The hashtags and the retweet enabled studies on the conversational aspects of the platform. However, the platform also puts limitations to the analysis. The following sub-section will discuss several studies on politics and Twitter. The discussion gives an indication how these platform specific objects can be put to use for political Twitter studies. Additionally, I will identify the problems and obstacles related to these studies.

2.2 Studying Politics on Twitter

Social media allow for a new kind of interaction between politicians and their audiences. For politicians, social media opens up a form of communication that is more direct and unmediated (Lassen and Brown 2011, 421). Additionally, politicians can communicate via social media in a more personalized and individualized campaigning style, which can be seen as more or less detached from the campaign from the party they are associated with (Vergeer 2013, 10). There are several reasons for people to connect with politicians on social media, such as entertainment, guidance and information seeking (Parmelee and Bichard 2011, 37). However, one important motivation for people to connect with politicians on social media is social utility (Vergeer 2013, 10). Parmelee and Bichard (2011) explain that social utility plays an important role for people to connect with politicians. With social utility they refer to the information people get from the social media which they, in their turn, share with their own social network (37). However, we should be careful to overestimate the role of social media. Although the use of social media is increasing, for most people TV is still the most important news source (Vergeer 2013, 10). However, the increased use and role of social media within politics allows scholars to ask questions which have not been explored before. Between 2005 and 2010 publications in ISI-ranked journals about internet and politics per year has increased over fifty percent. Studies about social media and politics only started in 2008 but show a similar increasing trend (Vergeer 2013, 11). There is an increased interest in politics on Twitter, which is accompanied by an increased interest from the academic sphere. As discussed earlier, Twitter studies may vary from each field. This leaves the question how politics on Twitter can be studied from different fields and perspectives and more importantly, what lessons can be learned from these studies?

One popular field on studying politics on Twitter is that of using tweets to predict election results. Tumasjan et al. (2011) have done a controversial study where it is shown how tweets were a good indicator of the election results in Germany's federal election. By counting

(11)

the tweets that mention (some of) the different parties, they conclude that the 'mere number of tweets reflects voters' preferences and comes close to traditional election polls' (2011, 414). The study of Tumasjan et al., however, is heavily criticized and most prominently in an article by Jungherr, Jurgens and Schoen (2012). When counting the number of tweets that mentioned the different parties, Tumasjan et al. did not include the German Pirate Party, if they did, their prediction method would conclude the Pirate Party would have won the German elections (Jungherr, Jurgens and Schoen 2012, 232). A second critique refers to the time period that Tumasjan et al. collected tweets. They did not include tweets which were published during the week before the election (Jungherr, Jurgens and Schoen 2012, 232). As we will see in this study, the week before the elections is the week that most tweets are published. A successful method for predicting election results using Twitter has not been developed yet. Sang and Bos (2012), who tried to predict the Dutch senate elections in 2011 point to important obstacles within Twitter data. First of all, one person may publish several tweets (Sang and Bos 2012, 56). It can be assumed that people who have a preference for the Pirate Party may be more active on social media than people who would vote for the elderly party (such as the 50plus party in the Netherlands). The demographics of Twitter users need to be taken into consideration. A second obstacle Sang and Bos refer to is that when a tweet mentioning a specific party, it does not have to be positive about that party (2012, 56). The somewhat easy and direct communication allows citizens to use Twitter to criticize certain politicians or parties. Although this study is not about predicting election results, there are lessons to be learned from these studies. Including all the actors at stake (in this case all the parties) and tracing the meaning and contexts of the tweets, are among those lessons.

Other studies focus on creating different networks from tweet collections. Networks are able to tell something about the interaction on Twitter. Vergeer, Hermans and Sams (2011) argue that e-campaigning and in particular Twitter has the potential to reduce gap between politics and citizens. In a different study, the same authors found that the more followers a politician has, the less likely that politician is to send replies to users. Therefore they argue that 'when candidates become more popular on social media, these social media become less social' (Vergeer, Hermans and Sams 2013, 496). As discussed earlier, hashtags can be considered as conversational spaces. Small (2011) studied the number of @replies within a data set of a hashtag relating to Canadian politics and found that less than 10 percent of the tweets were conversational. The study points

(12)

towards the problems of hashtags by making clear that many conversations on Twitter do not include hashtags (Small 2011, 889). As discussed, the hashtag may create a collective shared conversational dimension, direct conversations between users often happens without the use of a hashtag. Other studies create network graphs to analyze interactions. Paβmann, Boeschoten and Schäfer (2014) created two network graphs from a data set of tweets from Dutch politicians, one of the @replies and one of the retweets. They found that the politicians from different parties are highly connected in the @reply graph while in the retweet graph there are clear clusters of each party. This indicates that replying is used as a form communication among colleagues, while retweeting can be seen as way of spreading the messages to a wider audience (Puβmann, Boeschoten and Schäfer 2014, 336). Bruns and Highfield (2013) found similar results in their @reply network graph, however, they included users as well. As they describe there is 'a clear indication of the relative interest in candidates of different political colours' (680). Although the results are similar with those of Puβmann, Boeschoten and Schäfer, the meaning could be different. While at the first one the @reply is interpreted as a form of communication among colleagues, the second one is seen an interest in different parties. However, it might also indicate that the colleagues are intertwined into an online debate. The hashtags and replies used within these researches do not speak for themselves. Different studies show that they are used in different ways and therefore it is important to look at the contexts these features are used in.

The studies discussed show only a few of the different angles that politics on Twitter can be studied. In fact, there are many more ways. So far, the lessons learned are twofold. Firstly, the demarcation of the study needs to be done properly. The chosen data sets and date range needs to be elaborated and well argued. The actors at stake need to be included in a logical time period. Secondly, although the medium specific features of Twitter are useful for research, relying on them should be done with caution. As shown, the hashtags do not always include all the topical tweets that are desired. The @replies are used in different contexts. The affordances of the platform allow users to use these features for different purposes. Including context is crucial. There is a need for a research framework which takes into account theses different technical features and relates them to the ways the users are (re)purposing with them. Following the medium according to the digital methods is useful for achieving this. The following sub-section relates this framework with that of the actor-network theory (ANT), which will prove to be useful for giving context and meaning to the uses of these objects.

(13)

2.3 Actor Network Theory and the Politics of the Social

In his book Reassembling the Social, Bruno Latour (2005) introduces the ANT in order to redefine what the "social" is. Although the ANT should not be treated as a practical method, it does give useful insights into how to practically map out a social controversy. In this sub-section I will explore the ANT and describe its usefulness, especially for doing a study on social controversies by using digital methods. As it will turn out, the obstacles and pitfall described earlier that occurred in previous studies on politics and Twitter, can be (partially) avoided by implementing some of the core essences of the ANT into the method for collecting and analyzing data. This section ends with a discussion on how to use ANT to critically engage with the new methods of digital research.

In short, the ANT is about the connections between the technical and the social, and the human and non-human (Gane and Beer 2008, 27). For Latour, the "social" or the "society" are not given entities. The social is 'a type of connection between things that are not themselves social' (Latour 2005, 5). The social is not something the analyst should define. According to Latour, the actors within a study are the ones who define what the social is (2005, 23). John Law (1992), one of the key developers of the ANT, argues that the 'social structure is better treated as a verb than as a noun' (389). In other words, the social is not something that the analyst puts as a label on a group of actors. The social is something that the actors need to perform in order to be and remain a group of actors. Thus, the social needs to be traced among the performed connections between the actors. These social traces should be found by "feeding" of controversies (Latour 2005, 16). A controversy is taken place whenever actors are in disagreement (Venturini 2010a, 261). Therefore, as Venturini argues, 'controversies remain the best occasions to observe the social world and its making' (2010a 263). To put this differently, whenever there is a controversy taking place (i.e. when actors are in disagreement), they will explain or give arguments. They will need to leave a traceable action in order to participate in the controversy. This makes sense especially for political issues where there is a lot of disagreement. However, tracing the social is not as straightforward. To get a better understanding it is crucial to understand the ANT interpretation of what an actor is and what a network means.

According to the ANT anything can be an actor within a network. This means that nonhuman entities or objects are treated as actors as well (Latour 2005, 72). Although this may complicate things, it also makes the ANT very suitable for doing research on digital platforms.

(14)

According to the ANT an actor is something that acts. The actor, in ANT, always has agency, that is, it is always doing something; it is always making a difference (Latour 2005, 52). Objects too can do something; they can make a difference just as humans can. When we think of a tweet, who is the actor? The human who typed the message, the tweet, or both? Without the tweet the message might not be produced at all or maybe in a different use of words. Furthermore, the message would not get the same audience if Twitter was not used. The tweet is transforming the original message of the human who typed out his thought. The tweet, as a mediator, has agency over the action that is taken. Additionally, the hashtags, replies, and so forth, can all be considered to perform an agency over the action. Therefore, these too are actors within the network. As mentioned earlier, the social is not something given but something that needs to be defined by the actors. For ANT this means that the researcher should "follow the actors" (Latour 2005, 12). This also account for the nonhuman ones. We need to trace their actions, connections and associations to get towards the ANT understanding of what a network is.

Networks in ANT are not the same as social networks. Neither should a network be understood in the technical sense. Latour points out that a network is 'a tool to help describe something, not what is being described' (2005, 131). Therefore, the network can never be a "social network" because the network is the tool that helps to trace the social. For Latour, a good account of ANT is one that 'traces a network' (2005, 128). The network consists out of all the traceable connections among the actors, it is 'the trace left behind by some moving agent' (Latour 2005, 132). The network should be understood as a concept for describing the different relations among the different actors at stake. Therefore the ANT should not be understood as method, rather as a concept to frame a method. Combining the ANT with digital methods means forcing to critically engage with all the actors at stake and admitting that not only human actors, but also the digital tools and platforms have agency.

"Following the medium" turns out to be not that different from "following the actors". Following the medium specific objects and treating them as actors. According to Latour (2005), it is necessary to question if the actors are behaving as intermediaries or mediators (39). Intermediaries 'carry the meaning or force without transformation: defining its input is enough to define its output', while mediators on the other hand, 'transform, translate, distort and modify the meaning they are supposed to carry' (Latour 2005, 39). When relating this to Twitter the question becomes if tweets can be considered as mediators or as intermediaries? A single tweet, it could

(15)

be argued, does not change the meaning it carries. However, the retweet, for example, although it re-uses the exact same content, the meaning or "force" of the tweet itself does undergo a transformation. The retweet has to possibility to put into action a certain motion. The retweeted tweet suddenly gets a wider audience. Additionally, as discussed, the retweet can 'give rise to an emotional sense of shared conversational context' (boyd, Golder and Lotan 2010, 1). The retweet, as a mediator, enables a single tweet to gain a larger impact. A similar argument could be made about the use of a hashtag. Following the medium, means to understand if the objects of a platform are behaving as intermediaries or mediators. If the object can be considered as a mediator, the question becomes what changes to the meaning are being performed?

ANT and digital methods seem to complement each other. The web seems to be a good space for making an ANT account. As Rogers, Querubín and Kil (forthcoming) describe, 'the web has opened new channels for action, communication and participation for actors involved in a controversy' (23). By taking Twitter as an example, it is easy to argue that this is a space where many actors participate in a controversy. Twitter is a space where many actors are active; politicians, citizens, journalists, activists, and so forth, all participate within the same platform and it is possible for all of those types of actors to communicate about the same controversy. Secondly, applying digital methods on the web has advantages in accessibility, aggregability and traceability (Rogers, Querubín and Kil forthcoming, 29). The web allows following the actors and tracing their relations because most of actions of the actors are stored and accessible. Every tweet sent is stored on the servers of Twitter. Accessibility to data on Twitter is questionable. However, it is possible to retrieve a live stream of all the tweets which relate to a certain (set of) keyword(s), the so-called "firehose"3. The fact that tweets are text based and that many contain hashtags makes it easy to trace them by using a search engine. Tools for digital methods allow aggregating the data in useful formats. This allows to order a data set of tweets according to the platform specific features such as the user, timestamp, hashtag, retweet, reply, mention, and so forth. The web has made the complicated ANT research a lot easier. However, it is important to remain not too optimistic about these new media, for it has its pitfalls too. As Venturini (2010b) reminds us, there are four simple facts that need to be acknowledged: '1. search engines are not the web; 2. the web is not the Internet; 3. the Internet is not the digital; 4. the digital is not the

3

For more details on the Twitter API see https://dev.twitter.com/docs/streaming-apis/streams/public. A more elaborate discussion on the different API statuses can be found in an article by Gerlitz and Rieder(2013), 'Mining One Percent of Twitter: Collections, Baselines, Sampling."

(16)

world.' (8). Earlier I have discussed some problems and obstacles in earlier work on politics and Twitter. One of the problems with predicting election results on Twitter is that not everyone is active Twitter. Twitter is not the world, therefore the studies should not ask question about a whole population. Especially because it remains difficult to understand who a user is on Twitter, since the users are not required give any demographic information. This is not a problem that can be easily resolved at this moment rather, it is a problem that should be avoided by asking the right questions. By "following the actors", tracing their connections and letting them define themselves, the researcher avoids making assumption on who they are.

The ease of collecting large sets of data has given rise to "big data" studies. Are large sets of data contributing to answers about questions on the social? Historically, analysis of very large (or big) databases has been part of the large internet-based companies (Bollier 2010, 3). In the recent years, because of the access to these databases using the platform's API's and the rise of digital tools for scraping and analyzing, researchers have gain access to this data. Without using the term "big data", Savage and Burrows (2007) point towards the value of digital generated data as a by-product (891). Digital transactions, mailings, subscription data, and so forth, all produce metadata as a by-product. The value for this data for sociologists lies within the fact that the users produce this data without knowing that they are being studied. Whereas a traditional survey would pull the participant out of its context, for digital transactions, there is no experiment or research context. As Savage and Burrows puts it, "[t]hey work directly with the real, complete, data derived from all the transactions within their system" (2007, 891). The importance here is that Burrows and Savage refer to "complete data", as in opposite to data samples, which is later be used to describe "big data" (Mahrt and Scharkow 2013, 25). Others describe big data as data that, because of the size and complexity, cannot be collected and/or analyzed using "normal" consumer computers but only by "supercomputers" (Snijders, Matzat and Reips 2012, 1, Manovich 2011). These descriptions of big data refer mostly to the size of the collected data in order to get the full or complete data sets. Since tweets are increasingly being archived, the number of tweets that are included in scholarly work has increased every year since 2007 till 2012 (Zimmerman 2014 preprint). As we can see here, big data earns its name mostly because of its big numbers. This enthusiasm towards big data, however, is met by critics who question the quality of this quantity.

(17)

In relation to the philosophies of Latour, the question arises whether larger sets of data are helpful in answering questions about the social. The complete data, which Burrows and Savage refer to, becomes problematic when related with Twitter data. Twitter data, so far, can never be complete data for two reasons. First of all, the most obvious, not everyone is active on Twitter, and those who are active can hide or fake their identity. When studying a particular hashtag, such as #ausvotes (the #ausvotes hashtag is widely used by Australian Twitter users during 2007 federal elections), it should be noted that most tweets about the Australian elections do not contain this hashtag (Bruns and Burgess 2011, 6). #ausvotes is a sample of a sample, in the sense that only a part of the population is tweeting about politics during the elections, and that only a part of the users are using this specific hashtag. Twitter data can never represent data of a society as a whole.

As mentioned, Savage and Burrows mention transactional data (data as a by-product) as a means to retrieve complete data. They mention that: "...data on whole populations are routinely gathered as a by-product of institutional transactions…" (2007, 891). They refer to the way that Amazon does its marketing research by exploring the browsing behavior of users on their website. Instead of participating in a survey, users are now always participating in a study (commercial or academic) when they are active on Twitter. The advantage, here is that users are participating without being in the "survey" context. This implies that tweets are somewhat uncensored. As boyd and Crawford argue: "researchers have the tools and the access, while social media users as a whole do not. Their data were created in highly context-sensitive spaces, and it is entirely possible that some users would not give permission for their data to be used elsewhere" (2012, 673). Besides the issue of privacy that boyd and Crawford highlight, they refer to the context-sensitive space of users of social media. The users of Twitter tweet for a specific audience which varies from user to user. Users can participate in a hashtag conversation, reply to a comment, inform their list of followers, and so forth. The combination of the user, the platform itself and its features, and the expected audience is what determines the tweet. If a user had known that his or her tweet was going to be part of a study, would that tweet be different? The argument that the tweet is somewhat uncensored is only half the truth. Tweets may be uncensored in the sense that the user does not know that he or she is participating in a study; however, the tweet is always censored for its intended audience. Bruns and Burgess argue that users add the #ausvotes hashtags in order to make their tweets more visible (2011, 6). However,

(18)

not every user adds a hashtag to participate in the "hashtag conversation", hashtags can also be used to inform your audience which about topic the user is tweeting about. The point here is, that big data studies is useful to understand what is happening on a platform (for example, what is being tweeted the most), however why and who questions are more difficult to answer with solely numbers.

Ironically, big data, which is mostly social data, is unable to answer big questions about the social. As Uprichard (2013) puts it 'big data, little questions'. If we cannot ask questions about social, we can also question the use of "complete" big data sets. If we do not want to make claims about society as a whole, what questions can we ask? It is necessary to find a balanced question which does not make claims about society, but also does not stick by the numbers of data. For Latour and Venturini (2010) argue that the problem here lies within the fact that we still treat the data obtained through digital methods as data we would have obtained through surveys. Questions should not relate the data to their representativeness of a society. The advantage of digital methods is that we can trace the 'assemblage of collective phenomena' (Latour and Venturini 2010) as it unfolds. Assemblage refers to the network of relations among the different actors. This study, as mentioned earlier, discusses the relationship between the tweets of politicians and those of users during the elections. As it will become clear in the next sections, the method balances between numbers of data and interpretations of primary texts. By following both the medium and the actors, this study is able to trace both medium specific connections among actors, as well as connections in their use of language.

(19)

3. Method, and its social life

Digital tools have made it possible to collect and analyze data in relatively short time. However, following the ANT to do research as well for reflecting on research requires stopping at every decision and rethink every step that is taken. As Latour and Venturini (2010) argue, 'for the new methods to realize their innovative potential, it is necessary that each step in the research chain be rethought in a coherent manner.' This section will take this literally. I will stop at each step and question the method; what are the roles of the digital tools and what is there to say about objectivity and subjectivity? Implementing an ANT approach into a digital method for empirical research does not make it easier. The method will balance between a qualitative ANT approach and a quantitative big data approach. The ANT is used both as a framework for the method, as well as a framework for criticizing the method. Such a "self reflection" on the researcher's own method fits the idea of the "redistribution of methods" (Marres 2012). The concept of "redistribution of methods" highlights 'processes of exchanges between actors involved in social research' (Marres 2012, 144). It is especially useful to consider the roles of all the actors that are involved in the research when new technologies, and therefore new knowledge, takes its place within an academic area. In similar lines, Savage (2013) points towards the importance of the "social life of methods" which 'is a response to the increasing salience of methodological devices' (5). The "social life of methods" discards the method as just a technical or instrumental factor in the research. It takes the method as an object of study in order to engage with it critically and politically (Savage 2013, 9). The ANT is useful to criticize the method because it forces the researcher to look at the relations between the researcher, the tools, the data, the users and the platform. The ANT helps to remember that each of these actors can act, therefore each of them can make a difference in the study. The method described in this section aims to answer the question what sub-issues are discussed on Twitter by politicians and users when they discuss the bigger issues on decentralization. How can the ANT and the digital methods as frameworks be put to use in order to let the actors define themselves and the sub-issues? Furthermore, I will question what the actors are within the methodology and what difference they make.

In the following sub-sections, I will elaborate on the data sets I have used and I will make it clear why I refer to users instead of citizens. There, I will also elaborate on the Digital Methods Initiative - Toolset for Capturing and Analyzing Tweets (DMI-TCAT) (Borra and Rieder 2014, preprint), the open source tool I have used for data capturing and analyzing. Within that

(20)

sub-section, I will discuss "the politics of tools" which refer to increasing use of digital tools within the humanities and how it is making a shift more towards the social sciences. Originally, neither an IT field, nor is it social science, the humanities, or rather the digital humanities, are facing a shift in its episteme. I will refer to the new role of digital tools within the studies of the humanities. The sub-sections that follow from there will elaborate on how I have traced the issues and sub-issues within the data. These sub-sections will elaborate on how to make issues queryable for data sets of tweets. I will discuss how to follow both the medium specificities as well as the language of the actors themselves. Part of that sub-section will be dedicated on what it means to "spend time with your data", a crucial and complex step which is necessary in order to understand what the data means and how to clean it. For cleaning up data, I will propose a method that starts by collecting data using a "generic query"; one that captures all desired data, but also more. From there on, it is necessary to use the captured data to generate specific queries which will only capture the desired data. Finally I will elaborate on how to make analysis of issues within the specific data set.

3.1 DMI-TCAT, and the Politics of Tools

The DMI-TCAT is an open source tool for capturing and analyzing tweets4. Currently the tool is unique in the sense that it allows both for capturing as well as for analyzing tweets while the analyst requires little or no programming skills to work with its interface. Therefore this tool will prove to be particularly useful for researchers within fields where digital tools are becoming a more prominent part of the studies, but where at the same time the researchers often do not have the skills to work with complex coding. If a tool is easy to use, it does not mean it is easy in its structure. Rieder and Rohle (2012) describe "black boxing" as one of the challenges for digital methods (75). For researchers within digital humanities it seems awkward to do a study that relies completely on a tool which the researcher does not understand, and therefore cannot explain. For the researchers, the tools are "black boxed". Even for tools, such as the DMI-TCAT, which make the source code available, the researchers cannot make sense of if because of the "code illiteracy" among them.

4

The DMI-TCAT and more information about the tool are available on GitHub:

(21)

The DMI-TCAT, as mentioned, is both for capturing and analyzing tweets. For capturing tweets, the DMI-TCAT provides three options. First of all, the DMI-TCAT can capture a "one percent" random sample. The is a random sample of all the tweets within Twitter's data set. A second option allows to capture tweets containing specific keywords. This option is used to capture tweets in real-time since it cannot retrieve tweets from the past. Finally the DMI-TCAT allows to capture the tweets from a specific set of users. The DMI-TCAT allows to retrieve the last 3200 tweets from each user. The captured data can be analyzed through a variety of options. Borra and Rieder (2014), the lead developers of the tool, explain that besides "tweet statistics and activity metrics, network analysis and content analysis…[the tool] also facilitate[s] geographical analysis, ethnographic research, and even textual hermeneutics" (8). An important feature of the DMI-TCAT is that it enables the option to create subsamples of the captured data sets. For example, a dataset containing the tweets of all parliament members can be queried with a specific set of keywords, for example "global warming". The tool, then, generates a subsample of all the tweets that contain the phrase "global warming". Further analysis of this subset can be made by exporting csv files containing user activity, hashtag frequencies, retweet frequencies, URL or host frequencies and other forms of tweet statistics or activity metrics. Additionally, the DMI-TCAT allows to export certain .gdf and .gexf files which can be used to create network graphs. One can think of reply and mention graphs, but also co-hashtag or hashtag-user graphs. Finally, the TCAT can export the complete set of tweets, only the ID's of the tweets, or a random sample of a thousand tweets5.

Since the rise of tools such as the DMI-TCAT, the humanities researcher is capable of doing studies on a larger and faster scale (the DMI-TCAT can generate a subsample of millions of tweets within just a few minutes). There is a great interest from the humanities researcher to embrace these tools. The above mentioned list of options the tool allow us to do on tweet data will probably expand greatly within the coming years. However, this great interest should be careful for what Rieder and Rohle (2012) call "the lure of objectivity" (71). Once mostly focused on interpretive knowledge, these new digital tools allow the humanities to produce verifiable knowledge. Therefore, Rieder and Rohle (2012) argue that it might 'indicate a desire to produce knowledge that can compete with the natural sciences on their own terms, by being as 'objective',

5

There are many more options for analysis, I have only highlighted the ones that are important for this study.

(22)

as 'rigorous', with the help of machines' (72). It is true that these tools allow the researcher to do an empirical study on a larger scale in a relatively short time, however, as mentioned, precisely because we do not know exactly how these tools work for us, they are "blackboxed", we should stay critical of them as well. When it comes to objectivity, we are only capable of what the tools allow us to do. We could ask ourselves the question how objective these tools are, in the sense what they do and what they cannot do. I am not arguing that we should take a step back, on the contrary, we should keep experimenting, studying and improving these digital tools.

3.2 Data sets, and scraping politics

There are two data sets used for this study: one containing all the tweets of the Dutch parliament members and officials (such as ministers and the parties chairmen), and one containing all the tweets of users who tweeted about politics. The first data set, which I will refer to as "politicians" contains over 430 thousand tweets from 159 different users. The earliest tweet from this data set was published on the 24th of May in 2007 and the last one on 14th of May 2014. Currently the DMI-TCAT is still automatically collecting the tweets of the parliament members, thus the data set is expanding every day. The second data set, the "users" data set, contains over 1.8 million tweets from over 390 thousand users. Tweets that are collected for this data set include those which contain any mention of a parliament member's user account6, a hashtag of any of the parties or the mention of any official party account.

The DMI-TCAT automatically captures, or "scrapes", the data from Twitter. As Marres and Weltevrede (2013) point out, scraping is a practice which is closely related to the "real-time web" (317). Since the access the historical tweets is limited and most often behind a pay wall, much of the scraping of Twitter data happens after the decision is made to do research about a certain issue. This clearly has implications for the research being done. Elections seem a logical moment since these are announced events which span over a longer period. However, it becomes more difficult to study sudden events. This makes scraping Twitter more complicating because Twitter is precisely the platform to report about events as they happen. Although it is not impossible to scrape yesterday's tweets, it becomes more complicated to get those from a week or month ago. However, once the data is scraped, the data is delivered to the researcher in a clean

6

The mentions of six politicians were not included because they are the replacements some politicians that left the parliament during the period the data was collected.

(23)

fashion. This means that most scrapers order the data into spreadsheet files where the different elements are presented in different columns and, as the DMI-TCAT does, in chronological order. Marres and Weltevrede (2013) term this "pre-ordered data" (324). This presentation of the data is what the researcher calls his Twitter data which is ready to be analyzed. However, it is questionable if the researcher is really looking at Twitter data. Is it not the data produced by the scraper, which based its findings on actual Twitter data? If it was real Twitter data, then it should not make a difference which scraper the researcher uses, the data should always look the same. This is clearly not the case since each scraper presents the data differently. For creating an output file, the scraper needs to parse (structure) the data, therefore it needs to make an analysis of the data (Marres and Weltevrede 2013, 326). In the case of the DMI-TCAT, the scraper looks for metadata, such as a timestamp and location, but also specific uses of Twitter such as the retweet and hashtag. Studies with digital tools often do not describe the analytics performed by the scraper.

As mentioned, the DMI-TCAT scraped, and analyzed, data for two data sets, one containing tweets from politicians and one containing tweets about politics. For this study only those tweets which have been posted during the local elections of 2014 are of interest. The elections were held on the 19th of March 2014. The tweets that will be analyzed are those that are published between the 2nd of January and the 8th of April. This should be a reasonable amount of time before and after the actual elections. As mentioned, the DMI-TCAT can create subsamples. The subsamples of the data set "politicians" now includes over 55 thousands tweets from 155 different users (see figure 1). The subsample of the "users" data set includes over 1.3 million tweets from over 294 thousand users (see figure 2). Figure 1 and 2 show the output of the DMI-TCAT after generating a subsample. Both graphs show an increase in the number of tweets towards the elections, where they both peak, and a decrease afterwards. It can already be argued that politicians tweet with a weekly routine since they tweet substantially less in the weekends, while the number of tweets from the users seem to be unaffected by the weekends.

(24)

Figure 1: Screenshot of the DMI-TCAT. Subsample data set "politicians". x-axis represents the timeline and the y-axis represents the number of tweets. The blue line is the number of tweets, the red line is the number of users, the yellow line is the number of tweets that include a location and green line represents the number of tweets that include a geotag.7

Figure 2: Screenshot of the DMI-TCAT. Subsample data set "users". x-axis represents the timeline and the y-axis represents the number of tweets. The blue line is the number of tweets, the red line is the number of

7

The Tweets from the data set "politicians" between the second of January until the eight of April can be found in appendix I

(25)

users, the yellow line is the number of tweets that include a location and green line represents the number of tweets that include a geotag.8

3.3 Issues, sub-issues, query design, and "spending time with the data"

From 2015 onward the Dutch local governments will be responsible for a set of tasks relating to health care, participation and youth. The decentralization of these issues play an important role in the local elections of 2014. It is going to be these newly elected local governments which are going to decide how to execute these plans. These issues, however, are far from clear. The government has created three websites where the municipalities and citizens can find information about these plans. On www.invoeringwmo.nl the issue of decentralization of health care is vaguely explained: 'The new Wmo9 makes municipalities responsible for the support of citizens. This support is available for different purposes, such as counseling and participation.'10 Counseling and participation, and other "different purposes", can refer to anything. On a different section of the website, the "A to Z section", the different and more specific areas that are affected by the new Wmo are further elaborated. There it becomes clear that the new Wmo will change numerous areas that relate to health care. From different issues on health insurance, elderly health care, the "neighborhood nurse", to the abuse of women. On the website for the decentralization of "participation", www.gemeenteloket.minszw.nl, one can find the same type of information; vague and very broad. The decentralization of participation is mostly related to employment related issues, however, it also covers issues of social security, integration and discrimination. The decentralization of "youth" is the least vague and broad. On

www.voordejeugd.nl it becomes clear that the tasks are relating to, among other things, child abuse, youth mental health care, foster parents and education. Because of their vague definitions and the broad spectrum of sub-topics these three issues of decentralization cover, it is a challenge to study them on large data sets of tweets. The questions emerges how to translate these issues into words that are used by both politicians and users on Twitter?

8

The Tweets from the data set "users" between the second of January untill the eight of April can be found in appendix II

9

Wmo is short for Act social support ("Wet maatschappelijke ondersteuning" in Dutch).

10

I have translated this text. The original quote is: "In de nieuwe Wmo zijn gemeenten verantwoordelijk voor ondersteuning van burgers. Deze ondersteuning is voor verschillende doelen beschikbaar, zoals begeleiding en participatie."

(26)

In order to understand how the politicians and users discuss the three issues on Twitter it is necessary to follow the medium specific features of Twitter such as hashtags and retweets. Furthermore, it is important to understand the language the users and politicians use on Twitter when they tweet about these issues. However, first there is a need to take a step back in order to capture all the tweets in the data sets that relate to these three issues. For that I have created three generic queries, one for each issues, to create subsamples of the two data sets. The goal of the generic query is to cover every tweet relating to the issue. Here the rule counts that it is better to capture unrelated tweets than missing related ones. For each of the issue I have used their corresponding website to create a generic query. As mentioned, the websites have a specific "all topics" section where all the related files and information can be found. All these related topics needed to be turned into queries. This is done by using the topic as a query, find synonyms of these topics in the dictionary and by reading the sub-topics on the websites to look for specific keywords. Eventually all these keywords are put together and are separated by the OR feature to create a generic query. For example, on the website that relates to the decentralization of health care, there is a sub-topic called "alzheimer". The keyword "alzheimer" is used as a query, but also the keyword "dementia". Furthermore, the sub-topic mentioned the importance of carers therefore I would also use "carers" in the query. The DMI-TCAT also includes tweets where the keyword is only part of a word. Therefore I use only the shortened version of a word to include different conjugations. In Dutch the query for alzheimer will be as following: "alzheimer OR dement OR mantelzorg".

After creating subsamples using the (initial) generic queries, the next step is to "spend time with the data" in order to "get to know the data". This advice has been giving to me throughout the years by several teachers, and only after reading the works of Latour I was able to fully understand what these simple phrases mean. As described in an article by Venturini (2010a), when Latour was asked by a student to specify his instructions on doing cartography, his answer was 'just look at controversies and tell what you see' (Latour qtd. in Venturini 2010a, 259), which Venturini translated as "just observe and describe". "Just observe and describe" is a simple advice which carries an important and complex meaning. As Venturini (2010a) argues, the meaning of "just" is three-way. By using the word "just", Latour refers to that the researcher needs to make any effort to be as open as possible to the different perspectives in the established literature. At the same time, the researcher cannot pretend that he or she can be completely

(27)

objective; everyone carries some prejudices towards a topic. Additionally the consequence of "just" means that 'you shall listen to actors' voices more than to your own presumptions' (Venturini 2010a, 260). Just as the phrase "just observe and describe", the meaning of the phrase "spend time with the data", is more complex than one first expects. To "spend time with the data" means to know what it is inside the data set and also it means to decide if the data belongs there. Twitter data sets are messy in nature. The users might use a completely different language than the researcher expects (such as slang). Furthermore, the users might associate surprisingly different subjects with the issue that is being studied. To spend time with your data, means that the researcher needs to be aware of its own assumptions and be open to other perspectives. The reason the data is there is always because of the query the researcher used. Thus, whenever false data, or noise, occurs in the data set, the question should always be: Does the query needs to be adjusted to leave out the false data, or is the query being adjusted in order to retrieve the desired data? When posed towards this question, the researcher should be reminded that '[o]ne man's noise is another man's data' (Stenrud qtd. in Bollier 2010, 14).

Is it fair to argue that the study, because of the cleaning of data which is a very interpretive process, is not objective at all? Statistician Andersen argues that 'cleaning the data [...] removes the objectivity from the data itself. It's a very opinionated process...the truth is that the moment you touch the data, you've spoiled it. For any operation, you have destroyed that objective bias for it' (Andersen qtd. in Bollier 2010, 13). The problem with Andersen's argument is that he sees objectivity only as something that can only be labeled to "raw", untouched data. If that is the case, objectivity is only reachable by the digital (scraping) tools we use and that it is unreachable for humans. As discussed above, tools are not necessarily objective. Also, if humans cannot be objective, the tools we build, that make analytical decisions that humans programmed them to do, can neither be. Objectivity becomes an unreachable concept for any actor. The problem here, again, becomes the desire for objectivity. The tools we use enables us to study bigger sets of data, from a wider variety of actors. However, unlike the data in natural sciences, the generated social data does not let itself be calculated with standard models.

"Spending time with the data" in practice means that the researcher needs to look at the set of tweets generated by the generic query. Because of the large data sets it is impossible to read every single tweet. The random sample option of the DMI-TCAT (an export of 1000 random tweets from the subsample) is useful here. While reading the tweets, the researcher needs

(28)

to constantly ask if the tweet belongs in the data set or not. If not, the question is why that tweet should not be included? Is it because of the query design? Or, might it be that the users associate a different subject to the issue; one that the researcher would not make? If the latter is true, the tweet still needs to be included within the data set. For example, within the data set of participation the keyword "werk" (work) was included within the query. The keyword "werk" captures the following tweet: '@RenskeLeijten Er zitten mensen in de regering die zich nergens meer voor schamen marktwerking in zorg is asociaal!'11 (Poldervaart 2014). This tweet contains the word "werk" in the word "marktwerking" (market forces). The tweet is about the market forces within the health care system and that politicians should be ashamed of this. The tweet is included within the participation data set because of the word "werk". However, the word "marktwerking" is not related to "work" as I had intended, therefore, the query design needed to be adjusted in order that tweets such as these would not be captured. The same tweet could also be in within the subsample of health care because of the word "zorg". The tweet refers to the market forces within the health care system, a sub-topic someone might initially not think of. The final set of generic queries can be found in appendix III.

As mentioned, the three issues of decentralization are rather vague and broad. This makes them vulnerable for different interpretations. By using the tweets that are collected with the generic queries, it is possible to identify sub-issues that are associated with the issues by the politicians and users on Twitter. This approach is letting the actors themselves decide what meaning they give to the issues. To operationalize this, it is necessary to follow the medium's specific objects (Rogers 2013a). The generic queries have created six subsamples of the data set. For each of the issues there are two subsamples, one from the data set "politicians" and one from the data set "users". By using three different methods on each subsample, several sub-issues were extracted. The first method (figure 3) takes all the tweets and analyzes the most used words, which in turn can be categorized as sub-issues. By using the "export all tweets" option from the DMI-TCAT all the tweets from a subsample are exported into a .csv file. By copying the text of tweets into the Raw Text To Tag Cloud Engine (developed by the Digital Methods Initiative) (step 1 in figure 1), it is possible to retrieve the most used words and their frequency within the tweets. The next step (2 in figure 1) is to extract the issue related words. Here I have chosen to

11

Translated to English the tweet would be: "@RenskeLeijten There are still people in the government who have no more shame for the market forces within health care is unsocial!"