The Twitter-hype Methodology

(1)

THE TWIT

TER-HYPE

METHODOLOGY

(2)

(3)

Acknowledgements

It is a pleasure to thank everyone who made this thesis possible.

I am indebted to Peter Vasterman, for being my thesis supervisor -- for the second time around! -- and advisor/motivator for this methodological development.

I am grateful to all my lecturers over the years, for kind assistance and wise advice.

I wish to thank my fellow MA thesis students and friends, Danny, Douwe, Maylis, Michelle, and Wessel, for their support and entertainment, and all the lovely people at Off-Screen, for five great years of company, drinks, and distractions.

Lastly, thank you, Faye, Maud, Jan, and Yvonne. I hope you will read this one too.

T.J. Bovelander (6297811) Master’s thesis New Media and Digital Culture (Media Studies) University of Amsterdam, 25 June 2015 Supervised by dhr. dr. P.L.M. Vasterman Second reader: dhr. prof. dr. R.A. Rogers

bovelander@gmail.com UK English (MLA formatting style)

(4)

INTRODUCTION... 1

1. ON THE CONCEPT, THEORY, AND APPROACH... 4

1.1 BEHIND “THE TWITTER-HYPE”... 4

1.2 THEORETICAL FRAMEWORK... 5

1.3 THE TWITTER-HYPE... 8

1.4 PRACTICAL APPROACH... 10

2. THE TWITTER-HYPE METHODOLOGY... 13

2.1 DATA ANALYSIS... 13

2.2 DATA RESEARCH... 24

2.3 DATA COLLECTION... 25

2.4 DATA CONSIDERATIONS... 30

2.5 PROPOSED RESEARCH STUDY OUTLINE... 32

3. ANALYSING @PHARRELLHAT... 34

4. REFLECTION... 44

CONCLUDING REMARKS... 47

WORKS CITED... 48

(5)

APPENDIX A. “GET_TWEETS.PY”... 51

APPENDIX B. @PHARRELLHAT TWITTER DATA... 52

(6)

Introduction

An investigator starts research in a new field with faith, a foggy idea, and a few wild experiments. Eventually the interplay of negative and positive results guides the work. By the time the research is completed, he or she knows how it should have been started and conducted.

— Donald Cram

In this thesis the topic of research is the Twitter-hype; to be exact, this thesis proposes a methodology for the investigation and analysis of Twitter-hypes: to provide a clear and balanced set of techniques for the qualitative assessment of a popular account on the platform Twitter. While the term might not seem familiar, the concept itself is one that is deeply rooted in media studies research. The explanation behind the concept and its position in theoretical frameworks are explained in depth in paragraph 1.

The motivation behind this thesis is, for the second time around, both academic and personal. In 2014, the conclusion of my Media Studies Bachelor’s programme in New Media emerged in the form of a thesis entitled “The Twitter-Hype”, in which the viral Twitter-account @SochiProblems was analysed. The account reported on mishaps in and around the 2014 Olympics and amassed over 150,000 followers in just three days, surpassing the follower count of the official @Sochi2014 account.1 While the case and its analysis was a highly interesting research project, I concluded the thesis by reflecting on the potential for significantly more in-depth research into these Twitter-hypes. I use the plural ‘hypes’, because the @SochiProblems hype was certainly not the first (or last) account to experience an explosion of attention, which resulted in the acceleration of its popularity and thus news-worthiness -- a vital precondition for the ‘definition’ of a Twitter-hype. Examples include @PharrellHat (covered in this thesis as a case study) and the Dutch #PGBAlarm hype (not covered as a case study, but at times referred to). While popular Twitter accounts are not really a unique occurrence, accounts with more followers and significantly popular tweets can be considered to be important actors within the network that is Twitter -- and at times, the repercussions

(7)

of the accounts and their popularity extend beyond the networking platform. As such, it is logical to argue that the question of how these popular Twitter-accounts came to be is a significantly relevant one -- for both academic and social considerations. The importance or significance of Twitter-hype research is perhaps best explained through a quick recap of the @SochiProblems hype. In short, the account’s explosive growth was not only interesting and (admittedly) amusing, but was clearly indicative of a situation behind and enabled by the popularity: the self-reinforcing nature of the hype became apparent when the news reports did not cover the actual problems in Sochi, but the @SochiProblems Twitter-account. The account grew rapidly in followers and popularity, and became somewhat of an unofficial aggregator for any Sochi problems. The journalism student behind the hype-account eventually became overwhelmed, most likely, as the account failed to systematically generate content after four days of high activity. The patterns surrounding the Twitter-account were highly interesting and prompted an investigation into the situation, by taking Peter Vasterman’s media-hype theory to structure an approach and proposal for the Twitter-media-hype concept.

This thesis aims to connect thoroughly with the concept, by proposing a methodology for the investigation and analysis of Twitter-hypes, while also focusing on aspects such as data collection and (field-related) data research. It is perhaps not custom to finish a Master’s programme by writing a thesis that is not an extensive theoretical research report, but instead a proposition for a methodology. However, over the course of this thesis I aim to demonstrate that the investigation of these Twitter-hypes is of significant importance for media studies, e.g. because of its strong roots in academic research through qualitative cultural and platform analyses. As a result of its rather different approach and to ensure that the thesis’ investigation and proposed methodology is abundantly clear, this research is structured as follows.

The first paragraph, “On the concept, theory, and approach”, is designated to ensure that the research into this proposed methodology for investigating hypes on Twitter is thoroughly explained and justified. It does so by explaining the concept of the Twitter-hype and positioning it in relevant media studies theoretical research. The deliberate approach of data research in combination with this theoretical framework gives way to the practical approach of this thesis. The latter section covers the methodology behind the one proposed in this research dissertation.

(8)

The second paragraph covers the actual Twitter-hype methodology. It focuses on a multitude of aspects that are important for its critical and qualitative analysis; included are prime directives such as data retrieval and visualisations, and key techniques for data analysis and research. Also included are extensive reports on the considerations for this data; while the methodology aims to guide deliberate research, there are still aspects to data analysis that need further verification and justification.

In order to demonstrate this methodology and showcase what kind of analysis and research can be achieved, a concise case study is covered in the third paragraph. Focusing on the account @PharrellHat, it is the aim to show how even a contained hype can be thoroughly investigated and how its analysis and outcome can add to the qualitative media studies research agenda. The case study deliberately leaves some analytical elements open for interpretation in order not to draw attention away from aspects that were covered in the methodology, but not in this research study2.

The fourth paragraph is a reflection of the methodology and the prominent overall questions that arise in Twitter-hype research. The discussion is two-fold, both reflecting on the contribution of the own work and how this can be translated into a broader perspective for future research and the social repercussions of such hypes.

The last paragraph is a conclusion to both the thesis and the methodology, to neatly sum up the research and its contribution. In the hope that any aspiring student, willing to read a two-page conclusion, will afterwards read this thesis and delve into such Twitter-hype research too.

2_{For a more extensive study and rigorous assessment of a Twitter-hype, see “The Twitter-Hype”}

(9)

1. On the concept, theory, and approach

1.1 Behind “The Twitter-hype”

It is always difficult to introduce concepts and vocabulary into a research field; at the same time, it should be the intention of researchers to “help accumulate knowledge” (Rudner and Schafer 1). Similarly, providing a new methodology -- or, at least, critically assessing how different methods can be employed to analyse significantly popular Twitter content -- is a worthwhile exercise. Critical thinking on existing methodology and research agendas is arguably a contribution to any field, in this case for media studies.

While the term itself might be new, the concept I propose as the Twitter-hype is deeply rooted in media research studies and tied to its existing terminology. In the aforementioned Bachelor’s thesis, the Twitter-hype was defined as “simply a media-hype that originated from the social media platform Twitter” (Bovelander 5). While this definition is to a certain degree suitable and still true, it is rather crude and requires considerable explanation and specification. Perhaps it best to explain the reasoning behind arguing for this new concept, before a more extensive definition is offered. Considering this, the concept is very much influenced by prominent research notions -- the same goes for the approach taken in this thesis, which is inspired by Richard Rogers’ Digital Methods. Rogers introduces his book not as a methods book, but as “a methodological outlook for research with the web” (1), a way to study and engage with online devices in order to understand them fully. By “thinking along” with the digital objects, by scrutinising them in great detail and recombining their features, Rogers argues that the purpose is not to engage with an online device solely to understand its workings better, but more so to understand and recognise the potential for social and cultural research questions that lay underneath. With the Twitter-hype, the aim is to resituate a portion of digital methods in a similar manner as Rogers’ Digital Methods: to show the individual attributes and features of the platform, to highlight what elements can and should be investigated, and how the combination of these components gives way to research questions pertaining to the understanding of Twitter-hypes and what they can tell us. For this reason, it is difficult to give a full explanation of what a Twitter-hype specifically is or covers; it

(10)

this thesis intention to provide a methodology that allows for the definition of what makes a significant burst of attention a true Twitter-hype, or at least to define what aspects or elements of the investigated subject are truly a ‘hype’. In conclusion, the methodology proposed here uses the new term Twitter-hype for two reasons: (1) because of its aim to contribute something new to the field; and (2) because the term is a reference (perhaps even a homage) to its theoretical, but even more so the practical inspiration: the media-hype.

1.2 Theoretical framework

The media-hype, Twitter studies, and Internet research

Genealogically, the Twitter-hype is inspired and named after Peter Vasterman’s research of the media-hype. A great definition for the concept is given in “Media-Hype: Self-Reinforcing News Waves, Journalistic Standards and the Construction of Social Problems”, published by Vasterman in 2005. The media-hype is a “media-generated, wall-to-wall news wave, triggered by one speciﬁc event and enlarged by the self-reinforcing processes within the news production of the media” (515). Vasterman’s study into and definition of the media-hype is orientated on traditional journalistic media; it focuses mostly on newspapers and television reports. This emphasis on journalistic practices, principles, and consequences makes Vasterman’s media-hype analyses a highly qualitative and impressive investigation, but it also creates a small rift when compared to the Twitter-hype. Where the media-hype is mostly focused on higher-quality news events -- although it is reasonably argued that the coverage will change because of self-reinforcing tendencies -- a Twitter-hype does not necessarily cover such a high-profile topic. The media-hype as theorised by Vasterman is deservedly the namesake of the Twitter-hype, as its definition and key dynamics are highly relevant for the qualitative analysis of Twitter news waves. Also covered in the definition is the one specific event; the isolated nature is a key point for the analysis, as the Twitter-hype aims to be as qualitative as can be.

This means that the research does not follow the ‘big-data-related’ data collection principles, but instead focuses on smaller and highly specific datasets, carefully selected and acquired. Consequently, for the Twitter-hype, the “one specific event” translates into the analysis of one or a few related accounts and/or hashtags.

(11)

Unlike more contemporary Twitter studies, this Twitter-hype analysis does not translate well into analysis of extravagantly big data sets, although it is possible. Like Vasterman’s media-hype research, this methodology focuses on precise analyses; its emphasis is on analysing very specific elements and aspects, such as classifying content, detecting characteristics, recognising patterns, and more. There are multiple reasons as to why Twitter is a viable ground for hype analysis. For instance, Twitter is undeniably a large network platform, ranking ninth on the Alexa’s list of global “top sites” (Alexa, “Twitter.com”) and having over 300 million monthly active users (Twitter, “About”). As such, there is considerable activity and interaction taking place on the platform, which makes it an interesting site for analysis.

In “Debanalizing Twitter: The Transformation of an Object of Study”, Rogers classifies Twitter into three (historically) different stages. The first stage (Twitter I, 2006-2009) identifies Twitter as an urban lifestyle tool, meant for ‘banal’ interactions -- content covered the life of its early users tweeting about casual daily topics (e.g. lunch specifics). Rogers argues that the second stage, (Twitter II, 2009-2012) was essentially disruptive because users started using the platform for elections, disasters and revolutions (5). This change is highlighted when the placeholder text in the tweet bar changed from “What are you doing?” to “What’s happening?” While this was explained as merely a change to “make [Twitter] easier to explain to your dad” (Stone), the new phrasing can be seen as indicative of the platform Twitter aims to be: not one for just ‘banal’ socialising, but almost as an investigate platform to be used for the latest scoop, update or happening. As a more ‘journalistic’ platform, Twitter could be of a lot more interest to research projects. For Twitter-hype research, the question “what is happening” is more accurate than its more personal predecessor. Twitter III, which views the medium “as archived data set” (7), sets the tone for much of this thesis:

Twitter is particularly attractive for research owing to the relative ease with which tweets are gathered and collections are made, as well as the in-built means of analysis, including RT (retweets) for significant tweets, #hashtags for subject matter [categorisation], @replies as well as following/followers for network analysis and shortened URLs for reference analysis. (Rogers 7)

Similarly, this methodology is deeply rooted in Internet research: firstly because of the fact that Twitter was designed for and exists on the medium; and secondly

(12)

because conceptualising a new analytical toolset for the investigations into such significantly popular Twitter accounts is linked intricately with what Markham and Buchanan identify as Internet research:

The internet is a social phenomenon, a tool, and also a (field) site for research. Depending on the role the internet plays in the research project or how it is conceptualized by the researcher, different epistemological, logistical and ethical considerations will come into play. The term “Internet” originally described a network of computers that made possible the decentralized transmission of information. Now, the term serves as an umbrella for innumerable technologies, devices, capacities, uses, and social spaces. Within these technologies, many ethical and methodological issues arise and as such, internet research calls for new models of ethical evaluation and consideration. (Markham and Buchanan 3)

The hype in theory

Perhaps the question as to why the Twitter-hype should be considered as an important topic for (media) studies research is only partly explained when the media-hype and Twitter studies are covered. As was argued earlier, the most important prerequisite for the investigation of what could possibly be a Twitter-hype is a burst of attention; the significance of the burst is for the researcher to assess, critique, and characterise. In its own respect, attention is a crucial aspect for research as well. Same as when Michael Goldhaber argued that the “economy of attention” is the prime economy of the web, the point can be made that basic web (or at least Twitter) interactivity begins with the perhaps relatively logical condition of attention (“The Attention Economy and the Net”). Crogan and Kingsley extend on this when they state, “where Goldhaber’s analysis rings true is in predictions that we would increasingly place import upon online social networks and the diverse means by which they are accessed” (5). In light of the Twitter-hype, I would extend on this by stating that ‘we’ should not only investigate the means how Twitter is accessed, but highlight in great detail what the characteristics of those means are; to analyse user attention and see how the specifics of the attention surrounding a topic are indicative of the way a hype works and progresses. The theoretical frameworks behind investigating media-hypes, Twitter as a platform, and the importance of user attention on the web provides an excellent structure for this Twitter-hype methodology. Moreover, as Bruns and Burgess argue, “the time spent developing Twitter research methods remains time well spent” (11).

(13)

1.3 The Twitter-hype

Finding a Twitter-hype

As was mentioned earlier, the methodology covered here can be applied to analyses of either @accounts and #hashtags -- the prime precondition being that the subject is a well-contained entity. It can be argued that if the case is a properly isolated incident, for instance in the form of a account or hashtag dedicated to a specific happening or media-event, the analysis of the case will be as precise and accurate as possible. For this reason, it is perhaps true that this methodology is, in its current form, more applicable to and useful for the analysis of accounts. Firstly, because the origins of a Twitter-hype are more easily and accurately pinpointed when the content history of a subject is contained within a single account. Secondly, if there are fewer variables -- e.g. lesser accounts -- there is more chance for a structured and focused set of conclusions (more on this in 2.1 “Data analysis”). As investigating hashtag hypes in a similar way is certainly possible, there will be some focus on such research as well. Another reason why hashtag analysis is arguably more burdensome than the investigation into account hypes, is because it is significantly easier to instantly collect a relevant dataset: often, account hypes are more accurately measurable than hashtag hypes. Firstly, because account hypes rarely include spam; and secondly, because account hypes do not often deviate from the topic of the hype. The former is most of all an issue for the analysis of hashtag hypes; spam results skew results in a qualitative content analysis. However, it could also be argued that spam is indicative of a Twitter-hype: the accounts (or their instigators) responsible for the spam have ‘noticed’ the considerable attention around the subject too and try to make some advantage from it. In the case that the latter is the case, than this data presents an interesting opportunity to the researcher (more on this in 2.1 “Data analysis”). Even though the considerations for the existence of spam in a dataset are interesting, it can be argued that spam is too variable to be considered a part of the data collection (as a result, thoughts on this debate can be found in 2.4 “Data considerations”).

The beginning of researching a Twitter-hype is perhaps primarily based on deductive reasoning: a wave of attention is noticeable, and the growth of the account or popularity of the hashtag in question is significant. This is not too difficult to ascertain, as Twitter shows a lot of numbers in relation to its features: following,

(14)

followers, and tweets are counted for every user; retweets and favourites are counted for each and every tweet -- for which time and date specifics are included as well. When a potential Twitter-hype is recognised, it is (perhaps logical, but still) important that the researcher starts acquiring a tweet dataset as soon as possible.

The Twitter-hype: patterns, indications, and characteristics

Before delving into the practical approach and the methodology itself, it is perhaps best to list of the patterns that will be investigated. These patterns are based on the theories discussed earlier, which are applied to the data that is going to be collected (in the form of the content and other material from and surrounding the account). The whole of these patterns is what can be understood as the Twitter-hype. This section is, like the other sections in this paragraph, meant as an introduction: the patterns, indicators, and characteristics that can be considered as indicative of a Twitter-hype are frequently detailed, explained, and theorised throughout the whole methodology. Overall, a Twitter-hype is (1) inherently self-reinforcing, and (2) is indicated by the following patterns: positive feedback loops, a key event, a unifying news theme; lowering of news thresholds; interactive media momentum; and the eventual decline of the news wave (“Media-Hype 515-16). The goal of the methodology is to present a set of methods to allow researchers to find data evidence that supports these patterns.

The key methods that are extensively explored throughout are (1) the overall time span and individual frames identification, (2) quantitative research principles, (3) qualitative research principles, (4) follower rise research, and (5) data research, or contextual background research. The quantitative research principles cover aspects such as the content origin diversity ratio (or tweets-to-retweets ratio) and quantitative classifying principles (classification based on replies, hashtags, URLs, et cetera); the qualitative research principles covers content popularity analysis (the importance of retweets and favourites for the account’s own content) and qualitative classifying principles (classification based on the content’s tone, semantics, or if a tweet is on or off topic in the grand scheme of the account’s content).

Over the course of this methodology, the hype patterns and investigatory methods are extensively discussed -- both as independent entities, but through their juxtaposition. While different research techniques and analytical angles are also noted throughout, these form the frame the Twitter-hype analysis should be built around.

(15)

1.4 Practical approach

Behind the methodology

As I have argued, the methodology that is covered here aims to be as comprehensive as possible. As such, the starting point for investigating a Twitter-hype is not a given; there is much leeway for choosing the subject of analysis. As was also stated, this methodology works best for somewhat contained subjects; the case study that is briefly covered in this thesis surrounds one singular account, and limits its analysis to investigating its origin and diffusion. The theoretical framework for Vasterman’s media-hype includes a set of dynamics; patterns that emerge following the analysis of the hype (“Media-Hype” 513). The dynamics identified are “positive feedback loops; a key event; the news theme; lowering of news thresholds; interactive media momentum; and the eventual decline of the news wave” (513-15). For Twitter, topics of such nature can be quickly identified. For instance, positive feedback loops such as the Twitter retweet function are indications of popular content. The key event and news theme could arise in the form of a viral account or a hashtag, which can come to the researcher in any way. It is not the intention of this methodology to guide the researcher to a solid subject for a Twitter-hype analysis, but to provide the tools and techniques for analysing an account that has known a significant burst of attention, e.g. in the form of excessive retweets or even media attention. After following the provided methodology, the researcher should be left with an account of why the chosen subject is a Twitter-hype and what characteristics of this individual hype can contribute to the greater understanding of hype methodology and analysis.

Structuring the Twitter-hype methodology

The methodology is divided into different sections, respectively covering data collection, research, analysis, considerations, and a report structure.

The section on data analysis is ostensibly the most important part of this methodology, as its prime contribution to the field is defined by the prospect for high-quality research reports. In this section, the conditions for ‘a Twitter-hype’ will be abundantly clear through its focus on important hype analysis techniques. The analytical strategies covered in this section are invaluably inspired by Vasterman’s work on the media hype, as well as his joint efforts with Nel Ruigrok for the analysis

(16)

on the “Pandemic alarm in the Dutch media” (436). As also covered in the previously mentioned early Twitter-hype thesis (Bovelander 2014), the data research procedures include “time framing [... and] classifying principles” (9), but also techniques for the analysis of miscellaneous content, reconstruction narrative techniques, and other thoughts on procedure designs.

The section on data research covers the other part of the hype aspect, namely the attention surrounding the subject. As such, it has strong ties with other forms of data research, as it includes research techniques for finding information that enriches the hype; covered are aspects like alternative media reports, off-platform interactions with the hype, and adequately doing follower rise research.

The section on data collection covers the principles of Twitter-hype research, by dictating what features and characteristics of the platform should be assessed, and how they can be collected and compiled.

The section on data considerations is an invaluable part of the overall Twitter-hype analysis. As the proposition for this methodology originated from a quite critical outlook on data research in general, a section on considerations proved necessary. The more critical approach to hype analysis was also inspired by Rogers, when he asks in the introduction of Digital Methods whether “a particular hashtag, and its set of most retweeted tweets, [organises] a compelling account of an event” (1). As such, this section includes a more ‘philosophised’ and theoretically speculative view of the possible deficiencies of the Twitter-hype methodology. It aims to make the researcher that is following this methodology more critical of their data; to make them question the outcome of their analysis. It does so by sometimes even hypothetically proposing where data might fall short of expectations or even ‘the truth’, which is eventually continued upon in the discussion section.

The paragraph finishes with a research report outline proposal, in which all the methodological elements that were covered are included. It could be considered as an outline, or a reference sheet, to make the steps and methods abundantly clear. It also serves as a more ‘natural’ transgression into the case study on the @PharrellHat hype that follows the paragraph.

It should also be noted that, as this methodology is already quite extensive, the choice was made not to include any extensive case studies, save for a moderately modest analysis of the viral @PharrellHat account (covered in paragraph 3). As was previously mentioned, a Twitter-hype analysis was envisioned earlier in the form of a

(17)

Bachelor’s thesis. While this methodology greatly extends on the research practices of such an approach, and puts heavy focus on the proposal of new methods, overall clarifications, academic justifications, and the positioning of such research within media studies, the @SochiProblems case study is at times referred to. The study into

@PharrellHat should therefore not be seen as a definitive improvement on the

@SochiProblems investigation, although it certainly touches other aspects of the hype and makes a different assessment of the characteristics surrounding this Twitter-hype. Overall, the methodology presented here is in many ways an improvement upon the older work, but @SochiProblems analysis is still highly relevant because of its links with elements in the methodology and the great care that was put into the case study.3

Finally, it should be stressed that while this thesis is both a methodological outlook, much like Rogers’ Digital Methods, and a theoretical qualitative investigation into hyped content, much like Vasterman’s “Media-Hype”, it does not seek to provide a

singular answer as to what Twitter-hypes can tell us. If anything, the methodology

proposed here proves that there is no definitive or all-encompassing answer to this question -- albeit it is a very interesting one and certainly worth exploring in the future. The aim is to present a methodology that allows the analysis of a Twitter-hype, regardless of its nature, and to show how the characteristics of hypes can be brought back to an overall claim. This methodology provides both outlook and a set of methods; it balances between offering tools and practices for studying, analysing and characterising a Twitter-hype and contributing to the media studies field by partaking in (and highlighting) theoretical discussions of the merits of qualitative Twitter content analysis. As such, its goal and contribution is to encourage research and thought on what a (sudden, isolated, and coherent) burst of Twitter activity can tell us by approaching it from this Twitter-hype methodology.

(18)

2. The Twitter-hype methodology

2.1 Data analysis

‘When is a hype a hype?’

In essence, the methodology proposed here is meant for the investigation of Twitter content that has proved itself subject of significant attention, to contextualise this data with other reports, and use both qualitative and quantitative research procedures to construct a narrative for the topic. Various conclusions can be drawn from this narrative, for instance whether or not the researched topic can be specified as a full-blown Twitter-hype and what elements of the hype can be identified as key characteristics for this hype.

In “Media-Hype”, Vasterman identified a set of criteria for the identification of a media-hype: “a key event; a consonant news wave; a sudden increase in reports on comparable cases; and a strong rise of thematically related news” (516). Perhaps the latter two criteria are not that likely to be abundantly clear for Twitter-hypes, because the analysis of one account (such as @PharrellHat) or one hashtag will most likely result in a very singular thematic structure. Then again, deviation from the topic might only occur in the form of retweets made by the account; related news may become obvious through associated hashtags or constant acknowledgement of the related topic within the content. The identification of the key event could also require more in-depth research; the unifying theme is not necessarily the key event, as the theme could also be a natural progression from the key event and perhaps even the result of the hype’s own self-reinforcing nature. The consonant news wave could be the reason for investigating the topic in the first place, e.g. when the researcher’s interest is piqued by the continued appearance of the topic on Twitter, or in another way prominently present (on the web). In researching the topic, the consonance of the news wave should be an important matter of concern. The research into whether the account or hashtag that is the focus of the investigation has been a consistent theme can provide many insights and considerations -- why the theme could have changed, how this deviation changes the analysis and outcome of the research, and what kind of reaction to this change can be seen in the data. In order to maintain theoretical consistency in the methodology, the criteria defined by Vasterman have a similarly

(19)

important role in this methodology, for the consideration whether the investigated topic is, or which characteristics are indicative of, a true Twitter-hype.

Research analysis procedures

As demonstrated in the earlier sections, there are vital components to the analysis of Twitter-hypes: a set of specific analyses (content quantity analysis, content quality analysis, its contextual analysis, and the follower rise analysis), research structure characteristics (a specific narrative, research methods and procedures design, and conclusive assessment), and accompanying result characteristics’ analysis (in which the results are contextualised to make precise demonstrative statements). In this section, all these components will be (respectively) covered. As is the same in the previous sections, there is again a heavy focus on account analysis -- hashtag analysis is also covered, albeit again in a more reserved fashion.

However, the first procedure that should be investigated is the establishment of a time span4. While the contribution of this methodology in general is the careful analysis of Twitter activity, the backdrop of the Twitter-hype analysis asks for a more detailed analysis of the development of these surges of attention. With any hype, there is undoubtedly a kick-off, often in a form of key event. As the topic selected for such research is already decided on by the investigator, and with the dataset also collected, this initial start is often fairly straightforward: it is the first Tweet or piece of content from the account surrounding this topic. For instance, the @SochiProblems account was created specifically for covering that subject; their first tweet is easily identifiably as the start of the time span (Bovelander 8). The same is true for the @PharrellHat hype: the start of the timeline is marked by the first occurrence of content for that account. But while the start of the timeline might be straightforward, it requires more effort to find a proper justification for dividing this into frames. In “Pandemic Alarm in the Dutch Media”, the timeline is divided into three time frames, based on an “alarm [, a] preparatory [, and a] crisis” stage (440): a division that was based on a combination of news waves and key events. For the analysis of @SochiProblems, the division was based on a combination of content quantity and follower rise. In both cases, the division was based on specific stages in the overall timeline: the former focused on three waves of rise and decline (Vasterman and Ruigrok 440); the latter on

4_{Note that this methodology uses the term ‘span’ for the overall time, whereas ‘frame’ is used to}

(20)

one point of significant rise and decline, with the three waves being ‘start to initial rise’, ‘initial rise to peak’, and ‘peak to stagnation’ (Bovelander 225). As such, it can be argued that the division of time frame stages can be either based on initial data research, or eventual data analysis. In the latter case, the only vital attribute this division (or identification) depends on is consistency with the hype methodology: what is specifically hype-related about the time span and frames? Over the course of the previously mentioned vital components, every analysed aspect will also focus on what is indicative for a hype -- as well as again focusing on what the analysis on itself can contribute to media studies research.

Content quantity analysis

The analysis of content from the standpoint of its pure quantity forms the basis of this analysis. The collected Twitter data is arguably very accurate (with a severely limited amount of spam or irrelevant content); the previously mentioned unifying news theme should be present over the course of the data set. Quantitative values that exist in the data set are not necessarily just the numerical values; aspects such as the retweets and favourites (literal ‘counts’) that a tweet has collected are perhaps more so indicative of quality. True quantitative elements of the data are those informational elements that need to be analysed in order for them to mean something: the content origin diversity (or ‘tweets-to-retweets ratio’), and its classification are significantly important for the Twitter-hype analysis. Admittedly, it is firstly necessary to analyse the context in which the hype takes place -- content quantity is arguably more easily analysed and investigated when the researcher is already aware of the ‘bigger picture’, as it were.

Within the isolated news wave that is the account in such an investigation (or, if executed properly, hashtag research), it is key to identify both time spans, and what could be a possible framing of the Twitter-hype. For the case of @SochiProblems and

@PharrellHat, the time spans were based on the periods of significant follower rise in

combination with the placement of the most popular content. When considering that “virtual worlds, by their nature capturing a complete record of individual [behaviour], offer ample opportunities for research”, it is important to “understand the impact” of any characteristics, factors, and structure as variables (Lazer et al., 3). Structures such as a possible encompassing time span are crucial to investigate further. In the cases of

(21)

@SochiProblems and @PharrellHat, combining the results from the data analysis in

accordance with contextual background research identified the time spans. Of equal importance is the identification of a large span -- to find the context behind the hype, to investigate what aspects of the popularity of the account can be attributed ‘outside’ the hype: for one case, this proved to be the Olympic Games span; for the other, the Grammys live event span. It is also possible that an account becomes the centre of such significant attention bursts that it creates its own specific news span. In the case of the hashtag-hype #PGBAlarm, the account lead to official enquiries being made in ‘de Tweede Kamer’ (the Netherlands’ equivalent of the House of Representatives). The use of the hashtag launched an initial hype, but due to its backlash and sensitive nature, the Ministry of Public Health saw fit to respond to the hype publically (Van der Velden, “#PGBAlarm”), which resulted in another wave of significant attention. Since, the topic has consequently been sporadically a news item and (even in May) has the hashtag been used daily since its first occurrence in late January. Ergo, it is of great importance to analyse the time when an account or hashtag is initiated, when the content is most popular, and any notable momentous changes in the hype’s progress.

One of the first steps is to investigate the content origin diversity ratio (which can also be called the tweet-to-retweet ratio), i.e. all the tweets and retweets that make up the account’s content. As Twitter uses the same term (‘retweet’) for both a content item (‘a retweet of a tweet posted by another user’) and its function (‘a tweet amassed a number of retweets’), it is important to make it abundantly clear which is analysed. In this analysis component, the content ratio can help assess whether the account is dependent on its own content, or acts more as a gatekeeper of other (popular) content. Gatekeeping, “the process by which selections are made in media work, especially decisions whether or not to admit a particular news story to pass through the “gates” of a news medium into the news channels” (McQuail 213), in a way it would present itself for a Twitter-hype, would be characterised by a higher amount of retweets. With the tools mentioned in the section on data collection (see 2.3.), the python script and the DMI-TCAT can both result in CSV files6. In these files, a retweet is indicated by the appearance of “RT” in the tweet text, followed by “@username” of the account that originally posted the tweet. It is also possible the retweet was posted manually, by copy–pasting the text and adding RT: in this instance, the tweet should be counted

(22)

as own content. The origin of the tweet is easily seen in the results, as screen_name (“the screen name, handle, or alias that this user identifies themselves with”) and the

id_str (“the string representation of the unique identifier for this User”) are gathered

in both tools (Twitter, “Users | Twitter Developers”).

The necessary classifying principles are based directly on the combination of these two investigations; analysing the content in the context of a media-hype, live event, or any other (news) span can lead to a greater understanding of the tweets and their popularity. The classification can be based on the different time spans that were identified in the overall span; the analysis should focus on characterising the activity surrounding the content for both time frames and overall span. Firstly, the tweet-to-retweet ratio should be ascertained for every frame. Differences in the ratios between time spans can be either attributed to a change in the time span (e.g. the emergence of an important actor within the debate, or another possible hype account), or has to be analysed for qualitative values (methods for which are discussed later on). Of equal importance is the overall size of the data set, and how this is translated into both time frames and span. For instance, if the account generates most of its content over a short time span or early in the overall span, it can be argued that the hype is most likely centred around a live event or recent happening (again, qualitative assessment should make the reasons quite clear). The quantitative side of the content is also important to contextualise any findings, as this side reports on the characteristics of how the hype progressed in terms of ‘just’ data. As such, it is important to find the structure within the content, like through investigating how much content there is and when this was generated. Through the framing, any changes in rise and decline of post quantity can be rationalised or contextualised as significant; an important research procedure could be to investigate when and why the account was most prolific, also specifying when and why content existed more of own tweets, or retweeted content of other accounts.

Content quality analysis

The content’s quantitative analysis is arguably the structure of the hype: a rigid shape in the form of a bulk of content proved popular -- when contextualised and analysed, a physical pattern of this popularity should be obvious within this structure. In order to characterise this pattern, it is needed to analyse the content’s qualitative values. So similarly, the content’s qualitative analysis is arguable the tone of the hype: very specific characteristics that need to be identified to understand the details behind the

(23)

popularity of content. As such, the content quality analysis focuses on aspects such as the retweets and favourites that an account’s own content generated, and specifying a number of different principles meant for classifying the content.

An important feedback loop for Twitter content appears as the retweet and favourite functions. In the section on data collection it was stated that these are for the retweeted content of lesser importance than for the accounts own content. While this is put perhaps somewhat blunt, the reasoning behind this is logical: retweeted content has been exposed to a significantly different environment than own content. The tweet was not sent by the same account, could have been exposed to a significantly different follower group, and could break with identified classifications, e.g. in terms of style. Instead, the popularity of these retweets should be considered for the context analysis, where there is less emphasis on precise data, but more so on considerations on what this extra information can tell us. In order to gain a clear sense of what content is most popular, it is important to structure this data, as tables or figures, in (at least these) three ways: (1) the total of retweets and favourites; (2) the average amount of retweets and favourites; and (3) a top ten list (or equivalent ‘top-hashtags-list’) of the most popular tweets, for both retweets and favourites. Same as the previous section, there should be a strong focus on the different time spans within the overall span, so the same three analyses should be applied to any time spans that were identified. However, it is also possible that contextual and quantitative analyses have not (yet) resulted in a division of the overall span into specific time spans. In such an instance it might be an option to consider the outcome of this analysis as indicative of important spans or events. In the case study on @PharrellHat, a second key event was identified when a graph was visualised out of the overall retweets and favourites data; the graph showed a notable spike. Upon inspection, it appeared that the spike appeared after a key event -- and the popularity could be also seen in the follower rise analysis. So while visualisations can not only support and clarify the investigation’s results, argumentation, they can also serve an analytical function. It can be states that the grouping of data in graphs, tables, spreadsheets, and charts is a highly worthwhile activity. However, the key condition for these visualisations is that they serve the report: the analysis should not be limited to just findings, but instead serve the report’s overall research claim -- “do not include a table or graph unless it is discussed in the report” (Rudner and Schafer).

(24)

Varying from the approach taken for quantitative analyses, the classification principles for qualitative analysis are mostly based on subject matter. One of the key classifications that should be considered is if the content is ‘on-topic’ or ‘off-topic’. As Vasterman notes, because of the pressure of popularity surrounding these hypes, “news thresholds” will be lowered (“Media-Hype” 514), resulting in the production of content that might not be a reaction to the actual hype, or in the same tone as the rest of the content, but more so a reaction to the popularity around the hype. It is for the researcher to assess whether the content (both for tweets and retweets) is in the same line as the other content items. The classification of content can also be performed at a more recognisable level, for instance the ratio of stand-alone tweets to interactions with other Twitter accounts (so-called @-replies) or the ratio of content with solely text to that with images. It is also possible to produce a different classification based on any linguistic or semantic format, like was done for @SochiProblems (Bovelander 10-11), and use this for a qualitative analysis to ascertain what kind of tweets, such as content in a specific tone (Bovelander 19-21; 25-27), is most popular. If “linguistic change [...] has cultural roots” (Michel et al. 176), it is of similar logic that web (or even Twitter) specific linguistic characteristics have cultural roots that are telling for the ‘culture’ of the web medium. While linguistic analysis is admittedly difficult and could leave a lot of room for interpretation, the semantic structure of a content item can also be seen as the inclusion of other elements, such as memes, Internet jargon, or other cultural references. As “data sets [...] offer some qualitatively new perspectives on collective human [behaviour]” (Lazer et al. 3), it is equally important to translate the ‘most popular class’ within a set of classifications to a meaningful conclusion for the whole data set.

Context analysis

This latest point, how it is key to explain what the popularity of one certain category of tweets means for the entire researched hype, is reflected in another methodological strategy in this analytical toolkit: contextual analysis. Simply stated, all conclusive findings that are based on well-supported methods should be contextualised in light of the Twitter-hype that they appear in. This presents the opportunity to make definitive claims about the Twitter-hype (and its methodology) in general, and contribute to its understanding and contribution to the research field of media studies. When it is kept in mind that “methodological and theoretical issues are supremely important when it

(25)

comes to studying a complex communication system such as Twitter” (Rieder, “The Refraction Chamber”), the researcher should feel compelled to strive towards constructing more methods for analysis that could result in substantial knowledge on how Twitter-hypes behave, but also how its methodology can be extended upon. In a way, results should be questioned -- much like this methodology is based on a questioning of data. The ‘questioning’ of data, which is “closely related to research methods”, results ostensibly in data-driven research -- and for data-driven research, its “methods are most valuable when they enable scholars to ask new questions in new ways” (Borgman 19). Context analysis should not be limited solely to the results, but is also a technique for the analysis of miscellaneous content. In the (upcoming) case study of @PharrellHat, initial background research resulted in the finding of an extensive blog post. It was a retrospective account of the initiator of the Twitter handle, in which vital information was given for the establishment of time frames within the span, but also for the context behind some of the account’s content and popularity aspects.

Context analysis: follower rise analysis

A prime example of a popularity aspect that requires thorough context analysis is the investigation of the follower rise: it is perhaps one of the most challenging aspects to research. As this is neither a built-in commodity on the platform, nor a function that is included for scraping through the Twitter API, the research for follower rise has to most likely be done manually. When the hype is still happening, one could keep count of the follower count and log the count per day or so in a spreadsheet. For account hypes that existed more in the past, research into follower rise can be conducted on three different levels: (1) within the Twitter account’s own tweets, (2) within other Twitter account’s tweets; (3) via studying off-platform content. As the popularity of the account and the topic or event rises, it is logical to assume that the amount of users following said account will rise too. In many cases, either the account itself or another account will actively keep an eye on the follower count7; the follower amount almost resembles a reward model of some kind, with users paying excessive amount of attention to higher numbers. Followers have become a literal commodity, with a multitude of companies offering varied amounts of followers for a one-time flat fee or

1_{As can for instance be seen from the Twitter results for “just passed ... followers”:}

(26)

other form of payment, such as Devumi8, FastFollowerz9, or TwitterBoost10. It also possible that the follower count is being monitored off-platform. In the case of

@SochiProblems, the account became an unwitting rival of the official Olympic

Games Twitter account -- and as soon as the mock account garnered more followers than the official account, multiple news articles were released, often detailing the (at moment of writing) exact follower count (Bovelander 17-29). Accordingly, all articles that included the account name and mention of the query “followers” in either title or article were collected using specific search engine searches. This was combined with a Twitter site search for the queries “@SochiProblems”/”SochiProblems” and “followers”/”following”/”follow”, which resulted in many instances of users giving the follower count. When the results were put together and visualised in a graph -- save for a few findings that conflicted with other data, like belated retweets --, this form of data research resulted in a fairly accurate representation of the follower build-up. The same research, visualisation, and analytical method was used in the case study for @PharrellHat, which further adds to the argument that for highly popular Twitter accounts, even challenging data can still be found and included in the qualitative and quantitative analyses. While it can prove quite difficult to gather a representative progress of the follower count, it has proved to be possible if the hype is substantial enough. If the researcher is technologically adept enough, it is possible to enhance the Python script11 with a “followers_count” function and run it through a Python library, such as the Advanced Python Scheduler12 -- simply put, this prompts the Python script to gather the follower count automatically, e.g. every other hour or day. Admittedly, this is only an option for the investigation into a (potential) hype that is still going on, and in its beginning phases. Follower rise is of importance for the investigation of Twitter-hypes, as “[follower] relationships are usually based on a longer-term interest in updates from the followee” (Bruns and Burgess 2). The long-term interest can easily be translated into a willingness to know more about the hype or as a feedback model for the account, as it both (1) indicative that the account is popular and users like its content; and (2) the account is ‘pushed’ to resume content item creation. As such, it is also possible to base time spans within the overall span on

8_{http://devumi.com/twitter-followers/} 9_{https://www.fastfollowerz.com/} 10_{http://twitterboost.co/}

11_{See appendix A, “all_tweets.py”.}

12_{Advanced Python Scheduler. Eds Alex Grönholm et al. APScheduler. 9 March 2015. Web. Accessed}

(27)

the rise (and potential falls or stagnation) of the follower count. It is similarly a relevant exercise to contextualise popular content (through the retweets and favourites count) with the follower rise. Follower rise analysis similarly offers the opportunity to the researcher to claim that the investigated account is truly a Twitter-hype, e.g. when initial strong exponential growth is evident in the follower graph.

‘When a hype is a hype’

Overall, all specific methods that are covered in the case study’s methodology should serve the overall purpose of a Twitter-hype methodology: to assess whether or not the investigated account is a Twitter-hype. While all the methods mentioned above can be used to answer this question, they are also applicable for Twitter analysis without the prospective objective to identify and characterise a Twitter-hype. Partly, that was also the methodology’s goal: a multipurpose analytical toolset is not only easier to propose as a (new, whole) methodology, but is also a greater contribution to the research field. By reflecting on the methods individually before bringing them together, a researcher can -- regardless of the Twitter-hype identification goal -- take any of the techniques and use them for quantitative or qualitative analysis. Certainly, the strong emphasis of the methodology on strictly popular content narrows down the possibilities in some way, though it can be argued that research into more unembellished or banal activity (Rogers, “Debanalizing Twitter” 2) requires a whole different analytical tactic. An example of this is the “Mining One Percent of Twitter” research, in which a “one percent sample of all tweets posted during a 24-hour period” was analysed, regardless of content, origin, or popularity (Gerlitz and Rieder).

In order to support the claim that an investigated account is a true Twitter-hype, the previously covered methods should all indicate that the popularity of the account (in terms of retweets and favourites, and follower rise) was an extensive and accelerated event. This should be indicated by strong rises in certain time spans, lesser popularity in other time spans, and reviewed against the overall time span. For both

@PharrellHat and @SochiProblems, the hypes were characterised by an explosive

growth in follower count and initial highly popular content, which then stagnated on every possible front, as the accounts generated fewer content, attracted a fewer rise in followers, and procured fewer retweets and favourites for their own content (both on average and overall). This is indicative of true hype behaviour -- the true Twitter-hype is a dynamic wave, “created by the self-reinforcing processes” (Vasterman,

(28)

“Media-Hype” 527). A Twitter-hype self-reinforces itself through producing popular content, or starting with continuing to produce content at all once the hype is ‘on going’. Another factor in the reinforcement of the hype is the account being covered through alternative media sources: these can either be found in the data set in the form of RT (retweeted) content, or the contextual background research covered earlier.

These dynamics already match up with some of the ones covered the set of dynamic as identified by Vasterman, which is the whole of positive feedback loops; a key event; the news theme; lowering of news thresholds; interactive media momentum; and the eventual decline of the news wave (“Media-Hype” 513-15). The positive feedback loops can be identified as the most retweeted content: a tweet that is referenced over and over again, thereby becoming the most ‘important’ form of content. The key event is obvious in the data set -- in the form of content, a #hashtag, or sometimes even the account name -- and the starting point for the hype, e.g. the miserable conditions journalists encountered in Sochi for @SochiProblems, and Pharrell Williams sporting an eye-catching fashion accessory during the Grammys for

@PharrellHat. While the event triggers the hype, the other dynamics give way to its

overall nature. The news theme is what follows: “it structures the hunt for newer news about the case” (Vasterman 514; Brosius and Eps 395), which is characterised by the tone that the Twitter-hype takes; whichever classifying theme proves most popular is likely to dominate the hype. The consonance of the news wave should be clear here; at least for one time span, the account should focus on its actual theme and produce content. While @PharrellHat was true to its character for its most crucial time frame (the Grammys), @SochiProblems broke with its theme (problems at the Olympics) and changed its production of content significantly -- the latter stalled the hype, as is evident in follower counts and retweets/favourites averages for the content items. The sudden increase in reports on comparable cases should be analysed by looking at any replies and retweets the account made in regard to external content -- when other Twitter accounts become aware of the hype, they might seek to interact with the hype-account, and/or post similar content. In the case of @PharrellHat, Arby’s was a major actor who released related ‘news’, joining in on the humorous content surrounding the hat and later even becoming the instigator of a key event that resulted in another surge of attention for the hype-account. The use of these dynamics is covered in more depth in the @PharrellHat paragraph, which serves as further clarification for some of the dynamics covered here; it is also used to illustrate the methods that are proposed.

(29)

2.2 Data research

Contextual and background research

In addition to the Twitter data, other information is worthwhile to collect in order to get a clearer understanding of how a potential hype has behaved -- or, perhaps, is behaving. In order to approach the hype fully, it is necessary to conduct extensive research that either enriches or contextualises the Twitter data. In order to gain a clearer insight into the origin, evolution, and specifics of the topic, background research is crucial as just the Twitter data that is collected is not sufficient for the analyses this methodology proposes. Relevant research into the hype should at least cover the initial investigation into the (potential) hype, further background information that could contextualise the event, and reports on the account’s follower rise. A form of contextual background research was already included in the previous section, for a method on follower rise investigation. In the analysis of @PharrellHat, the data showed an unusual spike of follower rise and content popularity (in the form of a significant boost in retweets and favourites for a tweet). Upon inspection, the context of this sudden belated wave of popularity was easily explainable and could even be considered as a Twitter-hype indicator (see “The case of @PharrellHat”). During this contextual and background research, different news articles can be identified -- much like in “Media-Hype”, when Vasterman notes:

In order to classify the collection of news articles, different ‘layers’ of news were identified. The first distinction is the difference between incident-related news and thematically incident-related news. The former category is defined as factual reports about actual events: the key event and similar events. The latter is defined as reports that are not factual but only related to the central news theme in the construction: background articles, features, interviews, announcements, etc. During a media-hype it is expected that thematically related news will dominate very soon after the start of the news wave. (Vasterman, 521)

The same was obvious for the @SochiProblems hype, as a hashtag of the same name was already quite popular: the articles reporting on the early #SochiProblems linked to initial big actors (the journalists who arrived at the scene early); while later articles reflected much more on the popularity of both account and hashtag than writing news reports on the actual problems in Sochi. However, both news layers can give valuable background or context information for the analysis and characterisation of the hype.

(30)

2.3 Data collection

Significance of, and essentials for, data collection

While the practice of collection of data is in itself quite logical, it can be argued that the methodology for the dataset collection is of significant importance for multiple reasons. The overarching reason behind all these reasons is the fact that web data, like Twitter data, and content analyses, like a Twitter-hype methodology, are conflicting; its workings and principles are at odds. Twitter data can continuously be changed: users can delete their accounts; they can ‘clean up’ their user history by deleting data from their lists of retweets or favourites; usernames can be altered or switched, et cetera. As this data deteriorates, the validity and effectiveness of data and consequent content analysis is affected. Data research requires stability, at least in the sense that a dataset should not change during the course of the research analysis. A static dataset decreases the chance that data does no longer correspond: not because it is a data anomaly worth investigating, but because of faulty dataset and thus faulty research. A second reason for the acquiring of a proper data collection is to ensure that any future research will not suffer from the same problems: when the exact same dataset is used for different analyses, the comparisons of and juxtapositions between the outcome of these will be more balanced and justifiable as a proper contribution to the initially conducted research. Thirdly, the justification of research is highly dependent on it being assessed by another party -- e.g. in the form of a review or examination. In sum: [...] choices of data sources, research methods, and research problems are inextricably linked. Research methods in the sciences and in the humanities are becoming more data-driven. The key to “better” data – that is, data suitable for curation, reuse, and sharing – is capturing data as cleanly as possible and as early as possible in its life cycle. (Borgman 13)

In order to defend a certain dataset as proof of indicative of the hype, it should be an objective for the researcher to acquire two datasets, collected on different dates. For the original @SochiProblems hype investigation, two datasets were gathered: one on March 1, 2014, a second a month and a half later, on April 15. Analysis showed that there was “a high degree of [information] durability” (12), as the data loss for content, retweets, and favourites were respectively 2.8%, 1.39%, and 1,32%. Proving data

(31)

durability (or disproving data decay) should be considered an important task when covering a still-operational platform such as Twitter.

If an account is investigated, the data collection is perhaps rather straightforward. All the necessary and important data is neatly contained to one account: the content, and its retweets and favourites. Of course it is also possible to investigate more than one account for the analysis of a Twitter-hype, but the point still stands. The content that should be used in the analysis consists of both tweets and retweets; the account likely generated its own content (the tweets), but also made retweets of content posted by other accounts (the retweets). While for the former it is highly crucial to investigate the individual retweet and favourite count, as these are indicative of the attention and popularity of the account, the retweets and favourites for the retweeted content are of lesser importance. While the latter are certainly worth investigating, it can be argued that these counts are exposed to too many unknown variables outside the account that is investigated.

Scraping, copy–pasting, downloading, harvesting... Collection methods

For the collection of the content, it is perhaps logical to assume that the methods and techniques will likely change over time. Researchers are continuously programming and creating new tools for data collection, including for Twitter. A great example of this is the Digital Methods Initiative Twitter Capture and Analysis Toolset (DMI-TCAT for short), an open-source tool that can be used for data collection. Its initiators and lead programmers argue in an introductory paper that the DMI-TCAT was designed (and can be improved upon) by aiming to be as comprehensive as possible, utilising as many of the options available through the Twitter API (Borra and Rieder, “Programmed Method”). Also identified in this paper are other tools for data capturing: via (expensive) data resellers, commercial research and academic online analytics platforms, and other initiatives of open-source capturing software (264-66). The strength of the DMI-TCAT lies not only in the fact that it is an easy-to-use tool with a plethora of useful features, input, and output options, but also that the user is capable of installing and modifying the tool to have it fit their own intents and purposes (“DMI-TCAT”). Admittedly, installing the tool and other methodological devices like this requires at least some technical know-how and programming skills. In an effort to make the Twitter-hype methodology accessible to any researcher, it does not require extravagant or specifically obtained datasets. All that is necessary for