Investigating Sentiment Analysis Techniques as a Method for Internet Research

(1)

Graduate School of Humanities

Faculty of Humanities, University of Amsterdam

Investigating Sentiment Analysis Techniques as a Method for Internet Research

P.S. (Pieter) Vliegenthart B.A. Hamburgerstraat 2B, 3512 NR, Utrecht

pietervliegenthart@gmail.com UvA ID: 11398760

03-09-2018 Words: 22.133

Study Programme: Media Studies (Research) Supervisor: Dr. B. (Bernhard) Rieder Second Reader: Dr. T. (Thomas) Poell

(2)

The advantage of the emotions is that they lead us astray, and the advantage of science is that it is not emotional.

(3)

Acknowledgments

The epigraph of this thesis is invalidated twice within the context of this research. First of all, the subject of this thesis locates itself at the intersection of two research paradigms: the world of logic, rules, and mathematics on the one hand, and the world of cultural analysis, emotions, and affect on the other. By investigating sentiment analysis techniques for internet research, science thus becomes emotional. Secondly, the research that accompanied the writing of this thesis can be called emotional. Exploring other research traditions, programming an algorithm in Python and searching for a

methodology that enabled both practical and theoretical work turned out to be a journey with many ups and downs. Nevertheless, it was an enriching journey that I had to take in its entirety and of which - even though I sometimes thought otherwise when writing this thesis - I would not have wished to miss out.

Fortunately, I did not have to go down this route without support. First of all, I would like to thank my thesis supervisor Bernhard Rieder. Throughout this thesis, and in the preceding two years, he has always shared knowledge in an inspiring way. It is thanks to his enthusiasm about the intersection between programming and the humanities that I ended up in this field of research. Secondly, I would like to thank the second reader, Thomas Poell, for taking the time to read this thesis. I would also like to thank him for all the practical tips on academic life, which have made writing this thesis a little easier. Furthermore, I would like to thank everyone I have encountered over the past two years at the Department of Media Studies: students and teachers alike. Finally, I would like to thank my girlfriend, friends, family, and employer for the sympathy they have showed during the writing of this thesis.

(4)

Abstract

This thesis investigates the potential of sentiment analysis techniques (SAT) as a method for internet research. Sentiment analysis could be considered a computational tool that holds much promise: it enables (humanities) scholars to expand their research toolkit with automated techniques that enable the processing of large quantities of data. However, both computer scientists and media scholars alike have placed critical remarks on the use of SAT for internet research. These include issues such as (1) the potentially problematic use of SAT, (2) opaqueness of algorithmic black boxes present in the tools and (3) insufficient understanding of the methodological and theoretical assumptions that are built into tools (Feldman; Ribeiro et al.; Passman and Boersma; Röhle and Rieder; Tenen).

Using an experimental methodology, where theoretical and historical research is combined with a critical practical engagement with SAT in the context of a case study on the Brexit on Twitter, this thesis unraveled methodological, theoretical and epistemological perspectives that are present in SAT. This study found that lexicon-based SAT appear to be ill-suited for the analysis of complex cultural phenomena since the technology can only measure static constructs that have been defined a priori. Machine learning based SAT are more suitable for complex forms of cultural analysis since they enable researchers to investigate custom, context-specific classification categories. Even though SAT have potential as a method for internet research, there are at least four issues that need to be carefully evaluated when outsourcing qualitative analysis to algorithms.

First, programming skills are necessary to deploy SAT, but also to make transparent

methodological and theoretical assumptions that are folded into the tools. Second, lexicon-based SAT are developed for very specific purposes and thus contain epistemological perspectives. It is therefore essential to select a tool that is appropriate for the type of research at hand. Third, classification mechanisms in SAT influence the usefulness of the analysis. This study found that a probabilistic classification is more useful than a binary one since it enables a more nuanced perspective. Fourth, the usability of machine learning based SAT stands or falls with the quality of the operationalization of context-specific classifications. When insufficient unique properties can be attributed to a specific category, the reliability of the algorithm is significantly reduced.

Keywords

(5)

Part 1: Theoretical Engagement ... 12

1. Towards a Critical Research Practice ... 13

1.1 Towards Digital Methods... 13

1.2 Tool Criticism ... 15

1.3 What to Study with Sentiment Analysis ... 17

1.4 Twitter as Research Object ... 19

1.4 Sentiment Analysis and Twitter Research ... 20

2. The State of the Art: Sentiment Analysis Techniques ... 24

2.1 Sentiment Analysis: An Introduction ... 24

2.2 Different Levels of Analysis ... 25

2.3 Different Categorization Strategies ... 26

2.4 Lexicon Based Sentiment Analysis... 27

2.5 Machine Learning Based Sentiment Analysis ... 28

2.6 Feature Selection: Bag of Words ... 29

2.7 Feature Vector ... 29

2.8 Most Informative Features ... 31

2.9 Naïve Bayes Classifier ... 31

2.10 Bernoulli Naïve Bayes ... 32

2.11 Summary of Characteristics ... 33

2.12 Selecting the State of the Art: Sentiment Analysis Benchmark... 34

2.13 Problematizing Sentiment Analysis for Internet Research ... 36

Part 2: Practical Engagement ... 39

3. Methodology ... 40

3.1 An Alternative Classification Framework ... 40

3.2 Grounded Theory ... 40

3.3 Open-, Axial-, Selective Coding ... 41

3.4 Data Collection ... 42

3.5 Programming Environment and Software... 43

3.6 Sentiment Analysis Using VADER ... 43

3.7 Sentiment Analysis Using Bernoulli Bayes Classifier ... 46

3.8 Further processing the data ... 48

4. Case Study: The Brexit On Twitter ... 49

4.1 The Article 50 Procedure ... 49

4.2 Initial Data Set ... 49

4.3 Lexicon Based Sentiment Analysis... 50

4.4 Machine Learning Based Sentiment Analysis ... 53

4.5 Complicating Hashtag Research ... 59

5. Findings... 63

5.1 Generic Issues with The Quantification of Sentiment ... 63

5.2 Data Modification ... 63

5.3 Probability scoring ... 65

5.4 The potential of Sentiment Analysis Techniques for Internet Research ... 65

6. Conclusion ... 68

(6)

Introduction

Studying Emotions

"Emotion" is a frequently used term by both scientists and layman, that tries to describe a specific feeling or mental state that an individual might experience at a particular time (Keltner and Kring). Since emotions are a widely researched topic, there is no consensus within academia on what emotions exactly are. Paul R Kleinginna and Anne M Kleinginna recognized the lack of an

unambiguous definition of emotion and performed a meta-analysis in which an attempt was made to draw up a generic definition for emotion (Kleinginna and Kleinginna). Based on their research, emotion can be broadly understood as a complex set of interactions among subjective and objective factors, mediated by neural and hormonal systems, which can lead to affective experiences, cognitive processes, physiological experiences and typical behaviors (Kleinginna and Kleinginna 355). To clarify this definition, without entering the debate on the biological, physiological, or psychological grounding of emotions, it can thus be said that experiencing an emotion can influence and shape human thought and behavior (Ekman 171).

Because emotions are thus attributed fundamental properties, they are a widely researched topic within a variety of research traditions, ranging from medicine, neurosciences, psychology and social sciences to engineering, biochemistry and computer sciences (Scopus)1_{. For example, Charles} Darwin argued in "The Expression of the Emotions in Man and Animals" that emotions play a significant part in the survival of species since emotions influence the behavior of both humans and animals through bodily reactions (Darwin 349). This work of Darwin would later be considered as the precursor of affective neuroscience, a research tradition that seeks to clarify emotions through a multidisciplinary approach (Panksepp). Emotions are logically studied by research traditions engaged in understanding, explaining, and treating emotions and the behaviors that are associated with them, such as psychologists, psychiatrists, biologists and social scientists. However, emotions, or theorizing emotions, are also an important source of inspiration in other research paradigms.

The affective turn is perhaps the most striking example of a renewed theoretical research perspective based on the shifting attention from rhetoric and semiotics to affect (Kim and Bianco 1). With this shift from critical theory to affective theory, bodily, autonomic responses and

unconsciousness experiences are used to theorize the social (Kim and Bianco 2). Even though emotions do not classify as affect since they are cognitively processed and therefore no autonomic bodily experience2_{, the affective turn does illustrate an important scientific perspective: the}

1_{Scopus is the self-proclaimed largest database of reviewed literature. Scopes indexes 21,548}

peer-reviewed journals (Elsevier, Scopus | The Largest Database of Peer-Reviewed Literature | Elsevier).

2_{According to Brian Massumi, emotions are not an autonomic bodily response since they are}

social-linguistically processed (85). As has already become clear, there is no unambiguous definition of emotion, and there are also scientists (neuroscientist) who talk about "emotion" before it is social-linguistically processed (Panksepp).

(7)

acknowledgment of feeling and affect in theorizing the political, economic and cultural spheres (Kim and Bianco 3).

It is thus evident that emotions are a legitimate research object within different research paradigms, where each paradigm adopts a different mode of analysis. This also applies to the field of computer sciences, which are engaged in the study of emotions through sentiment analysis. Sentiment analysis can be defined as the automated processing of large quantities of unstructured data in order to find the opinions of authors about specific entities (Aggarwal and Zhai 1–2; Feldman 1). Simply put, sentiment analysis technologies assume that the emotions or feelings experienced by individuals are reflected in the texts they write. By extracting the emotions and feelings from those texts

systematically, these technologies aim to provide insight into sentiment towards certain entities. These sentiment analysis techniques and methods, developed by computer scientists, are used for research by social scientists, economists and humanities scholars alike (Cambria; Feldman).

Sentiment Analysis in Practice

In addition to the growing use of sentiment analysis within the academic community, the technology has also found its way into the commercial world. One of the most common applications of sentiment analysis is online brand- and reputation management (Tromp et al. 1). By applying sentiment analysis on social network sites, customer service departments can respond immediately when a problem with one of their products or services is reported (Gupta). However, sentiment analysis is also used for more complicated business intelligence questions. As Al-Kharusi et al. describe, can sentiment analysis be deployed to gain insight into market segmentation, to determine which features should be added to a product or service or to target potential customers on the basis of their sentiment towards a particular product or service (55). In addition to the use of sentiment analysis for commercial

purposes, the technology is also applied in other fields of work.

In recent years, sentiment analysis has made an appearance within the legal industry. Whereas junior lawyers and paralegals used to manually process hundreds or even thousands of pages of case files in search of evidence, today technology assisted review software is used (Nolan). The technology is deployed to detect sentiment in case law, analyze contracts and extract evidence from large dossiers (Nolan; Hoadley).

Sentiment analysis is also deployed in the financial world to provide analytical perspectives. By analyzing the sentiment on social network sites like Twitter3_{, Scientists are trying to predict how} stock market prices will develop (Bollen et al.; Oliveira et al.). Sentiment analysis also reached the

3_{Twitter is a microblogging service that allows its users to send 140 - 280 character messages (Rosen). Twitter}

users can follow other users, retweet messages and favorite specific messages from other users (Kwak et al.). Twitter's mission statement: "Let people tell their story about what's happening in the world right now" (Twitter).

(8)

political domain. Research of, inter alia, Choy et al. and Akitako and Benoit deploy sentiment analysis techniques on Twitter to find out if positive and negative sentiment is a good predictor of respectively presidential elections and the Brexit referendum ( Choy et al.; Akitako and Benoit). The application of sentiment analysis is thus widespread in commercial, legal, political and academic circles.

Towards a Critical Technical Perspective

Sentiment analysis has moved from the mathematically oriented domain of the computer sciences to the commercial-, political- and academic world, where the technology has proven itself a useful extension to existing research practices. However, there is also criticism of the use of sentiment analysis in academia and beyond. From a computer scientific point of view, conclude studies of Filipe Ribeiro et al. and Ronen Feldman that researchers often misuse sentiment analysis techniques (SAT). Due to a lack of knowledge about the deployed algorithms in SAT, and due to misinterpretations about what SAT can do in specific research contexts, the deployment of sentiment analysis is potentially problematic (Feldman; Ribeiro et al.). In addition to this criticism on the technological implementation of sentiment analysis, there is also critique from other perspectives.

Traditionally, classification practices have been carried out by human actors. However, as Jenna Burrell discusses, mechanisms of classification nowadays, are increasingly relying on

computational algorithms (1). This is also the case for sentiment analysis, which, as described in the first section of this introduction, is strongly rooted in the computer sciences paradigm. Sentiment analysis is partly an automated practice that uses computational algorithms for the classification of human emotions and feelings. According to Burrell, this is problematic, since there is a mismatch between human-scale reasoning and the logic of algorithms (2). In other words: a computational algorithm "thinks" differently than a human being. Burrell, therefore, argues for studying classification algorithms as distinct phenomena with their own logic, since classification mechanisms, like sentiment analysis, have tremendous power (Burrell 1).

Another scientist who pleads for the critical study of algorithms is Phil Agre. As a mathematician and artificial intelligence scholar, Agre eventually turned to the social sciences to investigate the social and political aspects of networking and computing. According to Agre, this was necessary since the technological field of computer sciences is mainly concerned with designing and implementing working systems (6). This focus on functional productivity means that there is

insufficient critical reflexive practice present in the field of the computer sciences, herewith failing to pay sufficient attention to the potential social and political consequences of technology (2). Agre, therefore, proposes a critical technical practice: a research perspective that, on the one hand, relies on a technical background to make the operations of algorithms transparent, and, on the other hand, on a theoretical background that makes it possible to criticize technology from a social-political perspective (18). In order to fully understand how computational algorithms in sentiment analysis tools influence

(9)

the production of knowledge and society in general, it is thus necessary to follow a research methodology that is both practical and theoretical.

Towards Computational Cultural Analysis

Sentiment analysis could thus be considered a computational tool that holds much promise: it enables (humanities) scholars to expand their research toolkit with automated techniques that enable the processing of large quantities of data. This is exactly what is happening today: sentiment analysis technologies developed by computer scientists are being used by internet researchers to study a variety of cultural phenomena (Bollen et al.; Choy et al.; Akitako and Benoit). However, both computer scientists and media scholars alike have placed critical remarks on the use of sentiment analysis tools for internet research. These include issues such as (1) the potentially problematic use of sentiment analysis technologies, (2) opaqueness of algorithmic black boxes present in the tools and (3)

insufficient understanding of the methodological and theoretical assumptions that are built into tools (Feldman; Ribeiro et al.; Passman and Boersma; Röhle and Rieder; Tenen). This thesis, therefore, investigates the potential of sentiment analysis techniques for complex forms of cultural analysis in internet research.

The answer to the above-raised questions will be obtained through an experimental

methodology: instead of only conducting theoretical and historical research, this thesis will also be a critical practical investigation into SAT. By conducting a case study where I will program a sentiment analysis tool myself, I aim to reduce methodological obscurity. The practical approach that manifests itself in programming a tool builds on the train of thought set out by Burrell, Agre, and Tenen. Dennis Tenen argues that only reading the source code of a tool is not enough to fully understand how a tool affects epistemic processes. We need to master the tools from the inside out: "[t]he best kind of tools are therefore the ones that we make ourselves" (Tenen 1).

***

This introduction has outlined how studying emotion has been quantified into automated

computational techniques known as sentiment analysis. As has become clear, these technologies are already being applied by social scientists, economists and humanities scholars alike. By introducing an emerging critical technical perspective, issues have been raised about what scientist need to consider when investigating tools that perform automated forms of cultural analysis. This thesis aims at investigating the potential of computational sentiment analysis techniques for internet research. In order to investigate its potential, it is necessary to reduce methodological obscurity present in SAT. This will be achieved by following an experimental, both theoretical and practical methodology. In

(10)

this practical experiment, where a sentiment analysis tool will be programmed, the Brexit on Twitter will serve as a case study.

The first chapter develops a theoretical framework that highlights the problems with studying tools from a digital humanities perspective. After the framework for tool criticism is constructed, both emotions and Twitter are justified as a research object. The first chapter continues by stressing the urgency to study sentiment analysis techniques by identifying trends in commonly used research methods on Twitter. Finally, related sentiment analysis research on Twitter is discussed.

In the second chapter, The State of The Art: Sentiment Analysis Techniques, both lexicon- and machine learning based sentiment analysis techniques are introduced. Since this thesis aims at

investigating the potential of sentiment analysis techniques by reducing methodological obscurity, the most important algorithmic techniques used in sentiment analysis will be made transparent.

Subsequently, the performance of lexicon-based sentiment analysis will be evaluated, after which problems with lexicon-based sentiment analysis in relation to the media studies paradigm will be identified.

In chapter three, The Methodology, an alternative classification strategy based on grounded theory is proposed, in order to overcome the shortcomings formulated at the end of chapter two. The chapter then outlines how the experimental lexicon based- and machine learning based tools are developed, and how the tools function from a theoretical perspective.

Since the theoretical functioning of technology does not provide a conclusive answer to the research goal of this thesis, the tools are then tested in the context of an actual case study in chapter four. The topic here is the Brexit on Twitter, in which the various sentiment analysis tools described in chapter three are put to the test. The case study aims to test in practice whether the alternative

classification framework implemented in the machine learning algorithm can generate new insights concerning sentiment on Twitter. The case study also aims to verify current hashtags studies, which assume that hashtags correlate with sentiment and are therefore a good predictor. The practical insights into the functioning of automated classification technologies during this case study will be used in the next chapter to answer the research question.

By testing sentiment analysis technology in practice in chapter four, the framework for tool criticism formulated in chapter one will be used to gain insight into how the scientific process changes through the use of automated computational technologies in the media studies paradigm. The chapter provides insight into how the original data is treated, which algorithmic techniques play a role and which theoretical and methodological assumptions are present in sentiment analysis techniques. Finally, it assesses the usefulness of SAT for internet research.

In the conclusion, the question of whether complex forms of cultural analysis can be automated utilizing computational techniques is addressed. It also shows how the process of knowledge production is changing in the deployment of these technologies and what needs to be

(11)

critically monitored in the deployment of these technologies for internet research in the digital humanities.

(12)

(13)

1. Towards a Critical Research Practice

This theoretical chapter will first introduce the academic tradition to which this thesis aims to contribute. Secondly, it will further elaborate on the challenges involved in the deployment of digital tools within humanities research. These issues will be used as a guideline throughout this thesis since they enable the critical unraveling of methodology present in SAT. Thirdly, since sentiment analysis tries to measure emotions, it will be explained how emotions are operationalized in this research. Fourthly, Twitter is historically analyzed as a research object. This thesis is primarily concerned with investigating the potential of sentiment analysis techniques, but since Twitter -in this thesis and other research alike- often is the medium that is studied with sentiment analysis techniques, it is important to understand how Twitter can be placed in the tradition of internet research. The fifth section of this chapter contextualizes popular research methodologies on Twitter, after which related academic work on sentiment analysis on Twitter will be discussed in the final section of this chapter.

1.1 Towards Digital Methods

Whereas the study of society in the humanities has traditionally focused on the study of non-digital objects and phenomena, the advent of digital technology has complicated this approach. According to David Weinberger and Eric T. Meyer et al., the large amount of digital data - which are generated by for example Twitter -have changed the way in which knowledge is produced (Meyer et al.). In order to study this knowledge, it is necessary to deploy methods that enable researchers to handle digital objects and the artifacts that they produce (Schäfer and Es, van 14). Mirko Tobias Schäfer and Karin van Es describe the changing field of work of the humanities as follows:

For the humanities, this transformation requires not only that we critically inquire into how technology affects our understanding of knowledge and how it alters our epistemic processes, but that we also employ the new data resources and technologies in new ways of scholarly investigation. (14)

In order to make sense of how digital technologies and the artifacts they produce influence our society, we must thus adapt our research methods to those digital environments. This is where digital methods come in. As Rogers defines digital methods:

Broadly speaking digital methods may be considered the deployment of online tools and data for the purpose of social and medium research. More specifically, they derive from online methods, or methods of the medium, which are reimagined and repurposed for research. (Rogers, “Foundations of Digital Methods: Query Design” 75)

(14)

In line with Schäfer's and van Es's train of thought on digital methods, it can, therefore, be said that the use of computational techniques, like sentiment analysis, is a logical strategy for humanities scholars to deal with the changing and expanding research landscape. However, as also indicated by Schäfer and van Es, it is equally important to critically investigate how these methods of the medium alter the production of knowledge (14).

To clarify how methods influence epistemological processes, it is first important to distinguish between methods and tools, as tools are often confused with methods (Tenen 1). A method could be understood as a systematic way of pursuing knowledge, while a tool -within the context of this thesis- can be described as a piece of software that performs certain analytical tasks (Röhle and Rieder 67). In traditional research, most of the analytical steps are carried out manually or under the direct

supervision of scientists. With digital methods, however, parts of these analytical steps are replaced by tools that consist of data structures and algorithms (Röhle and Rieder 69–70). Johannes Passmann and Asher Boersma equate tools with algorithmic black boxes and argue that digital methods are becoming more and more dependent upon those tools (Passman and Boersma 139). This is problematic for at least three reasons.

First, tools are often considered to be untransparent (Röhle and Rieder 70). Since part of the analytical steps are programmed, it becomes more difficult to understand, reproduce and criticize the methodological procedures. Even if the source code of the algorithms is publicly available, it is still hard for scholars that are not experienced in writing code, to gain insight into the operations of the automated techniques used in the tools (Röhle and Rieder 70). This "transparency problem" is thus two folded: both (1) the availability of source code and (2) the ability to understand source code play a role.

Secondly, most of the tools used in digital methods, are not specifically developed for media studies research. As Tomasso Venturini et al. describe:

To produce useful and interesting findings, digital methods require the extra care needed for the secondary analysis of inscriptions that have not been created by or for the social sciences and thus bear the imprint of the particular purposes (whether political, commercial or otherwise) and technical infrastructures through which they were created. (4)

It is thus important to be critical on the tools that we borrow from other (research)paradigms and repurpose for internet research, since the original purpose of the tool may differ from the purpose for which it is used in media studies research.

The third problem relates to tools "[that] encourage intellectual laziness by obscuring methodology" (Tenen). Although there is a difference between tool and method, tools are sometimes treated as part of the methodology, without giving proper attention to the theoretical and

methodological assumptions that are present in the tool itself (Tenen 2; Röhle and Rieder 68). As Theo Röhle and Bernhard Rieder emphasize: "[...] theory is already at work on the most basic level of

(15)

methodology, i.e., when it comes to defining units of analysis, algorithms and visualization

procedures" (69). In order to truly understand how tools affect the epistemic processes, we must thus thoroughly examine how tools function and expose which theoretical and methodological assumptions are folded into the tools.

1.2 Tool Criticism

The emergence of digital technology in our society has led, among other things, to an enormous amount of digital data (Meyer et al.). Schäfer and van Es recognize this development and argue that the datafication4_{of society opens up possibilities to study culture through data (11). David M Berry} argues that this datafication of society also influences academia: research is increasingly being

mediated by digital technology (1). However, the transformation from humanities to digital humanities has not suddenly come about.

Jeffrey Schnapp and Todd Presner historically distinguish two waves of digital humanities research (2). The first wave, which occurred between the late 1990s and early 2000s, is characterized by the digitalization of traditional research methods and the creation of technological infrastructures. Presner considers the second wave to be more revolutionary: the development of digital methods for researching digital native data, interdisciplinary research and other hybrid research methods

characterize the "Digital Humanities 2.0" (Presner 6). Although David M. Berry criticizes the strict dichotomy of Schnapp and Presner, since historical developments are seldom linear, he does propose a possible path for the third wave of digital humanities (4–5).

Where the second wave of the digital humanities is thus concerned with studying "digitally born materials" via digital methods, or methods of the medium, Berry argues that the third wave of research within digital humanities should focus on studying the underlying computationality of the digital methods that are being employed for research (4–5). As Berry puts it: "[...] we need an additional third-wave focus on the computer code that is entangled with all aspects of culture and memory, including reflexivity about how much code is infiltrating the academy itself" (5). Berry's line of thought is similar to that of many other (humanities) scholars since the field of tool criticism is steadily establishing itself within digital humanities.

This thesis will not be able to create an exhaustive overview of factors in tools that affect the epistemological process, but nor is it the purpose of this research. However, in order to be able to critically assess the methodological implications of the sentiment analysis tool that is to be developed, and in order to answer the research question of this thesis, it is important to discuss the most obvious and important problems with tooling. The focus is to identify these "tooling problems" from the

(16)

outside to the inside: like peeling off the layers of an onion. Through transparency and code literacy, methodological problems regarding black-boxing and implicit theoretical and methodological imprints are eventually discussed.

The outer layer of the onion to be peeled refers to transparency. Before it is possible to understand how a tool works and how it affects epistemological processes, it is necessary to have access to the source code, and be competent in reading and understanding the source code (Röhle and Rieder 69). As Taina Bucher demonstrated in her research into Facebook, it is often difficult to obtain the algorithms of commercial applications (1176). According to Jenna Burrell, the opacity of algorithms (or source code) can have three causes: (1) intentionally opaque in order to protect commercial interests, (2) unintentionally opaque due to a lack of specialized skills and (3) opaqueness that arises from a mismatch between mathematical concepts and human reasoning (1). If the problems regarding transparency are overcome, the next layer of opaqueness arises: black boxing.

Gregory Bateson describes a "black box" as "[...] a conventional agreement between scientists to stop trying to explain things at a certain point [...]" (39). As Knorr Cetina points out, this approach might be productive in some cases, for example when the functioning of the black boxed "thing" or technology does not affect the practice of research (Cetina 99). However, in the case of automating cultural analysis through tooling, black boxes should be opened since they might contain

methodological and theoretical assumptions that do influence the epistemological process (Tenen; Venturini et al.). As is already clear in the description of Gregory Bates, a black box is thus a dynamic concept that can be applied at different levels. Tenen clarifies this dynamic character by introducing "layers of encapsulation" (3). For example, tools do not only consist of a programming language, but there are also many mathematical concepts and calculations present. So even if the code layer of encapsulation is unboxed, the mathematical layer of encapsulation might be left un-opened (Tenen).

Closely related to both transparency and black boxing, are theoretical imprints that might be present in both the tools and the technological infrastructures that we are studying (Venturini et al. 4). As Venturini et al. describe, often, neither tool nor object of study is created for scientific research. It may thus well be that political, commercial or other interests are implicitly incorporated in the tool or the object of study (4). By opening up black boxes present in tools, parts of the imprints might be rendered visible. However, as Venturini et al. point out, "Using digital methods, we are always at risk of mistaking the characteristics of medium for the signature of the phenomena we wish to observe" (4). It is thus important to critically define and analyze the phenomena that is being studied, and which role the medium plays in shaping these phenomena (Venturini et al. 17).

The tool criticism as described above by Röhle and Rieder, Bucher, Burrell, Bateson, Cetina, Tenen and Venturini et al., identify issues that may affect epistemological processes through the use of tools in digital methods. It is therefore important to investigate the above-raised issues in the

(17)

practical character of this thesis, the checklist in Table 1 summarizes the above-raised tooling issues and will be used to reduce methodological obscurity in sentiment analysis techniques.

Table 1

A checklist of tooling issues that can affect the epistemological process with digital methods research.

Issue Items to address

Object of study in relation to digital media

• What phenomena are being studied?

• What kind of technological infrastructure is studied? Transparency of the tool • Is the right skillset available to understand the tool?

• Is the source code of the tool available? Black boxing in the tool • Which layers of encapsulation are present?

• Which backboxes need to be opened? Epistemological perspective of

the tool

• For which purpose is the tool developed? • Which cultural, theoretical, commercial or other

imprints are present in the tool?

As has become clear in the previous section, the methodological and theoretical assumptions that are folded into tools that are being used in internet research must be thoroughly examined. If these implicit assumptions are not made transparent, it is impossible to determine how our epistemological processes change through the use of automated methods. As Tenen argues, the best way of achieving this transparency is to get your hands dirty: develop a tool from scratch (Tenen). This research follows Tenen's advice and will use the above-raised tooling critique as a guideline for reducing

methodological obscurity in the tool that is to be developed.

1.3 What to Study with Sentiment Analysis

Since sentiment analysis aims to extract emotions from a text, it is important to make clear why emotions are an important object to study and define what will be studied with SAT. Especially, since Venturini et al. stress that it is important to precisely define what is being studied with digital methods since otherwise there is the risk that medium-specific properties are mistakenly considered to be a result of the analysis at hand (17).

As discussed in the introduction, emotions are a widely researched topic resulting in the absence of an unambiguous definition. Kleinginna and Kleinginna have, however, produced a generic definition based on 67 definitions of emotions from different research paradigms:

(18)

Emotion is a complex set of interactions among subjective and objective factors mediated by neural/hormonal systems, which can (a) give rise to affective experiences such as feelings of arousal, pleasure/displeasure; (b) generative cognitive processes; (c) activate widespread physiological adjustments to the arousing conditions; and (d) lead to behavior that is often, but not always, expressive, goal-directed, and adaptive. (365)

Within the scope of this research, it is not necessary to further elaborate on the neural/hormonal systems that are described in the definition of emotions by Kleinginna and Kleinginna. What is important, however, is that experiencing an emotion can influence and shape human thought and behavior (Kleinginna and Kleinginna). This is in line with Paul Ekman's view on emotions, who argues that emotions can occur without interaction with other people and even then have the ability to influence and shape human thought and behavior (Ekman 171). This is an important notion since the emotions that will be studied with sentiment analysis techniques are emotions that are mediated by technological infrastructures. Emotions are thus attributed with fundamental properties, which makes extracting emotions utilizing sentiment analysis a legitimate research design. However, what kind of emotions can be distinguished?

Paul Ekman and Wallace Friesen showed in 1971 that there are at least six, distinctive, measurable and culturally independent emotions: anger, disgust, fear, happiness, sadness, and surprise (Ekman and Friesen). Robert Plutchik builds on the work of Ekman and Friesen, by acknowledging that there are basic, distinctive emotions. According to Plutchik, however, there are eight basic

emotional dimensions that can be divided into positive-negative pairs: ecstasy versus grief, admiration versus loathing, terror versus rage and amazement versus vigilance (Robert Plutchik 349). Plutchik argues further that there are more than eight emotions, but these other emotions are formed by

combining the basic emotions (Robert Plutchik 349). This psychologically oriented operationalization of emotion creates clarity but limits the applicability of sentiment analysis as the research scope is narrowed down due to the finite list of emotions. In order to be able to classify other categories as well, a different psychological concept offers a solution: affect.

Affect within psychology is considered to be a mental state that occurs before we experience emotions (Lerner and Keltner 474). However, the concept is not uncontroversial within psychology. According to Robert B. Zajonc, affect is experienced without cognitive processing of stimuli, herewith showing parallels with "the missing half second" of Brian Massumi (Zajonc 119; Massumi 89). Richard Lazarus, however, argues that stimuli are cognitively processed before affect is experienced (Lazarus). Later work by Lerner and Keltner takes a more nuanced stance: depending on the type of affect, a priori cognitive processes can play a role in experiencing affect (Lerner and Keltner). The research of Zajonc, Lazarus, and Keltner does agree on one thing: affect is considered to be an important influencer of attitudes5_{that we can have towards entities (Zajonc; Lazarus; Lerner and} Keltner). As the discussion about affect has indicated, there are thus other feelings than emotions that

(19)

are also important to study. This opens up possible classification strategies for SAT since not only emotions are an interesting object to study: affect is too.

1.4 Twitter as Research Object

As has become clear in the first section of this chapter, it is important to be critical of the tools that are being used in digital methods research. However, it is equally important to investigate the object that is being studied with these tools (Venturini et al. 17). Without understanding how and why Twitter should be studied, it is pointless to reduce methodological obscurity in technologies that do study the medium.

Since Twitter was founded in 2016, more than 19.000 articles have been published about the social networking platform, according to Scopus6_{(Elsevier, “Scopus - Analyze Search Results”). The} majority of the research, however, has its roots in the computer sciences- a scientific tradition that in general does not show any specific interest in studying the relationship between technology and society7_{. Similarly, research into Twitter has not always been as useful as a study object to study} society. According to Richard Rogers, the history of Twitter research can be described in three phases: Twitter studies I, II and III (Rogers, Debanalizing Twitter).

During the early years of Twitter, from 2006 to 2009, the research on the medium mainly considered Tweets to be banal (Rogers, Debanalizing Twitter 2). A study of Java et al. in 2007, manually categorized Tweets based on their content. Most of the posts on Twitter talked about daily routine, and where therefore considered to be "daily chatter" (7). One could thus argue that Twitter performed poorly as a legitimate research object in the early days of its existence.

According to Richard Rogers, this changed during Twitter Studies II. When Twitter changed its tagline from "What are You Doing" to "What's Happening?", the microblogging site evolved into a medium that enabled event-following since the banal character of tweets made way for information sharing and news updates (Dybwad; Naaman et al.). The change in the use of Twitter also resulted in a change in the research angle. Where Twitter was first studied primarily on the basis of the content of its messages, Twitter was now seen as a news medium that gave a stage to the voices on the ground (Rogers, Debanalizing Twitter 5). However, the legitimacy of Twitter as a high-quality event

following news medium should not be assumed to be an absolute truth, according to Rogers, since one has to think critically about the quality, accuracy, and professionalism of the reporting (Debanalizing

6_{Scopus is the self-proclaimed largest database of reviewed literature. Scopes indexes 21,548}

peer-reviewed journals (Elsevier, Scopus | The Largest Database of Peer-Reviewed Literature | Elsevier).

7_{With the exception of Software Studies, a multidisciplinary research tradition which deals with how}

algorithms, logical functions, aesthetics, computing and programming subcultures and other soft- and hardware related entities and phenomena create, influence, multiply or control reality (Fuller). An interesting read to explore this field is Matthew Fuller's "software studies \ a lexicon."

(20)

Twitter 5). The second wave of Twitter research is according to Rogers, thus characterized by the contemporary temporal characteristics of the medium, enabling researchers to study cultural and societal phenomena as they are occurring in real-time (Debanalizing Twitter 7).

In addition to the contemporary temporal character of Twitter, the medium has also evolved as a historical dataset. Twitter itself has a historical archive, of course, but also The Library of Congress archived all Tweets between 2006 and 2018, in close collaboration with Twitter (Osterberg). It did so for the following reason: "The Library took this step for the same reason it collects other materials – to acquire and preserve a record of knowledge and creativity for Congress and the American people" (Osterberg). This stresses once more the value that a medium like Twitter has in relation to society. The evolution from contemporary medium to archived data set is seen by Rogers as Twitter Studies III, which is characterized by its shifting research focus from Twitter as a medium to a historical object for studying (Debanalizing Twitter 7). Although the Library of Congress stopped archiving all tweets on January 1st of this year due to technological constraints, there are other solutions (including Twitter itself) that enable researchers to use Twitter as a historical database (Borra and Rieder; GNIP). The research in this thesis also uses historical Twitter data and could thus be considered to be part of the so-called third wave of Twitter research.

1.4 Sentiment Analysis and Twitter Research

Now that it has become clear how the research object Twitter has been studied in the past decade, it is necessary to identify which methods have been deployed when investigating Twitter. This is important since it will stress the urgency to investigate SAT for Twitter research. Afterward, related research with sentiment analysis on Twitter is discussed. Since the scope of this thesis is focused on investigating the potential of automated forms of cultural analysis, purely qualitative research is ignored since it does not contain a tooling element. To further narrow the scope, only research performed on Twitter around the Brexit is discussed, since it corresponds with the case study of this thesis.

Due to the vast number of academic publications on Twitter, it is sheer impossible to obtain a holistic summary of applied research methods. Still, studies of Williams et al. and Zimmer and Proferes have tried to summarise the research on Twitter between the years 2006-2011 and 2006-2012 respectively (Williams et al.; Zimmer and Proferes). Especially Zimmer's and Proferes' research provides clarity about methodological traditions in Twitter research. Based on a qualitative content analysis of 380 articles on Twitter, they clearly show which types of analysis have been deployed over the past years (Zimmer and Proferes 254). Their findings are set out in Table 2.

(21)

Table 2

Count of Twitter studies by discipline (2007-2012).

2007 2008 2009 2010 2011 2012 Total Content analysis 1 2 4 53 93 81 234 Event detection 2 6 11 7 26 GIS analysis 1 4 3 8 Influence study 1 6 4 4 15 Predictive/ correlation 1 11 26 13 51 Sentiment 4 14 23 22 63 Traffic analysis 1 8 20 38 13 80 User study 1 11 13 21 14 60 Other 1 4 3 8

Source: Table 1 in Zimmer, Michael, and Nicholas John Proferes. “A Topology of Twitter Research: Disciplines, Methods, and Ethics.” Aslib Journal of Information Management, edited by Dr Axel Bruns and Dr Katrin Weller, vol. 66, no. 3, May 2014, pp. 250–61. Crossref,

doi:10.1108/AJIM-09-2013-0083.

The categorization used by Zimmer and Proferes is non-exclusive, meaning that one article can deploy multiple modes of analysis. As Zimmer and Proferes found in their research, almost two-thirds of all Twitter research uses some form of content analysis (253). Another striking trend is the application of sentiment analysis to tweets. Qualitative analysis and sentiment analysis combined make up 70% of the conducted modes of analysis in the research corpus of Zimmer and Proferes' study. Zimmer and Proferes thus illustrate that sentiment analysis methods are an increasingly common research method, which makes investigating this technology relevant.

In Twitter research, three relevant research approaches that aim at extracting sentiment can be differentiated. The first tradition can be described as hashtag research, the second as sentiment analysis research and the third as a combination of the first two approaches. Although there are many more approaches within the digital methods domain, the previously described categories are discussed since this research will build on them.

As Schäfer and van Es made clear, we must adapt our research methods to the digital environments that we are studying, in order to understand them (14). A sound operationalization of this proposition is the tradition of hashtag research. Clare Llewellyn and Laura Cram have researched hashtags in relation to the Brexit (Llewellyn and Cram). Using expert classification, Llewellyn & Cram tried to measure pro- and anti-Brexit sentiment of British Twitter users. In their first study, they found a pro-Brexit sentiment on Twitter, based on the assumption that hashtags are a good indicator of

(22)

sentiment (Llewellyn and Cram). Even though Llewellyn and Cram are critical in generalizing findings on Twitter to a larger population, they do not provide insight into how the classification of pro- and anti-Brexit sentiment has been operationalized. It is therefore not possible to verify whether their assumption regarding the relationship between hashtags and Brexit sentiment is correct. This might be problematic, because hashtags may be used differently by Twitter users than the researchers expect.

A study that tried to overcome this presumption is the research of Akitako and Benoit. Besides analyzing the opinion of Twitter users on the Brexit employing sentiment analysis, they have mapped out the use of hashtags by sentiment by manually classified data (3). However, their research only identified the most important themes per camp using hashtags and did not assess whether the presumed relationship between hashtags and sentiment attribution correlated (Akitako and Benoit 3– 4).

The second tradition is characterized by the deployment of computational SAT. Lansdall-Welfare et al. deployed an off the shelf lexicon based sentiment analysis technique that measured five different moods: positive, negative, anger, anxiety and sadness (437). The application of this technology generates interesting results, Lansdall-Welfare et al. find that on the day of the referendum, June 23rd, 2016, and the day after, negative emotions prevail on Twitter (439). Although this study is very clear about the methodology and does deploy a more extensive measure for classifying sentiment than a mere positive-negative dichotomy, it fails to take the context of the Brexit into account because a standard tool is used.

The research of Porcaro and Muller does consider the context of the Brexit debate on Twitter. Although the specific methodology of this research is unclear, the authors do elaborate on the use of expert classification to label pro-, anti- and neutral sentiment regarding the Brexit on Twitter (Porcaro and Müller). One of their key findings: pro-Brexit sentiment is on the rise. Even though their approach recognizes the contextual nature of the debate on Twitter, the precise methodology is unclear, and a limited classification is used.

Another example of a study that does use a context-specific variant of sentiment analysis is the deployment of a multinomial Naïve Bayes classifier that is trained with data manually classified by experts (Akitako and Benoit 1). Again, using a pro-, anti-Brexit and neutral categorization, Akitako and Benoit found that most of the tweets in their research corpus contained a pro-Brexit sentiment (Akitako and Benoit 2). As in Porcaro and Muller's research, context-specific classification is used, but the scale on which it is measured is again limited.

The third tradition combines a mix of both techniques to map out sentiment on Twitter in relation to the Brexit. Celli et al. used an approach in which a classifier was trained on the basis of data that was evaluated as pro- or anti-Brexit based on hashtag categorization (Celli et al. 114). The classifier was

(23)

trained to recognize two sentiments: in favor of- or against the Brexit. Besides calculating a sentiment polarity score for each message, they calculated the level of agreement of a message in relation to the hashtag (Celli et al. 115). The study found that calculating agreement in relation to hashtags is more reliable than a sentiment polarity score. Also, this study found a small majority of anti-Brexit related feelings in the data which they examined.

The study of Khatua & Khatua followed a similar approach; using hashtags, data was classified as either favorable or unfavorable to the Brexit (Khatua and Khatua 430). However, the researchers noticed that some hashtags were ambiguous and did not formulate a clear opinion on the Brexit. Therefore, a category of mixed feelings was ascribed to the classification (Khatua and Khatua 430). Khatua & Khatua claim that their approach was accurate in predicting the outcome of the referendum: they found a small majority to be in favor of the Brexit (433). Khatua & Khatua indicate that they have categorized hashtags with the greatest possible care and that they have excluded ambiguous hashtags from the research to ensure validity. However, despite the accurate classification of hashtags by Khatua and Khatua, both the research of Celli et al. and the previously mentioned authors assume that hashtags are a good indicator of sentiment.

***

This chapter has contextualized the investigation of sentiment analysis within the third wave of digital humanities research. It also introduced a framework of tooling issues that need to be addressed when investigating potential epistemological changes through the application of tools in digital methods. Since defining research constructs is important for valid digital methods research, both emotion and affect are operationalized. Finally, Twitter has been justified as a (historical) research object. The urgency to study sentiment analysis techniques has been made clear, and related sentiment analysis research on Twitter has been discussed. Now that all related theoretical domains have been introduced, it is time to go into more detail in sentiment analysis techniques to provide insight into the underlying methodology.

(24)

2. The State of the Art: Sentiment Analysis Techniques

In order to determine the potential of SAT for internet research, it is necessary to reduce methodological obscurity in such techniques. As described in the introduction, this goal will be achieved through both a theoretical and practical approach. While coding the sentiment analysis tools, which will be discussed in chapter 3, it has become clear that there are many layers of encapsulation present in sentiment analysis techniques. Even though these black boxes have been identified during the practice of coding the actual tool, they do require a theoretical explanation in order to understand the methodological and theoretical assumptions they contain properly. Eventually, the algorithmic techniques that are discussed in this chapter will be put to the test in a practical case study. Thus, before the actual tool is deployed in a case study on Twitter in the next chapter, it is important to open up the different black boxes through theoretical research.

2.1 Sentiment Analysis: An Introduction

Sentiment analysis has a variety of names: "sentiment analyses", "opinion mining", "opinion

extraction", "sentiment mining", "subjectivity analysis", "affect analysis" and so forth: all these terms -even if they may differ slightly in their approach- fall under the general heading of "sentiment

analysis" or "opinion mining" (Liu 7). The most important common denominator of all these different applications is the automated processing of large quantities of unstructured data in order to find the opinions of authors about specific entities (Aggarwal and Zhai 1–2; Feldman 1).

Sentiment analysis could thus be considered the automated practice of distracting meaning, like opinions, from unstructured data, like Tweets, with digital technologies (Aggarwal and Zhai 1–2; Feldman 1). Sentiment analysis is a widely used technology, both in academia and beyond (Williams et al.; Zimmer and Proferes). This is partly due to the relatively easy application of the technology, and because large amounts of data can be processed in a short period of time, which would be

impossible with qualitative research (Celli et al. 116). But how does sentiment analysis technology, in general, work?

As stated earlier, sentiment analysis is a widely used technique which is used in a multitude of research paradigms, often deploying very specific and different technologies. Without pretending that the following statement is exhaustive, we can generally distinguish between two widely used

technologies: sentiment analysis based on machine learning algorithms and sentiment analysis using a lexicon based approach (Ribeiro et al. 2). Sentiment Analysis that builds on machine learning

algorithms often require manually classified data in order to build a model that can be used for predicting sentiment (Ribeiro et al. 3). Lexicon based technologies rely on a pre-classified list of words (which could be considered a model) in order to determine sentiment, and could thus be

(25)

deployed without manually classifying data (Ribeiro et al. 3). Before setting out in more detail how the technologies work and what the advantages and disadvantages are, it is important to note that a distinction can be made between the various levels of analysis that can be applied by both sentiment analysis technologies.

2.2 Different Levels of Analysis

Bing Lui distinguishes three levels of analysis: document level, sentence level and entity and aspect level (10). Document-level analysis means that texts are analyzed as a single logical text consisting of several sentences. This means that this approach assumes that there is one clear sentiment in the text directed at a single entity (B. Pang et al.; Turney). This form of analysis is also known as document-level- sentiment classification and is thus not suitable to distinguish sentiment in texts that evaluate multiple entities (Liu 10).

Sentence level analysis partly overcomes the limitation of document level analysis concerning the requirement of sentiment towards one entity. As the name suggests, this analysis level measures sentiment at the level of the sentence. This approach assumes that each sentence expresses sentiment towards one single entity in the sentence itself. According to Bing Lui, this mode of analysis is closely related to subjectivity classification. In subjectivity classification, a distinction is made between subjective and objective statements (or sentences) where it is assumed that objective statements do not express sentiment towards an entity (Wiebe et al.). In sentence-level sentiment analysis, this

assumption is operationalized by classifying objective observations as neutral sentiment (Liu 11). Even though it can be argued that this form of analysis is more accurate than the document level analysis form, it is still assumed that per sentence sentiment is attributed to a clear entity, something that does not have to be the case in practice.

The third level of analysis is done on entity and aspect level, or feature level (Liu 11). Where document- and sentence level analysis look at language constructs that contain a sentiment, like documents, paragraphs or sentences, entity and aspect level analysis look directly at the sentiment itself (11). Within this form of analysis, it is assumed that opinions consist of two parts: a sentiment and a target. A target could be understood as the entity towards which sentiment is expressed. This method of analysis, therefore, enables researchers to classify the most meaningful vehicles of sentiment on the basis of classified data. In this case, vehicles of sentiment are words that express a certain type of sentiment (Joshi and Itkat 5422). This approach is theoretically capable of providing detailed insight into sentiment. However, according to Lui, this method is the most challenging one from a technological perspective (11–12).

(26)

2.3 Different Categorization Strategies

Since sentiment analysis aims to classify linguistic constructs, sentiment analyses methods use a classification mechanism that generates a certain sentiment score as output. Depending on the type of technology used by the sentiment analysis method, the algorithm assigns a predefined value to words (lexicon-based methods) or a value from the classification model that is created by training the classifier (machine learning based methods). Since the different applications use different calculations for generating the output, it is important for now to understand the basic principles of classification (Ribeiro et al. 9–10). In general, there are two types of classification output (1) binary: e.g.,

positive/negative and (2) numerical: polarity scores (Ribeiro et al. 8).

In the classification process, a classification model (either a predefined lexicon or a custom model that is generated based on training data) is used to assign polarity scores to the words in the documents that are being classified. Using a series of calculations, the polarity scores are calculated. For example, in Figure 1, the fictional sentiment analysis method assigns values between -4 and +4 to words that are categorized as either positive or negative. Depending on the methodology, the

sentiment analysis method can either output the average sentiment scores for both positive and negative sentiment or, using a methodological specific threshold, print the dominant sentiment as classification in the output.

Figure 1. Tweet with positive and negative words highlighted, including fictitious sentiment polarity score ranging from -4 to +4 per word.

Depending on the type of calculations performed by the specific technique, the binary sentiment will probably be classified as positive since on average the tweet is more positive than negative. Both documentation techniques are applied in sentiment analysis methods, as shown in Table 4 on page 34. Where reporting a binary category (e.g. positive sentiment) is easier to understand, a polarity score per measured construct provides a clearer and more nuanced picture of the present sentiments in a text (e.g., positive: 0.6, negative: 0.4). In the example in Figure 1, the negative sentiment is likely to be ignored by a binary classification.

(27)

2.4 Lexicon Based Sentiment Analysis

Lexicon based sentiment analysis methods use a classified vocabulary to assign sentiment to text (Ding et al.; Taboada et al.; Tan et al.). The basic framework of sentiment analysis that uses a

vocabulary consists of three steps: (1) the classification model uses a predefined glossary that contains words with a polarity score. (2) The classification model assigns a polarity score to words that appear in the word list and in the text to be examined. Depending on the methodology, a binary label or a numerical polarity score is assigned. (3) The algorithm calculates the sentiment polarity and assigns this score to the text as a whole (Medhat et al. 1094). Depending on the methodology, this is based on different calculations and a different classification strategy (binary versus numerical). The simplest calculation consists of summing up the positive and negative sentiment polarity scores.

This approach was introduced by Andrew Ortony et al., who created an "affective" glossary in 1987 (Ortony et al. 341). The affective glossary consisted of 500 words that were taken from literature on emotion. The goal of the research was to find generic words that represent a certain affective or emotional condition, with the aim of easily determining emotion for a particular research object without analyzing in detail the specific conditions of the research object in question (Ortony et al. 342). Ortony et al. found the following emotional words particularly useful:

Within the taxonomy we propose, the best examples of emotion terms appear to be those that (a) refer to internal, mental conditions as opposed to physical or external ones, (b) are clear cases of states, and (c) have affect as opposed to behaviour or cognition as s predominant (rather than incidental) referential focus. (341)

The final product of Ortony et al.'s work is an affective dictionary that complies with the above-stated definition, containing words like "angry," "afraid," "calm," "fine," "lucky," etcetera (Clore et al. 764). However, although the glossary of Ortony et al. establishes the basis for sentiment analysis using a lexicon, there are still some steps to be taken before automated classification based on a lexicon can be applied (Clore et al.; Ortony et al.).

One of the first methodological frameworks for applying sentiment analysis by using a lexicon comes from Hatzivassiloglou and McKeown (Hatzivassiloglou and McKeown). These researchers developed an algorithm that could classify words as positive or negative with an accuracy of 90% (Hatzivassiloglou and McKeown 174). The limitation of this application, however, is the requirement of structured data that contains a limited number of adjectives, since the adjectives are used to classify a text. In Twitter data, for example, this is often not the case, and this approach does not work. Hu and Liu propose an alternative approach; instead of only looking at classified words that thus contain sentiment, they propose a lexicon based classifier that uses feature extraction in order to determine sentiment in more complex sentences (Hu and Liu 8). Using this approach, their lexicon-based classifier can process unstructured texts that contain multiple affective words. Ding et al. build

(28)

on this methodological approach and have introduced a framework that can process unstructured language for both language expressions based on a lexicon and context depended on emotional words, herewith refining the performance of lexicon-based SAT (Ding et al.).

2.5 Machine Learning Based Sentiment Analysis

In addition to the easy to apply sentiment analysis methods using a predefined vocabulary, the machine learning approach is a widely used tradition in sentiment analysis methods (Medhat et al.; Ribeiro et al.). As pointed out by Ribeiro et al., the main limitations of these technologies are that they are often not freely available, and researchers have to train a custom algorithm for which they need a certain level of programming knowledge (3). However, once these challenges have been overcome, sentiment analysis based on machine learning offers a great advantage: it is possible to classify context-specific emotion since custom categorizations can be added. However, why does this technology fall under machine learning and how does it work specifically?

Machine learning algorithms are a computational technique that transforms raw unstructured data into suitable internal representations, or a feature vector, from which a learning subsystem, like a classifier, can detect or classify patterns in the input data (LeCun et al. 436). The big advantage of this technology is the fact that these internal representations, or feature vectors, are not created by humans: the algorithm learns from the data using a general-purpose learning procedure (436). Where sentiment analysis methods that use a lexicon are based on human-selected words or a human-selected feature vector, this labor-intensive step is taken away by an algorithmic technique in machine learning based sentiment analysis. However, this approach does require classified data from which the algorithm can learn. The basic functioning of a machine learning algorithm that uses a classifier is displayed in the model in Figure 2 on page 29.

When applying a machine learning algorithm, two phases can be distinguished: training the classification model and deploying the classification model. First, classified training data [step 1 in Figure 2] is fed into the algorithm [step 2]. The algorithm deploys some kind of feature extraction process where specific attributes of the classified training data are collected for a given classification. Those specific attributes, or features, are stored as a feature vector model [step 3]. The algorithm [step 2] will use this model [step 3] when it is confronted with new unclassified data [step 4] when the algorithm is deployed. Using some kind of classifier, the algorithm will predict the likelihood that a text belongs to a certain class, and generate an output that contains the original unclassified data with probability scores for each class (Troussas et al. 3). However, how are the most important properties of a text identified, stored and deployed?

(29)

Figure 2. Model of methodological steps in a machine learning algorithm that uses a classifier to classify a document.

2.6 Feature Selection: Bag of Words

An essential step in the knowledge distillation process of sentiment analysis is to bring a form of structure to the unstructured original data (Tan 2). For example, a corpus of tweets could be the original data where we want to deploy sentiment analysis, but tweets do not have a fixed shape and are therefore difficult to process by algorithms. As described in the previous section, this process is referred to as feature extraction. A feature could be understood as a distinctive aspect, quality or characteristic (Gutierrez-Osuna 2). Features can thus be a multitude of entities, but in the case of sentiment analysis, it generally concerns words (Medhat et al. 1094). A commonly used method to isolate these so-called features from the original unstructured text is through an algorithmic technique that is known as the "bag of words" model (Manning et al. 107).

In essence, the bag of words model summarizes the content of a document by looking at how often each word appears in the document. This process results in a term frequency for a given document. The number of occurrences within a document is also known as weight. Based on this quantitative summary of the text, which is not affected by word order, the algorithm can store this information in a feature vector which can be used for future calculations (Manning et al. 107).

2.7 Feature Vector

Feature vectors are used to represent numeric or symbolic characteristics of features, like term

frequencies in a text (H. Pang et al.). The advantage of feature vectors is that they can store qualitative properties as a numerical value so that this transformed qualitative data can easily be used in

(30)

calculations. A feature vector should be understood as an item that contains multiple elements about an object (H. Pang et al.). When several feature vectors are combined, which all contain specific properties for the concerning feature, a feature space is created. Figure 3 shows a feature vector on the left side and a feature space on the right side.

Figure 3. On the left, a conceptual feature vector containing the most informative features for angry language. On the right, a conceptual representation of a feature space that is built of multiple feature vectors. Based on Pang, Hannah et al. Feature Vector | Brilliant Math & Science Wiki.

https://brilliant.org/wiki/feature-vector/. Accessed 22 May 2018.

Applied to the Twitter case study, we can understand feature space as follows: by using the bag of words methodology, characteristic features are isolated from the text. Features, in this case, are therefore words that belong to a certain classification category. By using the weight of the words, or word frequency, their significance for a certain classification category is expressed as a quantitative value in vector space (Perone). Based on the feature space, which could be considered a model, the machine learning algorithm can thus classify which characteristics are associated with a certain class.

Investigating Sentiment Analysis Techniques as a Method for Internet Research