Using supervised machine learning to measure resistance towards persuasion

(1)

Using supervised machine learning to measure resistance towards persuasion

submitted in partial fulfillment for the degree of master of science Henna Lee

10915842

master information studies data science

faculty of science university of amsterdam

2018-06-26

Internal Supervisor Title, Name Dr. Bob van de Velde Affiliation UvA, FNWI, IvI

(2)

Contents Contents 1 Abstract 2 1 Introduction 2 2 Related Work 3 3 Methodology 3 3.1 Data Collection 3 3.1.1 News Comments 3 3.1.2 Manual Annotations 3 3.1.3 Inter-coder Reliability 4

3.1.4 Internet Argument Corpus (IAC) 4

3.2 Variables 5

3.3 Infrastructure 5

3.4 Pre-processing 5

3.5 Feature Extraction 5

3.5.1 Term Frequency-Inverse Document Frequency (TF-IDF) 5

3.5.2 Sentiment Analysis 5

3.5.3 Word Embedding Techniques 5

3.6 Learning Algorithms 5

3.6.1 Multinomial Naïve Bayes 5

3.6.2 Support Vector Machines 5

3.6.3 Decision Trees 5

3.6.4 Stochastic Gradient Descent 5

3.6.5 Perceptron 5

3.6.6 AdaBoost 5

3.6.7 Majority Voting 5

3.7 Performance Evaluation Metrics 6

4 Results 6 4.1 First-level Classification 6 4.2 Second-level Classification 6 4.3 Sanity Check 6 4.4 Variances 8 4.4.1 Correlation 8

4.4.2 Majority Voting Method 8

4.5 Additional Dataset 9 4.6 Hypotheses Testing 9 4.6.1 H1 9 4.6.2 H2 9 4.6.3 H3 10 5 Discussion 10 6 Conclusion 11 References 11 A Appendix 11

(3)

Using supervised machine learning to measure resistance

towards persuasion

Henna Lee

University of Amsterdam hye.lee@student.uva.nl

ABSTRACT

People are motivated to resist persuasion in many situations. In recent years, scholars have identified and categorized several sistance strategies in the field of Communication Science. Yet, re-sistance to persuasion has been difficult to measure due to certain response biases, such as social desirability. This study aims to fill the academic gap by automating the measurement of resistance strategies towards persuasion by answering the first research ques-tion: a) Can resistance attempts and strategies be adequately clas-sified? By reaching an adequate level of automated classification for the strategies, this study further seeks to gain the practical un-derstanding of how everyday people use different ways to resist persuasion depending on domains and types of information. Thus, another substantial research question is: b) Do people use resis-tance strategies more or less depending on certain domains and types of online news? Using a manually structured dataset of online news comments, multiple supervised machine learning classifiers were trained and tested. In a stepwise method, the comments were first classified according to the presence or absence of a resistance attempt. Second-level classification was based on the extent of the three resistance strategies: "contesting," "empowerment," and "neg-ative affect." Results show that the best performing model for the first and second-level classifications of contesting, empowering, and negative affect strategies achieved F1 scores of .78, .75, .78, .79, respectively. Moreover, people’s use of resistance strategies was found to depend on domains (politics & economy vs. lifestyle & health) and types of news (regular vs. opinion).

KEYWORDS

Machine learning, Text classification, Data science, Persuasive com-munication, Resistance strategies, Communication science

1 INTRODUCTION

In recent decades, a number of different resistance strategies that audiences use to resist persuasion have been identified and cate-gorized [10, 11, 31, 35]. For instance, the four main strategies in the context of persuasive health communication were identified as follows, avoidance, denial, cognitive reappraisal, and suppression strategies [31]. More recently, avoidance, contesting, and empow-ering strategies were introduced and defined as the three types of overarching resistance strategies that consumers use when resisting advertisements [11].

Yet, there are known issues, such as social desirability [16], with measuring response biases [21] in an experimental setting. This study intends to reduce the problem by taking a data science ap-proach. The study implements a supervised machine learning tech-nique to detect resistance attempts using real-life examples of com-ments to major newspaper articles. Previous studies in the political

domain successfully created an automated classification of agree-ment and disagreeagree-ment in online discussions by Support Vector Machines (SVMs) with sentiment and word-embedding features [22] and a deep neural network with lexical and word vector based features [14]. Similarly, this study proposes to detect resistance attempts yet by taking a stepwise approach, further classifying the resistance attempts into the three overarching resistance strategies: contesting, empowering, and negative affect.

A dataset of 1,778 online news comments was manually struc-tured. Several classifiers and methods of feature extraction are tested and compared. Also, an existing dataset from a related pre-vious study is incorporated to determine whether it improves the overall performance of the models. In order to guide the model-building process, the following research question and sub-questions are outlined:

RQ1 Can resistance attempts and strategies be adequately classi-fied?

(1) Which feature and classifier perform the best? (2) How much variance is there within classifiers? (3) Do models improve using related datasets?

Previous studies have mainly focused on measuring one or two strategies at a time within a single domain. This makes it impossible to assess the relative use of the different strategies, assuming that resistance can be shown in different approaches depending on the domains of interests [11]. Thus, the scope of detection ought to be expanded to classifying the disagreeing texts in multiple domains into different resistant categories. The presence of resistance at-tempts and the use of strategies in different domains and types are compared based on the hypotheses argued in the next section. Below, the second research question is determined, followed by the three corresponding sub-questions:

RQ2 Do people use resistance strategies more or less depending on certain domains and types of online news?

(1) Do comments on news in the domain of Politics and the Economy show more of the contesting strategy than com-ments on news in the domain of Lifestyle and Health? (2) Do comments on news in the domain of Lifestyle and

Health show more of the empowering strategy than com-ments on news in the domain of Politics and the Economy? (3) Do comments on opinion news show more resistance at-tempts against persuasion than comments on the regular news?

(4)

2 RELATED WORK

To answer the second research question, substantial arguments must first be established with regard to the use of resistance strate-gies depending on different domains and types of news. The present study explores the three resistance strategies: contesting, empow-erment, and negative affect. The contesting strategy is defined as when individuals actively object to (a) the content of the message, (b) the source of the message, or (c) the persuasive strategies used in the message [11]. The empowerment strategy involves embold-ening one’s existing attitudes to reduce vulnerability to external influence [11]. Finally, the negative affect strategy involves respond-ing to attempted persuasion by becomrespond-ing angry, irritated, or usrespond-ing sarcasm [35].

The first two sub-questions of RQ2 concern how the news au-dience resists in different ways with regard to different domains. News in different domains may deliver very different concepts to the audience with different characteristics [6]. News about politics and the economy and news about lifestyle and health might affect commentators in different ways. For instance, news in the domains of politics and the economy might be heavily related to the audi-ence’s “concerns of deception” [8]. Previous literature has suggested that people contest content and sources of persuasion when they have suspicions and/or skepticism [11, 35]. Counter-arguing the content was found to be the most effective strategy for resisting persuasion about political issues such as the death penalty [35]. Additionally, in terms of political beliefs and judgments, the audi-ence might consume news content with disconfirmation bias, such that they spend more time and cognitive resources counter-arguing attitudinally incongruent rather than congruent arguments [28]. Based on these views, the first hypothesis was formulated:

H1 Comments on news in the domain of Politics and the Econ-omy will show more of the contesting strategy than com-ments on news in the domain of Lifestyle and Health. On the other hand, news topics related to lifestyle and health are more likely to cause “threats to freedom” and “reluctance to change” because the topics deal with specific behavior and individual threats [7, 34]. It is common for people to respond defensively by avoiding, dismissing, or denying a threat to maintain a sense of self-worth, and persuasive messages are an inherent threat to autonomy [17]. According to communication science scholars, when an individual feels threats to their freedom and a reluctance to change, they might empower themselves and their sense of self-worth [11]. Addition-ally, when perceiving a higher risk of persuasion, people feel the need for more self-protection and show more self-affirmation [30]. The second hypothesis was formulated accordingly:

H2 Comments on news in the domain of Lifestyle and Health will show more of the empowering strategy than comments on news in the domain of Politics and the Economy. The third sub-question examines to what extent commentators show resistance to persuasion depending on the type of news. With regard to opinion news, people might perceive persuasive intents

more than in the regular news since the opinion news deliver jour-nalists’ opinions in addition to factual information [5, 12]. The perception of persuasive intent is crucial in the processing of per-suasive messages. When people perceive a perper-suasive intent, they tend to avoid the message, counter-argue the message, or engage in biased processing to protect their pre-existing beliefs [15]. The Boomerang effect may be detected since this persuasion knowledge will increase resistance attempts when processing the messages [25]. Furthermore, persuasion knowledge leads to a more negative sentiment since being aware of the persuasive intent stimulates reactance from audience [11]. Thus, comments on the opinion news might show more negative sentiments, which will be a measure of the negative affect strategy. The following hypothesis was formu-lated:

H3 Comments on opinion news will show more resistance attempts against persuasion than comments on the regular news.

3 METHODOLOGY

3.1 Data Collection

3.1.1 News Comments. Initially, there was a small existing dataset of 686 comments in newspapers likeThe Guardian (n=439), The New York Times (n=57), and The Washington Post (n=190). The news articles were published from November 2016 to January 2017 and the comments were manually coded by the researcher in 2017 for an academic purpose. Its coding scheme was identical to the cod-ing scheme used in the current study; no discrepancy between datasets is expected. The news articles were from sections like Health, Lifestyle, Politics, Current Affairs, Economy, and Science and Technology. Out of 686 labeled comments, 294 of them showed the contesting strategy, 212 comments showed the empowering strategy, and 210 comments showed the negative affect strategy. These were used as a preliminary sample dataset.

In the beginning of this study, another set of news articles was newly created. For the time-span of roughly three months from January 1st to March 10th of 2018, all news articles and comments for each news section (Politics, Economy, Lifestyle, Health, and Opinion(s)) were scraped from the websites ofThe Guardian (www. theguardian.com). The links of news articles were archived and used as landing pages. Some articles did not allow comments; these were skipped during the scraping procedure. Next, only the articles that had a minimum of comments showing at least one of the resistance strategies were selected. In total, 303 news articles from The Guardian were scraped, 118 of them were from Lifestyle & Health, 135 of them were from Politics & Economy, and 50 of them were from the Opinion section.

3.1.2 Manual Annotations. The articles were read first to grasp the main arguments. Only the top-level comments about the article were selected, without regarding the comments to other comments. Up to two sentences were selected from the each comment that showed a resistance attempt using any of the resistance strategies. Resistance attempts were coded according to a binary classifica-tion. Comments that were coded as "1" were further categorized as binary classifications for each resistance strategy. A codebook

(5)

was developed based on definitions and examples of each class, as shown in Table 2. The preliminary set of comments were anno-tated again with the newly collected comments during this process. Multi-labeling was allowed, since there could be more than one strategy present in a single comment. For example, this comment, "What an insulting article. Really, WP? No, Dr. Martin Luther King, Jr. was not a Conservative," shows both irritation and objection to-wards the newspaper, which can be classified simultaneously as the negative affect and contesting strategies. As the first coder, the researcher manually coded a total of 1,778 comments showing a resistance attempt and classified them into the three resistance strategies: contesting (n=913), empowering (n=594), negative affect (n=389).

3.1.3 Inter-coder Reliability. To ensure adequate quality of coded data, a second coder was included. The second coder was a native English speaker with experience in assisting academic research project in the field of Communication Science. For each stepwise classification process, 200 randomly selected examples of the col-lected news comments were coded. The inter-coder reliability test results for the first-level and second-level classifications are found in Table 1. Unfortunately, the results were not satisfactory. Also, the Cohen’s Kappa value for the first-level classification was higher than the second-level classification. However, the Kappa values are still determined as acceptable according to a widely used standard that Kappa values that are higher than .40 could be interpreted as a fair level of agreement between coders [32].

Classification Cohen’s Kappa

First-level 0.59

Second-level 0.48

Table 1: Cohen’s Kappa results for Inter-coder reliability test

3.1.4 Internet Argument Corpus (IAC). In order to aid the effec-tiveness of the first-level classification, an extra dataset was used. This was to answer the third sub-question of RQ1, whether using a related dataset could improve the performance of the first-level classification. Internet Argument Corpus (IAC) is a corpus contain-ing texts of political debates on internet forums annotated uscontain-ing Mechanical Turk [33]. Among 390,000 posts available, a part of the texts was annotated to show "disagreeing" and "agreeing" from a range of -5 to 5. First, the texts that were annotated as "0", which showed neither agreement nor disagreement, were excluded. Next, 7,716 texts with an annotation in the range of -5 to -1 were labeled as "resistance attempt," and 1,973 texts with the annotation in the range of 1 to 5 were labeled as "no resistance attempt." An example of a text labeled as showing a resistance attempt was "Actually, they didn’t. The whole tragedy was caused by gun control," and an example of a text labeled as showing no resistance attempt was "Well, you got that one exactly right."

Resistance Strategies Definitions / Examples

Contesting

Contesting the Content Directly counter-arguing

against the claims made in messages by giving counter-arguments.“It is impossible to choose what is more sickening in this editorial: its distortion of what Rev. King stood for and fought for, which was certainly to change, not conserve, US institutional and legal racism."

Contesting the Source Countering messages by

ques-tioning the expertise and valid-ity of the source.“The research is discredited thus I can’t believe it.”

Contesting the Strategies Used Resisting by focusing on per-suasive strategies used in mes-sages, detecting persuasion tac-tics.“The author is using his per-sonal story to persuade our atti-tude change.”

Empowerment

Attitude Bolstering Generating thoughts that are

consistent with the exiting op-posite attitudes (without re-futing the arguments in mes-sages).“Yes, but there is another important factor that should be taken into account, which is. . . ”

Social Validation Counter-arguing messages by

seeking validation from sig-nificant others who also dis-agree.“There are still many other people who don’t believe in this.”

Self Assertions Asserting the self by

remind-ing that nothremind-ing can change their attitudes or behavior be-cause we are confident about ourselves.“I voted for Remain and my opinion hasn’t changed, and it will not change.”

Negative Affect Responding to the persuasive

attempt by getting angry, irri-tated, sarcastic, or upset.“1000 unlikes, so much views. Congrat-ulations Sir.”

Table 2: Definitions and examples of the three overarching resistant strategies and their sub-strategies

(6)

Figure 1: Overview of inferential relationships between comments in different news domain and resistance strate-gies proposed in H1 and H2

Figure 2: Overview of inferential relationships between comments in different news domain and resistance attempts proposed in H3

3.2 Variables

An overview of the inferential relationships proposed by the three hypotheses to RQ2 are shown in Figure 1 and 2. The first indepen-dent variable, "news domain," will be operationalized as the articles found in different sections of the newspapers’ websites. The second independent variable, "news type," will be operationalized as the ar-ticles found in the “opinion(s)” section of the newspapers’ websites. The dependent variable is the presence of a resistance attempt or a certain resistance strategy measured based on the best performing supervised learning algorithm.

3.3 Infrastructure

All the steps from pre-processing to machine learning modeling were executed on Google’s cloud computing instance with 16 CPU cores and 60GB of memory with 2 NVIDIA K80 GPUs.

3.4 Pre-processing

The scraped texts of comments are assumed to be unclean as they usually have punctuation, Emojis, URLs, and typos [20]. To clean the texts, the NLTK library were used to filter out stopwords and lemmatize the words.

3.5 Feature Extraction

3.5.1 Term Frequency-Inverse Document Frequency (TF-IDF). The TF-IDF approach is common way of extracting features in the stud-ies of text classification. By weighting each word in the text docu-ment according to uniqueness, the approach captures the relevancy among words, text documents and particular categories [19].

3.5.2 Sentiment Analysis. Previous studies have showed that sentiment is useful to classify whether there is a notion of agree-ment or disagreeagree-ment in texts from online discussions [14, 22]. In order to extract a sentiment feature, TextBlob in Python library was used. Sentiment for given sentences is measured with two parameters: a "polarity" score in a range from -1.0 (negative) to 1.0 (positive), and a "subjectivity" score in a range from 0.0 (very objective) to 1.0 (very subjective) [4].

3.5.3 Word Embedding Techniques. Two ways of word embed-ding techniques were implemented and compared. First, vectors in 300 dimensions computed using the pre-trained GloVe [24] of Word2Vec technique were averaged for each comment. Secondly, the Doc2Vec technique was used to vectorize the texts in 300 di-mensions.

3.6 Learning Algorithms

3.6.1 Multinomial Naïve Bayes. Multinomial Naïve Bayes (NB) was used since the technique has been found appropriate for text classification with the TF-IDF word representation of a weight matrix [13].

3.6.2 Support Vector Machines. Support Vector Machines is known as one of the best classifiers for text categorization [18, 23]. It seeks the decision surface that best separates positive from negative data in the hyperplane [19].

3.6.3 Decision Trees. The Decision Trees estimators build cate-gorization of texts by constructing true and false queries in a tree structure [19]. The following five variations of the Decision Trees are tested and compared: Random Forest, Extra Trees, and Gradient Boosting.

3.6.4 Stochastic Gradient Descent. This estimator implements regularized linear models with stochastic gradient descent learning, where the gradient of the loss is estimated each sample at a time and the model is updated along the way with a learning rate [3].

3.6.5 Perceptron. Perceptron is also a type of linear classifier that has the same underlying implementation as SGD, with the loss function based on perceptron parameter [2].

3.6.6 AdaBoost. An AdaBoost classifier refers to an estimator that fits a classifier on a dataset and then feeds additional copies of the classifier. It simultaneously adjusts the weights of incorrectly classified instances so that subsequent classifiers focus more on difficult or weak cases [1].

3.6.7 Majority Voting. In order to challenge the use of a single classifier in the models, a majority voting classifier that implements a set of classifiers will be tested. Selection criteria for classifiers are based on measures of diversity, such as choosing classifiers that are least correlated on the evaluation metrics of a single classifier [26].

(7)

3.7 Performance Evaluation Metrics

The hyper-parameters were selected for each model after perform-ing a grid search that was optimized based on F1 scorperform-ing. The test set of the evaluation was only gathered from the news comments, excluding the extra IAC texts. This was to ensure that the models were tested only according to the set of data relevant to the purpose of the study.

For performance evaluation of the supervised machine learning algorithms, the following metrics are considered and reported: pre-cision, recall, F1, and AUC measure. Precision represents the ratio of the number of comments correctly labeled as positive to the total number of positively classified comments [29]. Recall represents the ratio of the total number of positively labeled comments to the total comments which are truly positive [29]. The F1 measurement is the mean of precision and recall, where an F1 score reaches its best value at 1 [27]. The Area Under the Curve (AUC) measurement represents the probability that the algorithms will rank a randomly chosen positive comment higher than a randomly chosen negative comment [9].

4 RESULTS

Figure 3 displays the overall structure of two stepwise classifica-tions. Both first- and second-level classifications were binary, which detected either the presence of a resistance attempt or a certain resistance strategy. Table 3 shows the best-performing results for each estimator from GridSearch that was optimized by F1 scor-ing and three-fold cross-validation with sentiment, TF-IDF, and Word2Vec and Doc2Vec features. The classifiers with the highest f1 scores are indicated in bold for each method of feature extraction and type of classification. Their optimal parameters are found Table 9 and 10 in Appendix.

Figure 3: Overview of a stepwise classification

4.1 First-level Classification

The first-level classification detected the presence of a resistance attempt in comments. For training the models, the data of 7,716 positive labels and 1,973 negative labels from the IAC dataset was used in addition to the manually structured dataset of 1,778 com-ments. A balanced set was made by adding 7,521 randomly selected comments toThe Guardian news articles that were annotated as not showing resistance attempts. A total of 9,494 positive and 9,494 negative labels were used.

The eight classifiers were tested along with the three types of extracted features: sentiment, TF-IDF, and word-embedding fea-tures. The results were based on 17,089 train samples and 1,899 test samples for TF-IDF features, 16,976 train samples and 1,887 test samples for Word2Vec features, and 17,089 train samples and 1,899 test samples for Doc2Vec features. The number of samples used for Word2Vec features differed from that used for TF-IDF and Doc2Vec features, as some of the samples were removed because only samples with at least one of the words from the vocabulary corpus were kept. The results showed that the overall evaluation metrics of the classifiers were satisfactory. The best-performing model for the first-level classification was a model using an SVM classifier with TF-IDF features, which yielded the highest F1 score.

4.2 Second-level Classification

The second-level classification aimed to detect the extent to which the three resistance strategies were present in comments that showed a resistance attempt in the earlier step of classification. To train the models, a manually structured dataset of 860 contesting samples, 568 empowering samples, and 350 negative affect samples was used as positive labels. In total, 1,600 samples were used as a train set and 178 samples were used as a test set for TF-IDF features. In total, 1,578 train and 176 test samples were used for word-embedding features. The results showed that the MultinomialNB classifier and TF-IDF features yielded the best F1 score for classifying contesting and empowering strategies, and the SGD classifier and word2vec features had the best F1 score for classifying negative strategies.

4.3 Sanity Check

As a sanity check of the first-level classification, the percentages of positively labeled comments of annotated articles were aver-aged and found to be 0.094 per article. This could be generally assumed as meaning that for every ten comments, a resistance attempt is shown. To ensure external validity of the prediction for each model and method of feature extraction, the predicted means of the unlabeled comments (n = 14,742) from 12 randomly selected articles were compared, as shown in Table 4. Overall, the scores of these were higher than those of the annotated labels. The best-performing model based on F1 score using the SVM classifier with TF-IDF features had a .207 mean score per article, which was moderately acceptable. For models using sentiment and doc2vec features, inconsistent scores were found among different classifiers.

(8)

Estimator Any Resistance Strategy Contesting Strategy Emp o w ering Strategy Negativ e Affe ct Strategy Pr ecision Re call F1 scor e AUC Pr ecision Re call F1 scor e AUC Pr ecision Re call F1 scor e AUC Pr ecision Re call F1 scor e AUC Sentiment SVM .59 .51 .59 .51 .56 .53 .56 .53 .60 .37 .60 .51 .31 .26 .31 .48 Random For est .53 .53 .53 .53 .56 .54 .56 .53 .57 .61 .57 .51 .64 .70 .64 .52 Extra T re es .52 .52 .52 .52 .53 .53 .53 .53 .60 .62 .60 .54 .64 .70 .64 .52 A dab o ost .53 .53 .53 .53 .60 .55 .60 .54 .52 .63 .52 .49 .72 .74 .72 .52 GB .47 .47 .47 .48 .58 .55 .58 .54 .55 .62 .55 .50 .65 .67 .65 .55 SGD .25 .50 .25 .50 .26 .51 .26 .50 .44 .66 .44 .50 .53 .73 .53 .50 Per ceptr on .56 .52 .56 .52 .26 .51 .26 .50 .58 .54 .58 .53 .53 .73 .53 .50 TF-IDF MultinomialNB .78 .76 .78 .77 .75 .72 .75 .72 .78 .66 .78 .53 .63 .79 .63 .50 SVM .78 .78 .78 .78 .58 .56 .58 .54 .58 .50 .58 .54 .59 .40 .59 .39 Random For est .72 .72 .72 .72 .72 .72 .72 .72 .61 .65 .61 .54 .76 .80 .76 .58 Extra T re es .74 .74 .74 .74 .74 .74 .74 .74 .66 .68 .66 .60 .74 .74 .74 .60 A dab o ost .70 .67 .70 .67 .66 .66 .66 .65 .63 .65 .63 .58 .71 .72 .71 .56 GB .72 .72 .72 .72 .74 .74 .74 .74 .63 .63 .63 .59 .78 .78 .78 .66 SGD .76 .73 .76 .74 .75 .53 .75 .51 .13 .36 .13 .50 .63 .79 .63 .50 Per ceptr on .70 .68 .70 .67 .35 .42 .35 .44 .60 .62 .60 .56 .63 .79 .63 .50 W or d2V ec SVM .63 .52 .63 .51 .69 .68 .69 .68 .61 .60 .61 .57 .68 .68 .68 .57 Random For est .72 .72 .72 .72 .69 .68 .69 .68 .66 .68 .66 .60 .74 .77 .74 .57 Extra T re es .73 .73 .73 .73 .70 .69 .70 .69 .68 .69 .68 .59 .78 .78 .78 .57 A dab o ost .71 .71 .71 .71 .70 .69 .70 .69 .12 .34 .12 .50 .75 .77 .75 .59 GB .74 .74 .74 .74 .41 .41 .41 .41 .69 .71 .69 .63 .73 .75 .73 .62 SGD .73 .73 .73 .73 .68 .60 .68 .60 .72 .72 .72 .70 .79 .80 .79 .62 Per ceptr on .62 .51 .62 .52 .75 .52 .75 .51 .72 .63 .72 .67 .75 .77 .75 .58 Do c2V ec SVM .52 .51 .52 .51 .42 .48 .42 .48 .71 .37 .71 .55 .72 .71 .72 .59 Random For est .66 .66 .66 .66 .57 .56 .57 .56 .58 .63 .58 .47 .66 .75 .66 .51 Extra T re es .66 .66 .66 .66 .56 .56 .56 .55 .61 .67 .61 .51 .75 .79 .75 .58 A dab o ost .67 .67 .67 .67 .56 .56 .56 .56 .07 .27 .07 .50 .72 .25 .72 .51 GB .69 .69 .69 .69 .60 .60 .60 .59 .63 .53 .63 .54 .67 .52 .67 .53 SGD .66 .64 .66 .64 .60 .60 .60 .60 .53 .73 .53 .50 .73 .77 .73 .57 Per ceptr on .52 .52 .52 .52 .26 .51 .26 .50 .67 .67 .67 .57 .71 .71 .71 .59 T able 3: Best p erforming sentiment, TF-IDF , W or d2V e c and Do c2V e c featur es teste d using estimators 7

(9)

Predicted Mean Sentiment TF-IDF W2V D2V Estimator

MultinomialNB N/A .288 N/A N/A

SVM 1.000 .207 .960 .996 Random Forest .495 .176 .207 .405 Extra Trees .338 .243 .201 .399 Adaboost .533 .168 .216 .367 Gradient Boosting .779 .154 .219 .417 SGD .937 .968 .223 .999 Perceptron .000 .154 .018 .976

Table 4: Mean predicted scores of un-labeled sample com-ments for first-level classification

4.4 Variances

4.4.1 Correlation. In order to answer the second sub-question of RQ1, variances within models using different classifiers were examined. First, correlation coefficients were computed to assess the relationship between the models. Table 5 shows a correlation matrix of evaluation metrics, precision, recall, F1, and auc scores of the models. The features that were found to perform the best in the earlier section were used in the models. Those for detecting any resistance, contesting, and empowering strategies used TF-IDF features, while the model for detecting negative affect strategies used word2vec features.

Overall, there was a strong, positive correlation between evalua-tion metrics of models for classificaevalua-tions of any resistance strategies and negative affect strategies. However, the classifiers for contest-ing and empowercontest-ing strategies showed a relatively wider range of correlations. For instance, Perceptron seemed to be negatively correlated with other classifiers for both models of contesting and empowering strategies.

Figure 4: Comparison of evaluation metrics of classifiers for first and second-level classifications using best performing features

Any Resistance Strategy

Estimator MNB GB SVM ETC SGD RF ADA PER MNB 1.00 .93 .94 .54 1.00 .98 .98 .95 GB .93 1.00 .75 .19 .96 .99 .98 1.00 SVM .94 .75 1.00 .79 .91 .85 .87 .80 ETC .54 .19 .79 1.00 .47 .35 .37 .26 SGD 1.00 .96 .91 .47 1.00 .99 .99 .97 RF .98 .99 .85 .35 .99 1.00 1.00 1.00 ADA .98 .98 .87 .37 .99 1.00 1.00 .99 PER .95 1.00 .80 .26 .97 1.00 .99 1.00 Contesting Strategy

Estimator MNB GB SVM ETC SGD RF ADA PER SGD .99 .87 .96 .87 1.00 .57 .72 -1.00 RF .67 .90 .77 .08 .57 1.00 .98 -.64 SVM .99 .97 1.00 .69 .96 .77 .89 -.98 GB .93 1.00 .97 .51 .87 .90 .97 -.91 ETC .79 .51 .69 1.00 .87 .08 .28 -.82 PER -1.00 -.91 -.98 -.82 -1.00 -.64 -.78 1.00 ADA .81 .97 .89 .28 .72 .98 1.00 -.78 MNB 1.00 .93 .99 .79 .99 .67 .81 -1.00 Empowering Strategy

Estimator MNB GB SVM ETC SGD RF ADA PER MNB 1.00 .80 .62 .77 -.99 .67 .67 .62 SVM .62 .02 1.00 -.03 -.73 -.17 -.17 -.24 GB .80 1.00 .02 1.00 -.70 .98 .98 .97 SGD -.99 -.70 -.73 -.66 1.00 -.55 -.55 -.49 ADA .67 .98 -.17 .99 -.55 1.00 1.00 1.00 ETC .77 1.00 -.03 1.00 -.66 .99 .99 .98 PER .62 .97 -.24 .98 -.49 1.00 1.00 1.00 RF .67 .98 -.17 .99 -.55 1.00 1.00 1.00 Negative Affect Strategy

Estimator GB SVM ETC RF SGD ADA PER GB 1.00 .98 .99 1.00 .99 1.00 1.00 SVM .98 1.00 1.00 .98 1.00 .99 .99 ETC .99 1.00 1.00 .99 1.00 .99 .99 RF 1.00 .98 .99 1.00 .99 1.00 1.00 SGD .99 1.00 1.00 .99 1.00 .99 1.00 ADA 1.00 .99 .99 1.00 .99 1.00 1.00 PER 1.00 .99 .99 1.00 1.00 1.00 1.00

Table 5: Correlation matrix of evaluation metrics of the best performing models of first and second-level classifications per each classifier

4.4.2 Majority Voting Method. Figure 4 displays the amount of variance in each evaluation metric within classifiers for the first-and second-level classifications. The boxplots show that there is a fair amount of agreement within the classifiers for empowering and negative affect strategies. However, the classifiers for contesting strategies vary much more.

To determine if the classifiers could aid one another for contest-ing strategies, a hard Majority Votcontest-ing method was implemented. The single best-performing classifier, MultinomialNB, its least corre-lated classifier, Perceptron, and the other classifiers were combined

(10)

and tested. As shown in Table 6, the combined models did not no-tably improve the performance of classifying contesting strategies. estimator_1 estimator_2 estimator_3 precision recall f1 roc_auc

ETC PER MNB .73 .73 .73 .73 RF PER MNB .72 .72 .72 .72 GB PER MNB .72 .71 .72 .71 ADA PER MNB .72 .71 .72 .71 SGD PER MNB .70 .68 .70 .68 SVM PER MNB .68 .68 .68 .68

Table 6: Evaluation metrics of majority voting models using multiple classifiers

4.5 Additional Dataset

Figure 5 shows comparisons of evaluation metrics of eight classifiers with TF-IDF features for the first-level classification between the original and combined data with the additional IAC dataset. This answers the third sub-question of RQ2 by clearly displaying that adding the IAC dataset improves the overall performance of the models. Thus, the additional dataset was incorporated into the final model of first-level classification used for hypothesis testing in later sections.

Figure 5: Comparison of evaluation metrics of eight clas-sifiers with TF-IDF features for first-level classification be-tween the original and combined data with additional IAC dataset

4.6 Hypotheses Testing

To test the hypotheses, comments were randomly sampled from news articles on politics & economy (n=20,000), lifestyle & health (n=20,000), and opinion sections (n=40,000). The sample comments from the politics & economy and lifestyle & health sections were merged into the samples of regular news comments (n=40,000). The texts were pre-processed in the same way as the texts used to train and test the supervised machine learning models. Only the sam-ples that were classified as showing a first-level resistance attempt were used for hypotheses testing. Table 7 shows the number of

sampled comments that were classified as showing any, contesting, empowering and negative affect strategies.

An independent-samples t-test was used with a significance threshold of .05 to compare the means of the resistance scores calculated using a number of sampled comments. Table 8 shows the mean resistance scores of the sampled comments from news in different domains and types. Estimated standard errors are given within brackets.

Pol/Eco Life/Health Opinion Regular

Strategies

Any Strategy 2,868 6,240 9,720 8,932

Contesting 2,175 3,478 7,031 5,609

Empowering 317 1,699 2,030 1,966

Negative Affect 312 505 708 780

Table 7: Number of sampled comments from news in differ-ent domains and types classified by first and second-level classifications

Pol/Eco Life/Health Opinion Regular

Strategies Any Strategy .143* .312* .243* .223* (.002) (.003) (.002) (.002) Contesting .758* .557* .723* .628* (.008) (.006) (.005) (.005) Empowering .111* .272* .209* .220* (.006) (.006) (.004) (.004) Negative Affect .111* .082* .073* .088* (.006) (.003) (.003) (.003)

Note: * indicates significance at the 99% level.

Table 8: Mean of resistance scores of sampled comments from news in different domains and types

4.6.1 H1. The first hypothesis stated that comments on news in the domain of Politics and the Economy will show more of the contesting strategy than comments on news in the domain of Lifestyle and Health. This was confirmed by the results that there was a significant difference in the resistance scores for Politics & Economy News (M=.758, SE=.008) and Lifestyle & Health (M=.557, SE=.006) conditions; t(9,106)=18.709, p=.000. These findings suggest that comments to Politics & Economy news are more likely to show more of contesting strategies than comments to Lifestyle & Health news. Figure 6 compares the predicted means of comments from news on Politics & Economy and Lifestyle & Health. Error bars indicate the standard error of the mean. Standard errors are very small due to large sample numbers.

4.6.2 H2. The second hypothesis stated that comments on news in the domain of Lifestyle & Health will show more of the em-powering strategy than comments on news in the domain of Poli-tics & Economy. This was also confirmed. There was a significant difference in the resistance scores for Politics & Economy news (M=.111, SE=.006) and Lifestyle & Health (M=.272, SE=.006) con-ditions; t(9,106)=-17.558, p=.000. This suggests that comments to

(11)

Lifestyle & Health news are more likely to show empowering resis-tance strategies than comments to Politics & Economy news.

Figure 6: Bar chart of predicted means of politics/economy vs. lifestyle/health news comments

Figure 7: Bar chart of predicted means of regular vs. opinion news comments

4.6.3 H3. Lastly, the third hypothesis stated that comments on opinion news will show more resistance attempts against persua-sion than comments on the regular news. As shown in Figure 7, there was a significant difference in the resistance scores for Opin-ion news (M=.243, SE=.002) and Regular news (M=.223, SE=.003) conditions; t(79,998)=-6.591, p=.000. These results suggest that news type has an influence over the resistance attempts of the commenta-tors. Comments to opinion news are more likely to show resistance attempts than comments to regular news.

5 DISCUSSION

For each type of classification, a satisfactory level of performance was achieved. However, there were high variances between models using different classifiers for first-level and contesting strategies. Models for classifying contesting strategies had the most discrep-ancies. One reason for this might be that the use of contesting strategies includes some extent of context about the persuasive information presented in the news articles. Thus, it must be diffi-cult to detect the linguistic patterns of text that contests authors or contains persuasive content.

The three substantial hypotheses were all confirmed. Using the real-life examples of resistance attempts and strategies, a compari-son of multiple domains and types of news was successfully made. In terms of quantity of classified comments on the three strategies, contesting strategies were classified as positive the most, followed by empowering and negative affect strategies. This was similar to the manually annotated sample. Contesting strategies were found the most, and negative affect strategies were found the least during the coding process. Moreover, there were some interesting findings that were not covered in the hypotheses. Comments on Lifestyle & Health news showed more resistance attempts on average than comments on Politics & Economy news. In addition, contesting strategies were found more in Opinion news than in regular news. Meanwhile, limitations of the current study should be discussed. First, the selection of news comments could be biased due to the filtering in the comment sections of newspapers’ websites for of-fensive or insulting comments made by users. This could have led to a failure to capture the whole range of resistance attempts in the comments. Secondly, the moderate value of intercoder reliabil-ity could be improved by spending more time on coder training. Thirdly, an additional IAC dataset was used with the models for the first-level classification, since it improved the overall performance. However, since the IAC dataset was only composed of political ar-guments, it could have biased the models into detecting resistance toward political persuasion. Fourth, the evaluation of the models could be richer; for instance, time measurement for training the models could be considered. Fifth, the selection of sources for com-ments could be expanded, since there might be differences between users reading the major newspapers and those reading other types of newspapers, such as specialized news outlets and online news magazines.

For future research, a further analysis of the subcategories of the three overarching strategies might yield insightful findings. A greater number of annotated samples per strategy might lead to better predictions. In addition, adding sentiment features to the TF-IDF and word-embedding features did not improve the perfor-mance; thus, the results were not reported. The models solely using the sentiment features did not perform well. However, since one’s resistance is closely related to one’s sentiment, for future studies, measuring the sentiment of texts in different ways could be in-teresting. Moreover, the majority voting classifier model did not outperform the single classifier model. This was in line with a dis-cussion from previous studies suggesting that there is no reliable finding indicating that combined classifiers always outperform the best individual classifier. [26]. Weighting classifiers could be used in future studies. Lastly, a deep neural network approach would

(12)

be interesting to this text classification task, as done in a previous study [14].

The substantial findings from the second research question are applicable to not only academic scholars but also practitioners in the field of persuasive communications, marketing, and public relations. In a general sense, the understanding of how people use different resistance strategies depending on domains and types of informa-tion could aid communicainforma-tion specialists to overcome resistance from audiences by propositioning persuasive messages accordingly. For instance, when a message in the domain of Lifestyle & Health should be delivered, it should appeal more toward overcoming the empowering strategy. On the other hand, for a persuasive message in the domain of Politics & Economy, the message should contain more information that can overcome people’s contesting attitudes toward authors and content.

6 CONCLUSION

By using online news comments, this study aimed to automate the measurement of the presence of resistance attempts and three over-arching resistance strategies: contesting, empowering, and negative affect. A step-wise classification model was successfully built. The presence of resistance attempts was predicted on the first level, and then the presence of the three strategies was predicted on the sec-ond level. Several classifiers and features were used and compared. Furthermore, inferential relationships between news domains and types and the use of resistance strategies were examined by using the classification models. The results showed that people’s use of different resistance strategies depends on domains and types of online news. Firstly, comments on news in the domain of Politics & Economy showed significantly more contesting strategies than comments on news in the domain of Lifestyle & Health. Secondly, comments on news in the domain of Lifestyle & Health showed significantly more empowering strategies than comments on news in the domain of Politics & Economy. Lastly, comments on Opin-ion news showed significantly more resistance attempts against persuasion than comments on regular news.

REFERENCES

[1] [n. d.]. sklearn.ensemble.AdaBoostClassifier — scikit-learn 0.19.1 docu-mentation. http://scikit-learn.org/stable/modules/generated/sklearn.ensemble. AdaBoostClassifier.html. (Accessed on 06/13/2018). [2] [n. d.]. sklearn.linear_model.Perceptron. http://scikit-learn.org/stable/modules/ generated/sklearn.linear_model.Perceptron.html. (Accessed on 05/01/2018). [3] [n. d.]. sklearn.linear_model.SGDClassifier. http://scikit-learn.org/stable/ modules/generated/sklearn.linear_model.SGDClassifier.html. (Accessed on 05/01/2018).

[4] [n. d.]. Tutorial: Quickstart — TextBlob 0.15.1 documentation. http://textblob. readthedocs.io/en/dev/quickstart.html. (Accessed on 06/04/2018).

[5] Fatima A Al Kohlani. 2010.The function of discourse markers in Arabic newspaper opinion articles. Georgetown University.

[6] Kevin Coe, Kate Kenski, and Stephen A Rains. 2014. Online and uncivil? Patterns and determinants of incivility in newspaper website comments. Journal of Communication 64, 4 (2014), 658–679.

[7] Geoff Cooper, Nicola Green, Kate Burningham, David Evans, and Tim Jackson. 2012. Unravelling the threads: Discourses of sustainability and consumption in an online forum.Environmental Communication: a journal of nature and culture 6, 1 (2012), 101–118.

[8] Claes H De Vreese. 2005. News framing: Theory and typology. Information Design Journal & Document Design 13, 1 (2005).

[9] Tom Fawcett. 2006. An introduction to ROC analysis.Pattern recognition letters 27, 8 (2006), 861–874.

[10] Marieke Fransen, Claartje ter Hoeven, and Peeter Verlegh. 2013. Strategies to resist advertising.ACR North American Advances (2013).

[11] Marieke L Fransen, Edith G Smit, and Peeter WJ Verlegh. 2015. Strategies and motives for resistance to persuasion: an integrative framework. Frontiers in psychology 6 (2015), 1201.

[12] Guy J Golan. 2013. The gates of op-ed diplomacy: Newspaper framing the 2011 Egyptian revolution.International Communication Gazette 75, 4 (2013), 359–373. [13] Aysun Güran, Selim Akyokuş, Nilgün Güler Bayazıt, and M Zahid Gürbüz. 2009.

Turkish text categorization using N-gram words. InProceedings of the Interna-tional Symposium on Innovations in Intelligent Systems and Applications (INISTA 2009). 369–373.

[14] Sushant Hiray and Venkatesh Duppada. 2017. Agree to Disagree: Improving Disagreement Detection with Dual GRUs.arXiv preprint arXiv:1708.05582 (2017). [15] R Lance Holbert, John M Tchernev, Whitney O Walther, Sarah E Esralew, and Kathryn Benski. 2013. Young voter perceptions of political satire as persuasion: A focus on perceived influence, persuasive intent, and message strength.Journal of Broadcasting & Electronic Media 57, 2 (2013), 170–186.

[16] Allyson L Holbrook, Melanie C Green, and Jon A Krosnick. 2003. Telephone versus face-to-face interviewing of national probability samples with long ques-tionnaires: Comparisons of respondent satisficing and social desirability response bias.Public opinion quarterly 67, 1 (2003), 79–125.

[17] Mikayla Jenkins and Marko Dragojevic. 2013. Explaining the process of resistance to persuasion: A politeness theory-based approach.Communication Research 40, 4 (2013), 559–590.

[18] Thorsten Joachims. 1998. Text categorization with support vector machines: Learning with many relevant features. InEuropean conference on machine learning. Springer, 137–142.

[19] Aurangzeb Khan, Baharum Baharudin, Lam Hong Lee, and Khairullah Khan. 2010. A review of machine learning algorithms for text-documents classification. Journal of advances in information technology 1, 1 (2010), 4–20.

[20] Varada Kolhatkar, Hanhan Wu, Luca Cavasso, Emilie Francis, Kavan Shukla, and Maite Taboada. 2018. The SFU Opinion and Comments Corpus: A Corpus for the Analysis of Online News Comments. (2018).

[21] Neil A Macmillan and C Douglas Creelman. 1990. Response bias: Characteristics of detection theory, threshold theory, and" nonparametric" indexes.Psychological Bulletin 107, 3 (1990), 401.

[22] Stefano Menini and Sara Tonelli. 2016. Agreement and disagreement: Comparison of points of view in the political domain. InCOLING 2016, the 26th International Conference on Computational Linguistics. 2461–2470.

[23] Bo Pang, Lillian Lee, and Shivakumar Vaithyanathan. 2002. Thumbs up?: sen-timent classification using machine learning techniques. InProceedings of the ACL-02 conference on Empirical methods in natural language processing-Volume 10. Association for Computational Linguistics, 79–86.

[24] Jeffrey Pennington, Richard Socher, and Christopher Manning. 2014. Glove: Global vectors for word representation. InProceedings of the 2014 conference on empirical methods in natural language processing (EMNLP). 1532–1543. [25] Debra Jones Ringold. 2002. Boomerang effects in response to public health

interventions: Some unintended consequences in the alcoholic beverage market. Journal of Consumer Policy 25, 1 (2002), 27–63.

[26] Dymitr Ruta and Bogdan Gabrys. 2005. Classifier selection for majority voting. Information fusion 6, 1 (2005), 63–81.

[27] Marina Sokolova, Nathalie Japkowicz, and Stan Szpakowicz. 2006. Beyond ac-curacy, F-score and ROC: a family of discriminant measures for performance evaluation. InAustralasian joint conference on artificial intelligence. Springer, 1015–1021.

[28] Charles S Taber and Milton Lodge. 2006. Motivated skepticism in the evaluation of political beliefs.American Journal of Political Science 50, 3 (2006), 755–769. [29] Abinash Tripathy, Ankit Agrawal, and Santanu Kumar Rath. 2016. Classification

of sentiment reviews using n-gram machine learning approach.Expert Systems with Applications 57 (2016), 117–126.

[30] Anne-Marie Van Prooijen, Paul Sparks, and Donna C Jessop. 2013. Promoting or jeopardizing lighter carbon footprints? Self-affirmation can polarize envi-ronmental orientations.Social Psychological and Personality Science 4, 2 (2013), 238–243.

[31] Jonathan van ‘t Riet and Robert AC Ruiter. 2013. Defensive reactions to health-promoting information: An overview and implications for future research.Health Psychology Review 7, sup1 (2013), S104–S136.

[32] Anthony J Viera, Joanne M Garrett, et al. 2005. Understanding interobserver agreement: the kappa statistic.Fam Med 37, 5 (2005), 360–363.

[33] Marilyn A Walker, Jean E Fox Tree, Pranav Anand, Rob Abbott, and Joseph King. 2012. A Corpus for Research on Deliberation and Debate.. InLREC. 812–817. [34] Itzhak Yanovitzky and Courtney Bennett. 1999. Media attention, institutional

response, and health behavior change: The case of drunk driving, 1978-1996. Communication Research 26, 4 (1999), 429–453.

[35] Julia Zuwerink Jacks and Kimberly A Cameron. 2003. Strategies for resisting persuasion.Basic and Applied Social Psychology 25, 2 (2003), 145–161.

A

APPENDIX

(13)

TFIDF Mo dels Any Resistance Strategy Estimator Parameters(TF-IDF featur es) Parameters( classifier ) MultinomialNB {’ngram_range ’: (1, 1), ’max_df ’: 0.5, ’use_idf ’: T rue , ’norm’: ’l2’} {’ class_prior’: None , ’fit_prior’: T rue , ’alpha’: 2} SVM {’ngram_range ’: (1, 2), ’max_df ’: 1.0, ’use_idf ’: T rue , ’norm’: ’l2’} {’kernel’: ’rbf ’, ’C’: 1.0, ’gamma’: 1} GradientBo ostingClassifier {’ngram_range ’: (1, 2), ’max_df ’: 1.0, ’use_idf ’: T rue , ’norm’: ’l2’} {’max_depth’: 5, ’learning_rate ’: 0.1, ’min_samples_split’: 200, ’n_estimators’: 300, ’min_samples_leaf ’: 1} ExtraT re esClassifier {’ngram_range ’: (1, 1), ’max_df ’: 0.5, ’use_idf ’: T rue , ’norm’: ’l2’} {’n_estimators’: 30, ’criterion’: ’gini’} SGD {’ngram_range ’: (1, 1), ’max_df ’: 1.0, ’use_idf ’: T rue , ’norm’: ’l2’} {’alpha’: 0.01, ’max_iter’: 1} RandomFor estClassifier {’ngram_range ’: (1, 1), ’max_df ’: 1.0, ’use_idf ’: T rue , ’norm’: ’l2’} {’min_samples_leaf ’: 8, ’n_estimators’: 30, ’max_depth’: 20} A daBo ostClassifier {’ngram_range ’: (1, 2), ’max_df ’: 0.75, ’use_idf ’: T rue , ’norm’: ’l2’} {’n_estimators’: 30, ’learning_rate ’: 1.0} Per ceptr on {’ngram_range ’: (1, 1), ’max_df ’: 1.0, ’use_idf ’: False , ’norm’: ’l2’} {’p enalty’: ’l2’ , ’alpha’: 0.01} Contesting Strategy Estimator Parameters(TF-IDF featur es) Parameters( classifier ) SGDClassifier {’smo oth_idf ’: False , ’max_featur es’: 30000, ’norm’: ’l2’ , ’max_df ’: 0.25, ’analyzer’: ’w or d’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 1), ’use_idf ’: T rue} {’max_iter’: 20, ’alpha’: 0.01} GradientBo ostingClassifier {’smo oth_idf ’: False , ’max_featur es’: 10000, ’norm’: ’l2’ , ’max_df ’: 0.5, ’analyzer’: ’w or d’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 1), ’use_idf ’: T rue} {’max_depth’: 5, ’learning_rate ’: 0.1, ’min_samples_split’: 200, ’n_estimators’: 300, ’min_samples_leaf ’: 1} MultinomialNB {’max_featur es’: 20000, ’smo oth_idf ’: T rue , ’norm’: ’l2’ , ’max_df ’: 0.5, ’analyzer’: ’w or d’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 1), ’use_idf ’: T rue} {’fit_prior’: T rue , ’alpha’: 2, ’class_prior’: None} RandomFor estClassifier {’smo oth_idf ’: T rue , ’max_featur es’: 20000, ’norm’: ’l1’ , ’max_df ’: 0.25, ’analyzer’: ’w or d’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 1), ’use_idf ’: False} {’max_depth’: 20, ’n_estimators’: 30, ’min_samples_leaf ’: 2} ExtraT re esClassifier {’smo oth_idf ’: T rue , ’max_featur es’: 30000, ’norm’: ’l2’ , ’max_df ’: 0.75, ’analyzer’: ’w or d’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 1), ’use_idf ’: False} {’ criterion’: ’gini’ , ’n_estimators’: 30} A daBo ostClassifier {’smo oth_idf ’: T rue , ’max_featur es’: 10000, ’norm’: ’l2’ , ’max_df ’: 0.75, ’analyzer’: ’char’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 2), ’use_idf ’: T rue} {’learning_rate ’: 0.3, ’n_estimators’: 30} SVM {’smo oth_idf ’: T rue , ’max_featur es’: 10000, ’norm’: ’l2’ , ’max_df ’: 0.75, ’analyzer’: ’char’ , ’sublinear_tf ’: False , ’ngram_range ’: (1, 3), ’use_idf ’: T rue} {’gamma’: 1, ’max_iter’: 100, ’pr obability’: T rue , ’C’: 0.1, ’kernel’: ’linear’} Per ceptr on {’smo oth_idf ’: T rue , ’max_featur es’: 10000, ’norm’: ’l2’ , ’max_df ’: 0.5, ’analyzer’: ’char’ , ’sublinear_tf ’: T rue , ’ngram_range ’: (1, 2), ’use_idf ’: False} {’p enalty’: ’l2’ , ’alpha’: 0.1} Emp o w ering Strategy Estimator Parameters(TF-IDF featur es) Parameters( classifier ) MultinomialNB {’max_featur es’: 20000, ’analyzer’: ’w or d’ , ’smo oth_idf ’: T rue , ’norm’: ’l2’ , ’sublinear_tf ’: False , ’use_idf ’: T rue , ’max_df ’: 0.5, ’ngram_range ’: (1, 1)} {’fit_prior’: T rue , ’class_prior’: None , ’alpha’: 2} A daBo ostClassifier {’max_featur es’: 30000, ’analyzer’: ’w or d’ , ’norm’: ’l1’ , ’smo oth_idf ’: False , ’sublinear_tf ’: False , ’use_idf ’: T rue , ’max_df ’: 0.25, ’ngram_range ’: (1, 3)} {’n_estimators’: 30, ’learning_rate ’: 1.0} GradientBo ostingClassifier {’max_featur es’: 20000, ’analyzer’: ’w or d’ , ’norm’: ’l1’ , ’smo oth_idf ’: T rue , ’sublinear_tf ’: False , ’use_idf ’: T rue , ’max_df ’: 0.75, ’ngram_range ’: (1, 3)} {’n_estimators’: 100, ’learning_rate ’: 0.3, ’max_depth’: 10, ’min_samples_split’: 1000, ’min_samples_leaf ’: 10} ExtraT re esClassifier {’max_featur es’: 20000, ’analyzer’: ’w or d’ , ’norm’: ’l2’ , ’smo oth_idf ’: False , ’sublinear_tf ’: T rue , ’use_idf ’: T rue , ’max_df ’: 0.25, ’ngram_range ’: (1, 1)} {’n_estimators’: 30, ’criterion’: ’entr op y’} SVM {’max_featur es’: 10000, ’analyzer’: ’w or d’ , ’norm’: ’l2’ , ’smo oth_idf ’: False , ’sublinear_tf ’: T rue , ’use_idf ’: T rue , ’max_df ’: 0.5, ’ngram_range ’: (1, 1)} {’kernel’: ’rbf ’, ’pr obability’: T rue , ’gamma’: 0.001, ’C’: 1.0, ’max_iter’: 100} RandomFor estClassifier {’max_featur es’: 30000, ’analyzer’: ’char’ , ’norm’: ’l1’ , ’smo oth_idf ’: False , ’sublinear_tf ’: T rue , ’use_idf ’: False , ’max_df ’: 0.25, ’ngram_range ’: (1, 3)} {’n_estimators’: 10, ’min_samples_leaf ’: 1, ’max_depth’: 20} Per ceptr on {’max_featur es’: 20000, ’analyzer’: ’w or d’ , ’norm’: ’l2’ , ’smo oth_idf ’: T rue , ’sublinear_tf ’: T rue , ’use_idf ’: False , ’max_df ’: 0.75, ’ngram_range ’: (1, 3)} {’p enalty’: ’l2’ , ’alpha’: 0.01} SGDClassifier {’max_featur es’: 20000, ’analyzer’: ’w or d’ , ’norm’: ’l2’ , ’smo oth_idf ’: T rue , ’sublinear_tf ’: False , ’use_idf ’: T rue , ’max_df ’: 0.5, ’ngram_range ’: (1, 1)} {’max_iter’: 10, ’alpha’: 10.0} Negativ e Affe ct Strategy Estimator Parameters(TF-IDF featur es) Parameters( classifier ) GradientBo ostingClassifier {’max_featur es’: 1000, ’norm’: ’l1’ , ’smo oth_idf ’: T rue , ’max_df ’: 0.25, ’sublinear_tf ’: T rue , ’analyzer’: ’w or d’ , ’ngram_range ’: (1, 3), ’use_idf ’: False} {’n_estimators’: 300, ’min_samples_leaf ’: 1, ’max_depth’: 10, ’learning_rate ’: 1.0} MultinomialNB {’norm’: ’l2’ , ’max_featur es’: 20000, ’smo oth_idf ’: T rue , ’max_df ’: 0.5, ’sublinear_tf ’: False , ’use_idf ’: T rue , ’ngram_range ’: (1, 1), ’analyzer’: ’w or d’} {’fit_prior’: T rue , ’alpha’: 2, ’class_prior’: None} A daBo ostClassifier {’max_featur es’: 1000, ’norm’: ’l2’ , ’smo oth_idf ’: T rue , ’max_df ’: 0.25, ’sublinear_tf ’: T rue , ’analyzer’: ’char’ , ’ngram_range ’: (1, 3), ’use_idf ’: T rue} {’n_estimators’: 30, ’learning_rate ’: 1.0} RandomFor estClassifier {’max_featur es’: 1000, ’norm’: ’l1’ , ’smo oth_idf ’: T rue , ’max_df ’: 0.75, ’sublinear_tf ’: T rue , ’analyzer’: ’char’ , ’ngram_range ’: (1, 3), ’use_idf ’: T rue} {’n_estimators’: 10, ’min_samples_leaf ’: 1, ’max_depth’: 20} SVM {’max_featur es’: 2000, ’norm’: ’l2’ , ’smo oth_idf ’: T rue , ’max_df ’: 0.75, ’sublinear_tf ’: False , ’analyzer’: ’char’ , ’ngram_range ’: (1, 3), ’use_idf ’: False} {’max_iter’: 100, ’gamma’: 1, ’C’: 0.1, ’kernel’: ’linear’ , ’pr obability’: T rue} ExtraT re esClassifier {’max_featur es’: 1000, ’norm’: ’l1’ , ’smo oth_idf ’: T rue , ’max_df ’: 0.25, ’sublinear_tf ’: False , ’analyzer’: ’w or d’ , ’ngram_range ’: (1, 3), ’use_idf ’: T rue} {’ criterion’: ’entr op y’ , ’n_estimators’: 10} SGDClassifier {’max_featur es’: 2000, ’norm’: ’l2’ , ’smo oth_idf ’: False , ’max_df ’: 0.5, ’sublinear_tf ’: False , ’analyzer’: ’w or d’ , ’ngram_range ’: (1, 2), ’use_idf ’: T rue} {’alpha’: 10.0, ’max_iter’: 1} Per ceptr on {’max_featur es’: 2000, ’norm’: ’l1’ , ’smo oth_idf ’: False , ’max_df ’: 0.25, ’sublinear_tf ’: T rue , ’analyzer’: ’char’ , ’ngram_range ’: (1, 3), ’use_idf ’: T rue} {’alpha’: 0.01, ’p enalty’: ’l2’} W or d2V e c Mo dels Any Resistance Strategy Contesting Strategy Estimator Parameters (classifier ) Parameters (classifier ) SGDClassifier {’alpha’: 0.001, ’p enalty’: ’l2’ , ’loss’: ’log’ , ’n_iter’: 1} {’p enalty’: ’l2’ , ’n_iter’: 10, ’loss’: ’log’ , ’alpha’: 0.1} SVM {’kernel’: ’rbf ’, ’C’: 100.0, ’gamma’: 0.0001, ’max_iter’: 1000} {’kernel’: ’rbf ’, ’C’: 1.0, ’max_iter’: 1000, ’gamma’: 0.1} ExtraT re esClassifier {’ criterion’: ’entr op y’ , ’max_depth’: 20, ’n_estimators’: 1000} {’ criterion’: ’entr op y’ , ’max_depth’: 10, ’n_estimators’: 1000} GradientBo ostingClassifier {’learning_rate ’: 0.1, ’min_samples_leaf ’: 10, ’min_samples_split’: 200, ’max_depth’: 10, ’n_estimators’: 1000} {’max_depth’: 10, ’learning_rate ’: 3.0, ’min_samples_leaf ’: 1, ’n_estimators’: 10, ’min_samples_split’: 1000} RandomFor estClassifier {’min_samples_leaf ’: 1, ’max_depth’: 10, ’n_estimators’: 1000} {’max_depth’: 3, ’n_estimators’: 1000, ’min_samples_leaf ’: 2} A daBo ostClassifier {’learning_rate ’: 0.1, ’n_estimators’: 1000} {’learning_rate ’: 0.1, ’n_estimators’: 300} Per ceptr on {’alpha’: 0.001, ’p enalty’: ’l2’ , ’max_iter’: 1000} {’p enalty’: ’l2’ , ’max_iter’: 100, ’alpha’: 0.01} Emp o w ering Strategy Negativ e Affe ct Strategy Estimator Parameters (classifier ) Parameters (classifier ) GradientBo ostingClassifier {’max_depth’: 10, ’learning_rate ’: 0.1, ’min_samples_leaf ’: 10, ’n_estimators’: 100, ’min_samples_split’: 200} {’max_depth’: 5, ’learning_rate ’: 1.0, ’min_samples_leaf ’: 10, ’n_estimators’: 10, ’min_samples_split’: 200} SVM {’kernel’: ’rbf ’, ’C’: 1.0, ’max_iter’: 100, ’gamma’: 1.0} {’kernel’: ’rbf ’, ’C’: 100.0, ’max_iter’: 100, ’gamma’: 1.0} ExtraT re esClassifier {’ criterion’: ’gini’ , ’max_depth’: 20, ’n_estimators’: 10} {’ criterion’: ’gini’ , ’max_depth’: 20, ’n_estimators’: 100} RandomFor estClassifier {’max_depth’: 10, ’n_estimators’: 10, ’min_samples_leaf ’: 2} {’max_depth’: 20, ’n_estimators’: 10, ’min_samples_leaf ’: 2} SGDClassifier {’p enalty’: ’l2’ , ’n_iter’: 20, ’loss’: ’log’ , ’alpha’: 0.0001} {’p enalty’: ’l2’ , ’n_iter’: 50, ’loss’: ’log’ , ’alpha’: 0.0001} A daBo ostClassifier {’learning_rate ’: 3.0, ’n_estimators’: 10} {’learning_rate ’: 1.0, ’n_estimators’: 300} Per ceptr on {’p enalty’: ’elasticnet’ , ’max_iter’: 50, ’alpha’: 0.001} {’p enalty’: ’l2’ , ’max_iter’: 50, ’alpha’: 0.01} T able 9: Best p erforming TF-IDF and W or d2V e c featur es and estimators for first and se cond-le v el classification 12

(14)

Do c2V e c Mo dels Any Resistance Strategy Contesting Strategy Estimator Parameters Parameters SVM {’max_iter’: 50, ’C’: 1.0, ’gamma’: 1.0, ’kernel’: ’linear’} {’kernel’: ’rbf ’, ’gamma’: 1.0, ’max_iter’: 100, ’C’: 10.0} ExtraT re esClassifier {’n_estimators’: 1000, ’criterion’: ’entr op y’ , ’max_depth’: 20} {’max_depth’: 20, ’n_estimators’: 1000, ’criterion’: ’gini’} RandomFor estClassifier {’min_samples_leaf ’: 4, ’n_estimators’: 1000, ’max_depth’: 10} {’max_depth’: 20, ’n_estimators’: 1000, ’min_samples_leaf ’: 2} SGDClassifier {’loss’: ’log’ , ’n_iter’: 10, ’alpha’: 1.0, ’p enalty’: ’l2’} {’loss’: ’log’ , ’alpha’: 1.0, ’n_iter’: 10, ’p enalty’: ’l2’} A daBo ostClassifier {’n_estimators’: 1000, ’learning_rate ’: 0.1} {’learning_rate ’: 0.3, ’n_estimators’: 30} Per ceptr on {’max_iter’: 1, ’p enalty’: ’l2’ , ’alpha’: 0.1} {’alpha’: 1.0, ’max_iter’: 20, ’p enalty’: ’l2’} GradientBo ostingClassifier {’min_samples_leaf ’: 1, ’n_estimators’: 1000, ’min_samples_split’: 200, ’learning_rate ’: 0.1, ’max_depth’: 10} {’max_depth’: 10, ’learning_rate ’: 0.1, ’n_estimators’: 10, ’min_samples_split’: 1000, ’min_samples_leaf ’: 100} Emp o w ering Strategy Negativ e Affe ct Strategy Estimator Parameters Parameters SVM {’kernel’: ’rbf ’, ’gamma’: 1.0, ’max_iter’: 100, ’C’: 1000.0} {’kernel’: ’rbf ’, ’gamma’: 0.1, ’max_iter’: 20, ’C’: 1.0} GradientBo ostingClassifier {’max_depth’: 5, ’learning_rate ’: 3.0, ’n_estimators’: 1000, ’min_samples_split’: 1000, ’min_samples_leaf ’: 1} {’max_depth’: 5, ’learning_rate ’: 3.0, ’n_estimators’: 300, ’min_samples_split’: 200, ’min_samples_leaf ’: 1} ExtraT re esClassifier {’max_depth’: 20, ’n_estimators’: 10, ’criterion’: ’entr op y’} {’max_depth’: 20, ’n_estimators’: 10, ’criterion’: ’entr op y’} RandomFor estClassifier {’max_depth’: 20, ’n_estimators’: 10, ’min_samples_leaf ’: 2} {’max_depth’: 20, ’n_estimators’: 10, ’min_samples_leaf ’: 8} SGDClassifier {’loss’: ’log’ , ’alpha’: 100.0, ’n_iter’: 50, ’p enalty’: ’l2’} {’loss’: ’log’ , ’alpha’: 0.001, ’n_iter’: 20, ’p enalty’: ’l2’} A daBo ostClassifier {’learning_rate ’: 3.0, ’n_estimators’: 10} {’learning_rate ’: 3.0, ’n_estimators’: 1000} Per ceptr on {’alpha’: 0.0001, ’max_iter’: 50, ’p enalty’: ’l2’} {’alpha’: 0.01, ’max_iter’: 10, ’p enalty’: ’l2’} Sentiment Mo dels Any Resistance Strategy Contesting Strategy Estimator Parameters Parameters SVM {’max_iter’: 100, ’C’: 0.1, ’pr obability’: T rue , ’gamma’: 0.1, ’kernel’: ’rbf ’} {’max_iter’: 100, ’C’: 0.1, ’pr obability’: T rue , ’gamma’: 0.1, ’kernel’: ’rbf ’} GradientBo ostingClassifier {’max_depth’: 10, ’n_estimators’: 10, ’learning_rate ’: 3.0, ’min_samples_split’: 1000, ’min_samples_leaf ’: 100} {’max_depth’: 5, ’n_estimators’: 10, ’learning_rate ’: 0.1, ’min_samples_split’: 1000, ’min_samples_leaf ’: 1} RandomFor estClassifier {’max_depth’: 3, ’n_estimators’: 30, ’min_samples_leaf ’: 8} {’max_depth’: 3, ’n_estimators’: 30, ’min_samples_leaf ’: 2} A daBo ostClassifier {’n_estimators’: 10, ’learning_rate ’: 0.1} {’n_estimators’: 10, ’learning_rate ’: 0.1} ExtraT re esClassifier {’ criterion’: ’gini’ , ’n_estimators’: 30} {’ criterion’: ’gini’ , ’n_estimators’: 30} SGDClassifier {’max_iter’: 1, ’alpha’: 0.1} {’max_iter’: 20, ’alpha’: 0.01} Per ceptr on {’alpha’: 0.01, ’p enalty’: ’l2’} {’alpha’: 1.0, ’p enalty’: ’l2’} Emp o w ering Strategy Negativ e Affe ct Strategy Estimator Parameters Parameters SGDClassifier {’max_iter’: 1, ’alpha’: 100.0} {’max_iter’: 10, ’alpha’: 100.0} GradientBo ostingClassifier {’max_depth’: 5, ’n_estimators’: 10, ’learning_rate ’: 3.0, ’min_samples_split’: 1000, ’min_samples_leaf ’: 1} {’max_depth’: 5, ’n_estimators’: 10, ’learning_rate ’: 3.0, ’min_samples_split’: 1000, ’min_samples_leaf ’: 1} SVM {’max_iter’: 100, ’C’: 0.1, ’pr obability’: T rue , ’gamma’: 0.1, ’kernel’: ’rbf ’} {’max_iter’: 100, ’kernel’: ’rbf ’, ’pr obability’: T rue , ’gamma’: 0.1, ’C’: 0.1} A daBo ostClassifier {’n_estimators’: 30, ’learning_rate ’: 1.0} {’n_estimators’: 30, ’learning_rate ’: 1.0} RandomFor estClassifier {’max_depth’: 20, ’n_estimators’: 10, ’min_samples_leaf ’: 1} {’max_depth’: 20, ’n_estimators’: 30, ’min_samples_leaf ’: 1} ExtraT re esClassifier {’ criterion’: ’entr op y’ , ’n_estimators’: 30} {’ criterion’: ’gini’ , ’n_estimators’: 30} Per ceptr on {’alpha’: 0.01, ’p enalty’: ’l2’} {’alpha’: 0.1, ’p enalty’: ’l2’} T able 10: Best p erforming Sentiment and Do c2V e c featur es and estimators for first and se cond-le v el classification 13