Modeling the Language of Populist Rhetoric

(1)

MSC

ARTIFICIAL

INTELLIGENCE

M

ASTER

T

HESIS

Modeling the Language

of Populist Rhetoric

by

P

ERE

-L

LUÍS

H

UGUET

C

ABOT 12345466

March 1, 2021

48 ECTS Nov 19 - Jun 20 Supervisor: Dr. Ekaterina SHUTOVA Cosupervisor: Dr. David ABADI Assessor: Dr. Giovanni COLAVIZZA

INSTITUTE FORLOGIC, LANGUAGE ANDCOMPUTATION

(2)

(3)

iii

UNIVERSITY OF AMSTERDAM

Abstract

Institute for Logic, Language and Computation

Master of Science

Modeling the Language of Populist Rhetoric by Pere-Lluís HUGUETCABOT

In recent years, populism has taken the spotlight with its growth and media presence across various countries worldwide. While socio-economic factors have been con-sidered key in populist attitudes, lately the interaction between emotions and social identity is scrutinized as crucial to explain populist attitudes and its rhetoric. At the same time, Natural Language Processing (NLP) has recently provided computational models that tackle more ambitious tasks, enabling the in-depth study of political discourse and populist rhetoric. In this thesis, we will provide one of the first com-putational approaches into populist attitudes and political discourse through the use of deep learning architectures. We incorporate Multi-task Learning (MTL) with the use of auxiliary tasks that act symbiotically with political discourse. We create a new populism centered dataset (PopulismVsReddit) that enables us to model social identity in social media comments (Reddit) and the influence of biased news. In our work, we observe that metaphors and emotions play an important role when addressing political discourse. Moreover, we found evidence that emotions inter-act with the attitude different social groups receive online and provide significant improvements to identify out-group sentiments in Reddit comments. Overall, we highlight the importance of emotions on political discourse and the use of multi-task approaches that incorporate them to assess social identity and populist rhetoric.

(4)

(5)

v

Acknowledgements

First of all, I want to acknowledge and express my gratitude to both my supervisor Ekaterina Shutova and co-supervisor David Abadi for trusting in me and offering me a project that would spark my interests, academically and personally. Thank you for going the extra mile, especially as under the current circumstances around the COVID-19 pandemic your help and commitment wasn’t affected.

Katia, thank you for the patience and dedication, you helped me to keep on track while encouraging my freedom and curiosity within the research conducted. In chal-lenging times, for someone with many responsibilities you have always given a hu-man factor to your supervision and I really appreciate that. Thank you for your commitment on teaching NLP, it is because of your enthusiasm that I am invested in this ﬁeld.

David, I will remember our lengthy meetings fondly. You have provided me the crucial perspective from different fields (psychology and communication science), while being invested in learning about Natural Language Processing and Artificial Intelligence. Thank you for pushing the collaboration between our fields, which has made this thesis possible.

I want to particularly thank Verna Dankers, who laid the ground of what this thesis is about, ﬁrst by helping to spark my interest in NLP as a TA, and afterwards with her invaluable feedback, ideas and collaboration.

I want to thank anyone else who has provided information, perspectives or ideas that have contributed in any way in this work, either consciously or not, through conversation, sharing a beer or an email. Thank you to the friends back home (Àdel, Gerard, Xavi, etc) for not holding a grudge for going abroad and being a remote moral support.

Finally, I must thank colleagues within our MSc program. Our course has been a great example on how collaboration can be a boost for anyone who takes part in it. And I enjoyed sharing ideas and discussions in our Slack channel.

Thank you Christina for making life easier and happier, those are perhaps the most important aspects for a successful thesis. Finally thank you to my mother, my father and my sister, because even if family isn’t chosen, I would choose them anyways if I had the chance and I feel immensely lucky for having their support.

(6)

(7)

vii

Introduction

Populism has taken the spotlight in political communication in recent years. Various countries around the globe have experienced a surge of populist rhetoric (Inglehart and Norris,2016) in both the public and political space. Populism, when understood as a communication strategy, employs political discourse as a channel through dif-ferent types of media, such as news and social networks (Jagers and Walgrave,2007). Through different platforms, populism uses certain rhetoric that revolves around social identity (Hogg,2016; Abadi,2017) and the Us vs. Them argumentation (Mudde,

2004). Social psychological and emotional perspectives have described populist communication strategies (Rico, Guinjoan, and Anduiza, 2017) and demonstrated its operationalization through experimental research (Wirz et al.,2018) as being suc-cessful in inducing emotions. Moreover, emotions have been shown to be crucial in shaping the public opinion (Demertzis,2006; Marcus, 2002; Marcus, 2003). At the same time, metaphors also serve as a mechanism to inﬂuence public opinion within political discourse (Lakoff,1991; Musolff,2004).

Natural Language Processing (NLP) has a wide range of applications, aiming at

understanding text and perform language processing tasks computationally, which often involve comprehension of complex language, including political discourse. Previous work has proven to be successful in determining the political afﬁliation of politicians using parliamentary data (Iyyer et al., 2014) and the political bias in news sources (Li and Goldwasser,2019; Kiesel et al.,2019). However, there is a lack of computational modeling approaches for populist rhetoric and populist attitudes, the closest approach being hate speech detection (Silva et al.,2016).

1.1 Motivation and Research Questions

This thesis has populism as a focus through the lenses of Natural Language

Process-ing (NLP). Due to the nature of populism as an umbrella term usually interpreted

from different perspectives, such as ideology, political discourse or rhetoric, explor-ing populism leads to several questions regardexplor-ing how to model populist rhetoric computationally.

Our main research question is:

1. How can we capture populist rhetoric using Deep Learning models within Natural Language Processing?

In previous research, emotions have been shown to be tied to populist rhetoric (Demertzis,2006). Recent examples using emotions in multi-task learning models have shown how it can improve performance on other tasks such as metaphor de-tection (Dankers et al., 2019). Certain metaphors (implicit vs. explicit) are used within the populist rhetoric to evoke certain emotional reactions. Since emotions

(10)

have a strong relation to populist attitudes, we aim to explore whether computa-tionally they offer any advantage in MTL setups with populist rhetoric, leading to the research question,

2. How do emotions interact with populist rhetoric and can they contribute to modeling it?

Metaphors are used as mechanisms to engage and convince Flusberg, Matlock, and Thibodeau, 2018, and are ubiquitous within political discourse (Beigman Kle-banov, Diermeier, and Beigman,2008). We expect them to be beneﬁcial in the context of populist rhetoric and we also intend to explore the role of metaphor in multitask learning setups.

3. How do metaphors interact with populist rhetoric and can they contribute to modeling it?

Populist rhetoric adapts according to the communication channel it deploys. Bi-ased news sources and fake news are ways of circulating populist attitudes (Schulz, Wirth, and Müller,2020). Social media interactions show both reactions and spread of populist rhetoric (Engesser et al.,2017; Mazzoleni and Bracciale,2018). Political speeches include populist rhetoric and provide an opportunity to identify populist actors Hawkins et al.,2019. Due to the lack of available data, explicitly referring to populist rhetoric, we intend to explore the discourse using populist rhetoric, and to model political discourse in different media.

4. How to model political discourse or populist rhetoric used in different me-dia?

Social media constitute a crucial platform for the spread of populist rhetoric (Pos-till,2018; Alonso-Muñoz,2019). Biased news sources have been shown to incite pop-ulist rhetoric and are often spread through social media, which act as their ampliﬁer (Speed and Mannion,2017). Reddit is an online social news platform, allowing for such phenomena by creating threads started by posting a news article URL. We in-tend to explore how the populist rhetoric is spread in social media as a reaction to biased news sources, in order to be able to model such communication behavior.

1.2 Methodology and Contributions

New Transformer-based models, started by BERT (Bidirectional Encoder

Representa-tions from Transformers) (Devlin et al.,2018), allowed us to utilize large-scale language models, pretrained in an unsupervised fashion and ﬁne-tune them in new settings and downstream tasks. We base our models in RoBERTa (Liu et al.,2019b), by ﬁne-tuning its pre-trained base model to the different setups and tasks.

Multi-task Learning (MTL) (Caruana, 1993) provides a paradigm within Deep Learning by simultaneously training models in multiple tasks. MTL can provide a novel way to incorporate new information into modeling political discourse and populist rhetoric. Therefore, we aim to utilize data from different sources and MTL approaches with emotions and metaphors as auxiliary tasks. We will explore po-litical bias from popo-litical and mainstream discourse (politicians vs. newspapers) as well as framing in news sources.

We also collect and annotate a new dataset of Reddit comments posted in re-sponse to shared news articles, to study the relation of populist rhetoric and social

(11)

1.3. Thesis Structure 3

identity. We call this new dataset RedditVsPopulism and employ it in an MTL set-ting, in order to exploit the interactions between populist rhetoric, social identity and emotions.

We present the ﬁrst models to jointly learn political discourse related tasks with emotion or metaphor detection. Finally, we provide a model trained simultaneously on emotions and group identiﬁcation, which recognizes the attitude towards a social group, providing a valuable tool to assess the Us vs. Them rhetoric in social media.

We find that MTL provides a successful setup to boost performance on political discourse tasks as well as populist attitude. Moreover, we show the interaction be-tween the auxiliary tasks and the way information is shaped within the network, as well as a qualitative analysis on the results. These insights demonstrate the impor-tance of emotions in populist rhetoric. Furthermore, a data analysis of the annota-tion results reveals the significant differences on the attitudes towards social groups online, as well as the influence of the news source bias on shaping those attitudes.

1.3 Thesis Structure

Chapter2. Provides an overview of background work, which serves to introduce work related to the main body of the thesis and useful context on the topics it re-volves around.

Chapter3. Focuses on political discourse and its use in different media as posed by question4. We explore how MTL can aid in its comprehension by showing to what extent emotions and metaphors play an important role in tasks such as political bias, framing in news or political afﬁliation (questions2and3).

Chapter4. Discusses the creation of a new dataset through crowd-sourced anno-tation due to the lack of available data, which tackles populist rhetoric to train Deep Learning models. We describe the data gathering process, the nature of the data and the design of the annotation task. We analyze the annotation results, which contain valuable information on how populist rhetoric spreads within social media. Our annotation procedure includes emotions closely related to populist attitudes. To the best of our knowledge, this dataset is the ﬁrst of its kind. It will allow us to answer question1and to further explore the relationship between populist rhetoric and emotions (question2).

Chapter5. We deploy the dataset created in the previous chapter to train several

Neural Networks to model the out-group attitude and tackle question1, and we ex-plore the use of MTL setups using emotions (question2) and group identiﬁcation.

(12)

(13)

5

Chapter 2

Related Work

2.1 Neural Networks models in NLP

The recent explosion of Deep Learning approaches to NLP, triggered by word em-beddings and the Skipgram model (Mikolov et al.,2013) has brought a diverse spec-trum of new tasks and models. Convolutional Neural Networks (CNNs) are ubiqui-tous neural networks within Computer Vision, due to their properties as being fairly lightweight due to their shared weight architecture and being translation invariant. As early as 2008 Collobert and Weston, 2008used CNNs in NLP on a Multi-Task

Neural Network that used a CNN to encode sentences to predict 6 different tasks for

a given word. CNNs in NLP are mostly applied in classiﬁcation tasks because of their ﬁxed size output but have been also used in other applications as discussed in Moreno Lopez and Kalita,2017.

So far, Recurrent Neural Networks (RNNs) were the most used networks in NLP. Their ability to capture long-term dependencies in text, a property that CNNs cannot achieve to the same degree, is key in several NLP tasks. Long Short Term Memory (LSTM) cells (Hochreiter and Schmidhuber,1997) and Gated Recurrent Units (GRU) (Cho et al.,2014) extend the idea behind RNNs to avoid common problems as the vanishing gradient.

ELMO (Embeddings from Language Models) (Peters et al.,2018) made use of Bidi-rectional LSTMs trained on a large corpora to get context-aware pre-trained embed-dings, which can be used in other tasks through Transfer Learning. However, RNNs by themselves may fall short to capture very long dependencies due to all the infor-mation being "bottle-necked" in the hidden output of the previous time-step. This leads to a decline in performance for longer sentences.

Attention is a mechanism that computes a linear transformation to hidden

rep-resentations, which is learned from previous states in the network, which allows an explicit way to encode information from previous steps. It was ﬁrst introduced in Bahdanau, Cho, and Bengio, 2015 for an encoder/decoder architecture. Attention allowed to use a linear transformation of the encoder’s hidden representations as context vectors in the decoder, computed for each new word generated in the de-coder, allowing the decoder to pay attention to different units from the encoder at each generation step. Since then, different versions of Attention have been imple-mented with signiﬁcant success.

Yang et al.,2016made use of Hierarchical Attention LSTMs to classify longer texts. In their work, words are ﬁrst encoded using a Bidirectional LSTM (BiLSTM), and then they learn a sentence representation through Attention over the different words, which at the same time are fed to another BiLSTM to be encoded to a document rep-resentation by using Attention once again.

(14)

et al.,2017introduced this type of Attention module as a seq2seq model for transla-tion. The model comprises a stack of multi-head self-attention mechanisms as an en-coder which then used the same number of deen-coder heads, allowing Attention over the output of the encoder. The model achieved state of the art results and inspired a series of models called Transformers.

By training on a masked language modeling using the encoders multi-head self-attention mechanism of Transformers, the authors behind BERT (Devlin et al.,2018) translated the success from Vaswani et al.,2017to generate word and sentence level embeddings to achieve a state of the art performance on several NLP tasks, provid-ing a Transfer Learnprovid-ing model as ELMO did. The model was trained on a massive generic dataset but has been used on several other tasks by ﬁne-tuning it without the need to train from scratch.

2.1.1 Multi-task Learning

Multi-task Learning, or MTL, is the area of ML where a model is trained jointly on

more than one task to either improve the performance in any of those tasks. MTL is based on the principle that the information behind learning multiple tasks can beneﬁt the understanding of those tasks in a better way than if they had been learned independently from each other. The assumption is usually that related tasks can help provide regularization and generalization on the tasks. When we use MTL to improve performance on the main task by using related tasks, we refer to it as auxiliary learning, where the related tasks are considered auxiliary tasks. One of the ﬁrst to point at the relevance of MTL within Machine Learning was Caruana,

1993. By examining the use of MTL in different tasks, he describes in which ways MTL may be helpful. Citing his article, "MTL can provide a data ampliﬁcation effect (1), can allow tasks to eavesdrop on patterns discovered for other tasks (2), and can bias the network towards representations that might be overlooked if the tasks are learned separately (3)."

In his paper Caruana, 1997 he sums up these as the auxiliary tasks providing an inductive bias to improve generalization since the model will be biased to prefer features that are useful across multiple tasks. As he also points out, it is hard to know where and how MTL does help on the main task, even after its beneﬁts are proven. Nevertheless, it is shown to be a useful approach, in particular for tasks where data may be scarce or the tasks may be complex and prove to overﬁt, MTL can be a useful tool.

It is also worth mentioning the similarity between Transfer Learning and MTL. Transfer Learning learns a task where information is then used on other ones, while MTL learns those tasks simultaneously. Its beneﬁts lie on very similar principles of information sharing. The success of Transfer Learning in the recent NLP mod-els discussed in the previous section is at the same time an argument in favor of MTL. Successful Transfer Learning models, such as BERT, (Devlin et al.,2018) use an MTL approach at pre-training to learn different levels of language tasks that help the model generalize better at Transfer Learning once it is ﬁne-tuned on high-level tasks like (Question Answering) or lower ones (SST-2).

In Deep Learning, MTL models can have two types of sharing mechanisms be-tween the tasks. In hard-parameter sharing, the tasks share the hidden network and only the last layers are task-speciﬁc, which give each task’s output. This setup can help overcome overﬁtting issues since the same hidden representations are shared through the network for all tasks. It assumes that tasks are similar enough, such that

(15)

2.2. Populism 7

the shared hidden layers can encode meaningful information for both tasks, bene-fiting from domain information shared between tasks. Then the task-specific layers, known as a critical layer, must perform the task-specific transformation. In some situations, the difference between the tasks may involve having a hierarchical shar-ing of the layers, where the first layers perform a lower level shared task and other tasks have more depth in the network before its critical layer (Søgaard and Gold-berg,2016). This could be the case where one task is performed on a word level, and therefore it has its critical layer earlier in the network, and the other is sentence clas-sification which needs to encode a sentence level representation before performing any inference in its last layer.

In soft-parameter sharing, instead of sharing the network, the information is shared between tasks through sharing mechanisms at different points of the net-work. This can be through exchange of information encoded in their hidden rep-resentations, or having a shared layer between the tasks. The latter difference with hard parameter sharing is that both tasks still have their task-speciﬁc networks, and the shared layer adds shared information from different tasks. For instance, in Liu, Qiu, and Huang,2016, a Coupled-Layer Architecture is used by having task-speciﬁc LSTMs that can share information between tasks.

Within NLP, Collobert and Weston,2008trained a model using hard-parameter MTL on six different tasks, and led to state of the art results at the time. In Dankers et al., 2019, an MTL approach is used to model both emotions and metaphors at different levels. Several MTL approaches are used, from hard-parameter sharing to soft-sharing by using a Cross-Stitch Network (Misra et al.,2016). In Liu et al.,2019aa hard-parameter sharing method across the GLUE (Wang et al.,2018) tasks achieved state of the art results by training all their tasks at the same time and ﬁne-tuning BERT. MTL can be a hard task to tune and successfully train. Chen et al.,2018 pro-posed the GradNorm algorithm which automatically balances training in deep MTL models by dynamically tuning gradient magnitudes. This can be helpful in both soft and hard parameter sharing networks.

Here, we are interested in the interplay between populism, political bias, emo-tions and metaphors. Since the focus is populism and political bias, we consider metaphors and emotions as auxiliary tasks. This is known as Auxiliary Learning, where the main task, here political bias or populism, is improved by the use of aux-iliary tasks, metaphor and emotion detection.

2.2 Populism

Populism can be described from different perspectives. Essentially it is described not as a fully developed political ideology, but as a series of background beliefs and techniques, traditionally centered around the Us vs. Them dichotomy. In one of the ﬁrst attempts to fully deﬁne populism (Mudde,2004), it is described it as a thin ideology around the distinction between "the people", which includes the "Us", and the elites, which includes the Them, with politics being a tool for the people to achieve the common good or will.

Over time, these descriptions have evolved and the understanding of the Us vs.

Them adapted across countries or ideologies. But the general framework in which

populist actors operate can be understood as a populist rhetoric and the use of lan-guage to elicit (emotional) responses and gain support. Previous work to outline populist rhetoric focused on a general description of populism to determine if a

(16)

certain text, such as a party manifesto or a political speech contains what is under-stood as populist rhetoric or attitudes (Hawkins,2009; Rooduijn and Pauwels,2011; Manucci and Weber,2017). Manual annotation was necessary to perform this anal-ysis, often by experts, which also limited the scope and amount of data used. While this work has been crucial to describe and determine how politicians or media has used populist rhetoric, the data gathered is too limited to train machine learning models, in order to capture what populist rhetoric contains. Even further, the sole description of what constitutes populist rhetoric is still diffuse and covers many dif-ferent aspects. In works like Hawkins et al., 2019, it is attempted to use holistic grading to assess whether a text is populist or not, in order to later determine the degree of populism of certain political leaders. This work ended up in creating a

Global Populist Database, which contains political discourses from different countries

labeled by their content of populist rhetoric. It is one of the most ambitious projects to systematically approach the use of populist rhetoric across the globe. However, the issue still lies in the nature of such a project, as labeling the data requires lots of time and expertise to assess extensive texts, resulting in a rather speciﬁc and multi-lingual dataset for a few populist actors, hence being too limited for the purpose of training Deep Learning models.

2.2.1 TheUs vs. Them rhetoric

Social identity explores the relations of individuals to social groups. Turner and Reynolds Turner and Reynolds,2010study the evolution of research into social iden-tity where they explain the Us vs. Them as an inter-group phenomenon, exposing its relation to social identity where the self is hierarchically organized and that it is possible

to shift from intra-group (we) to inter-group (us versus them) and vice versa.

Within the Us vs. Them concept, the inter-group has two levels, the in-group as-similation, where one identiﬁes with a group, through a shared experience or sense of belonging; and the interaction with the out-group, where the outside groups are seen as antagonistic, or contrary to the in-group through identifying them as threats. These aspects are mainly explored within the social identity theory, however here we pragmatically refer to them as the Us vs. Them rhetoric. In our work we will focus on the out-group interaction, assuming the in-group to be implicit by using data from online communities (i.e., sub-Reddits). While there is no explicit out-group senti-ment identiﬁcation work within the context of populism and Deep Learning, there is some work in terms of abusive language detection and hate speech (Silva et al.,

2016), which is closely related.

2.3 Political Bias

Populism is not exclusive to a certain political perspective or side. There is right-wing populism as well as left-right-wing populism, however the main aim of populist rhetoric is to shift the public opinion towards a certain agenda (Alonso-Muñoz,2019; Schroeder,2019; Hameleers and Vliegenthart, 2020). The recent rise of biased me-dia, by the use of hyperpartisan news (i.e., whether it exhibits blind, prejudiced, or unreasoning allegiance to one party, faction, cause, or person), misinformation and fake news, is tied to populism. Moreover, the great majority of fake news are hyper-partisan (Potthast et al., 2018) and are tied to populist communication (Speed and Mannion,2017).

(17)

2.4. Emotions 9

That is why in this work we also explore the political bias behind textual con-tent, such as news sources or political speeches. However, it is not a new task within NLP and has been previously explored. In most cases, data used for the task is obtained by Distant Supervision, where labels are rather obtained indirectly by us-ing the political bias of authors or the publishers than beus-ing determined by labelus-ing textual data. Political speech can be assigned to a certain bias by the party mem-bership of the politician, and news media publishers often have an established po-litical bias. Allsides (AllSides Media Bias Ratings) or Media-Bias/Fact-Check (Search and Learn the Bias of News Media) are common sources to learn about the bias of var-ious news sources. The Convote dataset is one of the ﬁrst examples (Thomas, Pang, and Lee, 2006) of established text datasets to include political bias. It consists of US congressional-speech data, while each speech contains the support of or opposi-tion to a bill of a party spokesman. Along RNNs and word2vec embeddings it was used in the work of (Iyyer et al.,2014) to detect party membership based on political speeches. In the same paper, data from the Ideological Books Corpus (Sim et al.,2013) is used, which contains a collection of books and magazine articles by authors with a publicly known political ideology.

The same issue has been tackled for news media. Recently, SemEval19 (Task 4

Hyperpartisan News Detection) (Kiesel et al.,2019) revolved on the challenge to pre-dict the hyperpartisan argumentation of a given news article. The task involved a dataset containing 645 manually annotated articles for which 238 were labeled as hyperpartisan to create a classification model. The first team (Jiang et al.,2019) con-structed a sentence representation as an average of pre-trained ELMo embeddings of its words and then predicted the label with a five-layer CNN with different filter sizes. The second team (Srivastava et al.,2019) used a set of handcrafted features, such as polarity and bias from lexicons, the use of superlatives and comparatives, as well as semantic features from word and sentence encoders, Glove, Doc2Vec and

Universal Sentence Encoder (Pennington, Socher, and Manning,2014; Le and Mikolov,

2014; Cer et al.,2018).

Li and Goldwasser, 2019use Twitter social information encoded with a Graph

Neural Network combined with a Hierarchical LSTM (HLSTM) to predict the bias of

news articles given by distant supervision by the publishers known bias. To encode the Twitter social information, they used the known bias of public ﬁgures and the users who shared each article to predict the bias of the article itself, which combined with the embedding from the HLSTM, lead to improved results.

2.4 Emotions

2.4.1 Emotions and Populism

Emotions constitute part of the populist rhetoric and have been essential for informa-tion processing as well as the formainforma-tion of (public) opinion among citizens (Marcus,

2002; Götz et al.,2005; Demertzis,2006). While social identity and economical fac-tors have been considered as main indicafac-tors of populist parties growth (Rooduijn and Burgoon,2018), emotional factors have lately become a focus within empirical studies, in particular regarding the reactions and spread of populist views.

Latest attempts to scrutinize populism from the social psychological and emo-tional perspective have described populist communication strategies (Abadi et al.,

2016; Rico, Guinjoan, and Anduiza, 2017) and demonstrated its operationalization through experimental research (Wirz et al., 2018) as being successful in inducing emotions. According to the concept of media populism (Krämer,2014; Mazzoleni

(18)

and Bracciale,2018), media effects can further evoke hostility toward elites and (eth-nic/religious) minorities, because it contributes to the construction of social identi-ties, such as in-groups and out-groups (i.e., Us vs. Them). Emotions have been char-acterized by certain appraisal patterns, i.e. a negative event for which one blames the other is felt as anger - a pattern of appraisals is referred to as Core Relational Themes (Lazarus,2001), which are the central (therefore core) harm or beneﬁt that underlies each of the negative and positive emotions (Smith and Lazarus,1993).

It is important to also distinguish between different types of emotions through these Core Relational Themes, in particular in their relation to populism. For instance, contempt is more strongly associated with illegal and violent actions, while Anger is present in legal protests (Tausch et al.,2011). Furthermore, different emotions may have different roles within the left or right-wing populism, as proposed by Salmela and Scheve,2017, and negative emotions appear to play a bigger role in right-wing populism (Nguyen,2019).

2.4.2 Emotion Classiﬁcation

Emotion classification is closely related to one of the most prominent fields within NLP, sentiment analysis. While sentiment analysis focuses on assessing whether a piece of text is positive or negative, emotion classification focuses on the classifica-tion of a text based on the emoclassifica-tions it contains. Emoclassifica-tions have been described with different scales, the most common one being called Ekman’s Basic Emotions (Ekman,

1992), which encompasses six categories: Anger, Disgust, Fear, Happiness, Sadness and

Surprise.

Other scales translate emotions to a 3-dimensional space, such as the

Valence-Arousal-Dominance (VAD) model (Russell and Mehrabian, 1977). VAD approach states that all emotions can be mapped to a point in a three-dimensional space, which is composed by three independent dimensions; Valence, which encompasses posi-tive or negaposi-tive sentiment; Arousal, which shows the degree of engagement; and Dominance, which indicates the control or dominance over emotions.

These models have been translated into different datasets, in order to assess emo-tions in text. ISEAR model (Scherer and Wallbott,1994) used Ekman’s model, with a single discrete label granted to each text sample (obtained from crosscultural stud-ies in 37 countrstud-ies). Each emotion is roughly equally represented, with 1093-196 samples each.

EmoBank (Buechel and Hahn,2017) is a recent dataset, which includes more than ten thousand manually annotated sentences on each dimension of the VAD model. This dataset contains text from a variety of sources such as headlines, blogs and books.

One of the most common approaches to identify emotions in text is, similarly to sentiment analysis, to use a lexicon. There are various General-purpose Emotion

Lex-icons (GPELs), as explored in (Bandhakavi et al.,2017). For example, EmoLex (Mo-hammad and Turney,2013) has achieved a certain degree of success in identifying emotions within textual data with the use of a lexicon. Similarly, software prod-ucts such as LIWC (Linguistic Inquiry and Word Count) (Tausczik and Pennebaker,

2010) use word frequency and a proprietary dictionary to categorize text, including its emotional content.

SemEval (Semantic Evaluation) conference has hosted emotion-related tasks, such as the SemEval-2007 (Task 14: Affective Text) (Strapparava and Mihalcea,2007), which revolved around the classiﬁcation of emotions in news headlines using Ekman’s

(19)

2.5. Metaphors 11

which improved results over Naive Bayes classiﬁer and Support Vector Machine with an F1 of 33.22%. SemEval-2018 (Task 1: Affect in Tweets) included another emotion-related task, in which Mohammad et al.,2018 a dataset of tweets were labeled for eleven emotions through Crowdsourcing Annotation. Each tweet could be anno-tated for more than one emotion, including intensity and valence, therefore com-prising more than one task within the same dataset. It included Arabic, English and Spanish tweets, with the English portion containing 10,097 tweets.

Recently, Zhang et al.,2018have approached emotion classiﬁcation using a

Multi-task Convolutional Network by introducing the Multi-task of Emotion Distribution Learning,

which consists of mapping each sentence into an emotion-vector, where each dimen-sion represents the intensity of each emotion. This approach achieved state of the art results in some of the previously mentioned datasets like EmoBank and ISEAR.

2.5 Metaphors

2.5.1 Metaphors and Political Language

Often metaphors are considered to be prominent in the political domain (Beigman Klebanov, Diermeier, and Beigman,2008) and known to reﬂect or reinforce a par-ticular viewpoint. For instance, war metaphors are commonly used in political lan-guage, particularly in populist rhetoric. Flusberg, Matlock, and Thibodeau, 2018

discuss the use of such metaphors with historical examples such as the "War on Drugs" by former US-president Ronald Reagan. Moreover, War on Christmas is mentioned as an assault on public displays of Christianity by the political Left and decried by right-wing pundits. This shows an example how out-group threat is com-monly viewed in the Us vs. Them dichotomy and to what extent populist rhetoric is connected to social identity. Using multivocal sense through the use of metaphors reinforces these perceptions, as they can also be used as in-group signals (Albertson,

2014).

"Dog Whistle Politics" refers to the act of using coded or hidden-intended lan-guage aimed at a particular audience to stir support. During his presidency, Ronald Reagan used the cryptic term "Welfare Queens" considered as dog-whistles to middle-class white Americans to gain support and antagonize minorities (Lopez,2013). Dog whistling has lately gained traction in social media where such coded messages are used to appeal and signal to a certain group, such as the Alt-Right. These dog-whistles can be understood as a metaphorical language by the in-group. Therefore, it is hy-pothesized that tasks like hyperpartisan news detection could beneﬁt from being combined with metaphor detection in a MTL setup.

2.5.2 Computational Modeling of Metaphors.

Metaphors can be interpreted at different levels of language. Whether a word has a metaphorical meaning, or a sentence is metaphorical are different tasks and various computational methods have used speciﬁc approaches according to the task. At the same time, identifying when a word or a sentence has a metaphorical meaning is a distinct task than assessing the intended meaning behind the metaphor. The latter is similar to word-sense disambiguation, where the system has to identify which sense of a word is used, and becomes a much more complex task. In this thesis, we will focus on metaphor identiﬁcation as an auxiliary task at the word level. However, any metaphor related task has the relevance of context in common, in order to assess the

(20)

metaphoricity (i.e., the quality of being metaphorical) of language even if it occurs at the word level.

Early approaches used hand-engineered properties that involved the use of su-pervised texts, in order to take advantage of patterns or properties for which metaphors have different behaviors, before using machine learning models trained on those features. Mohler et al., 2013used semantic signatures based on domain concepts extracted from Wikipedia and then trained on a Random Forest classiﬁer. Several computational approaches have been successful in capturing metaphors in different domains as discussed in Shutova,2015and Veale, Shutova, and Klebanov,2016.

Deep Learning systems improved upon handcrafted features and have been the base of most recent computational metaphor detection approaches. Rei et al.,2017

presented the ﬁrst deep learning architecture designed to capture metaphorical com-position. Supervised Similarity Network was inspired by Shutova, Kiela, and Maillard,

2016, where learned word embeddings are compared by using cosine similarity, in-corporating a gating mechanism that modulates the representation, inin-corporating the one from the preceding word.

Gao et al.,2018made use of Bidirectional LSTMs in order to capture context and predict metaphors at word and sentence levels. In recent work of Dankers et al.,

2019, a Multi-Task Neural Network achieved state of the art results on metaphor clas-siﬁcation by using emotions as an auxiliary task.

(21)

13

Chapter 3

Modeling Metaphor and Emotion

in Political Discourse

While the core deﬁnition of populism is still discussed, Mudde, 2004 provides a formal deﬁnition. In his discussion on populism, he poses the question is populism

an ideology, a syndrome, a political movement or a political style?.

Moreover, he deﬁnes populism as an ideology that considers society to be ultimately

separated into two homogeneous and antagonistic groups, the pure people versus the corrupt elite, and which argues that politics should be an expression of the volonté générale (gen-eral will) of the people. In that deﬁnition it is clear there is a rhetorical component

of populism, which uses political discourse as a mechanism (Jagers and Walgrave,

2007). Therefore, in this chapter we focus on different aspects and channels of polit-ical discourse where populist rhetoric takes place and also take into account several prominent tasks there.

The role of metaphor and emotion in political discourse has been investigated in ﬁelds such as communication studies (Weeks,2015; Mourão and Robertson, 2019), political science (Charteris-Black,2009; Ferrari,2007) and psychology (Bougher,2012; Edwards,1999). Political rhetoric may rely on metaphorical framing to shape pub-lic opinion (Lakoff, 1991; Musolff, 2004). Framing selectively emphasizes certain aspects of an issue that promote a particular perspective (Entman, 1993). For in-stance, government spending on the wealthy can be portrayed as a partnership or

bailout, spending on the middle class as simply spending or stimulus to the economy

and spending on the poor as a giveaway or a moral duty, the former corresponding to the conservative and the latter to the liberal point of view (Peters,1988). Metaphor is an apt framing device, with different metaphors used across communities with distinct political views (Kövecses, 2002; Lakoff and Wehling, 2012). At the same time, metaphorical language has been shown to express and elicit stronger emotion than literal language (Citron and Goldberg,2014; Mohammad, Shutova, and Tur-ney,2016) and to provoke emotional responses in the context of political discourse covered by mainstream newspapers (Figar,2014). For instance, the phrase “immi-grants are strangling the welfare system” aims to promote fear of immigration. On the other hand, the experienced emotions may inﬂuence the effects of news framing on public opinions (Lecheler, Bos, and Vliegenthart,2015) and individual variations in emotion regulation styles can predict different political orientations and support for conservative policies (Lee Cunningham, Sohn, and Fowler,2013). Metaphor and emotion thus represent crucial tools in political communication.

At the same time, computational modeling of political discourse, and its speciﬁc aspects, such as political bias in news sources (Kiesel et al.,2019), framing of societal issues (Card et al.,2015), or prediction of political afﬁliation from text (Iyyer et al.,

(22)

this research has incorporated the notions of metaphor and emotion in modeling political rhetoric.

In this chapter we present the first joint models of metaphor, emotion and polit-ical rhetoric, within a multi-task learning (MTL) framework. We make use of aux-iliary learning, i.e. training a model in more than one task to improve the perfor-mance on a main task. We experiment with three tasks from the political realm, predicting (1) political perspective of a news article; (2) party affiliation of politi-cians from their social media posts; and (3) framing dimensions of policy issues. We use metaphor and emotion detection as auxiliary tasks, and investigate whether in-corporating metaphor or emotion-related features enhances the models of political discourse. Our results show that incorporating metaphor or emotion significantly improves performance across all tasks, emphasizing the prominent role they play in political rhetoric.

3.1 Related work

Modeling political discourse encompasses a broad spectrum of tasks, including es-timating policy positions from political texts (Thomas, Pang, and Lee,2006; Lowe et al.,2011), identifying features that differentiate political rhetoric of opposing par-ties (Monroe, Colaresi, and Quinn,2008) or predicting political affiliation of Twit-ter users (Conover et al.,2011; Pennacchiotti and Popescu,2011; Preo¸tiuc-Pietro et al.,2017; Rajamohan, Romanella, and Ramesh, 2019). Deep neural networks have been widely used to model political perspective, bias or affiliation at document level: Iyyer et al. (2014) used a Recurrent Neural Network (RNN) to predict politi-cal affiliation from US congressional speeches. Li and Goldwasser (2019) identified the political perspective of news articles using a hierarchical Long Short-Term

Mem-ory (LSTM) and social media user data modeled with Graph Convolutional Networks

(GCN). Lastly, a recent shared task presented a multitude of Deep Learning methods to detect political bias in articles (Kiesel et al.,2019). Framing in political discourse is a relatively unexplored task. Hartmann et al. (2019) classiﬁed frames at a sentence level using bidirectional LSTMs and GRUs. Ji and Smith (2017) trained Tree-RNNs to classify framing of policy issues in news articles.

Approaches predicting emotions for a given text typically adopt a categorical model of discrete, prototypical emotions, e.g. the six basic emotions of Ekman (1992). Early computational approaches employed vector space models (Danisman and Alpkocak,2008) or shallow machine learning classiﬁers (Alm, Roth, and Sproat,

2005; Yang, Lin, and Chen,2007). Examples of deep neural methods are the recurrent model of Abdul-Mageed and Ungar (2017), who classiﬁed 24 ﬁne-grained emotions, and the transformer-based SentiBERT architecture of Yin, Meng, and Chang (2020).

Computational research on metaphor has mainly focused on detecting metaphor-ical language in text. Early research performed supervised classiﬁcation with hand-engineered lexical, syntactic and psycholinguistic features (Tsvetkov et al., 2014; Beigman Klebanov et al., 2016; Turney et al., 2011; Strzalkowski et al., 2013; Bu-lat, Clark, and Shutova,2017). Alternative approaches perform metaphor detection from distributional properties of words (Shutova, Sun, and Korhonen,2010; Gutiér-rez et al.,2016) or by training deep neural models (Rei et al.,2017; Gao et al.,2018). Dankers et al. (2019) developed a joint model of metaphor and emotion by ﬁne-tuning BERT in an MTL setting.

(23)

3.2. Tasks and Datasets 15

3.2 Tasks and Datasets

Political Perspective in News Political news can be biased towards the left or right wing of the political spectrum. To model such biased perspectives computationally, we classify articles as left, right or center using data from Li and Goldwasser (2019).1 The articles are from the website AllSides2and are annotated based on their source’s bias. The training and test sets contain 2008 and 5761 articles, respectively. We use 30% of training data for validation. The splits are stratiﬁed based on the bias, yet, they do not take into account the news source. This may lead to articles of the same source being contained in different sets, which can lead to some degree of data con-tamination since the labels are based on sources.

Political Afﬁliation For this task, we use the dataset of Voigt et al. (2018)3_{, which}

was created to explore gender bias in online communication. The data comes from different sources; politicians and public figures Facebook posts, responses to TED speakers in TED talks, responses to fitocracy fitness posts and Reddit comments from selected sub-Reddits, while the latter are user-created areas of interest.

In our case, we use the portion of the dataset that contains public Facebook posts from 412 US politicians. The training, validation and test sets contain 9792, 2356 and 2458 posts, respectively. The task is to predict republican or democrat for posts of unseen politicians. The classes are balanced on each set. We perform the sets split such that each set does not include any posts by politicians present in the other sets.

Framing The Media Frames Corpus4(Card et al.,2015) contains news articles dis-cussing ﬁve policy issues: tobacco, immigration, same-sex marriage, gun-control and death penalty. There are 15 possible framing dimensions, e.g. economic, political etc. (see AppendixA,A.2). We use the article-level annotation to predict the framing dimension. Out of 23,580 articles, we use 15% as the test set, and 15% of the train-ing data for validation. The use of framtrain-ing is based on the deﬁnition provided by Entman (1993). Articles are labeled according to how they frame a topic.

Metaphor For metaphor detection we use the VU Amsterdam dataset (Steen et al.,

2010), which is a subset of the British National Corpus (Leech, 1992). The dataset contains 9,017 sentences and binary labels (literal or metaphorical) per word. We use the data split of Gao et al. (2018), that includes 25% of the sentences in the test set.

Emotion For emotion classiﬁcation, we use a dataset from SemEval-2018 Task 1 (Mohammad et al.,2018), in which tweets were labeled for eleven emotion classes or as neutral (see AppendixA,A.2). We use the English portion of the dataset (10,097 tweets) and the shared task splits.

3.3 Methods

We employ the Robustly Optimized BERT Pretraining Approach (RoBERTa-base) pre-sented by Liu et al. (2019b) through the library provided by Wolf et al. (2019)5_.

1_{https://github.com/BillMcGrady/NewsBiasPrediction} 2_{https://www.allsides.com/unbiased-balanced-news} 3_{https://nlp.stanford.edu/robvoigt/rtgender/}

4_{https://github.com/dallascard/media_frames_corpus} 5_{https://huggingface.co/transformers}

(24)

Framing Political Perspective in News

Legality, constitutionality

and jurisprudence 4920 Left 3931

Political 4762 Center 4164

Crime and punishment 2187 Right 2290

Policy prescription

and evaluation 2116 VUA metaphors 200K words

Cultural identity 1633 Metaphors 11.6%

Economic 1536 SemEval Emotions 10,097 Tweets

Health and safety 1380 Anger 36.1%

Public opinion 1122 Anticipation 13.9%

Quality of life 1084 Disgust 36.6%

Fairness and equality 849 Fear 16.8%

Morality 811 Joy 39.3%

Security and defense 609 Love 12.3%

External regulation

and reputation 312 Optimism 31.3%

Capacity and resources 249 Pessimism 11.6%

Other 10 Sadness 29.4%

Political Afﬁliation Surprise 5.2%

Republican Party 8132 Trust 5%

Democratic Party 8132 Neutral 2.7%

TABLE3.1: Dataset contents.

RoBERTa contains twelve stacked transformer layers and assumes an input sequence to be tokenized into subword units called Byte-Pair Encodings (BPE). A special <s> token is inserted at the beginning of the input sequence to compute a contextualized sequence representation.

Our tasks are defined at three levels of the linguistic hierarchy. The auxiliary tasks of metaphor detection and emotion prediction are defined at word and sen-tence level, respectively, while the main political tasks are defined at document level. For word-level metaphor identification, the subword encodings from RoBERTa’s last layer are processed by a linear classification layer. A word is considered metaphor-ical provided that any of its BPEs was labeled as metaphormetaphor-ical. We assume the BPE from inflections unlikely to cause a word to be incorrectly labeled as metaphorical. Figure3.1visualizes metaphor detection at its right side.

For the sentence-level emotion prediction task and the document-level tasks of political affiliation and framing, the <s> encoding serves as sequence representation and is fed to a linear classification layer. For political perspective in news, the av-erage document length exceeds the maximum input size of RoBERTa. Therefore, we split its documents into sentences and collect them in a maximum of 5 subdoc-uments with up to 256 subwords. After applying RoBERTa to the subdocsubdoc-uments, their <s> encodings are fed to an attention layer yielding a document representation to be classified. Figure3.1depicts the classification of sentences or short documents at the right, and the processing of longer documents at the left.

All task models use the cross-entropy loss with a sigmoid activation function. For the political perspective detection, the loss function includes class weights to account for class imbalance.

(25)

3.4. Experiments and Results 17

3.3.1 Multi-task Learning

The MTL architecture uses hard parameter sharing for the first eleven transformer layers. The last layer of RoBERTa, the classification and attention layers are task-specific to allow for specialization, similar to the approach of Dankers et al. (2019).

The main political tasks are paired with the metaphor and emotion tasks one by one. The task losses are weighted with α for the main task and 1−α for the auxiliary task. We include an auxiliary warm-up period, during which α = 0.01, for some tasks. This allows the model to initially learn the (lower-level) auxiliary task while focusing mostly on the main task afterwards. This approach is similar to Kiperwasser and Ballesteros (2018).

3.4 Experiments and Results

3.4.1 Experimental Setup

Forα values intervals of 0.1 were tried. Also to set the warm-up period on scheduled learning, 3,4 or 5 epochs were tried. For the political afﬁliation task, 0.1, 0.2 and 0.3 were tested for Dropout probabilities. The hyperparameters were chosen by manual tuning based on the accuracy score on the validation sets.

Hyperparameters that were shared between MTL and STL for the same main task were selected based on the performance on STL.

The models are trained with the AdamW optimizer, a learning rate of 1e−5 and a batch size of 32. The learning rate is annealed through a cosine-based schedule and warm-up ratios of 0.2, 0.3 and 0.15 for the political perspective in news, the political

FIGURE3.1: Schematics of the MTL model. The left side shows the path for longer documents from the Political Perspective in News dataset, while the right side is the path for the rest of datasets and

(26)

Framing Afﬁliation Perspective Li and Goldwasser (2019)

- HLSTM (text-based) - - .746

- GCN-HLSTM (using social information) .917

STL .707 .794 .848

MTL, Metaphor .716 .805 .854

MTL, Emotion .708 .802 .860

TABLE3.2: Accuracy scores for the main political tasks. Signiﬁcance compared to STL is bolded (p<0.05).

Perspective Afﬁliation Framing

STL .832 .804 .699

MTL, Metaphor .835 .804 .703

MTL, Emotion .838 .811 .704

TABLE3.3: Accuracy validation scores for the main political tasks.

afﬁliation and the framing tasks, respectively. Dropout is applied per layer with a probability of 0.3 for political afﬁliation and 0.1 otherwise.

The auxiliary warm-up period and α values are estimated per main task, for metaphor (αM) and emotion (αE) separately. For political perspective in news,αM =

0.7,αE = 0.8, and models were trained for 20 epochs, with early stopping. Within

political afﬁliation prediction,αM =αE =0.9 and the ﬁrst 5 epochs are for auxiliary

warm-up. The models were trained for 20 epochs total. For the framing taskαM =

αE = 0.5, with 5 epochs of auxiliary warm-up for metaphor. Training lasted 10

epochs at most, with early stopping.

We average results over 10 random seeds. We perform signiﬁcance testing using an approximated permutation test and 10 thousand permutations. Our work used

Pytorch and the Huggingface library to load the pretrained models and train on the

MTL. Some code was adapted from utils_nlp6library for training. Data splits and code are attached with the submission.

3.4.2 Results

Table3.2summarizes our results. For political perspective in news, the STL model improves over the text-based method of Li and Goldwasser (2019). This illustrates that RoBERTa provides an enhanced document encoding for predicting political perspective. Both MTL setups significantly improved over the STL model. Joint learning with emotion proved most beneficial and outperformed the metaphor de-tection setup, significantly. While neither outperformed the GCN-HLSTM model, that model used social information that already outperformed by itself the text-only models in their original work, and requires such information to exist for the articles. For political affiliation prediction, both MTL setups improve over STL signifi-cantly, although there is no significant difference between them. Although we do not have any previous work on this dataset for political affiliation, the performance is on par of previous work for the task.

(27)

3.5. Discussion 19

In case of the framing task, joint learning with metaphor signiﬁcantly outper-formed STL. MTL using emotion, on the other hand, yielded results on par with STL.

n Document Piece Gold Label MTL, Metaphor STL

1 . . . the anger simmering just below the surface in the U. S. is beginning to boil over. Right Right Centre 2 . . . and DNA evidence does not match. What once was considered an airtight case, Fairness and Fairness and Leg., Constit.,

Devine said, has evaporated into nothing Equality Equality Jurisdiction 3 . . . border security long have been sticking points in the immigration debate. Security Security Political

Bowing to those concerns, Presidents Bush . . . & Defense & Defense

TABLE 3.4: Political perspective (1) and framing (2, 3) examples of metaphor-MTL improving over STL. Underlined are words predicted

as metaphorical.

3.5 Discussion

Political Perspective in News For the political perspective task, the performance improvements of MTL models stem mostly from improved predictions for the

right-wing class. Example 1 of Table 3.4 presents an emotive article snippet containing the metaphors of “boil over” and “simmering anger”, for which joint learning with metaphor corrected the STL prediction. Table3.6presents a breakdown of the per-formance per class. In both MTL models, there was an improvement for right-wing articles. It is worth to mention that it was the less represented class at training, which may indicate how MTL helped overcome issues of data imbalance, without hurting performance on the other classes. For instance Wu, Wu, and Liu (2018) successfully used an MTL approach to improve performance on unbalanced sentiment analysis tasks.

Political Afﬁliation Improvements from auxiliary tasks are due to a more accurate identiﬁcation of the class of democrats. According to Pliskin et al. (2014), liberals are

FIGURE3.2: Average performance accross the political spectrum for

(28)

Anger Anticipation Disgust Fear Joy Love Optimism Pessimism Sadness Surprise Trust

Democrat 34.0% 42.9% 42.2% 23.1% 61.9% 73.6% 54.0% 82.5% 76.4% 75.4% 41.6%

Republican 66.0% 57.1% 57.8% 76.9% 38.1% 26.4% 46.0% 17.5% 23.6% 24.6% 58.4%

TABLE3.5: Proportion of posts predicted for each emotion, using the best-performing emotion-MTL model.

STL Metaphor Emotion Political Perspective - Center .874 .879 .885 - Left .860 .863 .871 - Right .774 .784 .798 Political Afﬁliation - Democrat .788 .806 .799 - Republican .802 .805 .800 Framing - Economic .747 .759 .758

- Capacity and Resources .601 .604 .602

- Morality .646 .662 .648

- Fairness and Equality .502 .527 .511

- Crime and Punishment .719 .721 .717

- Security and Defense .554 .577 .560

- Health and Safety .683 .694 .684

- Quality of Life .572 .554 .556 - Cultural Identity .690 .703 .695 - Public Sentiment .670 .678 .675 - Political .808 .815 .812 - Legality, Constitutionality .787 .795 .784 and Jurisdiction - Policy Prescription .525 .538 .530 and Evaluation - External Regulation .695 .675 .681 and Reputation

TABLE3.6: Average F1 for each class and task.

more susceptible to emotions, which could in part explain this result. Figure 3.2 visualizes the performance across the political spectrum, from which we infer that politicians at the center are harder to distinguish, and those on the left are better identiﬁed by our MTL models. We explored the emotions predicted by the MTL model in politicians’ posts, as shown in Table3.5. We found that emotions typically associated with conservative rhetoric – e.g. anger, disgust or fear (Jost et al.,2003) – were more frequent in republicans’ posts. On the contrary, emotions associated with liberals – e.g. love (Lakoff,2002) or sadness (Steiger et al., 2019) – are more often predicted for democrats’ posts. Table3.7 contains example posts where joint learning using emotion corrected the STL setup, where the emotions predicted align with emotions usually associated to each bias.

Framing In case of the framing task, MTL with metaphor prediction yielded the largest improvements for the frames of security and defense, morality and fairness

(29)

3.6. Conclusion 21

Emotions Facebook Post Gold Label Emo. MTL STL

Anger, Disgust and Fear

Last week, I held a Congress on Your Corner event in Frankfort. Monica was upset by the recent deal between the United States, our global partners, and Iran. The deal provides $7 billion in sanction relief in exchange for Iran limiting, but not halting, its nuclear activities. I am skeptical of this deal. In the words of my friend Eric Cantor, I believe we must distrust and verify in this case. I believe

it is imperative that we stand with Israel against the very dangerous threat posed by Iran’s nuclear activities. I do not believe that Iran has given us any reason to trust that it will not continue pursuing nuclear weapons.

Republican Republican Democrat

Love, Joy and Optimism

I’ll be spending most of my day tomorrow opposing Paul Ryan’s cuts-only budget in committee. In the name of deﬁcit reduction, Mr. Ryan is once again proposing to eliminate one of the few pieces of good news we have in reducing healthcare costs that are driving the deﬁcits: Obamacare (aka, the Affordable Care Act). We should be expanding its reforms, not trying to repeal them. For example, the CBO estimates that adding a public plan option to the health insurance exchanges would save another $88 billion and that the plan would have premiums 5-7% lower than private plans, which would increase competition in the marketplace and result in substantial savings for individuals, families, and employers purchasing health insurance through an exchange.

Democrat Democrat Republican

TABLE3.7: Examples where emotion-MTL improved the predictions over STL.

and equality, particularly in articles on the metaphor-rich topics of immigration, gun-control and death penalty. We excluded the Other category from3.6as it was misclassi-fied for all models, having only 2 samples in the test set. We automatically annotated metaphorical expressions in these articles to conduct a qualitative analysis. We ob-serve that correct identification of linguistic metaphors often accompanies correct frame classification by the MTL model. Examples of such cases are shown in Table 3.4. In Example 2, metaphors such as “airtight case" and “evaporated", aided the model to identify the fairness and equality framing within the topic of death penalty. Similarly, presenting border security in Example 3 as a “sticking point in the immi-gration debate” improved the classification of the security and defense framing of an article on the topic of immigration. Table3.8presents detailed results per policy issue. While emotions did not improve results significantly overall, it showed the biggest improvement over Gun-control related articles, which can be emotionally charged. The metaphor-MTL showed an improvement on all policies except for articles re-lated to same-sex marriage, indicating how closely rere-lated metaphors and framing are, with metaphors often referred as a framing mechanism.

Single Metaphors Emotions

Immigration 0.689 0.700 0.686 Tobacco 0.718 0.721 0.717 Death Penalty 0.690 0.704 0.690 Gun Control 0.704 0.717 0.711 Same Sex Marriage 0.744 0.741 0.745

TABLE 3.8: Average accuracy values across different policies for

Framing.

3.6 Conclusion

In this chapter, we introduced the ﬁrst joint models of metaphor, emotion and polit-ical rhetoric. We considered predicting the politpolit-ical perspective of news, the party

(30)

affiliation of politicians and the framing dimensions of policy issues. MTL using metaphor detection resulted in significant performance improvements across all three tasks. This finding emphasizes the prevalence of metaphor in political discourse and its importance for the identification of framing strategies. Joint learning with emo-tion yielded significant performance improvements for the political perspective and affiliation tasks, which suggests that the use of emotion is an important political tool, aiming to influence public opinion. Future research may explore further tasks such as emotion and misinformation detection, which social scientists have found to be inter-related, and deploy more advanced MTL techniques, such as soft parameter sharing. Our code and trained models are going to be publicly available at a future time.

(31)

23

Chapter 4

PopulismVsReddit: A Dataset on

Emotions, News Bias and Social

Identity

In general, there is a lack of datasets, intending to tackle populist rhetoric. As seen in Chapter2, the few existent datasets like the Global Populist Database (GPD)1 _{are not}

ﬁt to train a Deep Learning model. Populist rhetoric includes a range of properties and argumentations that are used across the political spectrum. Manichean Outlook,

Anti-Elitism and People-Centrism are discussed in Castanho Silva et al.,2019, where populism is measured as an attitude. To model these aspects computationally can be challenging from a Natural Language Processing (NLP) perspective, where the text is the subject rather than a respondent of a survey or an individual in an experimental setting. In case these sub-scales of populist rhetoric are measured in text such as political discourses or party manifestos (Hawkins et al.,2019; Hawkins and Silva,

2015), they need complex annotation procedures as well as expert annotators, which limits the scope of the data to annotate, or requires assessing extensive pieces of text to get a single metric.

Instead of attempting to capture the whole complexity of populist rhetoric, we choose to focus on a speciﬁc aspect within its nature. As discussed in the litera-ture review, populism has traditionally pivoted around social identity and the Us vs.

Them dichotomy. While other aspects are relevant, the in-group/out-group aspect of

populism is common across its different attitudes. In Anti-Elitism, the Them concept encompasses the so-called Elites as the out-group. In People-Centrism, the people be-come the focus as the Us, the in-group. Finally, for Manichean Outlook it is present behind the moral simpliﬁcation of both groups as good (Us) or bad (Them), and the rivalry between both.

Nevertheless, addressing the Us vs. Them conceptualization through NLP and Deep Learning involves some legitimate questions. Since we decide to tackle the task with an annotation procedure, we need to ﬁnd a methodical/systematic ap-proach and reduce it to a single question. Similar apap-proaches have been successful in addressing emotional content in political discourse (Redlawsk et al., 2018). We also need to decide the nature of the data to annotate and how to ensure those com-ments refer to speciﬁc groups.

To address the connection between emotions and populism we will also address them in the annotation procedure, with a special focus on those usually associated with Populist Attitudes. At the same time, given the relevance of news in the spread of such attitudes, through distant supervision, we will tackle news bias and its relation to the Us vs. Them rhetoric.

(32)

In this chapter, we will describe all the steps to create such dataset and present it as the PopulismVsReddit dataset. Next, we will analyze its contents using statistical methods, in order to assess them and to arrive at a conclusion.

4.1 Dataset Creation

4.1.1 How to annotate populist rhetoric?

We decided to use social media data for our task. This decision is motivated due to the lack of existing datasets that focus on social media data and its relevance for the spread of populist rhetoric. At the same time, the nature of comments in social media is usually short to medium length, which makes them ﬁt for crowd-sourced annotation. As discussed in chapter3, political discourse, such as parliamentary data, has already been previously explored to detect political bias. For similar rea-sons, we discarded the idea of annotating news articles, however we do consider them indirectly into our approach.

There are different precedents on how to tackle populist rhetoric - from expert annotation on speciﬁc items appearing in a text to holistic grading schemes that compare different texts. The ﬁrst method describes a general interpretation of a text in the context of populist rhetoric but not a single label or value tied to populist rhetoric to train a model.

Holistic grading provides a grade or score on the amount of populist rhetoric a text contains (Hawkins and Silva,2015). Holistic grading of a complex concept like populist rhetoric can be expensive to annotate and very susceptible to the interpre-tation of the annotators.

We focus on the Us vs. Them aspect of populist rhetoric. The identiﬁcation of an out-group as a threat is easy to identify and commonly spread in online media. While hate speech or abusive language can be mechanisms for such Us vs. Them rhetoric, they do not solely cover its use from the social identity perspective.

It is difﬁcult to identify the in-group aspect from a single online comment. On-line social media have clusters of onOn-line communities, often leading to bubbles of perception. This situation is sometimes described as an echo chamber (Barberá et al.,

2015) where social media users seek out information that conﬁrms their preexisting beliefs, hence increasing political polarization and extremism. By annotating com-ments that refer to an out-group we can monitor how they are targeted in online discussions and whether the text shows a positive or negative attitude towards that social group, ranging from support to discrimination. This does not ensure success-fully capturing the complexity behind the Us vs. Them rhetoric, however we deploy a tool to detect comments directed at certain groups (out-groups) within an online community (in-group) and the attitude towards them. We restrict this to speciﬁc groups that populist rhetoric has targeted as an out-group.

1. Immigrants 2. Refugees 3. Muslims 4. Jews 5. Liberals 6. Conservatives

While these groups have been often targets of populist rhetoric, we emphasize this is far away from being a complete list2_{. At the same time, some of these labels} 2_{Our list was decided upon in December 2019, hence before the murder of George Floyd gave rise to}

Modeling the Language of Populist Rhetoric

MSC

ARTIFICIAL

INTELLIGENCE

M

T

Modeling the Language

of Populist Rhetoric

P

-L

H

C

March 1, 2021

Abstract

Acknowledgements

Contents

Chapter 1

Introduction

1.1

Motivation and Research Questions

1.2

Methodology and Contributions

1.3

Thesis Structure

Chapter 2

Related Work

2.1

Neural Networks models in NLP

2.2

Populism

2.3

Political Bias

2.4

Emotions

2.5

Metaphors

Chapter 3

Modeling Metaphor and Emotion

in Political Discourse

3.1

Related work

3.2

Tasks and Datasets

3.3

Methods

3.4

Experiments and Results

3.5

Discussion

3.6

Conclusion

Chapter 4

PopulismVsReddit: A Dataset on

Emotions, News Bias and Social

Identity

4.1

Dataset Creation