Political Self-Presentation on Twitter Before, During and After Elections

(1)

Political Self-Presentation on Twitter

Before, During and After Elections

A diachronic analysis of political twitter data with

predictive models

Harm-Jan Setz Master Thesis Information Science Harm-Jan Setz 2381311 September 10, 2019

(2)

A B S T R A C T

During election time the behaviour of politicians changes. This study investigates behavioural changes of politicians on Twitter by using the accuracy of pre-trained language models as a proxy for measuring diachronic change in self-presentation. A data-set containing all tweets written by candidates for the Dutch parliamentary election of March 2017 from October 2016 till July 2017 was collected. This data-set contains a total of 567.443 tweets written by 686 politicians. A variety of dimensions was used to represent self-presentation in Twitter data. Predictive algorithms were then trained and tested on these dimensions in a variety of ways, to see how the accuracy of these models changes before, during and after election time. This study proves that this quantitative approach can provide insight into diachronic change in behaviour on a macro level.

(3)

C O N T E N T S

Abstract i

Preface iv

1 _introduction 1

2 _{theoretical background and related work} 3

2.1 Social science . . . 3

2.2 Natural language processing . . . 4

3 _approach 6 3.1 Dimensions . . . 6 3.1.1 Party . . . 6 3.1.2 Gender . . . 7 3.1.3 Incumbency . . . 7 3.1.4 Age . . . 7 Four-way classification . . . 7 Binary classification . . . 7

3.1.5 Chance of being chosen . . . 7

3.1.6 Political position: Left-right . . . 7

3.1.7 Political position: Progressive-Conservative . . . 8

3.2 Time . . . 8

3.2.1 Experiment 1: Monthly train and test . . . 8

3.2.2 Experiment 2: Weekly train and test . . . 8

3.2.3 Experiment 3: Month(train) to month(test) . . . 9

4 _data 10 4.1 Existing . . . 10 4.2 Collected . . . 10 4.3 Filtering . . . 11 4.4 Distributions . . . 11 5 _experiments 13 5.1 Classifier . . . 13 5.2 Pre-processing . . . 13

5.2.1 Use parties method . . . 13

5.2.2 Merge tweets method . . . 14

5.2.3 Equal data method . . . 14

5.3 Other Settings . . . 14

5.3.1 Split data method . . . 14

5.3.2 Cross-validation . . . 14

5.4 Unbalanced Data . . . 14

5.5 Features . . . 15

5.5.1 Word Embeddings . . . 15

5.5.2 Best Features . . . 15

6 _{results and discussion} 16 6.1 Results per experiment . . . 16

6.1.1 Experiment 1: Monthly testing . . . 16

6.1.2 Experiment 2: Weekly testing . . . 16

6.1.3 Experiment 3: Month to month testing . . . 16

6.2 Results per dimension . . . 18

(4)

CONTENTS iii

6.2.1 Political position dimensions(Party affiliation, Left-Right,

Progressive-Conservative) . . . 18

6.2.2 Gender . . . 18

6.2.3 Incumbency . . . 20

6.2.4 Chance . . . 20

6.2.5 Age & Age(bin) . . . 20

6.3 General discussion . . . 21

6.4 Other interesting findings . . . 21

7 _conclusion 22 7.1 future work . . . 22

(5)

P R E F A C E

A little while ago a post popped up on my time line about a woman who did a maternity photo shoot with her dissertation. The rational being that the ’birth’ of her thesis was the longest and hardest labor she ever did and that it should be commemorated as such. Although obviously a joke, I share her sentiment that bar-ing a thesis can be a long and sometimes painful process. However, the end result can bring you a lifetime of happiness and an added bonus is that apparently one quickly forgets about the pain after delivery. Therefore, I am proud to present to you, my very own little one.

I want to thank Malvina Nissim for helping me nurse my brainchild from a small idea into a full grown thesis that is ready to take on the world. Furthermore, I would like to thank Marcel Broersma for giving me tips and a lot of literature on how to manage the social side of my little one. I would also like to thank all the other teachers from the information science master at the University of Groningen for sharing their wisdom and know-how with me during the past year. As they say: ’It takes a university to raise a thesis’.

Jokes aside, I thoroughly enjoyed applying my knowledge, discovering new insight and the slow but steady emergence of this thesis.

I really hope you enjoy reading it, like I enjoyed seeing it all come together! All the best,

Harm-Jan Setz

Groningen, July 9th, 2019.

(6)

1

_{I N T R O D U C T I O N}

I’ll be glad to reply to or dodge your questions, depending on what I think will help our election most. - George H. W. Bush

As is illustrated well by this quote by a former U.S. president, politicians tend to change their behaviour based on what they feel to be beneficial for future election results. During political campaigns, pressure on politicians to stay in the public eye is high. Parties need to be present to make the public aware of their plans and view for the future, and to do so, politicians present themselves more promi-nently to get the attention of the public. The present study will investigate how the presentation of political candidates during election time differs from self-presentation during non-election time. We can define the act of self-self-presentation as "to convey an impression to others which is in one’s interest to convey" (Goffman, 1959).

The assumption that is made in this research is that the self-presentation of political candidates has a large impact on the course of an election. The worthiness of a political candidate is not only determined by their plans for the future, but also by their perceived likability and trustworthiness (Schütz,1995). Political candidates become entertainers during election time. They make appearances on TV and radio shows, make funny or bold speeches or share their private life with the public. Often they present themselves as being an average likeable person, who wants to change the world for the better. Political candidates that succeed in portraying this image are often successful in winning approval (Schütz, 1995). Therefore, it is reasonable to assume that politicians emphasize certain parts of their character during election time while suppressing other parts, to appear more likeable and sway public opinion in their favor. The interest of this study is to find out how politicians do this.

Twitter is uniquely suitable for answering these kinds of questions, since the medium functions as a direct link between the public and political candidates. A candidate can directly communicate his or her opinion on public affairs and news to their audience at any time through the platform. Twitter can, therefore, be seen as a direct channel for self-presentation without intervention or filtering by other factors. Twitter also facilitates direct communication with the press and there is a growing trend among journalists to incorporate tweets in their coverage of political events (Jungherr,2016). Therefore, Twitter has become a vital part of the campaign reper-toires of parties and candidates. There are currently very few serious candidates that don’t use Twitter as a means to communicate with the public.

Therefore, it can be stated that Twitter holds a treasure of information on social and political behaviour. The challenge is to extract this potential treasure from the data, in a useful and reliable way. With regard to this study, two questions are vital to solve this challenge:

1. How can self-presentation of political candidates be represented in Twitter data?

2. How do we create reliable measures to find and analyze changes in self-presentation on Twitter over time?

These two questions have been vital in the development of the approach that was used in this study. The study uses the accuracy of predictive models, which are trained on various author profiling dimensions, as a proxy for measuring change in behaviour. Self-presentation of politicians is represented by the eight dimensions

(7)

introduction 2

that are used to train the models. The change in accuracy of these models over time is then used to analyze how self-presentation changes. With this approach, this study aims to shed light on how politicians change their self-presentation during an election cycle.

The following central research question was devised as a guideline for this study: ’How does the self-presentation of political candidates change diachronically during an election cycle?’ To help answer this question, the following sub-questions were defined:

1. To what extent can we build models that predict different aspects of political candidates based on their twitter messages?

2. How does the diachronic accuracy of these models change before, during and after election time?

3. What do the differences in accuracy tell us about self-presentation of politi-cians during an election cycle?

(8)

2

_{T H E O R E T I C A L B A C K G R O U N D A N D}

R E L A T E D W O R K

Over the last ten years, research on Twitter has exponentially grown in popularity among scholars. This is largely due to the wealth of data that is available through the platform. In principle, any user interaction with Twitter is documented and accessible for researchers through scraping or the use of the Twitter API. This easy accessibility of large amounts of user data makes Twitter ideally suited for research at the intersection of social- and computer science (Jungherr,2016). Since this study takes place on the intersection of social science and natural language pro-cessing(NLP), a brief description of relevant theory and related work on both fields, related to the task at hand, will be offered here.

2.1 social science

In social sciences, a lot of research has been done into how certain personality traits affect a political candidate. This research can give insight into how politicians change their behaviour during elections time. In general, the volume of messages referring to politics tends to rise toward the end of a campaign (Jungherr, 2016). As we can see in Table 3, this is also clearly the case for tweets written by Dutch politicians. We can safely assume that this rise is due to the political campaign of the candidates. Therefore, it is reasonable to expect that a rise in accuracy for predicting the party affiliation of candidates around election time might occur.

An important and often emphasized facet of a politician is their ideological stance on important issues. Election campaigns are all about parties and candidates communicating to the public what they stand for. According to the literature, ideo-logical party position tends to shift when public opinion shifts away from the party (Adams et al.,2004) and in reaction to changing global economic conditions (Adams et al.,2009). Therefore, how parties and politicians change their ideological stance during election time is inherently connected with the performance of a party and changes in its environment. Since candidates tend to emphasize their ideological position during election time, the expectation is that the predictability of ideological stance will rise during election time.

Another interesting characteristic of a politician, which the literature suggests can be of influence on their behaviour, is whether or not a politician is an incum-bent. The literature suggests that incumbents have a clear ’incumbency advantage’ over non-incumbents to be elected(Levitt and Wolfram,1997). This is due to name recognition and general experience, both of which non-incumbent often lack. Non-incumbents have to compensate for this dis-advantage in their campaign strategy somehow. It can be hypothesized that non-incumbents will try to bridge this dis-advantage by portraying themselves more like incumbent, or by imitating the style of incumbents. If this hypothesis holds true, a fall in accuracy for the predictability of incumbency can be expected during election time, since incumbents and non-incumbents will behave more similar.

Research also shows some interesting findings on how personal characteristics can influence the behaviour of a political candidate. For instance, Lee and Lim (2016) found that during election time, female politicians emphasize their mascu-line traits while downplaying their feminine qualities. This is because stereotypical female traits like ’compassion’, ’understanding’ and being ’family oriented’ gen-erally work to a candidates disadvantage (Meeks, 2012), since these traits are not viewed as desirable for a politician. If female politicians suppress their female traits

(9)

2.2 natural language processing 4

and emphasize their male traits, it could be hypothesized that gender will be hard to predict with politicians in general and even harder to predict during election time.

A study byAlesina et al.(2015) found that young politicians often spend more money in pre-election years and attributed this to young politicians having stronger career concerns incentives than older politicians. Young politicians being more fear-ful for their position, can also be responsible for them adopting newer or bolder strategies and techniques. This is backed up by the fact that young politicians ap-pear to be more likely to use Twitter than older politicians (Jungherr,2016). There-fore, how a politician emphasizes or suppresses their age, or age-related language use during election time can also be an interesting facet to study. However, it is hard to articulate a clear expectation of how the predictability of age will change during election time.

2.2 natural language processing

In the field of natural language processing, Twitter data is very popular for creat-ing predictive political models. For instance,Sang and Bos (2012) used sentiment analysis to predict the Dutch Senate elections of 2011 with fairly good results, and Burnap et al.(2016) predicted the parliamentary election of the United Kingdom with a similar approach. A large part of the approaches in these studies is based on identifying traits of an author, which is known in computer science as ’author profiling’.

Author profiling aims to discover as much information as possible about an author by analyzing texts written by that person (Weren et al.,2014). Author pro-filing distinguishes classes of authors, for instance, ’male’ and ’female’ in gender profiling, and classifies authors in one of these classes (Rangel et al.,2013). There are a number of studies that use language models to profile authors and classify characteristics like age and gender (Sap et al., 2014; Rangel and Rosso, 2013) or personality (Schwartz et al.,2013). The PAN 2013 shared task on author profiling classified blogs on both age(10s, 20s, 30s) and gender(male/female) with 0.65 being the best score for age and 0.59 being the best score for gender.Weren et al.(2014) got scores of 0.68 and 0.62 with the same data-set. They recommending Bayes classifiers or decision trees for gender and logistic regression for age.

However, gender predictive performance on Twitter data is usually a lot higher. Burger et al.(2011) experimented with support Vector Machines, Naive Bayes and Balanced Winnow2 and reported an accuracy of 0.75 using word and character N-grams when using the text of a user, going up to 0.92 when including profile information. Halteren and Speerstra(2014) even got an accuracy of 0.95 on gender with the use of character and token N-grams on Dutch tweets of 600 labelled users using an SVM classifier.

The difference between this study and the above-mentioned studies in the field of natural language processing is that NLP studies are usually focused on the tech-nical aspect of improving the accuracy of author profiling. This study aims to use the predictive performance of author profiling language models to make inferences about the data.

Getting quality data for training a predictive language model is one of the main challenges for developing accurate language models in natural language processing. Noise in data can ruin accuracy, and a model trained on, for instance, Twitter data will perform significantly less on Facebook data. This is well illustrated by the scores of the 2019 CLIN shared task on cross-genre gender predictions (Bos, Dekker, and Setz,Bos et al.) where accuracy scores rarely exceeded 0.60. The idea behind this paper is that if we can select ’quality’ data for a specific language model by inspecting accuracy scores, then we can also make inferences about the world by

(10)

2.2 natural language processing 5

reviewing accuracy scores of language models trained on different dimensions of a data-set. Therefore, predictive accuracy can be used as a methodological tool to investigate a data-set and draw conclusions about a specific situation or point in time.

(11)

3

_{A P P R O A C H}

As mentioned in the introduction, this study uses a largely new approach to study diachronic behavioural change. The approach leans heavily on two pillars, which are ’dimensions’ and ’time’. It follows the development of the accuracy of different author profiling dimensions (see table 1) over time and uses the result to make general inferences on the change in behaviour of politicians during an election cycle.

A comparable approach was adopted by Jaidka et al. (2018) which used the diachronic accuracy of pre-trained language models to measure semantic drift in natural language in the span of multiple years. The study was based on the obser-vation that predictive language models bySap et al.(2014) degrade in performance with the passing of time. This is because natural languages are constantly evolving to fit the needs of their users (Frermann and Lapata, 2016). If predictive perfor-mance can be used to measure changes in natural language, the same technique might be usable in social science research to track behavioural change.

Dimensions Classes

Party ’GL’, ’SP’, ’FvD’, ’PvdA’, ’D66’, ’CDA’, ’VVD’, ’PvdD’, ’SGP’, ’CU’, ’50PLUS’, ’PVV’

Gender ’Male’, ’Female’ Incumbency ’Yes’, ’No’

Age ’<30’,’30-40’, ’40-50’, ’50>’ Agebin ’<40’, ’40>’

Chance ’Likely’, ’Unlikely’

Left-Right ’Left’, ’Left-center’, ’Right-center’, ’Right’

Progr-Conser ’Conservative’, ’Cons-center’, ’Prog-center’, ’Progressive’

Table 1: Dimensions and classes

3.1 dimensions

The approach in this study uses multiple SVM models that were trained on Twitter data generated by political candidates before, during and after the Dutch parliamen-tary elections of 2017. They were trained to recognize and predict eight different dimensions. The models were then used to make predictions on these dimensions. In this section each dimension is described in more detail.

3.1.1 Party

Twenty-eight parties partook in the Dutch parliamentary elections of 2017. Some of these parties had very little tweets, and others a very small amount of politi-cians that used Twitter. To make the classifier behave consistently, the choice was made to focus on parties that obtained at least one seat in parliament in 2017 and had over 1000 Tweets in a given month. Because of this choice, twelve parties re-mained: ’VVD’, ’50PLUS’, ’ChristenUnie’, ’CDA’, ’D66’, ’Forum voor Democratie’, ’GroenLinks’, ’PVV’, ’PartijvdDieren’, ’SP’, ’PvdA’, ’SGP’.

(12)

3.1 dimensions 7

3.1.2 Gender

The gender of a politician is classified into two groups, with each individual being either ’male’ or ’female’.

3.1.3 Incumbency

Incumbency was classified into two groups, ’incumbents’ and ’non-incumbents’. It must be noted that the incumbency dimension was not corrected with the new incumbents after the election. Therefore the results after election time were not taken into account.

3.1.4 Age

Intuitively, the profiling of age seems like a regression task. However, the labels for the age dimension were provided in seven categories(0:<30, 1:30-40, 2:40-50, 3:50-60, 4:60-70, 5:70>, 6:missing), therefore regression could not be applied. Therefore, classification was used to predict age category. A similar approach was taken in the PAN 2013 age profiling task (Rangel et al., 2013). The categories were very unevenly distributed(see fig4). To make the age dimension more valuable in terms of providing insight, the choice was made to classify this dimension in two ways: ’four-way’ classification and ’binary’ classification.

Four-way classification

For the four-way classification of age, every instance with a missing age category was removed from the data-set. Thereafter, the categories 50-60, 60-70 and 70+ were combined into a 50+ category. The classifier was then trained on the remaining four categories.

Binary classification

Since the four-way classification of age often barely performed better than the base-line of .25, the choice was made to also try to classify age in a binary fashion, with the categories ’40-’ and ’40+’. The choice to divide the data on 40 was made because the classes would be roughly equal in size.

3.1.5 Chance of being chosen

The chance of being chosen was also provided by the annotators of the existing data-set. Originally, this dimension had three classes: ’Sure’, ’Possible’, ’Almost impossible’. However, the data-distribution in figure 5 shows that the ’possible’ class has very little data. This is because per party there were often only two people marked as ’Possible’. To ensure a valid performance result, the classes of ’Sure’ and ’Possible’ were combined into ’Likely’ and the ’Almost impossible class was re-branded as ’Unlikely’.

3.1.6 Political position: Left-right

The left-right political position measurements from Kieskompas came in scores be-tween -2(left) to 2(right) for each party. These scores were binned in four categories: ’Left’, ’Center-Left’, ’Center-Right’, ’Right’. The standard deviation from the mean of the scores was used to put them into categories with mean + st-dev being the bound for the label ’Right’, and mean - st-dev being the bound for the label ’Left’. The mean was used as the divider between left- and right-center. The distribution of the data looked like this:

(13)

3.2 time 8

• Left: 2 parties • Left-center: 5 parties • Right-center: 4 parties • Right: 2 parties

Unfortunately, Kieskompas did not yet have a score for ’Forum voor Democratie’ which was therefore not included in the classification of this dimension.

3.1.7 Political position: Progressive-Conservative

The progressive-conservative scores were binned in the same way as the left-right scores. The distribution of the data looked like this:

• Progressive: 2 parties • Progressive-center: 5 parties • Conservative-center: 2 parties • Conservative: 4 parties

Unfortunately, Kieskompas did not yet have a score for ’Forum voor Democratie’ which was therefore not included in the classification of this dimension.

3.2 time

Since the approach to this study is relatively new, there were no guidelines as to how to use the data to get the most insight into the change of behaviour of politicians over time. Therefore, a variety of different combination of data and time intervals have been tried for training and testing the classifier. Three of these experiments returned interesting results and were, therefore, developed further. The results can be found in chapter6. Table2shows the division of train and test data, how each experiment is validated and what data is used. Furthermore, a short description of the settings of each experiment is given in this section.

Experiments Train Test Val-method Data

Exp 1 Month(0.8) Month(0.2) 5-fol cross f1-score Oct 2016 - Jul 2017 Exp 2 Week(0.8) Week(0.2) 5-fol cross f1-score Feb 2017 - Apr 2017 Exp 3 Month Next month f1-score Oct 2016 - Jul 2017

Table 2: Train/test settings for each experiment

3.2.1 Experiment 1: Monthly train and test

This experiment uses a monthly train and test cycle for each month with a test size of 0.2. The model was trained on the data for each month and the five-fold cross-validation macro f1-score for each of the dimensions for each of the 10 months was recorded. These scores were plotted and can be found in figures8and9.

3.2.2 Experiment 2: Weekly train and test

For most dimensions, predictive performance seemed to be peaking and/or falling around election time. Therefore, the choice was made to run the model on the weeks

(14)

3.2 time 9

of February, March and April, to have a more detailed look at this time period. The data of these months was split into weeks, and the full weeks(week 6-16) were taken as input for the classifier(test size = 0.2) in the same fashion as the monthly train and test experiment.

3.2.3 Experiment 3: Month(train) to month(test)

Since this research aims to study diachronic change in behaviour, it might be inter-esting to investigate how similar the months are to each other. The way this study tries to measure similarity is by training the classifier on all data of a month and then testing it on the next month. The difference in accuracy between months shows where change accelerates or slows down. Macro f1-scores were collected for each month except October(since that was the first month) and plotted in figure11and figure12

(15)

4

_{D A T A}

The data that is used in this study comes partly from an existing Twitter data-set that was provided by Marcel Broersma from the University of Groningen. This data served as a starting point for this study, and helped to devise the research direction. However, for the experiments described in section3.2, additional Twitter data over a longer period of time had to be collected.

4.1 existing

The existing data-set from the University of Groningen contains all tweets written by politicians during the month leading up to the 2017 Dutch parliamentary election. The data-set contains 49.711 tweets and is annotated with the following information: Party affiliation; rank (within their party); name; gender; incumbency (currently holding office or not); and some extra information about the politician’s current job.

4.2 collected

Using the handles of the original data-set, ten monthly data-sets were created con-taining all tweets of the same politicians from October 2016 till July 2017. This was done by querying another data-set at the Rijksuniversity of Groningen that contains all tweets written in the Dutch language. Together, these ten monthly data-sets con-tain a total of 567.443 tweets written by 686 politicians. A description of the data can be found in Table3.

Tweets Words Average words per tweet Characters per tweet October 2016 51.000 797.874 15.64 101.69 November 2016 60.491 952.990 15.75 102.09 December 2016 56.546 885.224 15.65 100.57 January 2017 60.184 955.037 15.87 102.68 February 2017 82.783 1.340.410 16.19 105.38 March 2017 97.037 1.578.300 16.26 104.65 April 2017 36.783 595.244 16.18 106.10 May 2017 40.016 647.150 16.17 105.57 June 2017 43.996 714.421 16.24 105.69 July 2017 38.607 623.954 16.16 104.50

Table 3: Available data per month

The data-set has eight different dimensions that the classifier can be trained on, which are: ’Party affiliation’, ’Gender’, ’Incumbency’, ’Age(four-way)’, ’Age(binary)’, ’Chance of being chosen’, ’Political position(left-right)’, ’Political

position(progressive-conservative)’. The first three were already in the original data-set, and the others were later added. The classes of the age and chance dimensions were annotated by the same people who annotated the original data-set. The political position data

(16)

4.3 filtering 11

came from an independent Dutch research institute called ’Kieskompas’, which has ties to the VU Amsterdam.

4.3 filtering

In the Dutch political system, any party can participate in the parliamentary election if it is registered and has at least 30 support statements in a certain constituency. There are a total of 20 constituencies, but a party doesn’t have to participate in all of them. This causes the total number of parties that participated in the 2017 elections to be very high, namely 28. Many of these parties are not serious candidates for a seat in parliament, have very few candidates and tweets, or have no clear political agenda. In order to safeguard the validity of this study, the choice was made to only include parties that obtained a seat in parliament in the Dutch parliamentary election of 2017 and had a cumulative amount of tweet of over 1000 per month. See table4for a description of the data-set after filtering.

Tweets Words Average words per tweet Characters per tweet October 2016 35.840 567.712 15.84 103.36 November 2016 43.565 694.351 15.94 103.42 December 2016 38.689 614.162 15.87 102.24 January 2017 40.101 650.996 16.23 105.15 February 2017 55.384 911.436 16.46 107.31 March 2017 68.255 1.117.503 16.37 105.73 April 2017 24.607 400.311 16.27 106.66 May 2017 27.197 442.029 16.25 105.58 June 2017 31.683 519.803 16.41 107.65 July 2017 23.174 383.051 16.53 107.71

Table 4: Used data per month

4.4 distributions

The experiments described in section 3.2 are carried out on all in section 3.1 de-scribed dimension. Each dimension has a different data distribution, which should be taken into account when analyzing the results. Measures were taken to balance the data (see5.4). The graphs below show how the data is distributed for the labels of each dimension.

(17)

4.4 distributions 12

Figure 1: Party

Figure 2: Gender Figure 3: Incumbency

Figure 4: Age Figure 5: Chance

(18)

5

_{E X P E R I M E N T S}

In section 3 a broad explanation is given on the workings and purpose of the ap-proach taken in this study. This section will discuss the technical choices that were made to complete the different experiments.

5.1 classifier

All models use an SVM classifier with a linear kernel. Research in the area of NLP revealed that an SVM classifier often perform best when it comes to author profiling tasks. However, experience teaches that different classifiers can perform well with this kind of data. Therefore, experimentation was done with a Naive Bayes classifier, LinearSVC classifier, and normal SVM classifier with the kernels ’linear’, ’rbf’ and ’polynomial’. These classifiers were trained and tested on the original existing data-set of 49.711 tweets, with character n-grams in the range 3-6. The LinearSVC classifier and the SVM classifier with linear kernel performed best in these experiments. The choice was made to proceed development with the SVM classifier, since the literature suggests that this classifier often performs best in author profiling.

5.2 pre-processing

The first step of the pre-processing stage was to filter out any tweets written by parties that did not obtain any seats in parliament in the 2017 elections, or had less than 1000 tweets for a specific month. After this, the tweets of each user were merged into a single string. For monthly testing, the data was split between a train and a test set (test-size = 0.2). After this, the split data method was used to split a user’s tweets into multiple user objects with 20 tweets each(see section5.3for a description of this method). This is done after the train/test split, to ensure that one politician can not be in both the train and test set.

All experiments were run on multiple different settings, since, different dimen-sions performed better with different settings. Tokenization and removal of stop-words was used in some experiments to obtain better accuracy scores. In each experiment, the best score was taken per dimension and then plotted. See table5 for which settings were used for each dimension in each experiment.

5.2.1 Use parties method

As already mentioned in section 4.3, the original data-set was filtered with a ’use parties’ method. This method takes a data-set of a specific month, counts the num-ber of tweets for each party, and only returns instances of parties that have over 1000 tweets and have obtained a seat in parliament during the election. For reasons of consistency and validity the choice was made to not only apply this method when classifying the party dimension, but for all dimensions. On dimensions like age and gender, the classifier might perform better without applying this method. However, this study is interested in the diachronic change in accuracy of these dimensions on politicians and since this method filters out the non-serious parties, the assumption is that it makes the data-set better representable of politicians in general.

(19)

5.3 other settings 14

5.2.2 Merge tweets method

A merge tweets method combined all tweets for each politician. An extra argument can be given so that the resulting data-frame can later be split in individual tweets again. This extra feature was written to be able to use the ’split data’ method(section 5.3.1) after the division of the data into a train and test set.

5.2.3 Equal data method

In Table3 we can see that the amount of data rises around election time(March). This rise in data can possibly skew the results. Therefore, an ’equal data’ method was written that can be used to down-sample the data to the level of the smallest set(July). With this method, experimentation was done to find if and how much the results are influenced by the number of data instances.

5.3 other settings

Especially for training and testing on the same month, the data-set was very small, which made the results very volatile. To stabilize the results, a split data method was written, and five-fold cross-validation was applied.

5.3.1 Split data method

Because the tweets of each politician were merged, the number of data instances was limited. To counter this, a split-data method was written that splits tweets of a user into blocks of n tweets. A user with 100 tweets for a certain month can be split into 5 users with 20 tweets each with this method. This increases the number of data instances(instead of one per politician, we now have multiple). After some experimentation, the number of tweets that were combined into one object was set to 20. Special care was taken to ensure that the same user would not end up in the train and the test set for a specific month. Although the average accuracy dropped a little when using this method, the results became more stable. Therefore, it was used in all final models.

5.3.2 Cross-validation

Because of the small number of data instances, results could also vary depending on the random assignment of train and test set. Therefore five-fold cross-validation was applied, to ensure that the accuracy score would be representable for the whole data-set

5.4 unbalanced data

For some dimensions it was necessary to balance the data. Experimentation was done with random down-sampling of the majority class, to balance the data and improve accuracy. Although results improved in terms of average macro f1-score they became more unstable(running the algorithm with the same settings twice in a row could yield very different results). Therefore, the decision was made to use the built-in class_weight function of the SVM classifier instead, to deal with class imbalances. For most dimensions setting the weight to ’balanced’ yielded the best results, except for ’party’ and ’incumbency’. For the party dimension no weight was used, and for the incumbency dimension ’Yes’ was set to 3.5, and ’No’ to 1.

(20)

5.5 features 15

5.5 features

A combination of character and word n-grams were used as features. These features were passed to a tf-idf vectorizer and the resulting tf-idf vector was then passed to the classifier. A number of different combinations of features were tried for each dimension to find the best performing settings. See section5.5.2for the best performing features for each dimension.

5.5.1 Word Embeddings

Some experimentation was done with word-embeddings, created as described in the paper by (van der Goot and van Noord, 2017). Each token of each instance was transformed into its 100-dimensional embedding, and for each instance, the average embedding was computed and the dimensions used as features. When a token had no equivalent embedding, the token was skipped. Unfortunately, the number of data instances was too small for the classifier to pick up much through the embeddings. For instance, in the case of training and testing on the ’Party’ dimension, it would just classify everything as ’VVD’. Therefore the decision was made to use n-grams as features instead of embeddings.

5.5.2 Best Features

The best features for each dimension were selected by first running the models with both ’words + 3-6 character n-grams + tokenization + removal of stopwords’ and ’1-3 word n-grams + 3-8 character n-grams + tokenization + removal of stopwords’.

Thereafter, experimentation was done with other combinations of the features that seemed promising. The number of word and character n-grams was determined by the results of earlier testing. The best combination of features for each dimension can be found in Table5.

Monthly Weekly MtM

All Equal All Equal All Equal

Party 2+3+4 2+4 1+3+4 5 2+3+4 1+3+4 Gender 2 1+3+4 2+4 5 2+3+4 1+3+4 Incumbency 1+4 1+3+4 1+3 5 1+3+4 2+3+4 Age 2+3+4 2 1+3+4 5 1+3+4 2+3+4 Age(bin) 2+3+4 2+3+4 1+3+4 5 2+3+4 2+3+4 Chance 2 2+3+4 1+3+4 5 1+3+4 1+3+4 Left-Right 2 2+3+4 1+3+4 5 1 1+3+4 Prog-Cons 2 2+3+4 2+4 5 1+4 1+3+4

Table 5: Best Features

1. Words & 3-6 Character n-grams

2. 1-3 word n-grams & 3-8 Character n-grams 3. Tokenization

4. Removal of Stopwords 5. Insufficient data

(21)

6

_{R E S U L T S A N D D I S C U S S I O N}

The clearest result of this study is the peak that ’party affiliation’ and both ’party position’ dimensions have during election time, followed by a sharp fall in accuracy after the election. This seems logical when looking at the data-set, since politi-cal candidates would emphasize their party and ideologipoliti-cal stances in their tweets. Therefore, this result was expected, but it still gives some validity to the approach. Furthermore, the study found that gender behaves adversely to the political posi-tion dimensions menposi-tioned above. The results also indicate that the classes of the incumbency, chance and age dimensions grow more similar prior to election time. This chapter will first discuss the results per time experiment and then for each dimension.

6.1 results per experiment

This section will show the results for each experiment. Overall, all three experi-ments show similar trends when it comes to the political dimensions. The monthly testing experiment seems to be most valid and stable.

6.1.1 Experiment 1: Monthly testing

A clear fall in accuracy on all dimensions can be found when classifying the month of April, which is the month after the election. However, as can be seen in Table 4, there is also a sharp drop in the amount of data that is available for this month. To further investigate whether or not the rise and fall in accuracy is due to the increased amount of data, the data for each month was randomly down-sampled to equal the month with the least total tweets, and then the experiment was run again. The results can be found in figure8and9.

6.1.2 Experiment 2: Weekly testing

The results for the weekly train/test split were less informative then the monthly train/test split. We can still see a clear rise of the ’party affiliation’, ’left-right’ and ’progressive-conservative’ towards the election week, followed by a sharp fall, but the other dimensions seem to have no clear trend. One thing that might be of influence is that there was very little data available per week, especially in the weeks after the election. This is also visible when looking at the binary dimensions(Gender, Age(bin), Incumbency, Chance), which in some weeks fall below the baseline of 0.5. Most models could not perform when the data of the weeks was equalized to the smallest week. Therefore, the choice was made to not include this graph.

We can conclude that for this data-set the weekly train/test split was, for most dimensions, not a valid measure of self-presentation.

6.1.3 Experiment 3: Month to month testing

The month to month testing was also done both with and without the ’equal data’ method. The results of this train/test split were a lot more stable and higher in accuracy. This can probably be explained by the fact that there is more data available for training, and the fact that the same people are in both the train and the test

(22)

6.1 results per experiment 17

Figure 8: Monthly test & train with equal data

(23)

6.2 results per dimension 18

Figure 10: Weekly test & train

set. Therefore, the results should be viewed as an indication of similarity between months. The results can be found in figures11and12.

6.2 results per dimension

A brief description of the results for each dimension will be given here:

6.2.1 Political position dimensions(Party affiliation, Left-Right, Progressive-Conservative)

These three dimensions show the most distinct trend in all graphs. All three dimen-sions show a rise of predictive performance during the months January, February and March, followed by a sharp fall in April. Overall the progressive-conservative dimension performs a little better than the left-right dimension. This might be because the progressive-conservative dimension was more polarized, with a very small conservative-center class. The left-right dimension was more evenly dis-tributed and might, therefore, be harder to predict.

6.2.2 Gender

The Gender dimension seems to have a decrease in predictive performance before the election, but an increase in March. It could be hypothesized that this is because the elections were held on the 15th of March, and the Gender predictive perfor-mance recovers right after election time. Experiment 2, with the weekly train/test split, tried to prove this, but the results of this data division proved to be untrust-worthy(see section6.1.2). However, both month-to-month graphs show a decrease in predictive performance in January and February, and the monthly graph also shows a large decrease in predictive performance in February. Also, the overall scores on the gender dimension are a lot lower than scores reported byBurger et al. (2011) and Halteren and Speerstra(2014). This supports the findings of Lee and

(24)

Figure 11: Month to month

(25)

Lim(2016), that female politicians emphasize their male traits while downplaying their female qualities. Furthermore, these results suggests that this trend becomes stronger around election time.

6.2.3 Incumbency

Something strange happens with the incumbency dimension before the election. In the monthly train/test graphs, incumbency goes down when the election is ap-proaching. However, in the month-to-month graphs, the predictive performance of incumbency stays the same or even goes up a little in the three months prior to the election. This means that the data of the months before the election, becomes less equipped to do predictions on itself, but better equipped to do predictions on the next month.

An explanation for this might be found in the diachronically increasing similar-ity of the classes of this dimension. At the beginning of this study, some experi-mentation was done with the original data-set of a month prior to the election and the month of June 2017. First, the classifier was trained on the election month and tested on June and then the other way around, to find out if the election or regu-lar data would be better for training a classifier to recognize political affiliation. It turned out that the model that was trained on the election data performed terribly on the June data-set, but the other way around performed well. The election data was full of clues about the party affiliation of a candidate, while the June data had a lot less reverences to this dimension.

In the binary classification of incumbency, this could mean that the next month has more clues to incumbency or non-incumbency, but the data instances belonging to the two classes become more similar to each other before election time. For model trained on the previous month, the classes are further apart and, therefore, better distinguishable then for the model trained on the month itself.

This would mean that incumbents adapt their tweets to be more similar to non-incumbents before election time, or the other way around. If we take into account that incumbents have an ’incumbency advantage’ over non-incumbents(Levitt and Wolfram,1997), this probably means that non-incumbents try to present themselves more like incumbents during election time to compensate for the incumbency ad-vantage.

6.2.4 Chance

Something similar happens to the chance dimension as with the incumbency dimen-sion. There is a fall in predictive performance with the monthly prediction before the election, but a rise in predictive performance with the month-to-month predic-tions. If we follow the logic that was applied at the incumbency dimension, this would mean that the use of language of candidates in ’likely’ and ’unlikely’ classes becomes more similar prior to the election.

6.2.5 Age & Age(bin)

The predictive performance of age seems to stay the same or fall slightly prior to election time in the monthly graphs, and stay the same or rise slightly in the month-to-month graphs. It also has a sharp fall in accuracy right after the elections in all graphs except weekly. This suggest that the classes grow more similar to each other prior to election time, just as we see with the chance and incumbency dimensions. However, except for the sharp fall in April, both age dimensions have a lot of similarities with the gender dimension in most graphs. Age(bin) and gender are often very close together and follow a similar line as age(four-way). Therefore,

(26)

6.3 general discussion 21

it could be hypothesized that age traits are also being suppressed during election time.

6.3 general discussion

The research question for this study was defined as: ’How does the self-presentation of political candidates change diachronically during an election cycle?’ To an-swer this question, we first have to establish that change in self-presentation can be measured with the approach that was adopted. The results of this study indi-cate that politicians emphasize their party and political position when election time approaches. The large increase in predictive performance for these dimensions fol-lowed by a sharp fall is a clear indication of change in self-presentation. This can also easily be linked to reality since the time prior to an election is dominated by political campaigns, which emphasizes the positions of a party. This result is, there-fore, hardly surprising, but does prove the validity of the approach used in this study.

With this in mind, strong evidence was found that gender becomes less distin-guishable prior to election time. Both the monthly and the month-to-month results show a fall in predictive performance on this dimension prior to the election. This is in line with what was expected based on the findings of (Lee and Lim,2016) about the suppression of stereotypical female traits in politics. Therefore, it can be stated that the self-presentation of politicians changes by suppressing gender traits during election time.

Furthermore, the research seems to indicate that the classes of the incumbency, age and chance dimensions grow more similar to each other during election time. In the case of incumbency, this can be linked to compensating for the ’incumbency advantage’ as described byLevitt and Wolfram(1997). Chance is very closely linked to incumbency because incumbents are often high on a party’s list. Therefore, this might be the same mechanism at work. Age, on the other hand, is not linked to the political career of a candidate. But the age dimension also seems to be closely related to the gender dimension and could also be more suppressed during election time.

6.4 other interesting findings

When looking over the graphs, it becomes clear that almost all dimensions in all different train/test settings show a fall in accuracy in December. This month has an average amount of data, so this fall seems odd. December is a holiday month, so it could be hypothesized that tweets during the holiday season(like: ’Happy new-year’ or ’Merry Christmas’) give fewer indications of the author’s character traits and are, therefore, less useful for author profiling tasks.

(27)

7

_{C O N C L U S I O N}

This study investigated the change in self-presentation by politicians during an elec-tion cycle. A largely new approach was developed, to perform a quantitative anal-ysis of tweets written by politicians before, during and after the 2017 Dutch parlia-mentary election. The validity of this approach was proven by showing an expected curve in accuracy of the dimensions party affiliation and political position(’Left-right’ and ’Progressive-conservative’), which shows that inferences about reality can be made with this approach, based on predictive performance. However, the more interesting results are those of the dimensions ’gender’ and ’age’ for which this study found evidence of suppression during election time. Another thing that the study seems to indicate is that the classes of the incumbency and chance di-mensions seem to grow closer to each other during election time. However, further study is needed to confirm this. These results give insight into how political candi-dates change self-presentation during an election cycle.

7.1 future work

This study investigates how accuracy changes on politicians in general, but the dif-ferent dimensions also offer an excellent way to further isolate demographics and then measure differences in accuracy on other dimensions between these demo-graphics. Is it harder to predict the gender of an incumbent or a non-incumbent? Is party affiliation better predictable in older age demographics? Is it easier to predict the gender of a progressive or a conservative? These kind of question can all be answered by splitting the data-set on one of the dimension and then measure the accuracy for each class on the other dimensions.

Another interesting thing to look into is how the data point shift before, during and after election. This study found evidence that the classes of the incumbency, age and chance dimensions grow closer together, but does not specify in which direction this happens. Do non-incumbents grow more similar to non-incumbents as was hypothesized, or is it the other way around? Are all age-group becoming more similar to 30-40 year demographic, or more similar to 50+? Also, it might be interesting to see which parties are becoming more similar and which grow more dissimilar.

Furthermore, it might be interesting to use the approach that was adopted in this study in other social science research. As is illustrated in section6.4, this approach can detect detect a variety of behavioural patterns. In a world with an abundance of social media data, a quantitative approach like the one adopted in this study might be very effective in analyzing social phenomenon because of its speed and scale.

(28)

B I B L I O G R A P H Y

Adams, J., M. Clark, L. Ezrow, and G. Glasgow (2004). Understanding change and stability in party ideologies: do parties respond to public opinion or to past election results? British journal of political science 34(4), 589–610.

Adams, J., A. B. Haupt, and H. Stoll (2009). What moves parties? the role of public opinion and global economic conditions in western europe. Comparative Political Studies 42(5), 611–639.

Alesina, A., T. Cassidy, and U. Troiano (2015). Old and young politicians. Economica. Bos, R., K. Dekker, and H.-J. Setz. Rob’s angels: Embedding and clustering for

cross-genre gender prediction.

Burger, J. D., J. Henderson, G. Kim, and G. Zarrella (2011). Discriminating gender on twitter. In Proceedings of the conference on empirical methods in natural language processing, pp. 1301–1309. Association for Computational Linguistics.

Burnap, P., R. Gibson, L. Sloan, R. Southern, and M. Williams (2016). 140 charac-ters to victory?: Using twitter to predict the uk 2015 general election. Electoral Studies 41, 230–233.

Frermann, L. and M. Lapata (2016). A bayesian model of diachronic meaning change. Transactions of the Association for Computational Linguistics 4, 31–45. Goffman, E. (1959). The presentation of self in everyday life, Volume 13. Garden City,

NY: Doubleday.

Halteren, H. v. and N. Speerstra (2014). Gender recognition on dutch tweets. Jaidka, K., N. Chhaya, and L. Ungar (2018). Diachronic degradation of language

models: Insights from social media. In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 195–200. Jungherr, A. (2016). Twitter use in election campaigns: A systematic literature

re-view. Journal of information technology & politics 13(1), 72–91.

Lee, J. and Y.-s. Lim (2016). Gendered campaign tweets: the cases of hillary clinton and donald trump. Public Relations Review 42(5), 849–855.

Levitt, S. D. and C. D. Wolfram (1997). Decomposing the sources of incumbency advantage in the us house. Legislative Studies Quarterly, 45–60.

Meeks, L. (2012). Is she “man enough”? women candidates, executive political offices, and news coverage. Journal of communication 62(1), 175–193.

Rangel, F. and P. Rosso (2013). Use of language and author profiling: Identification of gender and age. Natural Language Processing and Cognitive Science 177.

Rangel, F., P. Rosso, M. Koppel, E. Stamatatos, and G. Inches (2013). Overview of the author profiling task at pan 2013. In CLEF Conference on Multilingual and Multimodal Information Access Evaluation, pp. 352–365. CELCT.

Sang, E. T. K. and J. Bos (2012). Predicting the 2011 dutch senate election results with twitter. In Proceedings of the workshop on semantic analysis in social media, pp. 53–60. Association for Computational Linguistics.

(29)

BIBLIOGRAPHY 24

Sap, M., G. Park, J. Eichstaedt, M. Kern, D. Stillwell, M. Kosinski, L. Ungar, and H. A. Schwartz (2014). Developing age and gender predictive lexica over social media. In Proceedings of the 2014 Conference on Empirical Methods in Natural Lan-guage Processing (EMNLP), pp. 1146–1151.

Schütz, A. (1995). Entertainers, experts, or public servants? politicians’ self-presentation on television talk shows. Political Communication 12(2), 211–221. Schwartz, H. A., J. C. Eichstaedt, M. L. Kern, L. Dziurzynski, S. M. Ramones,

M. Agrawal, A. Shah, M. Kosinski, D. Stillwell, M. E. Seligman, et al. (2013). Per-sonality, gender, and age in the language of social media: The open-vocabulary approach. PloS one 8(9), e73791.

van der Goot, R. and G. van Noord (2017). Monoise: modeling noise using a mod-ular normalization system. arXiv preprint arXiv:1710.03476.

Weren, E. R., A. U. Kauer, L. Mizusaki, V. P. Moreira, J. P. M. de Oliveira, and L. K. Wives (2014). Examining multiple features for author profiling. JIDM 5(3), 266–279.