Analyzing Complexity of Social Media Communication by 2016 US Presidential Candidates

(1)

submitted in partial fulfillment for the degree of master of science Leony Brok

10767215

master information studies data science

faculty of science university of amsterdam

2021-01-28

First Supervisor Second Supervisor Title, Name Dr. S. Rudinac E.J.G. van Gerven MSc Affiliation UvA, FEB UvA, FEB

(2)

Analyzing Complexity of Social Media Communication by 2016

US Presidential Candidates

Leony Brok

leonybrok@gmail.com University of Amsterdam

ABSTRACT

Text complexity has a strong influence on how the conveyed mes-sage is perceived. Typically, simple mesmes-sages are evaluated more favorably compared to complex messages. Politicians can use this effect to their advantage. How complex is the political communica-tion on social media and does complexity influence the success of a message? In this paper, tweets by 2016 US presidential candidates Trump and Clinton are analyzed to answer these questions. The Flesch reading-ease score is used to determine the complexity of tweets by Trump and Clinton. Most tweets are easy or very easy to read. Our large-scale analysis shows that the tweets by Trump are slightly easier to read than tweets by Clinton. For Clinton, tweets that are easier to read get more likes and get retweeted more often. For Trump, the complexity of a tweet has no effect on the number of likes or retweets.

KEYWORDS

Text Complexity, Readability, Social Media, Twitter, Donald Trump, Hillary Clinton, Elections

1 INTRODUCTION

In recent years, social media has started to play an increasingly important role in American national politics. During the 2016 US Presidential Elections, the social media platform Twitter became particularly important [2, 10]. Twitter is a micro-blogging platform where users can send out short messages with a maximum of 140 characters1. These messages are called tweets and can include URLs or have images or videos attached to them.

Both the Republican and the Democratic candidate in the 2016 Elections used Twitter as part of their campaign. Republican candi-date Donald Trump often sent out insulting tweets and received a lot of media attention with this [13, 20]. Overall, he can be described as a "unconventional and spontaneous candidate" [2]. Democratic candidate Hillary Clinton also uses Twitter. Unlike Trump, she uses the medium to build her reputation as a "solid and experienced candidate" [2].

The way that candidates use language to express themselves has also drawn attention. In particular, the language used by Donald Trump stands out because it is much less complex compared to language used by other candidates [17, 35]. A more complex text is more difficult to understand. This relates to the concept of fluency, i.e. how easy or hard it is to process information. This is affected by numerous factors, including text complexity [32]. By analyzing text complexity, we can gain new insights into how politicians communicate about topics and how this affects the response to their communications.

1_{After the period of data collection, the limit has changed to 280 characters [31].}

Figure 1: Average Flesch reading-ease score of tweets by Trump and Clinton in the twelve months before the election, per week. Higher score means easier to read

Previous research on the complexity of language used by candi-dates in the 2016 US Presidential Election has focused on traditional political communication, such as speeches and interviews [17]. In this study, the complexity of the communication by Hillary Clinton and Donald Trump on Twitter when they were presidential candi-dates will be analyzed. This will be done using data from Twitter gathered during the 2016 US Presidential Elections.

The complexity of tweets will be measured using readability scores. These scores are based on features of complex texts, such as the length of a sentence and the number of syllables in a word. As can be seen in Figure 1, how easy to read tweets by both candidates are (measured by the Flesch reading-ease score [12]) varies per week. In the final weeks before Election day in November 2016, tweets by both candidates are becoming easier to read.

The relationship between the complexity of a tweet and the popularity of a tweet will also be studied. The popularity of a tweet will be predicted with ridge regression models. On Twitter, the number of likes and retweets a tweet receives indicate how well a tweet is received by the general public; in other words, how ‘popular’ a tweet is [19]. ‘Liking’ a tweet can be done by clicking on the heart icon that is present near every tweet. It is a way of showing appreciation. ‘Retweeting’ on the other hand is done by clicking the retweet icon and is a way to share a tweet on your own timeline. When a tweet is retweeted often, it is possible for this tweet to go viral, because many Twitter users will see it in their timeline.

The research question that will be used in this project is: "Does Text Complexity of Tweets by US Presidential Candidates Influence Tweet Popularity?". The following subquestions will be used: 1

(3)

(1) How does the complexity of tweets by Clinton and Trump compare?

(2) Is there a relationship between tweet complexity and tweet topic for either candidate?

(3) Is there a relationship between variation in complexity and tweet topic for either candidate?

(4) Is there a relationship between the complexity of a tweet and the popularity of a tweet?

The related work section gives an overview of previous research on fluency and text complexity and on determining the topic of a tweet. The method section describes how the data was gathered, how the complexity and topic of a tweet were determined, how complexity was analyzed and how likes and retweets were pre-dicted. This is followed by the section summarizing the results of extensive experimentation. In the discussion section, limitations of the conducted research are described and finally a conclusion is drawn.

2 RELATED WORK

This section provides an overview of recent literature on the fol-lowing subjects that are relevant for this study. First, fluency and its role in communication will be discussed. Fluency will be dis-cussed and also be related to text complexity. This is followed by a description of different ways to measure text complexity, with a focus on measuring text complexity on Twitter. Finally, ways to determine the topic of a tweet will be described.

2.1 Fluency

In communication literature, fluency refers to how easy or hard pro-cessing new information is. Fluency is influenced by many different variables, including how information is presented (text complexity, font, visibility), semantic relatedness of the information and pre-vious exposure to the information [32]. Fluency is relevant in the context of communication and leadership because it influences how people evaluate stimuli. Fluent processing causes a more positive evaluation of the stimulus [26, 32, 38].

Different explanations exist as to what causes the relationship between high fluency and positive evaluations of stimuli. Fluent processing itself might cause a positive affective response [32], because high fluency is associated with "positive states of the envi-ronment or the cognitive system"[38] This positive response then translates to a positive evaluation of the stimulus. Another expla-nation considers familiarity as the reason why high fluency fosters positive responses. High fluency can be interpreted as a sign that a stimulus has been encountered before and/or has a sense of fa-miliarity, which is experienced as positive because people have an instinctual "fear of the unknown" [38].

2.1.1 Fluency and Social Media. Fluency has also been found to influence people’s behavior on social media. In a study on the shareability of tweets, fluency was found to be one of the factors influencing how likely participants were to share a tweet [21]. However, a study based on the Chinese platform Sina Weibo could not reproduce this effect [22].

The use of emoji’s also influences fluency [16]. In a study on the effect of emoji’s on Twitter, participants looked at different versions of tweets: with the original emoji (congruent condition),

with a context-inappropriate emoji (incongruent condition), or without the emoji (neutral condition). Tweets with the original emoji were perceived as easier to understand and more believable, compared to tweets without emoji or with an incongruent emoji. This effect can be explained by fluency: a congruent emoji makes processing a tweet easier, which in turn cause higher believeability and shareability [7].

2.1.2 Fluency and Text Complexity. Fluency and text complexity are related concepts. A complex text is more difficult to process, meaning it is lower in fluency. This was studied by Oppenheimer [26], who conducted a series of experiments and found that, in line with the fluency research described before, "needless complexity leads to negative evaluations".

2.2 Text Complexity

How can we measure text complexity? This question has attracted the attention of many scholars. The goal of this section is to present an overview of complexity measures used in previous research. 2.2.1 Readability Scores. Most traditional readability scores were created with text comprehension in mind. They are based on surface features of a text such as the average length of sentences and words [14]. The assumption is that long sentences and long words indicate that a text is more difficult to read [33]. The features are weighted and combined in a score [5]. Most scores output the grade level required for understanding the text.

The Flesch reading-ease score is a very commonly used read-ability score. It is calculated with the average number of words per sentence and the average number of syllables per word [12]. Usual scores range from 0 to 100, where a higher score means a text is less complex. The maximum score is 121.22; there is no minimum score. See Appendix C for how to interpret the Flesh reading-ease score.

The Fog index is another commonly used readability score [8]. Like the Flesch score, it uses the average number of words per sentence. It also uses the percentage of hard words. Hard words are defined as words that are three syllables or longer. The Fog index outputs a score that is interpreted as the US grade level required to easily read the text.

Instead of defining hard words as words with three syllables or more, like the Fog index does, they can also be defined as words that do not appear on a list with easy words. This is also called the ‘OOV rate’ which stands for Out Of Vocabulary rate. If a vocabulary of the most common words in a language is used for comparison, the expectation is that the OOV rate for a more complex text is higher compared to a less complex text. This is because a complex text will contain less common words [9]. The New Dale-Chall Readability score makes use of this method [4].

2.2.2 Measuring Text Complexity on Twitter. When applying com-plexity measures to messages from social networking sites (and Twitter in particular), characteristics of the platform that might limit the validity of the measures should be considered [16, 30]. This is especially the case if the measure was designed to work on traditional media sources or on longer texts.

One of the characteristics of Twitter is that messages are very short. When the social networking site started out, messages could 2

(4)

not be longer than 140 characters. In 2017, this limit was changed to 280 characters [31]. Another characteristic of tweets is that ab-breviations are common.

What have other researchers done in order to deal with these characteristics? Pogrebnyakov and Maldonado [29] used three exist-ing text complexity measures (Flesch readexist-ing-ease score, Automated readability index and New Dale-Chall score) to analyze Facebook posts by US police departments. They argued that text size should not limit the validity of applying readability scores to social media messages, because the scores do not consider text size.

Abbreviations and (words collated in) hashtags might also be a problem, because words with more syllables or words not appearing on a list of easy words (relevant to the New Dale-Chall score, see section 2.2.1) are considered more complex by some scores [29]. Jacob and Uitdenbogerd [16] did not find a relationship between readability and tweets with and without hashtags, but they did find that mentions (using the @-sign to refer to other users) increased complexity. Temnikova, Vieweg and Castillo [33] also found that mentions increased complexity and they found the same results for hashtags, especially hashtags at the beginning of a tweet.

Furthermore, Pogrebnyakov and Maldonado [29] noted that be-cause their research focuses on comparing scores of messages from a single source, instead of getting an absolute score or comparing messages from different sources, the measures used are an appro-priate choice.

2.3 Tweet Topic

When we think about a text document, like a movie review or a newspaper article, we often think of the document having a certain subject or topic. The same can be said about tweets. For example, the following tweet by Trump from October 20, 2016 can be categorized as having ‘attack on opponent’ or ‘policy issue’ as topic:

"Crooked Hillary promised 200k jobs in NY and FAILED. We’ll create 25M jobs when I’m president, and I will DELIVER!"

Researchers have shown interest in uncovering the topic of a tweet. Topics of tweets can be used to group tweets together to discover correlations or patterns over time [2]. In the following paragraphs, different approaches to determine the topic of a tweet that are relevant to this study will be described.

2.3.1 Manual Classification. Content analysis is a way to get the purpose or meaning from documents in a corpus. This usually involves multiple human ‘coders’ adhering to a coding scheme which contains the rules for labelling documents and going through all documents manually. Content analysis is commonly used in the social sciences and has also been applied to tweets by Clinton and Trump. For example, Lee and Lim [18] used two coders to determine whether a tweet by Clinton or Trump mentions a gendered trait, a gendered issue and the type and content of a tweet. Buccoliero et al. [2] assigned tweets by Trump and Clinton to categories that emerged in the process of reading the tweets.

When done well, content analysis is transparent and reliable. However, it is also very costly because all of the work is done by hand and this can take up a lot of time. In studies with a corpus of multiple thousands of tweets, classifying topics by hand might take

too much time or cost too much money. In this situation, automated text classification methods might offer a solution.

2.3.2 Automated Text Classification. Automated text classification methods include (but are not limited to) machine learning methods. An example of an automated classification method that does not rely on machine learning is the study by Pancer and Poole [27]. In this study, tweets by Trump and Clinton are categorized using the MeaningCloud plugin for Excel and an existing subject taxonomy designed for use with news objects. This way 64% of tweets were classified into a topic, with the remaining tweets being too short or too generic to classify.

A study by Fromm et al. [13] provides another example of an automated classification method that does not use machine learning. In this study, the most frequently used words in tweets by Clinton and Trump were categorized. Then tweets were classified based on the words in them. This method uses QDA Miner and WordStat software.

The machine learning field provides many methods to determine the topics of documents, also called topic modeling. An important step in these methods is to create a mathematical representation of the documents in a corpus. This representation can then be used to ‘learn’ with. The bag-of-words model is a common representation that consists of a vocabulary of all words in a corpus. Each document is represented by the count of words in that document.

In addition to a bag-of-words model, TF-IDF (term frequency–inverse document frequency) weighting can be used to represent how rel-evant a word is to a document. The higher the TF-IDF value of a word, the more relevant this word is to the document. The TF-IDF value is calculated by multiplying two values: the count of a word in a document (which was acquired when making the bag-of-words model) and the inverse of the total number of documents in the corpus divided by the number of documents in the corpus that contain that word. The latter represents how common a word is in a corpus.

Latent Dirichlet Allocation (LDA) is a very common topic mod-eling technique that uses a bag-of-words or TF-IDF representation of documents. One of the assumptions of this method is that every document in a corpus is composed of multiple topics. Because of this assumption, LDA does not work well when it is applied to a corpus of tweets [24]. Tweets are very short and often contain only one or even no identifiable topic.

General clustering methods can also be used for topic modeling. Clustering means grouping unlabeled data together. A clustering algorithm will try to find the best way to group data points. The resulting model can be used to determine the group or cluster a data point belongs to. Many different clustering algorithms exist. In the-ory, any of them could be applied to (a mathematical representation of) a corpus to cluster documents.

(5)

Topic Example Hash Tags

Slogans #imwithher #strongertogether #hillary2016 #trumppence16 #maga Debates #demdebate #debatenight #gopdebate #debate

Nomination #demconvention #iowacaucus #nyprimary #primaryday #supertuesday Political Party #gopdebate #rncincle

Negative about Opponent #crookedhillary #neverhillary #rattledhillary #hillarycarefail US Government #sotu #scotus #draintheswamp #riggedsystem

Voting #nationalvoterregistrationday #ohvotesearly #ivoted #voting #electionday Minorities #blackhistorymonth #hispanicheritagemonth #blacklivesmatter

Women’s Rights #shewon #equalpayday #womensequalityday Foreign Policy #brexitvote #venezuela #irandeal #turkey #nato Geographic #newyork #southcarolina #az

Other Policy Issues #economy #jobs #veterans #immigration #fightforfamilies #raisethewage Media #msm #foxnews #media #snl

Event #grammys #kentuckyderby #superbowlsunday Miscellaneous #nationalpantsuitday #throwbacktuesday #dumpmacys

Table 1: Topics used and examples of hash tags belonging to that topic

3 METHOD

3.1 Data

The data set is sourced from the Trump Twitter Archive2. All tweets from the year before election day are included. The first tweet in the data set was sent out on November 8, 2015 at 6 pm and the last one on November 8, 2016 at 5 pm (US Eastern timezone).

The data set contains a total of 10,971 tweets, of which 6,002 are by Clinton and 4,969 by Trump. In addition to the text of the tweet, the following metadata are included: the source, the timestamp of the creation of the tweet, whether the tweet is a retweet from another Twitter user, the number of favorites and the number of retweets and a unique id number. Some tweets are missing. Tweets were collected a few times a day, so if a candidate created a tweet and deleted it before the next moment when tweets were collected, it is not included in the data set.

3.2 Text Complexity

The complexity of a tweet was measured with the use of readabil-ity scores. The Readabilreadabil-ity package for Python [6] was used for this. First, every tweet was tokenized into words, sentences and paragraphs. Then urls were removed from the tweet. After these pre-processing steps the number of paragraphs, sentences, words, syllables and characters were counted. The number of complex words (according to the Dale Chall index) was also counted. These counts were used to get the ratios of words per sentences, syllables per word and characters per word. These features were then used to calculate readability scores (see section 2.2.1).

All the features and readability scores were added to the data set of tweets and used in the exploratory analysis. However, for the regression models, the Flesch reading-ease score was used [12]. This score was chosen, because it has been extensively evaluated. It is also the most commonly used score in research on readability on Twitter [11, 16, 29]. See Appendix C for a table that helps with the

2_{http://www.trumptwitterarchive.com}

interpretation of the Flesh ease score. The Flesch reading-ease score is calculated as followed:

𝐹 𝑅𝐸=206.835−84.6∗(𝑠𝑦𝑙𝑙𝑎𝑏𝑙𝑒𝑠/𝑤𝑜𝑟𝑑𝑠)−1.015∗(𝑤𝑜𝑟𝑑𝑠/𝑠𝑒𝑛𝑡𝑒𝑛𝑐𝑒𝑠)

3.3 Topic

On Twitter, a hash tag can be added to a tweet by adding the #-symbol before a word or phrase written as a continuous string of characters. Any word can be made into a hash tag and hash tags can be used anywhere in a tweet. Hash tags are used to categorize a tweet or to join a discussion [27, 39]. This way, hash tags allow users to easily find tweets on the same topic. This function of hash tags on Twitter is leveraged by this study to assign tweets to topics. First, a list was made with all unique hash tags (N = 593) that were used in the tweets in the data set. Two coders assigned these hash tags to topics. The topics were chosen from a list which was roughly based on the topics used in studies by Lee and Xu [19] and by Fromm, Melzer, Ross and Stieglitz [13]. When the topics from this list were deemed insufficient, topics were added or modified. Not all topics from the list were used. The final list of topics that was used (with examples of hash tags belonging to each topic) can be found in Table 1. After all hash tags were associated with a topic, tweets were assigned to a topic based on the topic of the first hash tag in a tweet.

Using this method, the topic of 15.2% (N = 911) of tweets by Clinton (N = 6,002) and 29.5% (N = 1,465) of tweets by Trump (N = 4,969) was determined. How often each topic occurs is plotted in Figure 2. There are large differences in the frequency of topics. Both candidates have topics that they have never tweeted about, and many topics are tweeted about less than 20 times.

3.4 Additional Features

In addition to text complexity and topic features, a number of other features (to later be included in a regression model) were created for every tweet. Whether the tweet contains a hash tag, mention (using the @-symbol) or URL is extracted using regular expressions and added as a Boolean variable. Whether Trump, Clinton, a Democratic 4

(6)

Figure 2: Count of topics in tweets by Clinton and Trump

candidate, a Republican candidate, or Barack Obama is mentioned in a tweet is also added as a Boolean variable.

Some features related to date and time were also included. The feature ‘days before election’ counts the days before election day, meaning that November 8, 2016 is encoded as 0 and the day before as -1, the day before that as -2, et cetera, to the first day included in the data (November 8, 2015) which is encoded as -366.

3.5 Analyzing Complexity

3.5.1 Comparing Complexity. To compare the complexity of the tweets by Clinton and Trump, an independent two sample t-test was conducted. Cohen’s d was calculated to measure effect size. On top of this, a box plot was created to show the distribution of the readability of tweets by Clinton and Trump. Finally, the average number of syllables per word and words per sentence were compared. The Flesch reading-ease score is based on these scores (see section 3.2), so this comparison will provide further insight into why the complexity of tweets by Clinton and Trump differs or does not differ.

3.5.2 Complexity, Variation and Topic. A chart showing the aver-age Flesch reading-ease score of all tweets in a topic was created for both candidates (see Figure 5). Only topics with 20 tweets or more were shown. The charts allow easy comparison of the readability of different topics. Additionally, tables with the same topics were created for both candidate (see Appendix B). These tables contain the number of tweets in the topic, the mean, variation and kurtosis of the Flesch reading-ease score of tweets in the topic and the mean of likes and retweets of tweets in the topic.

3.6 Predicting Likes and Retweets

One of the goals of this study is to find out whether characteristics of a tweet influence the popularity of a tweet (RQ3). The number of likes and retweets of a tweet can be used as a proxy for the popularity of a tweet [19]. Characteristics of a tweet include how easy to read it is, what the topic of tweet is and whether a mention of (another) Twitter user is included. Multiple regression models were fitted to determine the effects of these characteristics. Regression

models have successfully been used in previous research to predict the popularity of a tweet [19, 27, 36].

Specifically, ridge regression was used. Ridge regression is a variation of Ordinary Least Squares regression that includes reg-ularization. The goal of regularization is to reduce over-fitting. In the case of ridge regression this is done by adding a penalty term to the loss function. This term ‘punishes’ high coefficients, thus preventing multicollinearity and reducing model complexity. How much influence the penalty term has is determined by the value of alpha: the higher alpha, the bigger the penalty. The value of alpha is not determined by learning, but has to be chosen manually or by cross validation. In this study, leave-one-out cross validation was used to determine the value of alpha that leads to the results with the lowest Mean Squared Error.

3.6.1 Pre-processing. Features were pre-processed before training. In the case of categorical features (such as ‘topic’), pre-processing is necessary to be able to include them in the model. The topic of a tweet is a category with a label. However, the ridge regression model can only learn from numerical features. Thus, all categorical features were transformed to numerical features in the form of dummy variables. Every category of a categorical feature is repre-sented by a dummy variable. This dummy variable can take two values: 0 if the category is absent and 1 if it is present. For example, the feature ‘topic’ can take on one of 16 values. For this feature, 16 dummy variable were created. Every tweet has a value of 1 in one of those dummy variables and 0 in all others.

In the case of ridge regression, some form of standardization or normalization is necessary. This is because the model is sensitive to the scale of variables, meaning that measuring a variable on a different scale will lead to a different coefficient. Normalization en-sures that features that are measured on different scales contribute equally to the training of the model. In this study, all numerical features were normalized using Min-Max Scaling. This means that they were scaled to a range of 0 to 1, so the lowest value of a feature is 0 and the highest value is 1.

Furthermore, output features were log transformed. Both number of likes and number of retweets were used as an output feature in regression models. Tweets have a lower boundary: they can not have less than 0 likes or retweets. For most tweets the number of likes and retweets is in an range close to other tweets, but the numbers can be very high for tweets that go viral. Because of this,

Figure 3: Distribution of likes per tweet by Clinton before and after logarithmic transformation

(7)

Kincaid

ARI

Coleman-Liau

Fog

SMOG

5

0

5

10

15 Grade Level: Higher is more difficult to read

Clinton

Trump

Figure 4: Box plot for Flesch reading-ease score, mean is indicated by white dashed line

both features are strongly positively skewed. Taking the natural logarithm of these features results in a more balanced distribution. After training, a coefficient is learned for each feature. Before these coefficients can be used to interpret the regression models, they have to be transformed back. Before training, the output fea-tures were log transformed, so transforming the coefficients back is done by exponentiating them, since this is the inverse of the log transformation.

After pre-processing, the data was randomly split up in a training and a testing set. 80% of the data was assigned to the training set and the remaining 20% was assigned to the test set. During the training phase, only the data in the training set is used. After the model is done training, the model is tested on the testing set. This strict separation of the data set ensures that the final evaluation of the model is unbiased. The ‘random state’ variable is used when dividing the data in a train and a test set so all models use the same train and test set.

Multiple models were created: separate models were created for the number of likes as outcome variable and for the number of retweets as outcome variable. Separate models were also created for Trump and for Clinton. This results in a total of 4 different mod-els. All models use the same features: Flesch reading-ease score, whether a url, mention or hash tag is present, whether the other candidate, another republican or democratic candidate, or Obama is mentioned, the topic of the tweets and the number of days before the election. The Flesch reading-ease score and topic of a tweet were used as features because they are the focus of this study. The features whether a url, mention or hash tag is present and whether a candidate or the President Obama mentioned were included be-cause they might offer interesting insights into what determines the popularity of a tweet. Finally, the number of days before the election was included as a feature because there is a clear relation with the outcome variables.

4 RESULTS

4.1 Comparing Complexity

How similar or different are tweets by Clinton and Trump with regards to complexity? A box plot (Figure 4) shows the distribution of the readability of tweets by both candidates. In addition to the Flesh reading-ease score, five other readability scores are plotted. These scores use grade level to indicate how easy a text is to read. A higher grade level means a text is more difficult to read.

Tweets by both candidates have a very similar distribution of readability. For both candidates, the majority of tweets (between the lower and upper quantiles) have a Flesch reading-ease score between 75 and 105. A higher Flesch reading-ease score indicates that a text is easier to read. Scores between 80 and 90 are interpreted as ‘easy to read’ (see Appendix C for how to interpret scores). Texts with these scores are easy to read for children around the age of twelve. Scores between 90 and 100 are ‘very easy to read’ and easy to understand for children around the age of eleven.

Both tweets by Clinton and Trump have a median Flesch reading-ease score of 89. The white dashed line shows the mean of the reading scores. This is 88 for Clinton and 91 for Trump. The t-test (t = 5.71, p < 0.001) shows that the difference in readability is significant. However, the effect size as measured by Cohen’s D is 0.11, which indicates that the effect is very small. Tweets by Trump are easier to read than tweets by Clinton, but only slightly.

The Flesch reading-ease score is based on the number of words per sentence and the number of syllables per word (see section 3.2). Comparing these values offers more insight into why the Flesch reading-ease score is higher for Trump than for Clinton. Tweets by Trump have an average of 11.20 words per sentence, compared to 12.48 words per sentence for Clinton. The words used in tweets by Trump have 1.23 syllables per word on average, compared to 1.25 for tweets by Clinton. The difference in Flesch reading-ease score thus is mostly based on the statistic that tweets by Clinton have more words per sentence.

(8)

4.2 Complexity and Topic

The bar plots (Figure 5) shows the average Flesch reading-ease score for every topic that a candidate has tweeted about at least 20 times. Exact values can be found in the table in Appendix B. There is some variation in the average scores for each topic, but all averages are well within the interquartile range of readability (see section 4.1).

For Clinton, the topic with the lowest readability score (more difficult to read) is ‘minorities’. The topic ‘voting’ also has a rela-tively low score. The topic with the highest score is ‘political party’, followed by the topic ‘US government’. For Trump, the topic with the lowest score is ‘negative about opponent’, closely followed by the topic ‘other policy issues’. ‘US government’ is the topic with the highest readability score. On average, tweets about all topics are easy or very easy to read, except for tweets by Clinton on the topic ‘minorities‘: these are fairly easy to read (see Appendix C for how to interpret readability scores).

4.2.1 Variation in Complexity and Topic. Comparing the topics based on the variation of complexity goes one step further than comparing based on complexity alone. See Appendix B for a table with the variance, kurtosis and more per topic for Trump and Clin-ton. Some topics have greater variance than others. Trump’s tweets with the topic ‘negative about opponent’ have greater variance compared to other topics. ‘Media’ is a topic with a relatively lower variance. Overall, variance of complexity within topics is lower for Clinton than for Trump. For Clinton, the topics with the highest variation in complexity are ‘debates’ and ‘minorities’. The topic with the lowest variance is ‘nomination’.

Figure 5: Average Flesch reading-ease score per topic of ev-ery topic with 20 or more tweets, for Clinton (top) and Trump (bottom)

4.3 Predicting Likes and Retweets

The 𝑅2score was used to interpret the predictive power of the models. This score indicates how well a model is able to predict the value of the outcome variable. More precisely, 𝑅2is the percentage of variance in the outcome variable that is predicted by all input features in the model. The higher this percentage, the better the model is able to predict the outcome variable. A 𝑅2value of 0 (or 0%) means the model explains none of the variance in the outcome variable and a value of 1 (or 100%) means the model explains all variance in the outcome variable.

The exponentiated coefficients were used to interpret the influ-ence of individual features. The higher the (absolute) coefficient, the more this feature contributes to the model. A positive coefficient means that a higher value of the input feature is associated with a higher value of the output feature. A negative coefficient means a higher value of the input feature is associated with a lower value of the output feature.

The models are visualized in Figure 6 (coefficient values can be found in Appendix D). Separate plots for the models for predicting likes and for predicting retweets show the coefficient magnitudes. In each plot, the red circles show the coefficients for Trump and the blue triangles show the coefficients for Clinton. The value for alpha that was chosen using leave-one-out cross validation and the test score are also shown.

4.3.1 Predicting Likes. The model predicting the number of likes that tweets by Clinton received was trained with an alpha value of 5.5 and reached a test score of 0.36. The three features with the highest coefficients in this model are days before the election, Flesch reading-ease score and whether president Obama is mentioned or not. The coefficient for the feature ‘days before the election’ is positive. A higher value of this feature indicates being closer to election day, so the positive coefficient indicates that tweets receive more likes when they are closer to election day. The coefficient for the Flesch reading-ease score is positive. This means that the higher the Flesch reading-ease score (indicating a text is easier to read), the more likes a tweet receives. Mentioning President Obama also has a positive effect on the number of likes.

The model to predict the number of likes on tweets by Trump was trained with an alpha value of 5.0. This model reached a test score of 0.67. The three features with the highest coefficients in the model are the number of days before the election, whether the topic of the tweet is ‘minorities’ and whether the tweet contains a mention. However, it has to be noted here that only 1 tweet by Trump has ‘minorities’ as topic. When we disregard this topic, whether a url is present in the tweet comes in the third place as feature with the highest coefficient. Both the presence of a mention and the precense of a url have a negative coefficient, meaning that if a tweet includes a mention or a url, it gets fewer likes compared to tweets without a mention or url.

The Flesch reading-ease score has a very small negative effect on the number of likes of a tweet. In other words, tweets with higher scores (that are easier to read) receive slightly less likes than tweets with lower scores (that are more difficult to read).

For both Clinton and Trump, the topic of a tweet has little to no effect on the number of likes a tweet receives. Some topics have small effects, such as the nomination and slogans for Clinton, that 7

(9)

Figure 6: Coefficients of the Ridge Regression Models 8

(10)

both have a negative effect on the number of likes. Furthermore, both for Clinton and for Trump, the number of days before the elec-tion is the feature that contributes most to the model. However, the feature is (relatively) more important for Trump’s model compared to Clinton’s model.

Another difference between the models for Clinton and Trump is that while the Flesch reading-ease score is the second most im-portant variable that predicts likes for Clinton, this feature has little to no predictive power in the model for Trump. There is also a large difference in the predictive power of both models. Where the model for the number of likes on tweets by Clinton only reaches a test score of 0.36, this is 0.67 for the model for Trump.

4.3.2 Predicting Retweets. The model to predict how often tweets by Clinton get retweeted was trained with an alpha value of 6.5. This model reached a test score of 0.32. In this model, the three features with the highest coefficients are the number of days before the election, the Flesch reading-ease score and whether President Obama is mentioned. These are the same features as the most contributing features in the model to predict likes (see section 4.3.1). The model predicting the number of retweets on tweets by Trump was trained with an alpha value of 6.0 and reached a test score of 0.62. The three features with the highest coefficients are the number of days before the election, whether the topic of the tweet is ‘minorities’ and whether the tweet contains a mention. These are the same as the features that contribute most in the model to predict like (see section 4.3.1). When the feature ‘topic minorities’ is disregarded, the feature whether President Obama is mentioned in a tweet takes the third place. Tweets by Trump that mention Presi-dent Obama get retweeted more than tweets that do not mention Obama.

It is interesting to note that mentioning Trump has a positive effect for Clinton (meaning it results in more retweets) and a nega-tive effect for Trump. If Clinton is mentioned in a tweet, this has a positive effect for Trump and a negative effect for Clinton. In other words, it is true for both candidates that mentioning the other can-didate results in more retweets, while mentioning yourself results in less retweets.

On top of that, all similarities and differences between the models for Clinton and Trump for predicting likes that have been described before (see section 4.3.1) are also true for the models for predicting retweets. The topic of a tweet has little to no effect, the number of days before the election contributes most to the models, the Flesch reading-ease score is important for Clinton, but not for Trump, and there is a large difference in the predictive power of the models for Clinton and Trump, with the model for Trump performing better.

5 DISCUSSION

Many different ways exist to determine the topic of a text. Manual and automated classification methods such as using human coders, Latent Dirichlet Allocation and clustering have been described (see section 2.3). In this paper, hash tags were used to determine the topic of a tweet. A consequence of this is that only tweets with a hash tag were assigned to a topic. Only 15.2% (N = 911) of tweets by Clinton and 29.5% (N = 1,465) of tweets by Trump have a hash tag. This means that all other tweets are left out of the topic classification. There is no information known on the topic of these tweets. Not

having information on the topic of a large proportion is a limitation of the chosen method. For example, the average Flesch reading-ease score for a topic or the contribution of a topic to a ridge regression model might change considerably if more tweets are assigned to this topic.

The medium Twitter dictates that all messages on it follow a specific format. Tweets have a character limit of 140 and features like hashtags and mentions follow a specific syntax. Because of these restrictions, tweets tend to be similar to each other. On top of that, informal rules about how to use the platform guide how tweets are written. This has implications for the complexity of tweets. Because of both formal and informal rules of Twitter, tweets are more similar to each other. This might explain why the difference in complexity of tweets by Trump and Clinton was found to be very small, despite large differences in the complexity of communication by the candidates in interviews and debates [17]. Moreover, in a study where participants were asked to rate the difficulty of tweets, most tweets were rated as ‘very easy’ [16]. This indicated that there is little variation in the complexity of tweets overall. If all tweets are similar to each other with regards to complexity, results of methods that compare complexity or rely on the variation in complexity will be limited as well.

6 CONCLUSION

Tweets by Donald Trump and Hillary Clinton are mostly easy or very easy to read. Previous research [17, 35] shows that the language Trump uses is less complex than language used by other candidates, based on speeches, interviews and debates. This difference is also seen in communication in Twitter: tweets by Trump are easier to read compared to tweets by Clinton. However, this difference is very small.

The average readability per topic was compared to find out whether candidates use more complex language when tweeting about some topics and less complex language when tweeting about others. There is some variation in complexity across different topics. Both Trump and Clinton use less complex language when tweeting about the US government compared to other topics. Clinton uses more complex language in tweets about minorities, while Trump uses more complex language in tweets that are negative about Clin-ton. Topics do not only vary in complexity, but also in how much the complexity varies within a topic. Overall, tweets by Clinton vary less in complexity per topic compared to tweets by Trump.

Is there a relationship between the complexity of a tweet and the popularity of a tweet? Two ridge regression models were trained for each candidate (resulting in four models total) to answer this question: one model to predict the number of likes a tweet receives and one model to predict the number of times a tweet is retweeted. Overall, there is little difference between the models to predict likes and the models to predict retweets. The models are similar both in test scores and in the coefficients of their features. This is expected, because the number of likes and retweets are strongly correlated and because likes and retweets are both seen as proxy for the popularity of a tweet.

The best predictor of the popularity of a tweet is the number of days before the election: the closer to the election, the more a tweet is liked and retweeted. For both Trump and Clinton, this 9

(11)

feature contributes a lot more to the model than all other features. Moreover, this contribution is even larger for the models for Trump than for the models for Clinton.

For this study, the complexity of the language of a tweet is the most important feature. For Trump, the readability of a tweet does not influence the popularity of a tweet. However, for Clinton the easier a tweet is to read, the more likes and retweets it will receive. The topic is another aspect of a tweet that might explain why one tweet is more popular than another. However, this feature is found to have little influence on popularity. Whether a tweet contains a hash tag or not also does not influence popularity.

ACKNOWLEDGEMENTS

I would like to thank my supervisors Stevan Rudinac and Emma van Gerven for their support, insights and unfailing enthusiasm during this thesis.

(12)

A

SOFTWARE

For analyzing the data, Python is used in a Jupyter Notebook [34]. The following Python packages are also used: pandas [23], numpy [25], scikit-learn [28], nltk [1], unidecode [3], readability [6], pyplot [15] and seaborn [37].

The code for this project can be found in this public GitHub repos-itory: https://github.com/brleony/Thesis-Analyzing-Complexity

B

READABILITY AND POPULARITY PER

TOPIC

FRE Mean

Topic N M Var Kurt Likes RTs No Topic 3508 88 753 3.09 15779 5942 Slogans 860 101 634 0.62 12838 4796 US Government 116 103 638 0.57 18782 9739 Debates 107 94 804 1.40 14476 7447 Nomination 95 99 788 -0.47 9375 3320 Miscellaneous 81 89 840 1.02 14211 5713 Media 46 88 454 0.63 10633 3809 Neg. About Opp. 39 82 940 1.50 16082 7602 Other Policy Issues 32 83 642 0.24 16228 7460 Geographic 25 92 426 4.18 4196 2889

FRE Mean

Topic N M Var Kurt Likes RTs No Topic 5093 88 571 1.45 5235 2667 Debates 345 89 747 1.79 9037 4983 Nomination 107 89 389 0.43 1988 1017 Slogans 98 95 536 0.57 2013 2199 Political Party 85 98 528 -0.05 5586 3126 Miscellaneous 62 90 579 0.10 3766 2343 Women’s Rights 47 93 505 1.80 5603 3020 Other Policy Issues 41 86 492 0.25 2930 1743 Minorities 31 78 752 -0.54 4876 3050 US Government 25 95 647 0.55 3862 1809 Voting 24 82 462 -0.22 5581 5451 Table 2: Flesch reading-ease score statistics per topic for top-ics with >20 tweets for Trump (top) and Clinton (bottom)

C

READABILITY INTERPRETATION

Score Description Grade Level 0 - 30 Very difficult College graduate 30 - 50 Difficult College 50 - 60 Fairly difficult Grade 10 to 12 60 - 70 Standard Grade 8 and 9 70 - 80 Fairly easy Grade 7 80 - 90 Easy Grade 6 90 - 100 Very easy Grade 5

Table 3: How to interpret the Flesch reading-ease score [6]

D

RIDGE REGRESSION COEFFICIENTS

Likes Retweets Feature Trump Clinton Trump Clinton Flesch Reading-Ease -0.073 1.456 -0.092 0.941 Presence Url -0.335 0.042 -0.176 0.187 Presence Mention -0.339 -0.102 -0.376 -0.169 Presence Hashtag 0.060 0.032 0.081 0.052 Days Before Election 6.966 4.817 5.725 3.320 Mention Trump -0.215 0.101 -0.210 0.340 Mention Clinton 0.111 -0.202 0.294 -0.272 Mention Dem Candidate 0.221 0.144 0.206 0.221 Mention Rep Candidate 0.002 0.006 0.054 0.004 Mention Obama 0.250 0.827 0.362 0.779 Topic Debates -0.119 0.265 -0.058 0.312 Topic Event 0.179 0.077 0.133 -0.009 Topic Foreign Policy 0.075 0.245 0.144 0.403 Topic Geographic 0.192 - 0.115 -Topic Media -0.007 -0.031 -0.034 -0.005 Topic Minorities 0.545 0.258 0.570 0.269 Topic Miscellaneous -0.016 -0.025 -0.004 -0.060 Topic Neg. About Opp. -0.181 - -0.177 -Topic No -Topic -0.025 -0.099 -0.030 -0.122 Topic Nomination 0.036 -0.275 -0.098 -0.281 Topic Other Policy Issues -0.109 -0.015 -0.036 0.017 Topic Political Party -0.004 0.071 -0.047 0.044 Topic Slogans 0.060 -0.373 -0.028 -0.374 Topic US Government -0.274 0.131 -0.146 0.140 Topic Voting -0.121 -0.066 -0.114 -0.074 Topic Women’s Rights - 0.085 - 0.044 Table 4: Coefficient values for all features in the ridge regres-sion models for predicting likes and retweets for Clinton and Trump

(13)

REFERENCES

[1] Steven Bird, Ewan Klein, and Edward Loper. 2009. Natural language processing with Python: analyzing text with the natural language toolkit. " O’Reilly Media, Inc.".

[2] Luca Buccoliero, Elena Bellio, Giulia Crestini, and Alessandra Arkoudas. 2020. Twitter and politics: Evidence from the US presidential elections 2016. Journal of Marketing Communications 26, 1 (2020), 88–114.

[3] Sean M. Burke and Tomaz Solc. 2008. Unidecode. https://github.com/avian2/ unidecode

[4] Jeanne Sternlicht Chall and Edgar Dale. 1995. Readability revisited: The new Dale-Chall readability formula. Brookline Books.

[5] Kevyn Collins-Thompson. 2014. Computational assessment of text readability: A survey of current and future research. ITL-International Journal of Applied Linguistics 165, 2 (2014), 97–135.

[6] Andreas van Cranenburgh. 2013. Readability. https://github.com/andreasvc/ readability

[7] Thomas A Daniel and Alecka L Camp. 2018. Emojis affect processing fluency on social media. Psychology of Popular Media Culture (2018).

[8] William H DuBay. 2004. The Principles of Readability. Online Submission (2004). [9] Carsten Eickhoff, Pavel Serdyukov, and Arjen P. de Vries. 2011. A Combined

Topical/Non-Topical Approach to Identifying Web Sites for Children. In Proceed-ings of the Fourth ACM International Conference on Web Search and Data Mining (Hong Kong, China) (WSDM ’11). Association for Computing Machinery, New York, NY, USA, 505–514. https://doi.org/10.1145/1935826.1935900

[10] Gunn Enli. 2017. Twitter as arena for the authentic outsider: exploring the social media campaigns of Trump and Clinton in the 2016 US presidential election. European journal of communication 32, 1 (2017), 50–61.

[11] Lucie Flekova, Daniel Preoţiuc-Pietro, and Lyle Ungar. 2016. Exploring stylistic variation with age and income on twitter. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers). 313–319.

[12] Rudolph Flesch. 1948. A new readability yardstick. Journal of applied psychology 32, 3 (1948), 221.

[13] Jennifer Fromm, Stefanie Melzer, Björn Ross, and Stefan Stieglitz. 2017. Trump versus clinton: Twitter communication during the US primaries. In European Network Intelligence Conference. Springer, 201–217.

[14] Norbert Fuhr, Anastasia Giachanou, Gregory Grefenstette, Iryna Gurevych, An-dreas Hanselowski, Kalervo Jarvelin, Rosie Jones, YiquN Liu, Josiane Mothe, Wolfgang Nejdl, Isabella Peters, and Benno Stein. 2018. An Information Nu-tritional Label for Online Documents. SIGIR Forum 51, 3 (Feb. 2018), 46–66. https://doi.org/10.1145/3190580.3190588

[15] John D Hunter. 2007. Matplotlib: A 2D graphics environment. Computing in science & engineering 9, 3 (2007), 90–95.

[16] Patrick Jacob and Alexandra L Uitdenbogerd. 2019. Readability of Twitter Tweets for Second Language Learners. In Proceedings of the The 17th Annual Workshop of the Australasian Language Technology Association. 19–27.

[17] Orly Kayam. 2018. The readability and simplicity of Donald Trump’s language. Political Studies Review 16, 1 (2018), 73–88.

[18] Jayeon Lee and Young-shin Lim. 2016. Gendered campaign tweets: the cases of Hillary Clinton and Donald Trump. Public Relations Review 42, 5 (2016), 849–855. [19] Jayeon Lee and Weiai Xu. 2018. The more attacks, the more retweets: Trump’s and Clinton’s agenda setting on Twitter. Public Relations Review 44, 2 (2018), 201–213.

[20] Jasmine C. Lee and Kevin Quealy. 2019. The 598 People, Places and Things Donald Trump Has Insulted on Twitter: A Complete List. https://www.nytimes. com/interactive/2016/01/28/upshot/donald-trump-twitter-insults.html [21] Huaye Li, Yasuaki Sakamoto, Rongjuan Chen, and Yuko Tanaka. 2014. The

psychology behind people’s decision to forward disaster-related tweets. Howe School Research Paper 2014-36 (2014).

[22] Jilei Lin, Yipei Huang, Ying Gao, and Rongjuan Chen. 2017. Predicting Information Popularity: A Study of Sina Weibo. In Proceedings of the 2017 2nd International Conference on Communication and Information Systems. 335–339.

[23] Wes McKinney et al. 2010. Data structures for statistical computing in python. In Proceedings of the 9th Python in Science Conference, Vol. 445. Austin, TX, 51–56. [24] Rishabh Mehrotra, Scott Sanner, Wray Buntine, and Lexing Xie. 2013.

Improv-ing lda topic models for microblogs via tweet poolImprov-ing and automatic labelImprov-ing. In Proceedings of the 36th international ACM SIGIR conference on Research and development in information retrieval. 889–892.

[25] Travis E Oliphant. 2006. A guide to NumPy. Vol. 1. Trelgol Publishing USA. [26] Daniel M Oppenheimer. 2006. Consequences of erudite vernacular utilized

irrespective of necessity: Problems with using long words needlessly. Applied Cognitive Psychology: The Official Journal of the Society for Applied Research in Memory and Cognition 20, 2 (2006), 139–156.

[27] Ethan Pancer and Maxwell Poole. 2016. The popularity and virality of political social media: hashtags, mentions, and links predict likes and retweets of 2016 US presidential nominees’ tweets. Social Influence 11, 4 (2016), 259–270.

[28] F. Pedregosa, G. Varoquaux, A. Gramfort, V. Michel, B. Thirion, O. Grisel, M. Blondel, P. Prettenhofer, R. Weiss, V. Dubourg, J. Vanderplas, A. Passos, D. Cour-napeau, M. Brucher, M. Perrot, and E. Duchesnay. 2011. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research 12 (2011), 2825–2830. [29] Nicolai Pogrebnyakov and Edgar Maldonado. 2018. Didn’t roger that: Social media message complexity and situational awareness of emergency responders. International Journal of Information Management 40 (2018), 166–174. [30] Marten Risius and Theresia Pape. 2016. Developing and Evaluating a Readability

Measure for Microblogging Communication. In E-Life: Web-Enabled Convergence of Commerce, Work, and Social Life, Vijayan Sugumaran, Victoria Yoon, and Michael J. Shaw (Eds.). Springer International Publishing, Cham, 217–221. [31] Aliza Rosen. 2017. Tweeting Made Easier. https://blog.twitter.com/en_us/topics/

product/2017/tweetingmadeeasier.html

[32] Norbert Schwarz. 2011. Feelings-as-information theory. In Handbook of theories of social psychology, A. Kruglanski P. Van Lange and E. T. Higgins (Eds.). Sage, 289–308.

[33] Irina Temnikova, Sarah Vieweg, and Carlos Castillo. 2015. The Case for Read-ability of Crisis Communications in Social Media. In Proceedings of the 24th International Conference on World Wide Web (Florence, Italy) (WWW ’15 Com-panion). Association for Computing Machinery, New York, NY, USA, 1245–1250. https://doi.org/10.1145/2740908.2741718

[34] Guido Van Rossum and Fred L. Drake. 2009. Python 3 Reference Manual. CreateS-pace, Scotts Valley, CA.

[35] Yaqin Wang and Haitao Liu. 2018. Is Trump always rambling like a fourth-grade student? An analysis of stylistic features of Donald Trump’s political discourse during the 2016 election. Discourse & Society 29, 3 (2018), 299–323.

[36] Yu Wang, Jiebo Luo, Richard Niemi, Yuncheng Li, and Tianran Hu. 2016. Catching fire via" likes": Inferring topic preferences of trump followers on twitter. In Tenth International AAAI Conference on Web and Social Media.

[37] Michael Waskom, Olga Botvinnik, Joel Ostblom, Saulius Lukauskas, Paul Hobson, Maoz Gelbart, David C Gemperline, Tom Augspurger, Yaroslav Halchenko, John B. Cole, Jordi Warmenhoven, Julian de Ruiter, Cameron Pye, Stephan Hoyer, Jake Vanderplas, Santi Villalba, Gero Kunter, Eric Quintero, Pete Bachant, Marcel Martin, Kyle Meyer, Corban Swain, Alistair Miles, Thomas Brunner, Drew O’Kane, Tal Yarkoni, Mike Lee Williams, and Constantine Evans. 2020. mwaskom/seaborn: v0.10.0 (January 2020). https://doi.org/10.5281/zenodo.3629446

[38] Piotr Winkielman, Norbert Schwarz, Tetra Fazendeiro, and Rolf Reber. 2003. The psychology of evaluation: Affective processes in cognition and emotion. In The psychology of evaluation: Affective processes in cognition and emotion, J. Musch and K. C. Klauer (Eds.). Lawrence Erlbaum.

[39] Michele Zappavigna. 2015. Searchable talk: The linguistic functions of hashtags. Social Semiotics 25, 3 (2015), 274–291.