• No results found

Dutch opinions about climate: a comparative analysis of tweets from 2014 and 2019

N/A
N/A
Protected

Academic year: 2021

Share "Dutch opinions about climate: a comparative analysis of tweets from 2014 and 2019"

Copied!
14
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Dutch opinions about climate: a comparative analysis of tweets from

2014 and 2019

Anne-Claire de Vries

11305304

July 3rd, 2020

Abstract

People and ecosystems all over the world will suffer severe and irreversible impacts from continued climate change. With research showing that public attitude can influence climate change policy support and behavior, it is important to be able to effectively bring public opinion on climate into perspective. In this study, volume, topic and sentiment analysis were performed on climate-related tweets from 2014 and 2019 in order to research how the nature of the Dutch climate discussion on

Twitter has changed in five years. Results from the different types of analyses show that notable differences exist between the Dutch climate discussions on Twitter in 2014 and 2019. Not only has the

proportion of anti-climate tweets increased, strong language used in 2019 indicates that the discussion between pro and anti-climate individuals was heated and often political in nature, whereas climate was

discussed more neutrally in 2014.

Supervisor:

T. Schilstra

Second examiner:

L. Stolwijk

Bachelor thesis Information Science

Faculty of Science

(2)

1 Introduction

Climate, climate change, and global warming are all heavily discussed topics, both politically and socially. According to the Intergovernmental Panel on Climate Change’s Fifth Assessment Report (2014), both people and ecosystems all over the world will suffer severe and irreversible impacts from continued climate change. Greta Thunberg, the Paris Agreement and frequent climate strikes are all examples that indicate that there is global awareness and concern regarding climate issues. There also seems to be more talk about recycling, vegetarianism, sustainable clothes and energy than ever before. However, there is also much dissension surrounding the topic of climate. In spite of the current scientific consensus, there are many climate change sceptics and deniers worldwide.

In the Netherlands, too, a discussion persists between pro and anti-climate individuals. On the one hand, steps are being taken to mitigate impacts on climate, such as lowering the national maximum speed and investing in sustainable alternatives to agriculture, natural gas, transport and construction (Rijksoverheid, 2020). On the other hand, some political parties that are notoriously anti-climate, such as Forum voor Democratie, are growing in popularity (NOS, 2019). Although it is evident that a two-sided discussion is taking place about climate and climate change, the proportions of pro and anti-climate opinions in the Netherlands is unclear. With humans being 90-100% likely to be responsible for global warming (IPCC, 2014), it is vital to gain better insight into these proportions. It is also relevant to study what topics are being talked about on pro and anti-climate sides of the discussion.

Traditionally, surveys are used in order to gain insight into public opinion. However, with the growing popularity of social media, many people are expressing their opinions and beliefs online and openly accessible. With the right tools, massive numbers of opinions can be analyzed quickly and effectively. This method, known as text mining, (Hearst, 2003) is applicable to many different contexts, but especially useful for studying public opinion. In this paper, three different text mining approaches are used to look at the Dutch climate discussion on Twitter.

Twitter is a popular social media platform used by millions of users worldwide. People use Twitter for a number of reasons: to talk about day-to-day life, to express religious and political views or to stay up to date on the latest news. Because of the sheer volume of data, its subjective nature and easy accessibility, Twitter is a valuable source for opinion mining and can thus be useful in bringing the Dutch climate discussion into perspective.

The goal of this study is to perform volume, topic and sentiment analysis on climate-related tweets from 2014 and 2019 in order to research if and how the Dutch climate discussion on Twitter has changed in five years.

2 Background

Due to its pressing nature, climate has been studied extensively from many different perspectives. Most research has been done by the Intergovernmental Panel on Climate Change (IPCC), a specific body of the United nations assigned to provide policymakers with regular scientific assessments on climate change (IPCC, 2020). The IPCC has also published special reports on specific climate issues, such as the impact of a global warming of 1.5 ºC (IPCC, 2018) and the implications of climate change on the ocean and cryosphere (IPCC, 2018).

There have also been many human-centered studies on climate. For example, research has found that public attitudes, beliefs and risk perceptions influence climate change policy support and behavior (Leiserowitz, 2005; Lorenzoni & Pidgeon, 2006; Weber & Stern, 2011). This signifies the importance of studies that bring public opinion on climate into perspective.

There are different ways of studying public opinion, one of them being text mining. Text mining has proven to be an effective research method, especially as a substitute to expensive and time-consuming polling and surveys (O’connor et al., 2010). Because of its large availability of subjective

(3)

text, Twitter has been subject to many text mining studies. For example, Twitter has been used to detect influenza outbreaks (Aramaki et al., 2011), assess public health (Paul & Dredze, 2011) and predict the stock market (Bollen et al., 2011). Some specific text mining applications, such as

sentiment and topic analysis are especially popular in Twitter studies. Sentiment analysis refers to any technique that deals with the computational treatment of opinion, sentiment, and subjectivity in text (Medhat et al., 2014). Topic analysis refers to any technique used to automatically extract meaning from texts by identifying recurrent themes or topics (Blei, 2012).

Cody et al. (2015) applied sentiment analysis to a large dataset containing English climate-related tweets to study how emotions about climate had changed over time. Similarly, Dahal et al. (2019) used sentiment analysis and topic modeling on Twitter data to compare pro and anti-climate opinions between countries and over time. Many studies researching public sentiment, however, use their own unique method and terminology when performing sentiment analysis. In some studies, for example, sentiment might refer to the emotion behind a tweet, while other studies denote sentiment as meaning attitude or opinion. This can lead to confusing results within the literature, especially with regard to the climate discussion, where negative emotion might actually signify a pro-climate opinion, or vice versa. Moreover, studies often make use of prefabricated sentiment analysis tools, such as the Hedonometer (Cody et al., 2015). These tools were not made for climate text in particular and are therefore not the most reliable method to use for studying climate-related sentiment. One contribution of this study was therefore to create a reliable and reproducible method for performing opinion-oriented sentiment analysis on Dutch climate text. To this end, a sentiment training corpus containing more than 10,000 labeled Dutch pro and anti-climate tweets was created.

The terms ‘pro’ and ‘anti’ climate in this study are used for convenience and legibility and refer the different sides in the climate discussion. In this study, being ‘pro’ climate means to acknowledge climate change and/or support the need for climate-related measures. Being ‘anti’ climate means to deny or be sceptic about climate change, human involvement in climate change and/or oppose the need for climate-related measures.

3 Methodology

In this study, the goal was to perform volume, topic and sentiment analysis on climate-related tweets from 2014 and 2019 in order to compare results and gain insight into if and how the Dutch climate discussion has changed over the past five years.

The different types of analysis in this study were performed using the programming language Python. The process for this study consisted of several different steps that needed to be executed sequentially. In order to perform the different types of analyses, two datasets containing all climate-related tweets from 2014 and 2019 needed to be obtained. A third dataset, known as a sentiment training corpus, needed to be created as well in order to train a classifier for sentiment analysis. The collected datasets were then cleaned in order to optimize and validify results. After this, the datasets were prepared according to each type of analysis that would be performed. Finally, results for both years were analyzed and evaluated. Figure 1 shows an overview of the research workflow in this study.

(4)

Figure 1 | Research workflow for this study

The main purpose of volume analysis was to look at the datasets from a volume perspective, such as counting the total number of tweets per year and occurrences of tweets per day/month, as this alone can produce valuable information. Topic analysis was performed to provide insight into the nature of both datasets and was useful for detecting differences in climate-related topics. The main goal of performing sentiment analysis was to detect whether there had been a change in the proportion of anti-climate tweets between 2014 and 2019.

3.1 Data collection and cleaning

In order to obtain all climate-related tweets from 2014 and 2019, a Python package named

‘GetOldTweets3’ was used. This package works like a web scraper and imitates Twitter’s advanced search option, saving tweets and their corresponding timestamps based on certain search criteria. The search criteria used for collecting the tweets in this study were that a tweet must contain the word ‘klimaat’ and fall within January 1st, and December 31st of 2014 or 2019. This method returned 22,990 and 312,621 climate-related tweets for 2014 and 2019 respectively. The collected climate tweets for both years were saved to separate datasets. Figure 2 shows a sample of tweets in the dataset collected for 2019.

date tweet

2019-01-09 23:44:22+00:00

Natuurlijk verandert klimaat. Het is namelijk al miljarden jaren verandert. Klimaat verandert ALTIJD. Jij wilt veel geld betalen om het klimaat te laten stoppen te veranderen. Maar dat zal niemand lukken. :) Maar het is jouw geloof dus geef je geld uit aan wat je wilt

2019-01-09 23:30:37+00:00

Spijbelen voor 't klimaat? Kan het nog zotter worden Daar moeten zéér zware straffen op staan !!!! #KlimaatHype gaat te ver, veel te ver! Rapporten over arctic ijs, temperatuurverloop, zeewaterspiegel,... #FakeNews Kan je ook je werk verzuimen voor 't klimaat???

2019-01-09 19:51:03+00:00

We kunnen ook concluderen dat mensen wel invloed op t klimaat hebben maar de huidige klimaatplannen van NL waanzin zijn.

Figure 2 | Sample of climate tweets collected from 2019.

Although all climate-related tweets from 2014 and 2019 had been collected, to perform any type of analysis at this point would be ineffective because tweets in the datasets were directly copied from Twitte and unsuitable for analysis in their raw, unedited form. If left in the data, capitalization, links,

(5)

punctuation, Dutch stop words and other Twitter jargon would influence the validity of the analyses results. In order to determine the cleaning steps necessary for obtaining valid results from each type of analysis, a manual evaluation of the data was performed. Based on this evaluation, a cleaning

procedure was formulated that would ensure reliable results from each type of analysis. For all three types of analyses, tweets were lowercased, and non-climate related tweets were removed. The removal of non-climate related tweets was based on personal belief about what was relevant for the scope of this study. For example, tweets containing the word combinations ‘economisch klimaat’, ‘sociaal klimaat’, ‘pedagogisch klimaat’, ‘ondernemend klimaat’, ‘vakantie’, ‘sport’ and ‘vacature’ were removed from the datasets. Before performing additional cleaning functions, an initial part of topic analysis was performed that was aimed at finding most important hashtags, user-mentions and

retweeted users. After this had been completed, the data was further cleaned by applying the following rules:

1. remove all punctuation

2. remove all non-ascii words (e.g. emoticons) 3. remove all links

4. remove all user mentions 5. remove all retweeted users

6. remove all Twitter-specific words (e.g. ‘via’, ‘rt’)

An example of a raw tweet from 2019 and its cleaned version is illustrated in figure 3.

date tweet clean_text

2019-01-09 19:51:03+00:00

En maar volhouden dat het klimaat en natuur niet zo snel kunnen veranderen. 'Honderden plantensoorten uitgestorven door toedoen van de mens'

https://www.nu.nl/klimaat/5930913/honderden-plantensoorten-uitgestorven-door-toedoen-van-de-mens.html via @nunl

volhouden klimaat natuur snel veranderen 'honderden plantensoorten uitgestorven toedoen mens'

Figure 3 | Example of a tweet from 2019 before and after cleaning.

After having performed all necessary cleaning functions, both datasets were filtered as to only contain unique tweets.

Finally, the cleaned datasets were prepared slightly different for each type of analysis in order to obtain the best possible results. Before performing topic analysis, for example, some additional stop words, such as ‘gewoon’, ‘niet’ and ‘geen’, were removed from the datasets because they would otherwise negatively influence the results. These stop words would appear too dominantly among visualizations, such as most common words or word clouds, while providing no added value. However, only the basic stop words were removed from the dataset before performing sentiment analysis, as the training algorithm might find that some stop words, such as ‘niet’ or ‘geen’ are more associated with either anti-climate tweets, or vice versa, which could play a role in accurately classifying tweets.

No additional preparation was performed for volume analysis, as no added value would be gained from removing certain words when drawing conclusions about the total number of tweets.

3.2 Corpus collection and cleaning

To test whether the proportion of pro and anti-climate opinions has changed between the two years, a machine learning algorithm needed to be trained that could classify tweets in the datasets as being either pro or anti-climate. To train a classifier, a corpus consisting of example climate tweets and given sentiment scores is required. Because no corpus existed which could be used for the classification of Dutch climate tweets, one was created specifically for the purpose of this study.

(6)

As a basis for the corpus, tweets from Dutch political parties that are famously pro and anti-climate were collected using the Twitter Application Programming Interface (API). Tweets from the parties Groenlinks, Partij voor de Dieren and D66 were chosen as examples of pro-climate sentiment, whereas tweets from Forum voor Democratie and Partij voor de Vrijheid served as examples of the opposite. A quick evaluation of the data was performed to ensure the collected tweets accurately represented pro and anti-climate opinions. After this, negative sentiment scores (-1) were assigned to anti-climate tweets and positive scores (1) were given to pro-climate tweets.

Because the corpus was neither large enough, nor evenly distributed (346 negative and 825 positive tweets), to train a reliable classifier at this point, additional pro and anti-climate tweets were added. This was done by studying a large number of climate tweets from 2014 and 2019 and denoting commonly used language for pro and anti-climate sides of the discussion. For example, tweets containing the words ‘klimaathysterie’, ‘klimaatgekte’ and/or ‘hoax’ were almost exclusively used in anti-climate language, while terms such as ‘duurzaam’, ‘actie’, ‘ontbossing’ and

‘#veranderingbeginthier’ were more common in pro-climate tweets. An additional 4581 positive and 4960 negative tweets containing these keywords were retrieved from both the 2014 and 2019 datasets and added to the corpus. The final corpus contained 5406 positive and 5306 negatively classified tweets. A sample of the final corpus is included in figure 4.

Figure 4 | Sample of the sentiment corpus containing negatively (-1) and positively (1) classified climate tweets.

3.3 Volume Analysis

An important first step in comparing the climate discussions in 2014 and 2019 was to look at the volume of tweets in each dataset. To do this, the number of unique tweets that remained after the cleaning process were counted for each year. After this, timestamps corresponding to each tweet (see figure 3, ‘date’) were converted to a ‘yyyy-mm-dd’ format. This made it possible to count and visualize the number of tweets per day. For example, if the date ‘2014-01-01’ appeared in the dataset 95 times, it meant 95 climate-related tweets were posted on that day. A similar method was used to count and visualize the number of tweets per month.

3.4 Topic Analysis

The second step to bringing the climate discussions into perspective was to look at the nature of the content for both years.

Several different functions were used in order to uncover and visualize important information in both datasets. Together, the different functions were able to find the most important hashtags, retweeted users, user mentions, words, and topics for each year.

In order to find the most important hashtags, retweeted users, and mentioned users, a regular expression funcion was used to extract all text directly following ‘#, ‘@’ and ‘RT @’. The number of unique hashtags, user mentions, and retweeted users could then be counted and visualized.

After this, word clouds were generated for each dataset. Word clouds are a straightforward and visually appealing method used for visualizing text (Heimerl et al., 2014). They can be used in various contexts and provide a good overview of the most important words in a body of text, by filtering down to those words that appear with the highest frequency. The 20 most frequent words were also

(7)

Although these steps alone could reveal some important differences between the two years, a final function was created to mathematically calculate differences between words used in 2014 and 2019. With this function, it was possible to find words that were both unique and important to 2014 and 2019.

3.5 Sentiment Analysis

As a final step, sentiment analysis was performed on the datasets in order to detect how many climate deniers took part in the climate discussions in 2014 and 2019. There are many different machine learning approaches to sentiment analysis. For this study, a Multinomial Naïve Bayes classifier was trained on a sentiment corpus and applied to the collected datasets for 2014 and 2019.

Naïve Bayes classifiers are known to be among the most effective inductive learning algorithms for machine learning (Zhang, 2005). Naïve Bayes classifiers are a set of supervised machine learning algorithms based on Bayes’ Theorem. In short, this theorem provides a way of calculating the probability of a piece of data belonging to a certain class, given prior knowledge about features belonging to that class. Multinomial Naïve Bayes implements Bayes’ Theorem for multinomially distributed data and is therefore often used in text classification (Scikit-Learn, 2020).

Before a Naïve Bayes algorithm could be trained, the collected sentiment corpus needed to be cleaned. The same cleaning function was used that was also applied to the datasets for 2014 and 2019. This helped ensure that the classifier would not be influenced by variations in text when applied to tweets from 2014 and 2019. Although the classifier could be trained using clean text and sentiment scores alone, an additional feature containing the number of words per tweet was added, because exploratory data analysis showed that, on average, negative tweets (M=27.6, SD=12.1) contained significantly less words than positive tweets (M=33, SD=11.5), t(10,710) = -23.93, p < .05. Figure 5 shows an example of the features in the sentiment corpus.

Figure 5 | Sample of the final sentiment corpus used for training the Naïve Bayes classifier

After this, the optimal parameters for Scikit-Learn’s MultinomialMB algorithm were calculated using a GridSearch function, which uses cross validation to test all possible parameters, and reports back those parameters that produce the best overall performance of the classifier based on accuracy, F1-score, recall and precision.

Finally, a Multinomial Naïve Bayes classification model based on the optimal parameters was trained and tested using the sentiment corpus. The classifier produced an overall accuracy of 86% on the test data.

In order to classify the climate tweets from the 2014 and 2019 datasets, both datasets were transformed to contain features similar to that of the sentiment corpus in figure 5. Tweets with fewer than five words were removed, as they would be too difficult to accurately classify. The Multinomial Naïve Bayes classifier was then applied to both datasets. In order to increase the validity of the classification results, a decision was made to only include tweets that were classified with a

confidence greater than 80%. This step helped ensure that tweets with low confidence scores, such as tweets unrelated to climate, illegible tweets, tweets in different languages or neutral tweets without opinion could be filtered out, improving the validity of the results. The remaining sentiment scores were then added to the datasets. Of the 18,851 tweets in 2014, the classifier was able to confidently

(8)

classify 10,455 of them. In 2019, 182,738 of 275,295 tweets were classified. Because the dataset for 2019 was much larger than 2014, samples containing 10,000 tweetswere taken from 2014 and 2019 in order to make the final results of the classifier easier to interpret. A final, manual evaluation of 300 tweets in each dataset was performed to ensure the classifier was working as expected. The sentiment scores for 2014 and 2019 were then visualized and compared.

4 Results

4.1 Volume Analysis

Climate was discussed over fourteen times more often in 2019 than in 2014. The exact number of unique climate tweets in 2019 was 275,295, where 2014 counted 18,851 tweets. Figure 6 shows a side-by-side comparison of the number of climate tweets per day in 2014 and 2019. These figures indicate that for both years, the number of climate tweets per day were heavily influenced by climate-related news or events. No significant difference in number of peaks was found between the two years. In 2014, December 2nd counted the most climate-related tweets with a total of 337. In 2019, February 2nd counted the most climate-related tweets with a total of 5,469.

Figure 6 | Side-by-side comparison of the number of climate tweets per day 2014 and 2019.

2014 2019

Figure 7 | Side-by-side comparison of the number of climate tweets per month in 2014 and 2019

4.2 Topic Analysis

The different types of topic analyses performed showed that the two years varied on a substantive level. Figure 8 shows the ten most popular hashtags in climate-related tweets from 2014 and 2019. As opposed to hashtags in 2014, climate hashtags in 2019 seem to be rather subjective, with hashtags such as #klimaathysterie and #stemzeweg signifying strong, anti-climate beliefs, and #youthforclimate and #klimaatmars representing pro-climate beliefs. Figure 8 also shows the ten most popular user mentions. For both years, Dutch news outlets and politicians are amongst the most often mentioned users in climate tweets. In 2019, less news outlets and more politicians and political parties were mentioned than in 2014. ‘@groenlinks’, a notorious pro-climate party was the most mentioned user in 2019. In 2014, ‘@nunl’, a popular Dutch news outlet, received the most mentions. Apart from ‘@nos’,

(9)

the ten most important user mentions in the climate discussion have changed completely in five years. No relevant data was found when looking at the most retweeted users in 2014 and 2019.

2014 2019 2014 2019

Figure 8 | Most popular hashtags and user mentions in climate-related tweets from 2014 and 2019.

2014 2019

Figure 9 | Visualizations of the most important words used in the climate discussion in 2014 and 2019.

The word cloud visualizations in figure 9 provide an overview of the most important words and topics used in the climate discussions in 2014 and 2019. The size of a word indicates its importance within that year. Figure 10 contains ordered graphs depicting the 20 most common words used in 2014 and 2019. These graphs show more clearly than the word clouds that some words, such as ‘co2’,

‘klimaatverandering’, ‘energie’ and ‘milieu’, have persisted since 2014. It can be presumed that these words are fundamental to the Dutch climate discussion. Interestingly, the word ‘klimaathysterie’, which is heavily associated with anti-climate opinions, is among the most used words in 2019. With the word ‘politiek’ also being among the 20 most common words, it is once again implied that politics have taken on a more important role in the climate discussion in 2019.

2014 2019

(10)

Finally, figure 11 shows the calculated differences between words in 2014 and 2019. Positive log2 scores signify climate-related words and topics that were very important in 2019, but not in 2014, and negative log2 signify the opposite. Only words with a log2 fold-change larger than 2 are shown. One explanation for some of the high-scoring words in 2019 is that some of the concepts or people these words refer to did not yet exist in 2014. Examples are the political party Forum voor Democratie (‘fvd’ in figure 11), which was founded in 2016, and Greta Thunberg (‘greta’ and ‘thunberg’ in figure 11), who started her climate strike in 2018. The presence of some strong, anti-climate words such as ‘klimaathysterie’, ‘leugens’, ‘waanzin’, ‘onzin’, and ‘gekte’ in figure 11, however, cannot be explained this way.

Based on the results from this topic analysis, it seems that the climate discussion in 2019 was, in fact, more a discussion between two opposing views, unlike in 2014, where climate seemed to be a more neutrally discussed topic. Moreover, strong, anti-climate language was detected in the 2019 dataset, where no such language was found in 2014. Politics, too, seem to have grown in importance, with many mentions to notoriously pro and anti-climate politicians and parties.

Figure 11 | Log2 fold-change between the mean word count in tweets from 2019 and tweets from 2014 about

climate change. Only words with an absolute log2 fold-change larger than 2 are shown.

5.3 Sentiment Analysis

Sentiment analysis of the 2014 and 2019 datasets show that the proportion of anti-climate opinions has increased since 2014. In the 10,000-tweet samples taken from 2014 and 2019, the Naïve Bayes

algorithm classified 29.5% of climate tweets in 2014 as being negative, while in 2019, negative tweets made up 48.3% of the climate discussion. Figure 12 shows the proportion of negatively and positively classified tweets in the samples taken from 2014 and 2019.

Figure 13 shows a visualization of the number of positive and negative tweets per day for 2014 and 2019. Again, peaks for both pro and anti-climate tweets seem to coincide with climate-related news or events. This most likely indicates that climate-related news or events trigger a discussion between the two sides. Nearly every peak in 2019 appears to be dominated by anti-climate sentiment. This suggests that anti-climate tweets overshadow pro-climate tweets in response to climate-related news or events.

In 2019, the most notable peak in anti-climate tweets was on February 7th. On this day, thousands of students took part in a climate strike movement on the Malieveld in the Hague (NOS, 2019). In 2014, the biggest peak in anti-climate tweets was on September 23rd, when a United Nations conference on climate was held in New York (NOS, 2014).

(11)

Figure 12 | Proportions of classified pro and anti-climate tweets in 2014 and 2019.

2014 2019

Figure 13 | Number of pro and anti-climate tweets per day in 2014 and 2019

6 Discussion

6.1 Conclusion

The goal of this study was to analyze climate-related tweets from 2014 and 2019 using different text mining tools in order to research if and how the nature of the Dutch climate discussion on Twitter had changed between the two years. To this end, two datasets containing all Dutch climate-related tweets from 2014 and 2019 were analyzed using volume, topic and sentiment analysis. A sentiment corpus was created to train a classifier and to provide other studies with a reliable tool for performing sentiment analysis on any Dutch, climate-related text.

Volume analysis was used to study differences in volume between the years, and showed that in 2019, climate was discussed over 14 times more often than in 2014. The number of tweets per day and month seemed to be strongly influenced by climate-related news and events.

Topic analysis helped bring the nature of the discussions into perspective and showed that in 2014, the most important words and hashtags used to talk about climate were entirely objective, whereas many of the top words and hashtags used in 2019 were subjective and indicative of strong, pro or climate opinions. This indicates that in 2019, a strong discussion took place between pro and anti-climate individuals. The role of politics seems to have grown as well, with many of the top mentioned users in 2019 being politicians and political parties, as opposed to mainly news outlets in 2014.

(12)

Finally, sentiment analysis indicated that since 2014, the proportion of anti-climate tweets has grown from 29.5% to 48.3%. Peaks in negative sentiment coincided with peaks in volume, indicating that whenever there was a climate-related discussion on Twitter, anti-climate opinions were dominant.

The results from the different types of analyses show that there are notable differences between the Dutch climate discussions in 2014 and 2019. Not only has the proportion of anti-climate tweets increased, strong, subjective language used in 2019 indicates that the discussion between pro and anti-climate was heated and often political in nature, as opposed to 2014, where there was no clear

indication of dissension between pro and anti-climate individuals.

6.1 Limitations and Future Work

Unfortunately, it was beyond the scope of this study to look at the years between 2014 and 2019. This poses a major limitation, as it may be possible that there was less cause for discussion in 2014, as climate issues were probably less prevalent. To study more years could have improved the robustness of results found in this study. Future research could study climate discussions in the years between 2014 and 2019 in order to gain insight into when and why some of the differences found in this study occurred.

Another major limitation of this study is the lack of user data on the collected tweets. Ideally, the number of unique users would have been counted for each dataset in order to gauge how many individuals took part in the climate discussions for both years and whether this corresponded to the number of collected tweets. Because only raw tweets and their timestamps were collected, however, it is possible that overly active Twitter users are over-represented in the data. Furthermore, although Twitter may be publicly accessible to anyone, it is plausible that older, less technologically advanced age groups are not being represented as well as others. For this reason, caution should be taken when making inferences about the Dutch population based on the results of this study.

Moreover, despite efforts to filter out irrelevant tweets from the datasets, some tweets unrelated to the climate discussion remained and were classified, even with the confidence threshold of 80%. This could have been largely prevented if initially, only tweets containing ‘#klimaat’ instead of ‘klimaat’ had been collected. However, this may have raised other limitations. Not everyone, especially anti-climate individuals might feel inclined to specifically reference ‘#klimaat’, especially during discussions.

Another limitation concerning accuracy is the Naïve Bayes classifier’s inability to handle more complex tasks, such as recognizing sarcasm and hidden meaning. When using sarcasm, often words associated with an opposing view are used in a derogatory manner. This is confusing to Naïve Bayes classifiers, and often leads to tweets, such as the one in figure 14, being confidently misclassified.

tweet sentiment

Figure 14 | Example of a sarcastic anti-climate tweet wrongly classified as being pro-climate.

In future work, ways of improving classifiers in order to handle more complex tasks, such as sarcasm, should be researched.

Although this study was able to present a dependable and easily reproducible method for performing opinion-oriented sentiment analysis on Dutch climate text, it is advisable that future research be conducted which focuses on studying data from more years and improving classification accuracy in order to validate the results found in this study. Under these circumstances, it is expected that the methods used in this study could produce a highly accurate reflection of the Dutch public

(13)

opinion about climate, which is relevant for many different reasons, such as being able to predict climate change policy support and behavior.

References

Aramaki, E., Maskawa, S., & Morita, M. (2011). Twitter catches the flu: detecting influenza

epidemics using Twitter. In Proceedings of the 2011 Conference on empirical methods in natural language processing (pp. 1568-1576).

Blei, D. M. (2012). Probabilistic topic models. Communications of the ACM, 55(4), 77-84.

Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of computational science, 2(1), 1-8.

Dahal, B., Kumar, S. A., & Li, Z. (2019). Topic modeling and sentiment analysis of global climate change tweets. Social Network Analysis and Mining, 9(1), 24.

Hearst, M. (2003). What is text mining. SIMS, UC Berkeley, 5.

Heimerl, F., Lohmann, S., Lange, S., & Ertl, T. (2014). Word cloud explorer: Text analytics based on word clouds. In 2014 47th Hawaii International Conference on System Sciences (pp. 1833-1842). IEEE.

IPCC. (2014). Climate Change 2014: Synthesis Report. Contribution of Working Groups I, II and III to the Fifth Assessment Report of the Intergovernmental Panel on Climate Change [Core Writing Team, R.K. Pachauri and L.A. Meyer (eds.)]. IPCC, Geneva, Switzerland, 151 pp.

IPCC. (2018). Global Warming of 1.5°C.An IPCC Special Report on the impacts of global warming of 1.5°C above pre-industrial levels and related global greenhouse gas emission pathways, in the context of strengthening the global response to the threat of climate change, sustainable

development, and efforts to eradicate poverty [Masson-Delmotte, V., P. Zhai, H.-O. Pörtner, D. Roberts, J. Skea, P.R. Shukla, A. Pirani, W. Moufouma-Okia, C. Péan, R. Pidcock, S. Connors, J.B.R. Matthews, Y. Chen, X. Zhou, M.I. Gomis, E. Lonnoy, T. Maycock, M. Tignor, and T. Waterfield (eds.)]. In Press.

IPCC. (2019). IPCC Special Report on the Ocean and Cryosphere in a Changing Climate [H.-O. Pörtner, D.C. Roberts, V. Masson-Delmotte, P. Zhai, M. Tignor, E. Poloczanska, K. Mintenbeck, A. Alegría, M. Nicolai, A. Okem, J. Petzold, B. Rama, N.M. Weyer (eds.)]. In press.

IPCC. (2020). History of the IPCC. Retrieved from: https://www.ipcc.ch/about/history/

Leiserowitz, A. (2005). American risk perceptions: Is climate change dangerous? Risk Anal. 25, 1433–1442.

Lorenzoni, I. & Pidgeon, N. (2006). Public views on climate change: European and USA perspectives. Climatic Change 77, 73–95.

(14)

Medhat W, Hassan A, Korashy H. (2014). Sentiment analysis algorithms and applications: a survey. Ain Shams Eng J. 5(4):1093–1113

NOS. (2014). VN begroet 125 klimaatsprekers. Retrieved from: https://nos.nl/artikel/701865-vn-begroet-125-klimaatsprekers.html

NOS. (2019). Duizenden klimaatspijbelaars lopen protestmars door Den Haag. Retrieved from: https://nos.nl/artikel/2270865-duizenden-klimaatspijbelaars-lopen-protestmars-door-den-haag.html

NOS. (2019). Zege Forum voor Democratie ongekend: 'Dit doet denken aan intrede LPF'. Retrieved from: https://nos.nl/artikel/2276974-zege-forum-voor-democratie-ongekend-dit-doet-denken-aan-intrede-lpf.html

O'Connor, B., Balasubramanyan, R., Routledge, B. R., & Smith, N. A. (2010). From tweets to polls: Linking text sentiment to public opinion time series. In Fourth international AAAI conference on weblogs and social media.

Paul MJ, Dredze M. (2011). You are what you Tweet: Analyzing Twitter for public health. In: ICWSM; p. 265–272

Rijksoverheid. (2020). Klimaatverandering. Retrieved from: https://www.rijksoverheid.nl/onderwerpen/klimaatverandering

Scikit-learn. (2020). Naive Bayes. Retrieved from:

https://scikit-learn.org/stable/modules/naive_bayes.html#naive-bayes

Weber, E. U. & Stern, P. C. (2011). Public understanding of climate change in the United States. Am. Psychologist 66, 315–328.

Zhang, H. (2005). Exploring conditions for the optimality of naive Bayes. International Journal of Pattern Recognition and Artificial Intelligence, 19(02), 183-198.

Author’s note

The sentiment corpus used in this study is available to anyone who wishes to use it and can be obtained by contacting anneclaire.dvrs@gmail.com.

Referenties

GERELATEERDE DOCUMENTEN

For stage III melanoma patients, regional lymph node (LN) dissection is the standard surgical therapy after a positive sentinel LN biopsy (completion LN

chrysogenum zijn er twee routes voor cysteïne biosynthese: de directe sulfhydrylatie (Trip, 2005) en de transsulfuratie route.. Het is onbekend welke van deze

Prospective pregnancy and birth cohort studies provide a unique opportunity for exposome research as they are able to capture, from prenatal life onwards, both the external

Het zal duidelijk zijn dat de invloed van het anaërobe sediment groot gaat worden wanneer de dikte van de aërobe laag (z0 ,) van dezelfde grootte wordt als de diepte waarover

Toch zou het prachtig zijn als er daarnaast bij de leerstoelgroep, al dan niet in samenwerking met andere leerstoelgroepen binnen het Mansholt Instituut, een

By applying Judith Butler’s performativity theory to Arthur Conan Doyle’s series of work surrounding the life of Sherlock Holmes, particularly focusing on the short

The purpose of this study was originally to determine whether or not Maslow’s Hierarchy can be debunked in the case of refugees in limbo in Greece, however, as the study

Combined with two case studies, this research identifies the salience of civil society actors and the presence of corruption as explanatory factors for the varying impact of the