• No results found

Where the Chirping Bird Flocks: Using Twitter to investigate the spatial distribution of the scales of meaning in the Groningen gas extraction discourse.

N/A
N/A
Protected

Academic year: 2021

Share "Where the Chirping Bird Flocks: Using Twitter to investigate the spatial distribution of the scales of meaning in the Groningen gas extraction discourse."

Copied!
43
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Where the Chirping Bird Flocks

Using Twitter to investigate the spatial distribution of the scales of meaning in the Groningen gas extraction discourse.

1

1 Word cloud created from Tweets concerning earthquakes and gas-extraction in Groningen, masked image from edteacher.org

By: Jurrian Doornbos – 11256486 Supervisor: Rowan Arundel

(2)
(3)

3

‘Birds are the most popular group in the animal kingdom. We feed them and tame them and think

we know them. And yet they inhabit a world which is really rather mysterious.’ (David

(4)

4

Abstract

This thesis presents the spatial distribution of the scales of meaning in the Groningen gas-extraction and earthquake debate on Twitter. Even though the citation ‘After scale is produced, where in the world is it?’ (Delaney & Leitner, 1997) is a theoretical standpoint on the presence of scale. After creating an algorithm that classifies scales of meaning in the tweets, it was possible to geolocate tweets across the Netherlands. Creating a spatial pattern for the complete dataset. It was shown that most tweets are present near the origin of the debate as well as inhabitant centres. The distribution of different scales of meaning however, are context dependent. With the use of the rural-urban theory a context was created around the scales of meaning in Twitter discourse. Presenting the tweets in rural and urban contexts shows that rural areas are more tweeting in local scales and urban areas lean more to the national scales. As was expected in theory. Tweets however do not show a complete picture, lack of proper demographic metadata, combined with the use of algorithms to analyse tweets, create a process as complex as the real world it tries to explain.

(5)

5

Table of Contents

1. Introduction ... 7

1.1. Meaning in discourse; scales of meaning ... 7

1.2. Social media research ... 7

1.3. Groningen case ... 7

2. Theoretical Framework ... 9

2.1. Discourses ... 9

2.1.2. The meaning of scales ... 9

2.1.3. Scales of meaning ... 10

2.2. GIS and social media ... 10

2.2.1. Twitter ... 11

3. Methodology ... 13

3.1. Operationalizing the main concepts ... 13

3.2. Twitter Data Acquisition ... 14

3.3. Tweet Classification ... 14

3.4. Opening a black box ... 16

3.5. From Tweets to GIS and beyond ... 17

4. Results ... 20

4.1. Total tweets ... 20

4.1.1.Total tweet hot spots ... 21

4.2. Scale distribution ... 22

4.2.1. Local scale ... 22

4.2.2 National scale ... 23

4.2.3. Global scale ... 24

4.2.4. Scaled hotspot maps ... 25

4.3. Determinants ... 25

4.3.1. Distance ... 26

4.3.2. Temporal aspects... 26

4.3.3. Rural and urban ... 27

4.4. Interpreting ... 27

5. Conclusion ... 29

6. Acknowledgements ... 30

7. References ... 31

(6)

6

List of tables, figures and maps in text

Tables:

Table 1. Conceptualization table

Table 2. Manually classified tweets in training and testing set

Table 3. Accuracy Report Random Forest Classifier on the testing set

Figures:

Figure 1. Top 20 important words.

Figure 2. Regression plot between tweets origin and mean earthquake origin.

Figure 3. Stacked area chart of the scaled tweets and earthquake magnitude over time. Green for local red for national and purple for global.

Figure 4. The scales of tweets per 1000 inhabitants in Urban and Rural areas.

Maps:

Map 1 and 2. Distribution of total tweets in dataset, Netherlands 2010 – 2019. 1: Absolute number of tweets. 2: normalized over inhabitants.

Map 3. Getis-Ord Gi* Hot and cold spots in gas-extraction and earthquake discourse tweet distribution. Map 4 and 5. Distribution of local tweets in the Netherlands 2010 – 2019. 4: Local tweets normalized over inhabitants. 5: Local tweets as share of total tweets in dataset.

Map 6 and 7. Distribution of national tweets in the Netherlands 2010 – 2019. 4: National tweets normalized over inhabitants. 5: National tweets as share of total tweets in dataset.

Maps 8 and 9. Distribution of global tweets in the Netherlands 2010 – 2019. 4: Global tweets normalized over inhabitants. 5: Global tweets as share of total tweets in dataset.

(7)

7

1. Introduction

In 2018, a broad report was published by the Rijksuniversiteit Groningen (Mulder & Perey, 2018), covering economic incentives behind gas extraction in the province of Groningen, public safety perspectives and policy reflection. The same diversity can be found in the public discourse. Housing damage, safety, economic and strategic benefits as well as climate impacts all have been featured in the debate. These different discussions create an interesting case-study for analysis on public discourse.

1.1. Meaning in discourse; scales of meaning

In this thesis, different scales of meaning are used to categorize this public discourse. Scales of meaning are scales at which an individual relates a discourse. The scale is used to give meaning to a discourse (Towers, 2000). Kurtz (2003) used this framework to analyse an environmental movement that increased its scales of meaning to reframe the environmental debate from a local siting problem to a national environmental racism debate. Sze et al. (2010) built on this theory from the perspective of the state in a similar case; the state abused the scales to displace minority groups. However, little is understood about the scales of meaning outside of the environmental justice debate, even though theory suggests that the scales are omnipresent in political discourse (Agyeman, 2002). More research is needed into the scales of meaning in other cases and their broader implications.

1.2. Social media research

Parallel to the Groningen gas-extraction debate, social media platforms have gained widespread adoption in society. These outlets form a relatively new location for debate. Social media platforms usually store all their information. Scholars have until now just focused on simpler analyses. For example, mapping sentiments (Bertrand et al., 2013), predicting stocks (Bollen, Mao & Zheng, 2011) or, election results (Sang & Bos, 2012). As well as mapping tourism activity using pictures taken on Flickr (Sobolevsky et al., 2015).

These new data sources however, are not without their drawback. Gayo-Avello (2012) calls the ‘I predicted x using Twitter’ a scholarly ‘fad’. Because scholars do not acknowledge the inherent

demographic characteristics of Twitter users and the self-selection bias. Current research shows that the average user is younger than the average age, tends to live in urban areas and is better off economically (Li, Goodchild & Xu, 2013).

Even with these drawbacks, Big data has been predicted to cause the ‘end of theory’ as predicted by Anderson in 2008. This prediction is the idea that vast amounts of data speaks for itself, and theory is superfluous. Big data would also cause ‘data-avalanches’ for geographers and related scholars (Miller, 2010). Kitchin (2013) suggests that a lack of methods in handling Big data sources is inhibiting the field of geography in capturing the true potential of Big data for its scholarly inquiries. More advanced methods are therefore the next step in harnessing Big data for geography.

1.3. Groningen case

In the 1960s a gas-field was discovered in the northern area of the Netherlands, near the village of Slochteren. The national government, in combination with Royal Dutch Shell started to exploit this field (Mulder & Perey, 2018). Back then, this natural gas source was used to supply the northwest Europe of gas during peak hours, when demand was highest. The total production of gas in the Netherlands was capped by the national government at 80 billion cubic meters (bcm).

This policy changed when in 2012, an earthquake near Huizinge, measuring 3.6 on the Richter scale shook the northern province (KNMI, 2012). The relation between the earthquakes and gas-extraction in

(8)

8

the area has been understood since the 1990s (Van der Sluis, 1989). However, this larger quake shifted policy towards risk minimization for the inhabitants of the area (Mulder & Perey, 2018), after a heated discussion between the inhabitants of Groningen and the national government. A reduced extraction level for the field was set at 27bcm per year. Further gas-extraction was deemed ‘societally irresponsible’ in 2018 by the ministry of Economic Affairs (Ministry of Economic Affairs, 2018). Therefore, they decided on a gradual reduction towards 0 bcm in 2030.

Application of the above

Tweets have the right attributes to look at the spatial distribution of discourses. Because of their textual nature and spatial metadata. In order to attribute meaning to the Dutch twitter discourse of the Groningen gas-extraction, the scales of meaning are applied through a spatial approach using innovative GIS techniques. Leading to the main research question.

How is the Twitter discourse on the Groningen gas-extraction represented by the different scales of meaning and how is this spatially structured in the Netherlands?

The main research question is divided into sub-questions, showing the two-fold nature of the study. On one hand:

What are the scales of meaning, and how can they be applied to Twitter data? How are the Tweets regarding earthquakes and gas-extraction distributed within the Netherlands?

and

Which determinants are related to this distribution?

On the other hand, it is a systematic analysis of Twitter data as a research subject:

What are the shortcomings of using Twitter as a data source and how are these apparent in the Twitter dataset used in this study?

The thesis will first present the theoretical framework. Following this section is the methodology, where an explanation is given for the empirical methods to answer the sub-questions. The third section is the analysis of the constructed data, presenting a variety of maps, statistical measures and graphs. Finally, the results will be concluded, and the use of twitter data will be reflected upon.

(9)

9

2. Theoretical Framework

In this framework, the thesis will be embedded within social science research, especially in the field of political geography as well as give a review on the contemporary views within this field. Furthermore, the framework will follow the same two-fold structure as presented in the introduction. This means that besides political geography, big data and its future in GIS research will be elaborated upon.

2.1. Discourses

Political geography focuses on the context and location of political engagement. Discourses are used most often to analyse the context and actors within the study area.

Discourses in a Foucauldian sense are understood as communicative processes expressed through language (Foucault, 1973). Contemporary scholars suggest that the local context in which these

discourses are employed and interpreted have been neglected in most discourse analysis (Elliker, Coetzee & Kotze, 2013). This creates a gap in which the field of geography has its role to fulfil in clarifying why local context matters for discourses (Agnew, Mitchell & Toal, 2003).

2.1.1. Rural and urban

One interpretation of the local context is with the geographic rural-urban division. Where rural areas are understood as peripheral areas, mostly focused on agricultural production (Krugman, 1991). Whereas urban areas are understood as locations with a core function to the surrounding area (Krugman, 1991). They form the basis for economic capital to accumulate (Harvey, 2004). Urban areas are more culturally diverse (Amin, 2002), and more connected to globalization processes (Sassen, 2013).

2.1.2. The meaning of scales

One way of going about contextualizing discourse is using the highly contested term scale. Ever since scale entered scholarly debate it has been poorly understood (Howitt, 2003). The following section will not give a comprehensive and concluding answer to this debate but show the different conceptions of scale.

At the start of the debate, scale was mostly presented as a nested hierarchy, such as global, national and local (Howitt, 2003). This three-level hierarchy was ever present within geopolitical analysis. Activist scholars found this hierarchy useful in analysing local action groups which challenge global pressures. However, an editorial in Society and Space (Jonas, 1994) challenged this nested hierarchy and moved beyond the rigidity of the concept. The rigidity was challenged using the constructivist approach to scale, showing that scale is ‘periodically transformed and constructed’. They presented the central conflict in defining scale. ‘After scale is produced, where in the world is it?’ (Delaney and Leitner, 1997). No pictures can be taken of the concept, like borders and jurisdictional areas. It is an elusive term on its own, the meaning arises when employed within a context.

Swyngedouw (1997) added that the nature of scale is not its theoretical debate, but its real-world presence in political struggle. Abstract discussions on the topic have since been abandoned. Leitner suggests using this constructivists’ perspective and defines scale as ‘a nested hierarchy of political spaces’. The question however remained why this perspective leads to a hierarchy in scales. Howitt (1993) suggested that ‘awkward juxtapositions’ and ‘cross-scale linkages’ is where the origin of the hierarchical scales is to be found. Nevertheless, the term is ever present and helps building up an understanding of complex relations and dynamic processes (Howitt, 2003). Howitt concludes that any struggle that divorces itself from scale, be it political, economic or environmental, has limited value.

(10)

10

2.1.3. Scales of meaning

Continuing with the effort to contextualize discourses is with Towers’ scales of meaning (Towers, 2000). Towers is part of a group of scholars concerned with environmental justice. In his presented framework for analysing unjust siting decisions, two scales are presented. One of which is the scale of meaning, which is understood as the scale in which a problem is framed and experienced in political discourse. The scale therefore gives meaning to the discourse (Towers, 2000). The other one is the institutional scale, the state regulations and frameworks it employs to maintain the discourse.

Kurtz (2003) used this framework to analyse an environmental movement that increased its scales of meaning to reframe the environmental debate from a local problem to a national environmental racism debate (also known as rescaling, Swyngedouw 2009). The movement did not want a chemical production facility near their town. The county however, continued with the siting process near the village. The environmental movement realized that their siting problem related to a national debate on racism in environmental hazard distribution. Kurtz uses scale frames to combine theory from the politics of scale from geography and framing literature from sociology. She theorizes that scales are employed within discursive frames from the different actors. In order to legitimize action from these different actors. Each actor has their own scale frame within the policy process.

Ever since Kurtz’ 2003 article the field of environmental justice has seen a rise in scale frame analysis. Within a wide variety of contexts and case specificities. Such as an analysis on a Delta Vision in the Sacramento San Joaquin Valley (Sze et al., 2010). Where the state the state ignores local minority groups in favor of regional benefits from economic and ecological projects. Or a Dutch mega farm policy process (Van Lieshout et al., 2011), where the municipality employs a scale frame of sustainable balance, the entrepreneurs a national and global scale frame of progress and the activists a scale frame of local injustice.

Walker (2009) suggests in a broad overview of the environmental justice field, that scale has become integral in the analysis of environmental struggles. However, little is understood about the scales of meaning outside of environmental justice, even though theory suggests that scales are omnipresent in political discourse (Agyeman, 2009; Bickerstaff & Agyeman, 2002).

The Groningen case has not seen an analysis where the different scale frames from each actor is

presented, with their effects on the injustice done to those in a minority. The Groningen case might be too complex to even see where to start the analysis and how to demarcate the boundaries. This thesis will not present the different scale frames, such as those presented in the analysis of Kurtz.

However, it uses the underlying fundament from Towers’ (2000) scales of meaning and Kurtz’ (2003) scale frames, posing that scales are present in discourses around environmental struggles. Presenting themselves through different actors, each has their scale in which they frame their stance on the problem. In this thesis the scales of meaning theory is used to justify the classification of meaning in the tweets. This is not without its problems; the next section will present more on the limitation of this data source.

2.2. GIS and social media

Geo-information science is in a period of embracing the potential data sources of online media platforms. These platforms provide large amounts of data, in a rapid velocity and wide variety of information (Kitchin, 2013). Big data can answer some of the most intriguing questions in geography, due to the amount of information on a big population size. As Sui and Goodchild (2011) put it: ‘Big data is deep data on the many’. Which is new in geography, as populations in research are either small and offer deep

(11)

11

information, like interviews and observations, or are large but do not offer deep information, such as surveys and census data.

As stated by John o’Loughlin in 2003 (when discussing GIS for political geography), "We will retreat to

the margins of academic debate, denying the notion that (spatial) measures and analyses can ever mean anything, and sniping at the successes of nongeographers when even quite rudimentary (spatial

analytical) techniques are shown to be applicable". (brackets added) This citation is applicable to either

GIS in political geography or using social media as a source for research (Kitchin, 2013).

However, Big Data has currently not given the GIS and geography field the information avalanche that was expected (Miller, 2010). Most research to this date has focused either on the theoretical potential that Big Data has or, used to predict election results (Gayo-Avello, 2012), stock market fluctuations (Bollen et al., 2011) or as an addition to crisis management (Zook et al., 2010).

Kitchin (2013) argues that geographers and GIS researchers alike, do not have the correct training and toolset required to work with these datasets. Furthermore, methods of analysing these datasets for geographic purpose are also underdeveloped (Sui & Goodchild, 2011).

Grimmer (2015) argues for the use of machine learning algorithms. It has been shown that they can be highly effective at handling the large datasets, as they can infer causal relations between them. However, they are not without their problems. Algorithms are usually a black box, something is put inside it, and the solution rolls out. The decision tree behind the solution is as complex as the causal relation itself. It is argued that for proper research, the algorithm should be analysed thoroughly as to find out how the decision came to be (Shalev-Swartz & Ben-David, 2014).

2.2.1. Twitter

The data source for this research is Twitter. Twitter is a platform on which users post microblogs. These microblogs carry a lot of information within them. Making them ideal for research. However, the source is not without its flaws. As identified by Gayo-Avello (2012), researchers generalize from their group of Twitter users to a larger population size. Without acknowledging the inherent biased subsample of the population. Furthermore, self-selection bias is ignored in most research. Tweets are sent only from those who are willing to create a tweet. Data is therefore produced by users who are already politically active. Scholars, especially the COSMOS (Collaborative Online Social Media ObServatory), are actively researching the differences in Twitter demography and the actual demography. Their findings and methods are crucial to the field of Twitter analysis in demarcating the boundaries of future research. Sloan et al. (2015) use the biography of the Twitter user to infer information on their occupation. Their biography can describe what a user does on a daily basis, on this assumption they created a text classifier to categorize the users into NS-SEC groups, which is a standard grouping of occupation based on socio-economic standards. They found that NS-SEC groups which are easier to describe, such as engineer, doctor or carpenter were classified with higher accuracy. As such, these groups were overly represented in the analysis. After the occupation, age was determined using the same biography. Interpreting different integers before or after words like ‘age’, ‘I am’ or, ‘years’. With this analysis they found that the users were heavily distorted towards younger ages between 15 and 25. However, they concede that younger people are more likely to profess their age in a public biography. In their conclusion they note that better methods for Twitter user demographic analysis are possible and further research in this field will

continue.

Studies using Twitter are all experimenting with methodologies which can best capture the potential the information present for their research. No doubt better methodologies will arise to show the demographic

(12)

12

characteristics of Twitter users and, would become a standard in Twitter-based research to explain the demography. Until that point though, definitive statements from Twitter research are of limited value. Shortcomings of Twitter data is present in a lack of proper methodologies to extract the best information for geographical inquiry on the one hand. While on the other hand is a shortage of demographic meta information. Even though methodologies exist to create additional user demographic information with Twitter biographies, these methodologies are also lacking.

This thesis will therefore limit its scope to tweet text classification in line with the scales of meaning. The main hypothesis being that different scales are present throughout the Netherlands in the Twitter gas-extraction discourse. On the one hand limiting its generalizing prowess due to a lack of demographic information. While on the other hand demonstrating how Twitter is used and what Twitter data can represent.

(13)

13

3. Methodology

Acquiring, handling and presenting the data will be elaborated upon in this methodology, as well as the operationalization of the concepts. At first, the data itself will be explained. Following, the data

acquisition of all sources will be elaborated upon. As well as the pre-processing necessary for the data. Finally, the steps used for data analysis will be explained.

3.1. Operationalizing the main concepts

Table 1. Conceptualization table.

The data that is used in the thesis consists of Twitter discourse. The Twitter discourse is defined as Dutch tweets containing ‘gas-extraction’ and/or ‘earthquake Groningen’ in the Netherlands. Starting from the first of May 2019 back to the first of January 2010. There were no tweets before 2010 in the discourse. The tweets were then classified according to the scales of meaning, which hypothesizes that there are different geographical scales present in discourses (Towers, 2000; Kurtz, 2003). The scales chosen for this thesis are Global, National and Local.

The three different scales were counted per municipality and combined with a CBS (Centraal Bureau van de Statistiek) municipality file for 2017 (acquired from CBS Statline in 2019). This file has more

information on each region, inhabitant size, postal code density and inhabitant density. Which is useful for the analysis on determinants and rural urban division.

Concept Dimensions Indicator Variable (unit)

Scales of Meaning in the Groningen Gas-extraction and Earthquake discourse Local, National and Global Gas-extraction and earthquake related Tweets in the Netherlands, classified with a supervised learning algorithm

Number of tweets in the dataset relating to each scale per municipality

Rural-Urban Rural

Urban

CBS OAD

CBS OAD

Less than 1000 addresses per square km in a municipality

More than 1000 addresses per square km in a municipality Determinants Inhabitant size Number of

inhabitants per municipality

CBS inhabitant size

Earthquakes Time and intensity of earthquakes

Distance to earthquakes per municipality

KNMI earthquakes measured in the province of Groningen

Centroid of each municipality to the mean centroid of all

(14)

14

Earthquake data from the Koninklijk Nederlands Meteorologisch Instituut (KNMI, 2019) was also added. This has a date and time signature to visualize the relation between Twitter activity and earthquake occurrences. This data is also used to show the distance between municipalities and the earthquake-affected area.

3.2. Twitter Data Acquisition

The following section will explain the process of the Twitter data acquisition. The goal of this section is to provide a comprehensive explanation on how Twitter data can be acquired. The code used for each step is presented in Code Boxes, found in the Appendix. Below each code box is a short explanation of what the code means. For this research Python was used, Python is a programming language mostly used in the scientific community. Python is the easiest to learn and offers ideal functionality for acquiring the data, as well as categorizing and analysing it (Oliphant, 2007).

Twitter provides open access to their API (Application Program Interface), this is a code based, online service from which developers can create apps with Twitter integration (Twitter, 2019). The usefulness for research comes in the form of the Twitter archive. The archive stores all tweets, these tweets can be acquired using queries via programming languages which can access this API (e.g., R, Matlab, Python). A query is an elaborate search term which the API can understand and respond accordingly (See Code Box 1, Appendix 1 for the query).

The API responds to the query with 100 tweets, including all metadata (like location of tweet, time and username and user biography). Code Box 2 (see Appendix 2) shows how this is implemented in the code. For every page of results from the API a new .csv (comma separated value, a file format for datasets) is created to store the 100 tweets, each file gets an assigned number, as not to overwrite the previous files. This method yielded a total of 4796 tweets (‘gas-extraction’ acquired a N of 3605, ‘earthquake

Groningen’ a N of 1191).

The tweets were stored per 100 tweets in .csv files. In Code Box 3 (see Appendix 3) the code for the pre-processing is shown. This code combines the .csv files into a single .csv and pandas dataframe2.

Furthermore, the pre-processing functions were applied, which cleaned the data somewhat, and removes excess columns. The main datasets now hold the 4796 tweets in a single file, ready for classification.

3.3. Tweet Classification

In order to analyse tweets, it is crucial to categorize the tweets. For this research the categories are the local, national and global scales.

The most common way of analysing text data is to create an algorithm that does the sorting, as sorting manually takes a long time, due to the amount of data. An algorithm ‘learns’ a pattern in text. With the pattern in mind it can classify any tweet. The first step for this research was teaching an algorithm to classify the twitter dataset, according to the different scales. This is called a supervised classification, where the algorithm is given categories which it needs to learn how to recognize.

Before the algorithm could learn, it needed classified text to learn from. This is called a training set. In the case of this research, several tweets were manually categorized into local, national and global scales. The tweets were classified when they mentioned certain aspects of the debate, the precise aspects can be found in Table 1. In total, 1113 tweets were manually classified. As seen in the table below, the global scale is not mentioned often in the Gas-extraction debate on Twitter. This makes teaching this algorithm the

(15)

15

correct patterns of text harder for the global level. Local and national scale tweets are way more common in the dataset.

Table 2 Manually classified tweets in the training and testing set

The next step was to preprocess this textual data in the tweet text column. Preprocessing consists of cleaning up the text and then converting it to numbers. Cleaning up the text occurs with regular

expressions, which are a powerful set of rules from which to select or delete aspects within a column. In Code Box 4 (See appendix 4) the expressions are shown and explained.

To let the algorithm see the relations between a sentence and the scale, it applies statistical methods, therefore the sentences are converted to 0’s and 1’s (Pavlidis, 1986). Python has a module that can do this automatically, called a count vectorizer. A count vectorizer places all words in the data in sequence and counts per tweet the occurrence of each word. Within this vectorizer, Dutch stopwords3 were removed

from the data. Furthermore, the dataset is split in two sections. One for training (80%), one for testing (20%).

With this cleaned data the algorithm could learn. The chosen algorithm is a Random Forest Classifier. This classifier has been highly used in a high number of previous Twitter Sentiment Analysis research (such as Bertrand et al., 2013). Bertrand et al. (2013) used it to extract positive and negative sentiment out of the tweets, to create a sentiment map for New York City. The classified their learning dataset

according to emoticons and classified the other tweets with their learned algorithm.

This algorithm works by creating a decision tree for each feature (a feature in this case is a single, pre-processed and count-vectorized tweet). A high number of trees are created like this to create the so called ‘forest’. The trees made of different shoots, all based on features in the tweet (single words), these features are chosen randomly, to improve diversity and thus performance of the model.

The algorithm was finely tuned with a randomized search for settings (Code Box 5, appendix 5). The accuracy of this model is presented in the table below. Accuracy is tested with the test-dataset4 and the

accuracy score function. This compares the algorithm prediction to the manual classification. These are

3 Stop words are words that occur a lot, but do not carry a lot of inherent meaning, like ‘and’, ‘it’, ‘if’, ‘yes’. For the complete list, see Appendix 7. ‘Gas-extraction’ was also present in most tweets, this was also removed.

4 The test dataset is 20% of the manually classified tweets, the other 80% was used for the learning algorithm.

Scale Tweets mentioning (and/or)

Local (n = 408) ‘housing damage’ ‘earthquake damage’ ‘groningers’

RTV Noord news items on damage

National (n = 575) National politics: persons, policy, news items ‘economy’

Global (n = 130) European politicians : Merkel, May, Putin ‘Sustainable action’

‘Russia’

‘Climate Change’ ‘enviromental activism’

(16)

16

completely new tweets to the algorithm. The model seems proficient in classifying national scales, this is due to the higher number of tweets present in the dataset that relate to the national scale. On the opposite end is the global scale, with low scores across the board, due to the low amount of testing data. In the middle is the local scale. To improve the model significantly, more data needs to be classified manually.

Table 3. Accuracy Report Random Forest Classifier on the testing set (n = 192 tweets). First column consists of the different categories, and the average score of the whole model. Precision is the percentage of not classifying the data as positive, while it is a negative. The recall column shows the ability of the classifier to find all positive samples. The f1 score is the average between precision and recall. Support is the number of tests taken per category.

However, the algorithm seemed fairly proficient at classification. This is similar to findings from

Hahmann, Purves & Burghardt (2014), which mention that scores from text-classifiers often hit a ceiling around the 70%.

3.4. Opening a black box

Understanding how the algorithm makes the predictions on scale is a science in and of itself. However, the Random Forest Classifier has a few features which help this process. For this thesis two aspects will be highlighted. One is how the algorithm values the different words; the other aspect is how the model deals with double scales in a tweet.

The categorized bar chart above shows the 20 most important words5 for categorizing the tweets, as

identified by the algorithm. Feature importance is the word with the most effect on the change in odds. Sadly, it does not show to which scale the feature importance counts towards. This could be overcome by making an algorithm exclusively for each scale, which is beyond the scope of this thesis.

However, educated guesses on the different scales can be made. ‘Future’, ‘sustainable’ and ‘climate’ (‘toekomst’, ‘duurzaam/duurzame’ and ‘klimaat’) are all present in this list. Showing that the classifier uses the correct terms for the Global scale. Furthermore, ‘Minister Kamp’, ‘VVD’, ‘government’ and

5 Every unique word has its own score, this is a list with around 300 items.

Category Precision Recall F1-score Support

Global 70% 32% 44% 6

Local 72% 49% 58% 69

National 64% 88% 75% 117

Average/total 67% 67% 67% 192

(17)

17

‘politics’ (‘kamp’, ‘kabinet’, ‘politiek’) are most likely for the national scale. The local scale has the ‘house’, ‘earthquake’, ‘loppersum’ and ‘groningers’ to represent the scale. Most other words are indeed from the manual classification, like usernames ‘earthmattersnl’ and ‘politiekinnederland’ which were often mentioned in the tweets, besides the actual content. Furthermore, topics which were in the time period where the manual classification took place are overly represented for example the ‘profiteering’, this was a trending hashtag for a short period. Showing that manual classification was the most crucial step for the algorithm. Having a solid foundation for the classification will heavily improve the accuracy of the model.

To understand how the algorithm works with multiple scales in a tweet, three double tweets were written, and a single triple scale tweet. The same pre-processing and prediction steps were applied (see Appendix 7 for these tweets and the outcome of the prediction). The four tweets are a combination of the scales within a tweet. All tweets were classified as local tweets. Which is accurate for three out of four of the tweets.

However, more interesting were the odds the model gives each category. Which are determined by the feature importance, as shown above. What you would expect, is that the scales which are present in these four tweets would have the highest odds, and the one with triple scales equal odds. This is the case. The most extreme being the local-global one. With a confidence of 48% local and 42% global. The national-global tweet is also interesting, a confidence level of 38% local with the other two scales trailing close by, but lower nonetheless. This shows that the algorithm can classify a single scale, it also gets the double scale mostly correct.

This algorithm could be recalled and applied to a non-classified dataset to classify it, just as it had been done for the four test tweets. It was applied to the other 3650 tweets. This can be seen in Code Box 6 (see appendix 6). In this dataset, the count for local tweets was 1762, global tweets was 233, and national tweets was 2801.

3.5. From Tweets to GIS and beyond

Tweets have different forms of location metadata. One way is coordinates, which users themselves can turn on or off, according to Li and Goodchild (2011) only 5% of users have this feature turned on. But there is the second location metadata, which is a place name. Twitter has a fuzzy selection method for creating this location information. This location metadata was used, because each tweet has it. A new table was made, with location metadata versus occurrences of local, national and global tweets. The exact process can be seen in Code Box 6 (See appendix 6).

To create a map of the tweets, the municipality 2017 shapefile is taken from CBS statline (2017). This file has the best basic correlation with municipality names and the Twitter location names. However, every year the municipalities in the Netherlands change somewhat. Twitter does not update their locations accordingly, but use their own definitions, which are unknown to the public.

With a simple join on Twitter names and CBS names only 70% of the locations find a match. This is due to outliers in precision, a handful tweets are more precise than the municipalities and have an address or building name. Other tweets are less precise and show provinces or ‘The Netherlands’ as a location. These tweets are dropped from the dataset, except for Groningen and Utrecht, these provinces share the same name as a municipality and are impossible to distinguish from their respective municipality names. The precise tweets have been manually located and scaled up to the municipal level. Luckily, there are around 350 municipalities. 283 municipalities have tweets originating from them, the other 70 are labelled as No Data. This dataset can be joined to the CBS location file, using ArcGIS. If the sample were larger

(18)

18

(>1000 locations), sophisticated fuzzy combination methods could have been used to bridge the gap between Twitter location definition and the CBS one.

Python plot constructing

There are two plots created in Python, a scatterplot of distance to earthquake area and tweets with earthquakes over time. GIS functionality was used to construe the earthquake and distance variables. The Dutch weather institute provides a dataset on all seismic activity in and around the Netherlands (KNMI, 2019). It published this data in a netCDF file, a format used in GIS data publishing. This means that extracting useful information from this format demands GIS software. The data was opened with the selection of date-time and magnitude per earthquake. After opening, the data was shrunk to only consist of earthquakes since 2010, as to have a similar timeframe as the tweets. Furthermore, all earthquakes not near the gas field area were removed, such as seismic activity in Limburg, Germany and, the North sea. This selection was exported as a table for the time plot.

The same selection was also used to calculate for the distance to earthquake area plot. With the mean centroid function in ArcMap, a mean centre point was calculated. The centroid of every municipality was calculated. With these numbers a distance to earthquake area per municipality was calculated and added to the table. This gave a distance value per municipality to the earthquake area.

In Python, the total number of tweets were plotted over time. Similarly, the earthquake magnitudes were added. However, this gave a messy plot as there were a lot of earthquakes. Every earthquake below a 2.5 on the Richter scale was removed from the plot.

The distance to the earthquake area and number of tweets present in each municipality was plotted in a regression plot and a scatterplot. The regression plot shows the scattered dots and regression line. In SPSS6 a Spearman’s Rho correlation was calculated. Spearman’s Rho shows the strength of association

between two ordinal values.

Rural Urban bar chart

The exported data file from which the maps were made was opened in Python. All OAD (postal code density) values greater than 1000 were placed in a variable Urban. CBS uses (CBS, 2019) everything below 1000 to denote low density (500-1000) and non-urban areas (less than 500). Everything higher they denote as mildly to very dense urban areas. All values below in Rural. Every column, except the scales and scales per 1000 inhabitants were removed. The leftover columns were aggregated to the Rural and Urban level. These bar charts are presented in Appendix 9.

GIS Map constructing

In this thesis there are three categories of maps that examine spatial patterns of tweets. The first looks at both total gas-extraction related tweets and tweets separated by scales of meaning in terms of tweets per number of inhabitants per municipality. The second looks at the share of different scales of meaning as a percentage of all gas-extraction relation tweets. The last presents the tweets through hotspot maps. Hot spot maps are maps which show a clustering of higher and lower values in the data. This section will elaborate on the construction of these maps.

When all different scales in the dataset are counted you get the total tweets from an area in the dataset. Total tweets originating from an area is mostly determined by how many inhabitants there are; more

(19)

19

inhabitants, more potential twitter users. To combat this bias, the maps show the count of tweets per 1000 inhabitants. The total number of and, different scales of tweets are all shown by dividing the tweets over inhabitants and multiplying them by 1000. Percentage shares were calculated by dividing the scale over the total tweets and multiplying this number by 100.

Hot spot maps were created with the Getis-Ord Gi* method (Ord & Getis, 1992). Getis-Ord Gi* hotspots are an analysis method in which data from each vector7 is compared to the data of the neighbouring

vectors. This assumes a certain contextual overlap between these areas, which possibly can explain the presence of a hotspot. Confidence levels of each hot spot is calculated and represented. In an inverse manner, cold spots are calculated, which are the opposite of a hotspot. Hot and cold spots show the confidence in significance of that value, compared to its’ neighbouring areas, creating a relative measure. With this method, four maps were created: the total tweet density and the three scales. The data used for this map is the percentage share of each scale

(20)

20

4. Results

The following section will first present the created maps and graphs. The first section will present the distribution of the whole dataset called total tweets. Following, the three different scales will be

examined. As explained in the theoretical framework and used in the methodology, the tweets have been categorized into three different scales. Afterwards the scales will be compared using hotspot maps. Following, three determinants in the spatial distribution are presented. The last section presents the results in a theoretical context.

4.1. Total tweets

The first maps are the spatial distribution of all gas-extraction and earthquake discourse related tweets across the Netherlands, during the period 2010 – 2019. As expected, tweet numbers roughly represent population sizes per municipality. All municipalities with a large inhabitant size show a high number of tweets from that location, as seen in the left map. Similarly, a significant Spearman’s Rho of 0,3 is found between these two data forms (Appendix 14), showing a positive correlation, as tweets increase, number of inhabitants also increases.

The highest number of tweets from any municipality is the municipality of Groningen. It has 781 tweets mentioning ‘gas-extraction’ or ‘earthquake’. The second map shows the tweets normalized per 1000 inhabitants. One can see that in the northern area of the Netherlands tweets per 1000 inhabitants are higher than anywhere else in the Netherlands. The no data visualization shows this as well. Regions further away from the earthquake area do not have any tweets coming from them. This shows that being near the origin of the discussion effects the number of tweets mentioning the discussion.

Map 1 and 2, Gas-extraction and earthquake related tweets per municipality 2010 -2019, 1: total tweets in dataset, 2: normalized over inhabitants.

(21)

21

The dark blue spot in the middle of the Netherlands is Baarn, it stands out of the other municipalities due to a high number of tweets (61) with a normal inhabitant size (24 529 in 2017). Looking deeper in the data reveals a very active twitter user, posting 60 out of 61 tweets. A similar story is occurring in Baarle-Nassau.

4.1.1. Total tweet hot spots

With the Getis-Ord Gi* method, similar conclusions can be made as above. A big hot spot can be seen in the northern area of the Netherlands, Tweets are significantly more present in this area. The other hot spots of tweets can be found near Amsterdam and the Hague. No cold spots are present in the tweet distribution.

Map 3. Getis-Ord Gi* Hot and cold spots in gas-extraction and earthquake discourse tweet distribution.

(22)

22

4.2. Scale distribution

First the different scales will be presented in solidarity, then in a side by side comparison of hot spots.

4.2.1. Local scale

First is the local scale, these tweets mentioned earthquake induced housing damage, as well as inhabitant safety of the inhabitants in the earthquake area. Represented how it was classified by the algorithm and manual classification.

Map 4 and 5, local scale tweets distribution. Left is the distribution normalized per 1000 inhabitants. The right map shows the share of local tweets for all tweets in that municipality.

The left map shows that the distribution of tweets per 1000 inhabitants roughly follow the distribution of the total tweets in the dataset. Being closer to the affected area increases the number of locally scaled tweets coming from that area.

The right map shows that locally scaled tweet shares are spread in different shares across the Netherlands. In regions further away from the gas-extraction area, people are less likely overall, to tweet about it. But when they tweet, there isn’t a clear pattern that they are less or more likely to tweet about the local scale than in the earthquake area.

(23)

23

4.2.2 National scale

Second is the national scale, these tweets mentioned national politics and national economic benefits, as well as national news items. Represented below is how it was classified by the algorithm and manual classification.

Map 6 and 7, national scale tweets distribution. Left is the distribution normalized per 1000 inhabitants. The right map shows the share of national tweets for all tweets in that municipality.

The distribution of national scale tweets per 1000 inhabitants is still mostly in the northern area, but darker red spots are present throughout the Netherlands. This scale seems to have the most interaction across the Netherlands. As most people who tweeted about the gas-extraction, discuss the national politics side of the gas-extraction story. In a similar manner, most tweets have critique on the government. Show a certain mistrust in handling cases such as the gas-extraction one. The relative distribution of tweets as shown on the right seems to have a quite uniform distribution. The south-western area of the Netherlands seems to be tweeting about the national scaled discussion, more than the other scales.

(24)

24

4.2.3. Global scale

Third is the global scale, as explained in the methodology, these tweets mentioned climate change, sustainable action, Russia and EU-politicians. Represented below is how it was classified by the algorithm and manual classification.

Map 8 and 9, global scale tweets distribution. Left is the distribution normalized per 1000 inhabitants. The right map shows the share of global tweets for all tweets in that municipality.

The first thing that stands out from these maps is the amount of locations that have not mentioned the global scale at all. This is because this scale was mentioned a lot less altogether. Around 5% of tweets were classified as global. The pattern on the left shows that the global scale is ever present, but the lack of tweets might also show that it is not a widely used scale in the discussion.

The right map shows that three areas in the southern area of the Netherlands show a high amount of global scales. Looking further in these tweets shows that these areas all have a single tweet originating from them, one mentioning what the future might hold, and the sustainable improvement they did to their house. The one in Nuenen has been wrongly classified, it is a tweet about the decision of Eric Wiebes to reduce the gas levels.

(25)

25

4.2.4. Scaled hotspot maps

Getis-Ord Gi* hotspots are an analysis method in which data from each municipality is compared to the data of the neighbouring municipalities. Confidence levels for each municipality is represented below. A hotspot in this case means a relative high (hotspot) or low (cold spot) share of scaled tweet in the

municipality.

The local and national tweet shares exclude each other. As seen by the red areas in the local map and the blue areas in the national map. Furthermore, the global map shows a concentration of tweet shares in Limburg. Mostly due to there being not a lot of tweets altogether, making it simpler to acquire 100% of tweets there. In this map, the northern area of Groningen is not presented as a hotspot, as the different scale shares are similar throughout the area. These findings suggest that the scales share an equal distribution across the Netherlands in the current state. This does not mean that this share is the optimal share.

4.3. Determinants

To understand underlying factors behind the distribution of tweets, three determinants have been picked. Distance to earthquake area, a timeline and the rural-urban divide.

(26)

26

4.3.1. Distance

The regression above shows a logarithmic relation between the tweets per inhabitant and the mean of the earthquake area. Twitter users are more actively discussing the earthquakes and gas-extraction topic when they are near the area itself, this is a similar outcome to significant Spearman’s Rho value of -0,39

(Appendix 14). Showing that the further from the mean earthquake area, less tweets are present.

4.3.2. Temporal aspects

Figure 3. Stacked area chart of the scaled tweets and earthquake magnitude over time. Green for local, red for national and purple for the global scale. Earthquakes presented on the left axis in Richter scale.

Looking at the temporal aspects of total tweets in the graph below, one can see a highly non-uniform representation of tweets. This might have to do with the occurrence of different earthquakes causing spikes in the gas-extraction debate. The increasing number of tweets also shows a widespread adoption of the twitter platform altogether.

On the left axis is the earthquake magnitude in Richter scale presented, on the right axis are the number of tweets per week presented. There seems to be a connection between the earthquakes and twitter activity. Especially the earlier tweet peaks. The gas-extraction debate really took off after 2015, from this point, continuous activity is present.

Figure 2. Regression plot between tweets origin and mean earthquake origin.

(27)

27

The different scales do not seem to overtake one another, causing a dominant scale in the discourse. However, peaks in data coincide with earthquakes of a Richter Scale above 3. Such as the 2012 Huizinge earthquake, with a magnitude 3,6. This event also shows a high number of local mentioned tweets, relating to earthquake effects such as damage to housing. Further along, the national scale starts entering the discussion. The global scale has not much of a presence in the twitter debate, as it does not seem to peak the 10 tweets per week at all. National tweets have a dominance in the twitter debate since the start of 2017. Except for the times an earthquake hits Groningen.

The time-based context shows the relation between the ongoing discourse and size of twitter activity within this discourse. Furthermore, earthquakes seem to impact tweet activity heavily, mostly affecting the earthquake related tweets.

4.3.3. Rural and urban

The rural and urban division as a context in which the discourse plays out has been presented in the theoretical framework (Sassen, 2013; Krugman, 1991). After data-analysis, the following bar plot was constructed. It has the tweets per 1000 inhabitants over rural and urban areas in the Netherlands, according to the scales.

Figure 4. The scales of tweets per 1000 inhabitants in Urban and Rural areas.

The above bar chart shows a clear division. Urban areas tweet per 1000 inhabitants more on a national scale and the same counts for rural areas and the local scale. For the rural area, most tweets per 1000 inhabitants come from the rural areas in the province of Groningen, where the earthquakes are. For the urban areas however, it is more elusive.

4.4. Interpreting

The following section will present the results in the context of the literature present on scales of meaning and rural urban theory.

The total tweet distribution mostly seems to follow inhabitants, as in line with findings from Hahmann, Purves & Burghardt (2014). However, the distance to the earthquake area is an interesting addition. Suggesting that distance to the context of the discourse matters for activity in the twitter discourse. Elliker

(28)

28

et al. (2013) argue for a more context dependent discourse analysis. The findings strongly support this case. Using the gas-extraction and earthquake Twitter discourse shows that distance to the origin of the discussion is an important factor on the activity of the discourse, and its’ scaled contents. The Rural urban divide tells a similar story, the context of an urban or rural area seems to cause different scales to occur. However, to understand this difference, more context dependent discourse analysis is recommended. The lack of the global scale in tweets altogether is another important finding from the data. Even though the tweets do not represent the complete discourse, the lack of this scale opens a path for environmental activists. Environmental activist scholars have found that increasing scales helps activists attain their goals (Swyngedouw, 1997; Kurtz, 2003). Increasing activity around the global scale might help the Groningen earthquake activists in attaining their goals and transition to a new period in the discourse.

(29)

29

5. Conclusion

This thesis presented the spatial distribution of the scales of meaning in the Netherlands. Using the Groningen gas-extraction and earthquake discourse as a case-study. After creating an algorithm to classify the tweets, the tweets were geolocated to municipalities across the Netherlands, showing a spatial pattern for the complete dataset, as well as the individual scales.

Analysing the total tweets in the dataset, it was shown that most tweets are present near the origin of the debate as well as inhabitant centres. People are more active in the twitter discourse if it is closer to their location. Furthermore, a lack of the global scale altogether was found. The local scale seems mostly present in areas around the earthquake centre as well. Furthermore, the different scales are found to be related to whether an area is rural or urban. Showing that urban areas, in the context of the Groningen gas-extraction and earthquake discourse, are tweeting more about the national scale, this scale is also the most present altogether in the twitter discourse. While rural areas are more focused on the local scale. Furthermore, peaks in twitter activity follow the earthquakes in Groningen.

These findings are somewhat in the same line as theory suggests. However, urban areas would be tweeting more about the global scale, according to Sassen (2013). In this dataset, the global scale has a very low number of tweets altogether. But the next scale down is mostly present in Dutch urban areas. The rural has been in line with theory, suggesting that rural areas are more locally connected, and relating more to the local discourse (Krugman, 1991).

Scale is a highly contested concept, using it to represent tweets in a GIS analysis does not avoid the complexity of the term. Howitt (1993a) suggested that ‘awkward juxtapositions’ is where the origin of the scales is to be found. The awkward juxtapositions present themselves in this thesis when manually classifying the data. Tweets are awkward to classify. Twitter is still a platform of 280 characters per message, adding the scale of meaning and its accompanying theoretical weight seems sometimes

contrived, sometimes completely accurate. Manual classification forms the basis of the algorithm and the following analysis. Clear classification rules and careful examination of tweets are suggested to maintain a clear view on what the algorithm exactly shows.

However, the findings are limited. Twitter in research is still in its infancy, compared to established fields and methods. Twitter data is not interesting enough on itself. Tweets do not show gender, age, ethnicity or income as external features to analyse, as is customary for geographical analysis. Demographic methods to demarcate the Twitter population, be it for the whole population as well as the specific dataset

population, are not sophisticated enough to cause the ‘end of theory’, as suggested by Anderson in 2008. For now, Twitter can be used as an addition to current methods, as it shows a completely different view on the topic than is currently present in academia. Which is always welcome, as to refresh and add to current theories.

(30)

30

6. Acknowledgements

This thesis was a diverse project, it all started with the idea of using Twitter. This idea grew and became a Bachelor thesis to end three years of studying at the University of Amsterdam. For this help, I would like to thank a few people.

First, my supervisor Rowan Arundel from the department of Planning and Geography, for accepting the idea that Twitter would be an interesting research subject. As well as providing many feedback sessions and in these sessions giving specific, highly useful feedback.

Secondly, I would like to thank Michiel Boswijk for helping me start with acquiring the Twitter data and giving me a jumpstart lesson in Python. Without the data, a thesis on Twitter would be impossible. Finally, I would like to thank my family, the countless messages between my sister and I on the specifics of the research and demarcating what I was going to do are highly appreciated. Furthermore, the

importance of the extra eyes reading with me cannot be understated, as on your own you always seem to end up stuck in your line of thought.

My thanks for reading this thesis, Jurrian Doornbos

(31)

31

7. References

Adnan, M., Lansley, G., & Longley, P. A. (2013). A geodemographic analysis of the ethnicity and identity of Twitter users in Greater London. In Proceedings of the 21st Conference on GIS Research UK

(GISRUK) (pp. 1-6).

Agnew, J. A., Mitchell, K., & Toal, G. (Eds.). (2003). Introduction. A companion to political geography, 1-9. John Wiley & Sons.

Agyeman, J. (2002). Constructing environmental (in) justice: transatlantic tales. Environmental Politics,

11(3), 31-53.

Amin, A. (2002). Ethnicity and the multicultural city: living with diversity. Environment and planning A,

34(6), 959-980.

Anderson, C. (2008). The end of theory: The data deluge makes the scientific method obsolete. Wired

magazine, 16(7), 16-07.

Attenborough, D. (1998). The life of birds. BBC.

Bertrand, K. Z., Bialik, M., Virdee, K., Gros, A., & Bar-Yam, Y. (2013). Sentiment in new york city: A high resolution spatial and temporal view. arXiv preprint arXiv:1308.5010.

Bickerstaff, K., & Agyeman, J. (2009). Assembling justice spaces: the scalar politics of environmental justice in north‐east England. Antipode, 41(4), 781-806.

Bollen, J., Mao, H., & Zeng, X. (2011). Twitter mood predicts the stock market. Journal of computational

science, 2(1), 1-8.

Bryman, A. (2016). Social research methods. Oxford university press.

Centraal Bureau voor de Statistiek. (2017). Wijk en Buurtkaart 2017. Retrieved from

https://www.cbs.nl/nl-nl/dossier/nederland-regionaal/geografische%20data/wijk-en-buurtkaart-2017

Delaney, D. and Leitner, H. 1997b. The political construction of scale. Political Geography, 162, 93-7.

Elliker, F., Coetzee, J. K., & Kotze, P. C. (2013, July). On the interpretive work of reconstructing discourses and their local contexts. In Forum Qualitative Sozialforschung/Forum: Qualitative Social

Research (Vol. 14, No. 3).

Foucault, M. (1973). The birth of the clinic. Routledge.

Gayo-Avello, D. (2012). "I Wanted to Predict Elections with Twitter and all I got was this Lousy Paper” A Balanced Survey on Election Prediction using Twitter Data. arXiv preprint arXiv:1204.6441.

Grimmer, J. (2015). We are all social scientists now: How big data, machine learning, and causal inference work together. PS: Political Science & Politics, 48(1), 80-83.

(32)

32

Hahmann, S., Purves, R., & Burghardt, D. (2014). Twitter location (sometimes) matters: Exploring the relationship between georeferenced tweet content and nearby feature classes. Journal of Spatial

Information Science, 2014(9), 1-36

Harvey, D. (2004). The 'new' imperialism: accumulation by dispossession. Socialist register, 40(40).

Howitt, R. (1993). "A world in a grain of sand": towards a reconceptualisation of geographical scale. Australian Geographer, 241, 33-44.

Howitt, R. (2003). Chapter 10. Scale. In Agnew, J. A., Mitchell, K., & Toal, G. (Eds.). (2003). A

companion to political geography, 138-157. John Wiley & Sons.

Kitchin, R. (2013). Big data and human geography: Opportunities, challenges and risks. Dialogues in

human geography, 3(3), 262-267.

Koninklijk Nederlands Meteorologisch Instituut. (2019). Earthquakes – Complete catalogue for the Netherlands and near surrounding. Retrieved from

https://data.knmi.nl/datasets/aardbevingen_catalogus/1?q=aardbeving

Krugman, P. (1991). Increasing returns and economic geography. Journal of political economy, 99(3), 483-499.

Kurtz, H. E. (2003). Scale frames and counter-scale frames: constructing the problem of environmental injustice. Political geography, 22(8), 887-916.

Lees, L. (2004). Urban geography: discourse analysis and urban research. Progress in human geography,

28(1), 101-107.

Li. L., Goodchild, M. F., & Xu, B. (2013). Spatial, temporal, and socioeconomic patterns in the use of Twitter and Flickr. Cartography and Geographic Information Science, 40:2, 61-77.

Miller, H. J. (2010). The data avalanche is here. Shouldn’t we be digging?. Journal of Regional Science, 50(1), 181-201.

Mulder, M., & Perey, P. (2018). Gas Production and Earthquakes in Groningen; Reflection on Economic and Social Consequences.

Oliphant, T. E. (2007). Python for scientific computing. Computing in Science & Engineering, 9(3), 10-20.

O’Loughlin, J. (2003). Chapter 3. Spatial Analysis in Political Geography. In Agnew, J. A., Mitchell, K., & Toal, G. (Eds.). (2003). A companion to political geography, 30-46. John Wiley & Sons.

Pavlidis, T. (1986). A vectorizer and feature extractor for document recognition. Computer

Vision, Graphics, and Image Processing, 35(1), 111-127.

Perlaviciute, G., Steg, L., Hoekstra, E. J., & Vrieling, L. (2017). Perceived risks, emotions, and policy preferences: A longitudinal survey among the local population on gas quakes in the Netherlands. Energy

(33)

33

Sang, E. T. K., & Bos, J. (2012). Predicting the 2011 dutch senate election results with twitter. In

Proceedings of the workshop on semantic analysis in social media (pp. 53-60).

Sassen, S. (2013). The global city: New york, London, Tokyo. Princeton University Press.

Shalev-Shwartz, S., & Ben-David, S. (2014). Understanding machine learning: From theory to

algorithms. Cambridge university press.

Sloan, L., Morgan, J., Burnap, P., & Williams, M. (2015). Who tweets? Deriving the demographic characteristics of age, occupation and social class from Twitter user meta-data. PloS one, 10(3), e0115545.

Sobolevsky, S., Bojic, I., Belyi, A., Sitko, I., Hawelka, B., Arias, J. M., & Ratti, C. (2015 ). Scaling of city attractiveness for foreign visitors through big data of human economical and social media activity. In

2015 IEEE International Congress on Big Data (pp. 600-607). IEEE.

Stefanidis, A., Crooks, A., & Radzikowski, J. (2013). Harvesting ambient geospatial information from social media feeds. GeoJournal, 78(2), 319-338.

Swyngedouw, E. (1997). Excluding the Other: the production of scale and scaled politics. In R. Lee and J. Wills (eds.) Geographies of Economies. London: Arnold, 167-76

Sze, J., London, J., Shilling, F., Gambirazzio, G., Filan, T., & Cadenasso, M. (2009). Antipode, 41(4), 807-843.

Towers, G. (2000). Applying the political geography of scale: Grassroots strategies and environmental justice. The Professional Geographer, 52(1), 23-36.

Trouw. (2018). Verbijsterend, hoe snel de regering van gedachten veranderde over gaswinning. Retrieved from https://www.trouw.nl/opinie/verbijsterend-hoe-snel-de-regering-van-gedachten-veranderde-over-gaswinning~a676db2e/

Sluis, M. van der. (1989). Aardbevingen in Noord-Nederland. Hoogezand

Van der Voort, N., & Vanclay, F. (2015). Social impacts of earthquakes caused by gas extraction in the Province of Groningen, The Netherlands. Environmental Impact Assessment Review, 50, 1-15.

(34)

34

8. Appendix

Appendix 1. Code Box 1.

bearer_key = ''

endpoint = "https://api.twitter.com/1.1/tweets/search/fullarchive/app.json" headers = {"Authorization":"Bearer {}".format(bearer_key), "Content-Type": "application/json"}

query = "\"(gaswinning OR aardbeving groningen OR groningenveld OR groningerveld OR groninger gasveld) place_country:nl\""

from_date = "\"201501010000\"" to_date = "\"201601010000\"" max_results = '100'

next_token = '' counter = 42

data = '{"query":' + query + ', "fromDate":' + from_date + ', "toDate":' + to_date + ', "maxResults":' + max_results + '}'

Code Box 1. Python code used for accessing the Twitter API. The bearer_key is an identification key created by Twitter to identify which app is accessing the API. The endpoint is the URL where the full archive is located. Headers is a formatted bearer key the API can understand. The query is the limitation the user sets on the API on what to search. For this research, the content of the tweet, location and time were used. The counter is to keep track how many requests were done. Data is the combination of all the above definitions into a single definition.

(35)

35

Appendix 2. Code Box 2.

while True:

response = requests.post(endpoint,data=data,headers=headers) response_json = response.json()

extracted_data = extract_data(response_json['results'], columns) data_df = pd.DataFrame(extracted_data)

data_df.to_csv('extracted_data/tweet_data_{}.csv'.format(counter)) counter += 1

try:

next_token = "\"{}\"".format(response_json['next'])

data = '{"query":' + query + ', "fromDate":' + from_date + ',

"toDate":' + to_date + ', "maxResults":' + max_results + ', "next":' + next_token + '}'

print('Next token found :D') except:

print('No more next token found') break

Code Box 2. Python loop to request information from the API. Response is defined as a POST request followed by the definitions from Code Box 1. Response_json is a new definition storing the response from the twitter API. Extracted_data is the the response_json stored into the preset columns defined in the code snippet called extract_data. Data_df is a pandas dataframe with the extracted_data. This is converted into a .csv file with the the name tweet_data_x, where x is the number found in the counter in Code Box 1. The try loop is a secondary request where the next_token is inserted in the code to gain access to the next page of results. With the print() command it shows the user when the code is done running.

Referenties

GERELATEERDE DOCUMENTEN

(2013) has used the transport equipment industry in his study on fragmentation and competitiveness in which, similarly to this paper, uses input-output table techniques

This meeting is necessary to evaluate the property on multiple characteristics; these are provided by the research of Koster and van Ommeren (2015). The characteristics

Successful competition in the wholesale market for Phase III customers will rely on the sale of additional flexibility services by the Groningen field, but there is no explicit

If, for instance, a new idea requires informa- tion that is not currently in the search engine’s inverted index, then the researcher has to re-index the data or even recode

To reduce the error to a minimum for a given size of the deposited feature, care has to be taken to match the number of radial segments and the maximum outer radius of the mask to

Hoewel, de cumulatieve diagnose duur wel (aanzienlijk) langer is dan voor de meer nauwkeurige aanpak, vindt de kortere cumulatieve totale duur vooral zijn oorsprong in het een

This method was originally proposed for the detection of ter- minological phrases and although domain terms may express the principal informational content of a scientific

Desalniettemin lijkt de diepte van de textuur B2-horizont het patroon van de kalkhoudende loess te bevestigen: op het centrale, vlakkere deel van het plateau bevindt