Representing the Sustainable Development Goals: mapping the discourse on Twitter

(1)

Representing the Sustainable Development

Goals: mapping the discourse on Twitter

Master’s Thesis MA New Media & Digital Culture

Name: Leon Smits

Student number: 10500731 Supervisor: Davide Beraldo Second reader: Bogna Konior

(2)

1

Abstract

To increase efficiency in the worldwide battle against complex problems like climate change and poverty, the United Nations (UN) has formulated the Sustainable Development Goals (SDGs), seventeen goals for 2030 that should make the world a better place. Research has shown that social media can be used to research discourses on a particular topic and collecting data on Twitter is specifically accessible. While much research has been done regarding discourses on individual aspects of the SDGs, little is known about the SDGs as a whole. Therefore, this research aims to provide insight into the discourse on the SDGs on Twitter, asking how the SDGs are represented on the platform. The results are then put into perspective considering awareness creation on social media.

To be able to reach this goal, this research uses concepts of issue mapping to present the results. In order to do so, tweets about the subject were collected during a one month period. Subsequently, the tweets and the users posting them were categorized, thereby also providing a new framework for categorizing tweets and Twitter users. Analysis of the categorizations showed that activists and UN-related users are the most vocal on the subject, and that no negative tweets are posted about the SDGs. These results indicate that the discourse on Twitter is exclusively positive and dominated by actors that are very engaged with the subject. This suggests a positive impact on awareness creation about the SDGs. However, as a relatively low number of Twitter users engage with the SDGs on Twitter, this impact is presumably rather small.

(3)

2

3 4.1.2 Activist ... 34 4.1.3 UN/Government ... 35 4.1.4 Business ... 35 4.1.5 NGO ... 36 4.1.6 Education ... 36 4.1.7 News ... 37 4.2 Clusters of actors ... 37 4.2.1 Local ... 38 4.2.2 UN/Government ... 40 4.2.3 SDG topic ... 41 4.2.4 Business ... 44 4.2.5 News ... 44 4.2.6 Undefinable ... 44 4.3 Tweet categorization ... 45 4.3.1 Activist ... 46 4.3.2 Promotional ... 47 4.3.3 News ... 47 4.3.4 Event update ... 47 4.3.5 Business ... 48 4.3.6 Political ... 48 4.3.7 Other ... 48

4.3.8 Sustainable Development Goals categorization ... 48

4.4 Frequent hashtags ... 50 5 Discussion ... 51 5.1 Limitations ... 53 5.2 Future work ... 54 6 Conclusion ... 55 References ... 56 Appendices ... 61

(5)

4

1 Introduction

Despite extraordinary economic growth in large parts of the world, approximately 8% of the world’s population still lives in extreme poverty1_{and problems like hunger, the violation of}

human rights and lack of access to clean water and good healthcare are still ubiquitous. On top of that, humanity is taking its toll on the planet, causing the extinction of numerous species, exhaustion of natural resources and, most importantly, climate change. This is why, in September 2015, all member states of the United Nations (UN) accepted the Sustainable Development Goals (SDGs), that eventually came into effect in 2016. The SDGs are seventeen goals for 2030 that aim to solve complex problems like climate change, poverty and hunger, to make sure that no one on our planet is left behind. Almost four years later, the UN is still working hard to spread the message and get everyone, whether they are countries, corporations or individuals, to contribute to reaching the SDGs and leaving the world a better place.

As the SDGs are an intriguing and important subject, this research aims to provide some insight into the state of the discourse on the SDGs on social media, specifically Twitter. As it is currently relatively easy to collect large amounts of user generated content on Twitter, for example with the help of a tool like DMI-TCAT2, the platform lends itself well to research (Borra & Rieder, 2013) and numerous studies have been conducted that used Twitter to explore discourses on particular subjects (see e.g. Kirilenko & Stepchenkova, 2014; Larsson & Moe, 2012; Thackeray, Burton, Giraud-Carrier, Rollins & Draper, 2013). However, it should be emphasized that, because of the relatively small number of people on Twitter, conclusions based exclusively on Twitter data, cannot be carelessly applied to society as a whole. That is why this study focuses solely on Twitter and only presents assumptions regarding the application of the results outside of Twitter.

To reach the aforementioned goal, this study aims to find an answer to the following research question: “How are the Sustainable Development Goals represented on Twitter?” To do this, some methods of the concept of issue mapping (Marres, 2015; Rogers, Sánchez-Querubín & Kil, 2015) are used to clearly present who is tweeting about the SDGs and why. Three sub-questions have been defined to provide a structure towards answering the research question:

• Which actors are tweeting about the Sustainable Development Goals?

• How are the actors clustered together in the social network, and what groups them?

1_{https://worldpoverty.io/index.html} 2_{See methodology chapter}

(6)

5 • To what end are actors tweeting about the Sustainable Development Goals?

By answering these sub-questions this research shows which actors are involved with the SDGs on Twitter, how these actors are connected to each other, and what the sentiment and goal of their tweets is.

This thesis is structured as follows: first, a theoretical framework is presented, explaining the SDGs, the characteristics of Twitter, theories of awareness creation on social media, the concept of issue mapping and finally previous work on the categorization of tweets and Twitter users. Subsequently the used methods are introduced, explaining the data collection and the used methodology to categorize the tweets and Twitter users, in order to be able to answer the three sub-questions. Next, the results are presented in detail, providing a clear image of the Twitter discourse on the SDGs and elaborating on specifically interesting findings. Finally the results are put into perspective using the theoretical framework to situate the findings, ultimately answering the research questions.

(7)

6

2 Theoretical framework

2.1 The Sustainable Development Goals

The Sustainable Development Goals (SDGs) are seventeen goals for the year 2030, formulated by the United Nations (UN) in 2015 as the successor to the Millennium Development Goals (MDGs) formulated in 2000. The seventeen goals are based on a total of 169 targets, each of which has one or more indicators for tracking its progress (United Nations, n.d.). The SDGs, as defined by the UN, are listed below:

1. No poverty: end poverty in all its forms everywhere.

2. Zero hunger: end hunger, achieve food security and improved nutrition and promote sustainable agriculture.

3. Good health and well-being: ensure healthy lives and promote well-being for all at all ages. 4. Quality education: ensure inclusive and equitable quality education and promote lifelong

learning opportunities for all.

5. Gender equality: achieve gender equality and empower all women and girls.

6. Clean water and sanitation: Ensure availability and sustainable management of water and sanitation for all.

7. Affordable and clean energy: ensure access to affordable, reliable, sustainable and modern energy for all.

8. Decent work and economic growth: promote sustained, inclusive and sustainable economic growth, full and productive employment and decent work for all.

9. Industry, innovation and infrastructure: build resilient infrastructure, promote inclusive and sustainable industrialization and foster innovation.

10. Reduced inequalities: reduce inequality within and among countries.

11. Sustainable cities and communities: make cities and human settlements inclusive, safe, resilient and sustainable.

12. Responsible consumption and production: Ensure sustainable consumption and production patterns.

13. Climate action: Take urgent action to combat climate change and its impacts.

14. Life below water: Conserve and sustainably use the oceans, seas and marine resources for sustainable development.

15. Life on land: protect, restore and promote sustainable use of terrestrial ecosystems, sustainably manage forests, combat desertification, and halt and reverse land degradation and halt biodiversity loss.

16. Peace, justice and strong institutions: promote peaceful and inclusive societies for sustainable development, provide access to justice for all and build effective, accountable and inclusive institutions at all levels.

(8)

7

17. Partnerships for the goals: strengthen the means of implementation and revitalize the global partnership for sustainable development (Sustainable Development Goals Knowledge Platform, n.d.).

The goals were formulated as a framework for governments and policymakers on all levels to consider while working on improving policies or developing new ones. However, the SDGs are not legally binding, and governments are therefore not obligated to govern according to the targets of the goals. This means that, strictly speaking, governments that fail to develop their policies in accordance with the SDGs cannot be penalized (Biermann, Kanie & Kim, 2017). While it would be hard to disagree that successfully achieving the SDGs by 2030 would be beneficial for the future of the planet, critics have argued that there are an excessive number of goals with priorities that are difficult to define, which may become distractions from the goal of attempting to eliminate poverty (The Economist, 2015). Moreover, the goals may be unrealistic, especially in terms of costs, and the UN has allegedly stated that it did not expect that each country in the world would achieve them; this means that, in addition to legal penalties not being applicable, shaming countries that fail to achieve these goals is also unlikely to be an option. Others have claimed that the indicators used for measuring the targets are insufficient and should be reevaluated and reformulated by experts (Hák, Janoušková & Moldan, 2016). Hickel (2019) argues that the goals are even contradictory, with, for example, the goal of a global economic growth of 3% (SDG 8) making it impossible to meet the climate-related goals (i.e. SDG 13, 14 and 15).

Although concerns about the SDGs may be valid, most opinions are positive. In a study that investigated the links among the various goals, Le Blanc (2015) used network analysis to prove that the SDGs are more interconnected than their predecessors and that this could significantly improve the efficiency of policymaking. While he also argues that policymaking in accordance with the SDGs will require substantial effort, he believes that the goals represent an excellent blueprint. Biermann et al. (2017) argue that goal-setting is a new, well-working form of global governance; they see the SDGs as a good example of how such governance could be implemented. Although the goals will need significant support and depend on the willingness of governments to engage in global governance collaborations, the authors see the SDGs as a significant improvement on previous policies and predict a promising future for them. This study investigates the discourse concerning the SDGs on social media, focusing on Twitter.

(9)

8 2.2 The basics of Twitter

Twitter is a social media platform that allows its users to post so-called micro-blogs. Tweets are messages of up to 280 characters that answer the question of “What’s happening?” This, however, has not always been the case. Originally, the maximum length of a tweet was 140 characters, and users would answer the question of “What are you doing?” (Marwick & boyd, 2011). Using hashtags, users can identify the particular topic or trend that their tweets refer to, and, while consuming content, users can search for topics of interest using hashtags. Users can make use of the @ symbol, followed by a username, to directly address another user (mentioning), and they are able to share each other’s messages by retweeting, thereby increasing the reach of a particular tweet. Like Instagram, Twitter uses following as form of connection between users, which means the relationship is directed: If you follow another user, it does not automatically mean that the other user also follows you. While Twitter users have a timeline that displays updates from the users whom they follow, your followers are not the only people who can read your tweets. Unless your Twitter account is set to private (which very few users do), people without a Twitter account can also view your tweets. This means that a single tweet potentially has a large audience. Currently, Twitter has approximately 330 million monthly active users (Statista, 2019a).

2.2.1 How is Twitter being used?

While the question of how Twitter is being used might seem relatively easy to answer, it is less straightforward than one might think. Based on the fact that user relationships on Twitter are directed, the platform seems less suited for a network of friends; Facebook’s (undirected) user relationships, for example, would be better suited for this. However, research in a workplace context shows that users do in fact use Twitter to update their followers on personal life events (Zhao & Rosson, 2009). Moreover, the authors argue that Twitter is perceived as a rather trustworthy source for personal updates due to the real-time nature of the platform. They believe that Twitter could represent a valuable channel for informal communication in a workspace and a means by which colleagues could get to know each other better. An important nuance of these findings is that this study was conducted in 2009, before Twitter changed its question and maximum tweet length. The original question “What are you doing?” seems much more personal than the new “What’s happening?”, and the added length of a tweet allows for more in-depth messages. Marwick and boyd (2011) identified several key uses of Twitter: as a broadcast medium, a marketing channel, a diary, a social platform and a news source. Most other studies have focused on the use of Twitter as a news source. Kwak, Lee, Park and Moon

(10)

9 (2010) found that 85% of tweets are news-related, and Matsa and Shearer (2018) found that 71% of Twitter users use the platform to read news. Murthy (2011) showed that traditional media channels have started to use (individuals’) tweets as sources of information, especially for breaking news, but the author also noted that the influence of such tweets is still limited. In cases involving quickly developing news stories, individuals who have directly witnessed an event serve as good sources for updates on breaking news. Based on these findings it can be expected that there is a strong focus on news among tweets regarding the SDGs, especially considering that (certain aspects of) the SDGs are very frequently featured in the media (e.g. climate change). Usage of Twitter in activist and political context are discussed in section 2.3, especially the first is very relevant to the SDGs.

2.2.2 Tweeting for an (imagined) audience

In research centered on individuals’ imagined audiences in communication, Marwick and boyd (2011) found that, like authors, Twitter users have an imagined audience to whom they post tweets. While a user’s followers can give an indication of who his or her audience is, this is often not correct. As described previously, practically anyone can read a tweet. The authors found that many of their respondents used Twitter to establish a “personal brand” by sharing tweets tailored to what they feel would be of particular interest to their followers. They use hashtags to guide their readers towards appropriate content and thus manage audience groups with different interests. The authors refer to this concept as “micro-celebrity,” referring to a practice where users “consciously use Twitter as a platform to obtain and maintain attention” (p. 122); interestingly, it is not only users with a high number of followers who engage in this behavior.

Research on tweets containing URLs by Wu, Hofman, Mason and Watts (2011) showed that there is a strong active elite on Twitter, with 0.05% of users being responsible for approximately 50% of all posted URLs. Their audiences were also found to be very homogeneous, as it transpired that members of the same user groups were the largest consumers: celebrities interacted with celebrities, bloggers with bloggers, and so on. Similar findings were reported by Kirilenko and Stepchenkova (2014), who traced tweets concerning climate change for a year. It was discovered that the discourse was dominated by a small group of elite users. Moreover, half of the tweets investigated referred to only 0.28% of the external domains considered in this research. Taking into account the aforementioned findings, it can be expected that actors who tweet about the SDGs are strongly influenced by an elite group of active users and that actors have particular audiences in mind when they tweet.

(11)

10 2.3 Awareness creation through social media

Likely the most significant example of awareness creation through social media is the “Kony 2012” video produced by Invisible Children. The video went viral, and, within several days, millions of people around the world became aware of a Ugandan war criminal whom they had likely not known about previously. Through the power of social media, Invisible Children was able to increase global awareness of this issue in a very short time.

While it is relatively easy to determine the success (in terms of views) of a viral video concerning a little-known issue like that described above, it is more difficult to determine the effect (in terms of awareness creation) of social media when it comes to an issue that initially has a broad audience and is not the subject of a viral campaign. In the absence of an obvious viral campaign and with a broader initial interest, it is challenging to determine who is discussing a certain topic: Are these just actors who were already involved with the topic, or do or do other actors gradually become involved? While, in the case of the “Kony 2012” campaign, the first shares were most likely also from involved actors, the pace at which the content spread indicates that “new” actors soon became involved.

2.3.1 Social media activism and action

Based on an Indonesian case study, Lim (2013) argued that, while social media activism often generates “many clicks,” “little sticks” (p. 653), which is consistent with the now widely used concept of “slacktivism.” Slacktivism refers to wide online support for a topic in terms of, for example, retweets, shares or joining a supportive Facebook group that does not translate to any actual impact (Morozov, 2009). However, Lim believes that social media activism can actually have an impact if certain criteria are met. She argues that, first of all, a message must be simple, catchy and easy to consume; secondly, taking action should not involve too much effort; and finally, the issue should not be contested by the mainstream media. When considering these requirements, the SDGs certainly satisfy the first point. While the underlying problems are immensely complex, the SDGs present a coherent framework of 17 goals that is easy to grasp. In research on the usage of social media for advocacy purposes among non-governmental organizations (NGOs), Lovejoy, Waters and Saxton (2012) argue that Twitter is a good medium for improving direct engagement with stakeholders on a daily basis. However, their findings indicated that the majority of the sample used Twitter only as a one-way communication channel, meaning that they share information with their followers but do not engage in conversation with them. In a subsequent study, Guo and Saxton (2014) found comparable results. Their research formulated a three-step process by which NGOs can engage new actors

(12)

11 on Twitter: (1) reaching out to people by generally educating and informing them on the topic in question (one-way communication), (2) keeping them engaged by aiming deeper communication at more informed and involved actors and (3) getting the more involved actors to act by, for example, promoting events and direct action. The process is not only hierarchical in terms of the manner in which it progresses but also in message density, as, in line with the aforementioned results, the one-way educational step is most emphasized. A nuance of these findings is that both studies focused exclusively on large and well-known NGOs, whose behavior on Twitter might significantly differ from that of smaller organizations, as larger NGOs most likely have many more followers.

2.3.2 Retweets, followers and influence

While advocacy through social media or Twitter in particular may not be very effective in terms of leading to action, the results discussed above suggest that it is effective when it comes to creating awareness. Taking into account the large audience that could potentially be exposed to a tweet, this makes sense; when one also takes into consideration the power of retweets, this makes even more sense. Kwak et al. (2010) found that, due to a cascading effect, a tweet that is retweeted reaches approximately 1,000 extra users on average. This number is stable for users with up to 1,000 followers, which means that, even when users do not have many followers, retweets can make a significant difference in terms of reach. A study specifically on retweeting identified several reasons why users retweet (boyd, Golder & Lotan, 2010). Particularly relevant to the case of the SDGs is “retweeting for social action,” where users retweet a message to support the cause that it refers to. In some cases, the poster of the original tweet will also specifically request that others retweet the message in order to increase the likelihood that the topic becomes trending. Other relevant reasons are “amplifying tweets to a new audience,” “publicly agreeing with someone,” “validating others’ thoughts” and “referring to less popular or visible content.” While many Twitter users provide a disclaimer in their biographies that retweets do not necessarily imply endorsement, retweeting still serves to support and increase the reach of a tweet. Suh, Hong, Pirolli and Chi (2010) found that specific characteristics of tweets and the users posting them that are unrelated to their content influence retweetability. Most importantly, the use of hashtags and URLs in a tweet increases retweetability; the user factors that have an effect are (unsurprisingly) the number of followers and account age. The first of these findings indicates why trending topics often have a hashtag and using a hashtag is effective in terms of increasing reach. In terms of increasing awareness, research shows that, while the absolute audience might be larger for users with a high number of followers, this is

(13)

12 not the most important factor (Cha, Haddadi, Benevenuto & Gummadi, 2010). The authors argue that, when attempting to obtain an impression of a user’s influence, retweets and mentions are more relevant than followers. Moreover, their research shows that a high follower count does not necessarily lead to a significant amount of engagement in terms of retweets and mentions. Consistent with the findings of Suh et al. (2010), Cha et al. (2010) found that mainstream news media generally receive many retweets, even though they are not the most popular in terms of followers. Their tweets are, however, rather retweetable, as they often contain URLs that contain potentially interesting information for other Twitter users. While this study shows that popular users are not necessarily the most influential, it should be emphasized that users with more followers are often more influential than those with fewer followers.

2.3.3 Cases of awareness creation on Twitter

In research on using Twitter to increase breast cancer awareness, the previously described emphasis is confirmed (Thackeray et al., 2013): In their dataset, the authors found that users with a higher number of followers were retweeted and mentioned more frequently. The research was conducted in the context of Breast Cancer Awareness Month, which led to increased activity related to the topic (although this activity significantly declined immediately after the first day). In terms of awareness, the authors suggest that tweets concerning raising funds are the most effective, and, consistent with this observation, tweets of this nature were found to be the most common. As their findings show that popular users are the most influential, the authors also suggest that advocates attempt to involve more celebrities in spreading awareness.

A study that focused on the correlation of Twitter use and participation in Earth Hour3 in Australia showed a positive effect of Twitter activity on energy savings (Cheong & Lee, 2010). Twitter activity on the topic of Earth Hour in five Australian states was measured and compared to the estimated energy savings that resulted from the event. The positive effect found in the study shows that, in this case, the awareness creation on Twitter even led to action. As the action in this case did not require much effort, it is consistent with the criteria for successful use of social media for action formulated by Lim (2013).

2.3.4 Measuring the effect of Twitter using political campaigns

While, in many cases, it is difficult to quantify the actual effect of awareness creation on Twitter, election campaigns can provide clear evidence of the impact of Twitter activity. While this is slightly outside of the scope of this thesis, it is relevant to briefly note some findings with

(14)

13 regard to the effect of Twitter in politics. Although they refrained from claiming that their findings were completely valid, Larsson and Moe (2012) found little to no Twitter influence on the Swedish election campaign that they investigated. The most active politician in their dataset did not manage to win a seat in the Swedish Parliament; similarly, the most active party did not manage to secure representation in Parliament. This can be explained by the fact that only an estimated 1–8% of the Swedish electorate was active on Twitter at the time. Similarly, Hong and Nadler (2012) showed that high levels of activity on Twitter did not lead to significantly increased attention for individual politicians. They did, however, identify a correlation between the amount of attention a politician received in traditional media and the attention that he or she received on Twitter, which indicates that Twitter responds to traditional media. Stier, Schünemann and Steiger (2018) recognized a similar trend in their research on political policy debates. Their research shows that traditional authoritative actors in policy debates keep their strong voice on Twitter, regardless of how active they are. As was the case in the studies considered previously, the authors recognized a certain degree of responsiveness on the part of Twitter users to both real-life events and traditional media. This is again in line with the previously described findings concerning the popularity of a topic on social media not necessarily leading to action. However, the reach that is generated by actively tweeting about a certain topic should be able to gradually increase awareness of a particular topic.

2.4 Issue mapping

As this study does not look at the actual (measurable) effect of tweets about the SDGs on public awareness, the concept of issue mapping is used to present the results. This means that the discourse on the SDGs is presented in a way that emphasizes which actors play a role in the discourse, what the connections between those actors are and why these actors are involved.

2.4.1 Actor-network theory and the cartography of controversies

Issue mapping is a technique that stems from Bruno Latour’s actor-network theory (ANT) and the related cartography (or mapping) of controversies (Latour, 2005; Venturini, 2010). Actor-network theory builds upon the view that “everything in the social and natural worlds [is] a continuously generated effect of the webs of relations within which they are located. It assumes that nothing has reality or form outside the enactment of those relations” (Law, 2008, p. 141). This means that the situation of an actor within its network(s) is crucial for its characteristics and that an actor can, for example, also be a material object. The method, which is widely used, uses these concepts to describe technological processes and knowledge creation. Venturini (2010) describes the cartography of controversies as a vehicle for education on ANT, as the

(15)

14 practice builds upon the same ideas, but it is less theoretical: “As such, the cartography of controversies may appeal to those who are intrigued by ANT, but wish to stay clear from conceptual troubles” (p. 258). The essence of the method can be summarized as “Just look at controversies and tell what you see” (p. 259), which suggests a certain degree of ease of application. However, bearing the definition of ANT in mind, one cannot “just” describe what one sees when working with this theory. Controversies are described as a state of disagreement in the broadest sense of the word: “controversies begin when actors discover that they cannot ignore each other and controversies end when actors manage to work out a solid compromise to live together” (p. 261). To describe complex controversies, the best approach seems to be to map them out, clearly identifying different perspectives. With the greater availability of digital data, the task of mapping out controversies becomes much easier (Venturini, 2012). While manually investigating all of the sides of a controversy without the use of (online) digital data may prove almost impossible, the ready availability of large amounts of information changes that. It is in this regard that the cartography of controversies is especially relevant to this research.

2.4.2 From controversies towards issue mapping for the Sustainable Development Goals

While working with controversy analysis, Marres (2015) suggests a shift from mapping controversies to mapping issues, especially when using digital media as a source of data (as is the case in this research). She argues that switching to issue mapping broadens the range of topical possibilities, as “controversy analysis used to begin with a robust controversy in order to detect given actor relations, issue mapping begins with a given topic in order to detect emerging issue formations” (p. 672). While the mapping of controversies is specifically aimed at disputes, issue mapping remains more neutral in its essence. Rogers et al. (2015) define the goal of issue mapping as follows:

to produce mappings that will aid in identifying and tracing the associations between actors involved with an issue, and to render them both in narrative and visual form so that they are meaningful to one’s fellow issue analysts and their audiences. (pp. 9–10)

While the SDGs address topics that are particularly controversial (climate change in particular), as a whole, the goals do not appear to have prompted much controversy. Although the SDGs are not free of criticism, public opinion appears to be largely positive. However, this does not mean that the topic is not worth investigating. Even though they are strongly interconnected (Le Blanc, 2015), the SDGs can still be seen as seventeen individual goals that, for various

(16)

15 reasons, may prove more or less important to particular actors. This means that the SDGs might have slightly different meanings for different actors. While, strictly speaking, the SDGs form a cohesive whole that does not emphasize any particular individual goal, actors may attribute varying means to this whole based on how they perceive certain aspects thereof. An example of this phenomenon would be an individual with a high standard of living who lives in a coastal area at risk of flooding versus an individual whose primary daily concern is obtaining enough to eat; it would not be surprising were the first individual to perceive the SDGs as being primarily associated with climate change, whereas the second would primarily associate them with combating poverty and hunger. To explore how the SDGs are represented online, this research uses Twitter to investigate the materialization of this issue. While there is no mapping in the literal sense of the word involved, the process of describing the characteristics of the discourse on the SDGs on Twitter is based upon the aforementioned concepts.

2.5 Categorizing Twitter

To be able to effectively characterize the discourse on the SDGs, the relevant actors and their tweets need to be categorized. Since the foundation of Twitter, several studies have been conducted on the categorization of tweets and Twitter users. Some of these are extremely broad, whereas others are more detailed or aimed at a specific topic (such as is the case in this study). Adopting a chronological approach, this section elaborates on these studies and their findings. Java, Song, Finin and Tseng (2007) were the first to attempt a broad categorization of tweets; their study was conducted only one year after the launch of Twitter. Their study focused on understanding this new social media platform and attempted to do so by examining the different types of tweets and users found on the platform. Based on manual categorization, they found four (broad) main types of tweets: ‘daily chatter,’ ‘conversations,’ ‘sharing information/URLs’ and ‘reporting news.’4 Undefinable tweets were omitted from the analysis. While Twitter has obviously evolved from this point, these categories can still generally be applied. For example, where Java et al. (2007) recognized conversations between users when they mentioned each other with the @ symbol, directly replying to one’s post is now also possible and widely practiced. Despite the fact that this categorization is broad, overlaps between categories can also be identified. Java et al. defined the ‘sharing information/URLs’ category as including tweets that contain a URL. When the URL leads to a news article, it could also be defined as ‘reporting news,’ or it could be part of a conversation. With regard to Twitter users, the authors

4_{To establish a distinction between quotations and names of categories, throughout the thesis single quotation}

(17)

16 used an even broader categorization. They divide their dataset into three primary types of users: ‘information source,’ ‘friends’ and ‘information seeker,’ with the first generally being users with a high number of followers, actively sharing tweets and the last generally being users who are less active but follow a high number of users and consume shared content. The authors define ‘friends’ as an extremely broad category containing a wide range of social relationships. In research focusing on conversation and collaboration on Twitter, Honeycutt and Herring (2009) built upon the work of Java et al. (2007). Their study revolved around the use of the @ symbol for conversational and collaborative purposes and criticized Twitter’s lack of better options (this has been improved with the introduction of the reply function). The study also considered the differences between tweets containing the @ symbol and those without. The authors manually coded a sample of the collected tweets and categorized them into twelve categories: ‘about addressee,’ ‘announce/advertise,’ ‘exhort,’ ‘information for others,’ ‘information for self,’ ‘metacommentary,’ ‘media use,’ ‘opinion,’ ‘other’s experience,’ ‘self experience,’ ‘solicit information’ and ‘other.’ While the approach to choosing these categories seems to differ to that used by Java et al. to identify the four used in their work, some of the categories identified by Honeycutt and Herring could be classified as subcategories of ‘daily chatter’ (i.e. ‘metacommentary’ and ‘media use’), ‘information sharing’ (i.e. ‘information for others’ and ‘self experience’) or ‘news reporting’ (i.e. ‘other’s experience’), while ‘about addressee’ and the study as a whole are closely linked to the ‘conversations’ category (characterized by the use of the @ symbol). Unlike Java et al., however, Honeycutt and Herring did not categorize the users of the platform.

In a study focused on corporate branding, Jansen, Zhang, Sobel and Chowdury (2009) analyzed the contexts in which brands are mentioned on Twitter and how corporations use this communication channel. As a secondary goal of the research, a sample of tweets mentioning brands was manually coded into four categories linked to branding: ‘sentiment,’ ‘information seeking,’ ‘information providing’ and ‘comments.’ The first category focuses on tweets that express a certain sentiment with regard to a brand (these tweets could also fit into other categories). The next two categories are self-explanatory, and, as coding was performed hierarchically, all of the residual tweets were placed in the final category; in these cases, the brand was not the central focus of a tweet. Jansen et al. (2009) also refrained from categorizing users. As this study focused on a very specific topic and the categorizing was done with the goal of understanding brand-related content, it is not possible to directly apply these categories in this study. However, since one could argue that awareness creation and sentiments regarding

(18)

17 the SDGs could be compared to brand awareness and sentiment, the train of thought of these authors is taken into consideration while categorizing the Twitter corpus used in the present study.

Naaman, Boase and Lai (2010) published a study focused on the message content of tweets in the context of “social awareness streams” (streams of short, personal messages that can be consumed by a large group of users as they appear on Twitter). Unlike Honeycutt and Herring (2009), who focused specifically on tweets directed at specific users, Naaman et al. (2010) focused on tweets that are not directed at anyone in particular and are not replies. In addition, tweets posted by corporate accounts or other accounts that are primarily used for marketing purposes were filtered out. However, despite the differences between the types of tweet samples used, after manual coding, the authors developed a coding scheme with categories that has many similarities to that formulated by Honeycutt and Herring. Naaman et al. used nine categories to classify the tweets in their dataset: ‘information sharing,’ ‘self promotion,’ ‘opinions/complaints,’ ‘statements and random thoughts,’ ‘me now,’ ‘question to followers,’ ‘presence maintenance,’ ‘anecdote (me)’ and ‘anecdote (others).’ In the research conducted by Naaman et al., tweets could also fall into more than one category to avoid ambiguous choices related to overlaps between categories. Based on the chosen categories, the authors also defined two groups of users: ‘meformers’ (by far the largest group), who mainly post tweets about themselves, and ‘informers,’ who mainly share information. While these user categories are again very broad, they represent a clear division in the behavior of Twitter users, a division that might very well be identified in the data analyzed in this study as well.

In a study specifically focused on the classification of tweets, Dann (2010) built upon the previously mentioned studies. Based on 2,841 tweets from his personal Twitter account, the author developed a system for classifying tweets. Once again, the method used was manual coding, which the author based on previous studies. However, not every category was eventually filled, which indicates that the author also added certain categories based exclusively on the results of other studies. Dann (2010) suggests six main categories, which are divided into several (less important) sub-categories: ‘conversational,’ ‘pass along,’ ‘news,’ ‘status,’ ‘phatic’ and ‘spam.’ ‘Spam’ was among the categories with no assigned tweets and was also not particularly relevant, since it only focused on messages created (automatically) without a user’s consent. Five relevant categories remain, of which only ‘pass along’ and ‘phatic’ require explanation. ‘Pass along’ includes retweets, endorsements and the sharing of one’s own content (outside of Twitter). It specifically does not include the sharing of news items, as these tweets

(19)

18 have their own category. ‘Phatic’ tweets are those that do not have a specific goal and are merely shared as a form of social communication. As this research only focused on tweets from a single user, user categories are not addressed.

Larsson and Moe (2012) conducted a study on Twitter users in the context of the 2010 Swedish election campaigns (they did not categorize individual tweets). While their study focused on the political spectrum, the categories of users that the authors identified are broad and generally applicable. Within the dataset, their focus was on “high-end” users and the ways in which such users interact with each other. Therefore, the authors analyzed network data on mentions and retweets. Their approach to the categorization of the users investigated focused on the latter’s behaviors within their networks. The authors made a distinction between behavior involving the use of mentions (@) and that involving the use of retweets and identified three categories of users for each group: ‘senders,’ ‘receivers’ and ‘sender-receivers’ for mentions and ‘retweeters,’ ‘elites’ and ‘networkers’ for retweets. While the names of the categories are different, ‘senders’ and ‘retweeters’ can be seen as the active group (as they send many tweets containing mentions and actively retweet), ‘receivers’ and ‘elites’ can be seen as the visible group (as they receive many directed tweets and are often retweeted) and ‘sender-receivers’ and ‘networkers’ are both active and visible. These last groups were therefore considered the most “high-end” in this study and are also generally very central in a social network. Although it can be valuable in terms of analysis to identify users as being any of these types, such an approach does not indicate anything about a user’s type in terms of profession or assumed purpose in being on Twitter (e.g. whether a user is a political actor, a news medium, an activist, etc.). The findings of this research can therefore be used to determine which actors would be valuable to analyze, but the categories do not fit the goal of the present research very well. Finally, the most recent study to address the topic of categorization focused on the influence that individual activists and advocacy groups can have on policymakers through Twitter (Stier et al., 2018). The authors collected tweets on two policy debates and, like Larsson and Moe (2012), took a sample of the 500 most “high-end” users, defined by their centrality as determined by the PageRank algorithm. In order to identify differences among them, they categorized these users as belonging to different actor groups through manual coding. The groups that they defined are: ‘politics,’ ‘media’ (divided into ‘traditional,’ ‘online’ and ‘citizen media’), ‘NGO,’ ‘industry celebrity’ and ‘individual activists.’ Categories that may require explanation are ‘industry celebrity’ and the three media categories. To be identified as ‘industry celebrity,’ a minimum of 100,000 followers is required, and politicians are not included (as they

(20)

19 fall within the ‘politics’ category). Generally speaking, this group consists of, among others, CEOs of large firms, actors and musicians. Two of the three media categories are professional media, either referring to print, radio and TV (‘traditional’), or to channels that solely operate online. ‘Citizen media’ refers to unprofessional media, which relies on unpaid submissions; however, for an actor to be included in this category, there must be more than one person involved. Unlike the previously mentioned studies, the actor groups used in Stier et al. (2018) were linked to the professions and/or assumed goals of users, making them more applicable to the present study.

When reviewing the aforementioned studies, it became clear that none had used the same categorizations, primarily because the various researchers focused on different topics, aimed to obtain other results or simply conducted their work from different angles. One could argue that developing categorizations for tweets and Twitter users that always apply and are at the same time appropriate for any research goal is not possible. However, the authors have used the work of others for inspiration and to obtain background information, identified partially matching categories and used similar methods to develop approaches to categorizing their datasets. Although the combined findings of these previous studies were unable to provide a suitable framework for the categorization of the actors and tweets investigated in this study, they did serve as inspiration, both in terms of methodology and category choice. Moreover, there are several similarities to be identified between the categories that were ultimately used in this study and those used in previous research. The choices with regard to categorization made for this study and the similarities with those of the reviewed studies are presented in the methodology section.

(21)

20

3 Methodology

This chapter describes the methods used in this research. To obtain answers to the research questions, this research first categorizes the types of Twitter users (actors) who have tweeted about the SDGs. Second, it attempts to identify the clusters of actors that emerge from the social network investigated and the reason(s) for this clustering. Third, it aims to categorize the types of tweets tweeted by the aforementioned users in order to identify the goal(s) behind the publication of these tweets. Finally, the results are considered in the context of existing literature on awareness creation on social media and the initial goals that informed the development of the SDGs by the UN. This chapter is divided into five parts: The first describes the data collection process, the following chapters respectively address the three steps identified above and the last discusses the ethical decisions made in the process of this research.

There are several different studies in which Twitter users and their tweets have been categorized (Dann, 2010; Honeycutt & Herring, 2009; Jansen et al., 2009; Java et al., 2007; Larsson & Moe, 2012; Naaman et al., 2010; Stier et al., 2018). These studies have adopted different approaches to categorizing actors and tweets, with these approaches ranging from being very broad to being specifically tailored to a particular topic. The original objective of researching previous work on this topic was to derive a structured coding scheme from the literature alone. However, after carefully reviewing the approaches to categorization found in these studies, it transpired that they were not well suited to the data analyzed in this research. Even though certain categorization choices were inspired by previous work, this research has an exploratory nature. Each of the previous studies to some extent used a grounded theory approach involving inductively building a set of categories, which is why this study also does. However, while a strict grounded theory method has a wide set of requirements and specifically does not involve consulting existing work beforehand (Glaser & Strauss, 1967), the grounded theory approach applied in the context of this research focuses on the inductive nature of developing a code scheme and eventually a categorization process. For each of the three data samples that were categorized, the code scheme was built by categorizing single entries on the fly and iteratively analyzing the data to account for changes during the process. The categories chosen for the three samples also share several similarities. As becomes clear from the aforementioned, this research adopted a qualitative approach. Following Gaffney and Puschmann (2014), this choice was made to avoid the risk of misinterpreting content through the adoption of a (computational) quantitative approach. While the more nuanced nature of a qualitative approach was more suited for this research, a quantitative note was added in the form of the analysis of the overall hashtag

(22)

21 frequency in the dataset used, which provided perspective on the findings from the three samples. The Excel files that were used during the process of analysis can be found in the appendices.

3.1 Data collection

In order to achieve the goals identified above, tweets were collected over a one-month period (March 5 – April 5 2019) using the DMI-TCAT tool developed by the Digital Methods Initiative (Borra & Rieder, 2013). This tool tracks tweets based on a search query and saves every tweet that matches the query to a dataset. From a sustainability dataset with over 500,000 tweets, based on the keywords “circular economy,” “global goals,” “sustainable development goals,” “sustainable development,” “circulareconomy,” “globalgoals,” “sdg,” “sdgs,” “sustainability,” “sustainabledevelopment” and “sustainabledevelopmentgoals,” 86,838 tweets from 40,291 individual users were selected based on a search for tweets containing the hashtags #SDG or #SDGs. When the dataset was examined, it immediately became clear that it contained the high number of 65,930 retweets, which is approximately 75% of the dataset.

3.2 Actor categorization

To obtain insight into the types of actors in this dataset, an export was made of all the users in the dataset, including those who did not themselves tweet but were mentioned in a tweet. This means that the number of users (49,218) is higher than the figure mentioned previously. This DMI-TCAT export listed both the activity (number of posted tweets) and the visibility (number of mentions) of each user. As suggested in research by Larsson and Moe (2012), the most “high-end” users were used for the analysis; thus, a sample of the most visible and active users was categorized, starting with the 100 most visible and the 100 most active users. To add an extra dimension to the sample, a third number was added: the sum of each user’s mentions and tweets. To account for overlap within the samples, the samples were selected in a hierarchical order, which means that after the 100 most visible users were selected, the 100 most active users who were not already part of the sample were added. Finally, the 50 users with the highest sum of tweets and mentions who were not part of the previous groups were also included in the sample. This means that a total of 250 users were individually categorized. These 250 users account for 13,411 of the tweets in the dataset (approximately 15% of the total tweets) and 50% of the mentions.

Users were categorized by manual coding in Excel, first by reviewing the biographies provided on their Twitter accounts (which were, where necessary, translated with Google Translate). Whenever this approach did not provide enough information, tweets posted by the user were

(23)

22 examined; alternatively, as suggested by Stier et al. (2018), Google Search and LinkedIn were used to find additional information about the actor in question. This resulted in eight distinct categories of actors, three of which were subcategorized as being either an individual or an official organization account. As not every user could be sufficiently described using one of these eight categories, in some cases a user was identified as belonging to two categories. The categories, along with brief descriptions of their contents, are listed below:

• Activist: users who actively share content supporting the SDGs. They do not have to specifically state that they are advocating for the SDGs, but their tweets and/or biography should show strong affinity.

• Business (individual or official account): users who are involved with the SDGs through their businesses. They could, for example, consult on the topic or work at a company that contributes to the achievement of the SDGs. An important distinction is that this should be for-profit work.

• Education: users who have a strong relation to education, either specifically concerning the SDGs or in general. This category consists of teachers, researchers and official university accounts.

• Influencer: users who share content for financial purposes. Their behavior fits in the ‘sender-receiver’ and ‘networker’ categories (Larsson & Moe, 2012). They tweet and retweet frequently and mention other groups of influencers to widen the reach of their messages. They have large numbers of followers, ranging from 2,000 to over 100,000. Those included in this dataset had also shared content related to the SDGs.

• News: users who share updates from news outlets; this can either be aggregated news from several sources or news from a single source. The group also includes official Twitter accounts of news agencies.

• NGO (individual or official account): users who represent or work for an NGO.

• UN/Government (individual or official account): users who represent or are related to governmental or UN entities. Included are, among others, official accounts and both high-ranking and low-ranking officials.

• Other: the residual category for users who did not fit into one of the other categories, for example because they had no clear relation to the SDGs.

A second category was added for example to users who were connected to the SDGs through their businesses and were therefore included in the ‘business’ category but who also strongly advocated for the SDGs. They were therefore included in the ‘activist’ category. Aside from

(24)

23 the possibility of classifying a user as belonging to a second category, two extra tags were added to account for very specific users. They are as follows:

• Bot: a “user” that automatically shares messages;

• Filipino project: users who belong to a project concerning the SDGs conducted at a university in the Philippines. These users and their tweets were significantly represented in the dataset, but they were rather disconnected from the rest of the users and only active for a short period of time, which is why a separate category was established. To identify potential differences in the categorizations for each of the three sample groups, the users were also tagged with the sample group to which they belong (either the mentions group, the tweets group, or the group with the sum of mentions and tweets).

Despite the fact that the findings of previous studies could not be applied to the dataset and the goals of this study, similarities can be found in terms of approaches to categorization. As the study by Stier et al. (2018) features the most similarities, this is discussed first. The ‘politics’ category is similar to the ‘UN/Government’ category, while the ‘industry celebrity’ category is similar to the ‘influencer’ and ‘UN/Government (individual)‘ categories, even though the 100,000 follower requirement for the ‘industry celebrity’ category is not applied in this study. Moreover, both studies include ‘activist’ and ‘NGO’ categories. Finally, the ‘media’ category resembles the ‘news’ category. The present research also has some similarities to earlier studies, however. For example, the ‘influencer’ category is similar to the ‘information source’ (Java et al., 2007), ‘informers’ (Naaman et al., 2010) and ‘networker’ (Larsson & Moe, 2012) categories. The other categories of the present study also all fit rather well with the ‘informers’ and ‘information source’ categories, as the nature of this dataset is generally rather informative. ‘Meformers’ (Naaman et al., 2010), ‘friends,’ and ‘information seekers’ (Java et al., 2007) would generally fall in the ‘other’ category. Finally, users belonging to the ‘UN/Government’ category could also be seen as ‘receiver’/‘elites’ (Larsson & Moe, 2012), as they are mentioned and retweeted frequently. Some concepts and categorizations from previous studies are used to obtain additional insights into the results presented in this study; these contributions are discussed in the results section.

3.3 Clusters of actors

In order to identify clusters in the social network of actors that used the hashtags #SDG and #SDGs on Twitter, a network file was extracted from the dataset collected using DMI-TCAT. This file contains nodes and edges, which can be transformed into a social network graph using

(25)

24 visualization software. In this research, Gephi was used to visualize network files. Gephi is an open-source network visualization tool developed to speed up the exploration and analysis processes when working with large network files. Beyond the broad customization options Gephi offers when it comes to analyzing a network, it has several integrated algorithms that can be executed simultaneously (Bastian, Heymann & Jacomy, 2009). These features make this software a perfect fit for this research.

3.3.1 Creating the social network graph

The graph was created based on mentions, meaning a tweet that is specifically directed at one or more users through the use of the @ symbol before another user’s username. When a tweet contains a mention, a directed edge is created from the user (node) who shared the tweet to the mentioned user. An important detail is that DMI-TCAT saves the content of a retweet with “RT @user:” added to the beginning of the tweet text. This means that every retweet also appears in the social network graph as if it were a mention, with a directed link from the retweeting user to the user who originally posted the tweet. The social network graph contained 46,592 nodes (individual users) and 94,323 edges (directed links between two users). These numbers are slightly different from the numbers in the original dataset; this is due to the fact that one tweet containing, for example, five mentions will result in five nodes and five edges. In contrast, tweets that featured the hashtag #SDG or #SDGs but did not contain a mention were not included in the social network file (10,730 tweets did not contain mentions). This means that the network contains users who were mentioned in a tweet containing the hashtag #SDG or #SDGs but did not post a tweet containing either of the hashtags themselves; however, it also resulted in users who did use the hashtags but did not mention another user being omitted. The social network graph was inspected for nodes that clung together and formed clusters within the social network graph. To keep large social network graphs (and the clusters within) clear, DMI-TCAT sets the maximum amount of users (nodes) to 500 by default and selects the top users based on number of mentions. For a dataset of over 45,000 nodes, however, 500 is a rather low number. Despite the fact that most users were only included based on either having sent one tweet mentioning another user or having been mentioned in a tweet without having tweeted themselves, more users need to be included to prevent an arbitrary selection. For this reason, several alternative selections were chosen to ensure that the final social network graph was fairly represented: the top 2,000 users, users with at least ten mentions, users with at least seven mentions and users with at least five mentions. In addition, Gephi’s giant component filter was used to filter out nodes that were not connected to the central (giant) part of the

(26)

25 network. This was done to prevent very small and irrelevant clusters from creating noise within the network. These are all options that resulted in different numbers of nodes and recognized clusters. The networks and clusters found using the different selection methods were compared with each other in order to identify significant differences in the results that might argue for the use of a particular method. Despite the fact that these selection methods may have proven useful in making the graph clear and readable, they were still arbitrary and focused on facilitating the process of analysis. Moreover, both the “top user selection” within DMI-TCAT and the filters applied within Gephi focus on the number of mentions a user has (the in-degree in the network). This means that active users who send a high number of tweets (the out-degree in the network) but who are not mentioned very frequently are filtered out. This is problematic, as an active user can also serve as a central node within a cluster by connecting other users. Therefore, in order to prevent an arbitrary selection of data, as doing so may have made the results less valid, the complete dataset was eventually used, without the giant component filter being applied. Using the complete dataset produced results that differed significantly from those obtained using other selections, which justified this choice.

3.3.2 Recognizing clusters

In order to identify clusters in this social network, the modularity algorithm was used (Blondel, Guillaume, Lambiotte & Lefebvre, 2008). This algorithm has proven to be well-suited to extracting community structures from within large networks and has been tested on networks of over millions of nodes. The modularity algorithm is integrated into Gephi and was therefore easy to apply to the dataset. To ensure the most accurate categorization, the algorithm includes a randomizing option (which increases computation time); this functionality was used to obtain the most valid results possible. It does, however, mean that re-running the algorithm produces slightly different results. Other options are including or excluding the weight of the edges (taking into account how strong the relationship between two nodes is) and changing the resolution. The resolution lets users decide whether they want a higher or lower number of clusters,5 with the default setting being 1.0. Both of these options were left at their default settings (i.e. with edge weight being included and a resolution of 1.0).

Using these settings, 1,374 clusters were found among the 46,592 nodes, the majority of which included very few nodes and consisted of less than 0.1% of the nodes in the network, whereas the largest clusters contained around 5,000 nodes (around 10% of the network). The large

5_{The modularity algorithm calls the clusters “communities,” but, for the sake of consistency, “clusters” is used}

(27)

26 number of clusters and the extremely small sizes of the majority of the clusters resulted in some arbitrary data selection. As there was a gap in cluster size after a cluster containing 39 nodes (0.08%), clusters with fewer than this number of nodes were excluded from the analysis. This resulted in 61 clusters being analyzed.

3.3.3 Categorizing the clusters

The process of finding a common denominator within these clusters relied on manual research, mainly by reviewing the details of the Twitter accounts of the users within a cluster and, whenever more detailed information was needed, by examining the posted tweets and links in a user’s biography or through the use of Google Search or LinkedIn to obtain additional information (Stier et al., 2018). Again, foreign language information was translated with Google Translate. To most efficiently find a common denominator across clusters, the categorization process started with the most important users in the cluster: those who were the most visible (receiving the most mentions and retweets6_{), those who were the most active (posting the most}

tweets) and those who were both visible and active. Larsson and Moe (2012) categorized these groups as ‘senders’ (active), ‘receivers’ (visible) and ‘sender-receiver’ (both visible and active), with all of these groups generally being very central in a cluster. However, analyzing clusters based only on user data can be challenging, as some important users may transpire to be outliers within a cluster and may cloud one’s judgment when categorizing a cluster. Therefore, visual analysis was used as an aid during the process of categorizing clusters.

Visual analysis can be very valuable when analyzing networks (Venturini, Jacomy & Pereira, 2014). Gephi was also used to create a better visual representation of this social network of mentions. This was done using the ForceAtlas2 algorithm (Jacomy, Venturini, Heymann & Bastian, 2014), which causes “nodes [to] repulse each other like charged particles, while edges attract their nodes like springs” (p. 6). Using this approach, the algorithm creates a spatialization of the network that makes it much more readable; moreover, it also groups clusters together, showing (visual) clusters that correspond with the clusters identified by the modularity algorithm. To make the visualization even clearer, each of the found clusters was distinctively colored to facilitate the identification of the different clusters (despite the fact that many of them overlap). As a final visual aid, node sizes were resized based on number of mentions. Even though visual analysis was not the main method used in this research, it was a valuable help when the results from the modularity algorithm combined with the user data were not

(28)

27 sufficiently clear. This was particularly the case when it came to clusters that consisted of several smaller subclusters (outliers), which were not directly visible from the tabular data but could be easily identified in the visual representation of the network.

Using both the tabular data and the visual analysis, it was generally rather easy to identify a common denominator for the clusters. As no previous studies were found on the categorization of clusters of actors, the coding had to be done completely from scratch, albeit with the process being informed by the categories used in the actor categorization. As clusters were often difficult to grasp within one (or even two) categorization(s), the decision was made to choose a broad primary categorization scheme and to also use the more detailed descriptions of each cluster for the analysis. As there were only 61 clusters, it was within the scope of this study to consider detailed descriptions for the analysis concerning the clusters (concerning the size of the samples it was not possible to offer such descriptions for the actors and tweets). The clusters were coded into seven categories; these, along with brief explanations, are listed below:

• Business: clusters that are centered around businesses, accounts that tweet for marketing purposes and conferences.

• Local: clusters that are centered around organizations and individuals based in the same country or region.

• News: clusters that are centered around one or more news agencies/networks.

• SDG topic: clusters that are centered around a specific topic related to the SDGs. (This topic does not specifically have to be one [or more] of the seventeen goals.)

• UN/Government: clusters that are centered around several UN or government agencies. This category has three subcategories that further specify the nature of a cluster:7

o Activist interaction: where the cluster is formed by interactions between individuals and the UN or government accounts;

o Topic: where the involved UN or government accounts, as well as the rest of the cluster, focus on a specific topic (e.g. forestry or health);

o Local: where the involved UN or government accounts, as well as the rest of the cluster, are focused on a specific region.

• Undefinable: clusters that appear to consist of several subclusters. These clusters have relevant content but they are hard to place in one category. More detailed information about these clusters is used in the analysis.

7_{The subcategories partially overlap with the previously mentioned clusters. The fact that these clusters are}

Representing the Sustainable Development Goals: mapping the discourse on Twitter