• No results found

An Approach to Localness Assessment of Social Media Users

N/A
N/A
Protected

Academic year: 2021

Share "An Approach to Localness Assessment of Social Media Users"

Copied!
76
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

AN APPROACH TO

LOCALNESS ASSESSMENT OF SOCIAL MEDIA USERS

YAN ZHANG 25 February, 2019

SUPERVISORS:

Dr. F.O.Ostermann

Dr. C.P.J.M. van Elzakker

(2)

AN APPROACH TO

LOCALNESS ASSESSMENT OF SOCIAL MEDIA USERS

YAN ZHANG

Enschede, The Netherlands, 25 February, 2019

Thesis submitted to the Faculty of Geo-Information Science and Earth Observation of the University of Twente in partial fulfilment of the requirements for the degree of Master of Science in Geo- information Science and Earth Observation.

Specialization: Geoinformatics

SUPERVISORS:

Dr. F.O.Ostermann Dr. C.P.J.M. van Elzakker

THESIS ASSESSMENT BOARD:

Prof. Dr. M.J. Kraak (Chair)

Dr. J. Krukar (External Examiner, Institute for Geoinformatics,

Spatial Intelligence Lab, WWU Münster)

(3)

DISCLAIMER

This document describes work undertaken as part of a programme of study at the Faculty of Geo-Information Science and Earth Observation of the University of Twente. All views and opinions expressed therein remain the sole responsibility of the author, and do not necessarily represent those of the Faculty.

(4)

related to one city and the relationship between people and city is necessary. Social media data is a potential data source of local knowledge, and identification of users who are related to one city is the precondition to extract local knowledge from social media data. Localness, which is defined as result of accumulating life experiences in a local environment, represents the relationship between people and cities and indicates potential local knowledge of people. Localness types mean the different possibilities of people’s localness, and localness types include long-term resident, temporary or short-term resident, seasonal resident, non- local commuter, visitor and tourist.

The aim of this study is to design an approach to assess localness of social media users. User features are devised mainly from three perspectives to reflect localness. Temporal features answer how long one user stayed in the city and when the users were in the city, spatial features describe users from activity scope, activity concentration and tourism interest aspects, and social features represent the connection between social media users and local society. Other useful features include user-defined location in profile and language. User features and thresholds compose conditions for each localness type. More reliable features are selected as strong conditions. Conditions are combined to select users step by step from strict to loose selection conditions and all selected users are assigned one localness type as the output of the approach.

Twitter data in the London region are used as a case study to implement the approach. About 86% of Twitter users’ localness can be assessed. Long-term residents and temporary/short-term residents account for 29% and 22% respectively among all assessed users. Compared with ground truth data, the accuracy of localness assessment is 69%. Localness assessment of long-term residents is the best of all the localness types, but seasonal residents and visitors have a relatively lower performance based on F1-measure. In the implementation, the assessment based on strict selection conditions shows better accuracy.

It’s difficult to find clear relationship between features and localness types due to limited information from social media data, especially for spatial features. The selection of strong conditions and the thresholds used in strong conditions have a great influence on the assessment result. The approach is generic for geo-social media data and cities, and it works well in the case study. However, more ground truth data is needed to fit data sets and devise more reliable conditions.

Keywords

Localness, Social media user, Local knowledge, Twitter, London

(5)

My greatest thanks go to my first supervisor Dr. F.O.Ostermann and my second supervisor Dr. C.P.J.M.

van Elzakker for their effort on encouragement, guidance and commenting through the whole research.

Your rigor and speciousness not only helped me to complete the thesis, but also pushed me to grow up. I also thank all ITC staff members and all my classmates for their help and goodwill.

My deep gratitude towards of my parents for the great supports from them. My special thanks to my boyfriend, my best friend, my sister and anyone who trusted me, encouraged me and comforted me in the past year. I also thank my friends in ITC for their food and companion. Special gratitude for anyone who have ever given me a hug in the past year.

Thanks for myself for coming out of the darkness and the loneliness and beginning to grow up.

Yan Zhang

Enschede, the Netherlands.

February 2019.

(6)

1.1. Background ... 1

1.2. Research identification ... 2

1.2.1. Research Objectives ... 3

1.2.2. Research Questions... 3

1.3. Thesis structure ... 3

2. Localness studies ...5

2.1. Localness Definition ... 5

2.2. Local knowledge and localness ... 6

2.3. Localness of Social Media Users ... 7

2.4. Localness and Mobility ... 8

3. Individual localness and types ... 10

3.1. What is Individual Localness? ... 10

3.2. Relationship between Localness and Mobility ... 11

3.3. Localness Properties ... 12

3.4. Localness Types ... 13

4. Localness assessment approach ... 17

4.1. Data Collection and Filter ... 17

4.2. User Feature Extraction ... 18

4.2.1. Temporal feature ... 18

4.2.2. Spatial feature ... 20

4.2.3. Social feature ... 22

4.2.4. Other feature ... 24

4.2.5. Summary of user features ... 25

4.3. Rule-based Localness Assessment ... 25

4.3.1. Conditions ... 25

4.3.2. Sequence... 28

5. Case Study-Twitter Data in London ... 30

5.1. Study Area and Data ... 30

5.1.1. Study Area ... 30

5.1.2. Dataset Description ... 31

5.1.3. Sampling and Data Filtering ... 32

5.1.4. Target Localness Types ... 36

5.2. Feature extraction ... 37

5.2.1. Feature extraction implementation ... 37

5.2.2. Results... 39

5.3. Localness assessment ... 43

5.4. Validation of localness assessment result ... 46

5.4.1. Ground Truth ... 46

5.4.2. Validation and discussion ... 48

6. Discussion ... 54

6.1. Localness definition and types ... 54

6.2. Localness assessment approach ... 55

7. concLusion and recommendations ... 58

7.1. Conclusion ... 58

7.2. Recommedations ... 60

REFERENCE ... 62

APPENDIX ... 67

(7)

Figure 3.2: The relationship between localness and mobility ... 11

Figure 3.3: Localness Properties ... 13

Figure 4.1: Overall process of localness assessment approach ... 17

Figure 4.2: Flowchart of temporal feature extraction... 20

Figure 4.3: Flowchart of spatial feature extraction ... 22

Figure 4.4: Flowchart of social feature extraction... 24

Figure 4.5: Localness assessment sequence ... 28

Figure 5.1: Map of the study area ... 30

Figure 5.2: Filters in data pre-processing ... 34

Figure 5.3: Box plot of tweeting frequency ... 36

Figure 5.4: Histogram of tweeting frequency ... 36

Figure 5.5: Flowchart for spatial feature extraction in the case study ... 39

Figure 5.6: Histogram of overall duration of Twitter users... 40

Figure 5.7: Histogram of short duration of Twitter users ... 40

Figure 5.8: Histogram of the maximum interval of Twitter users ... 41

Figure 5.9: Histogram of average visit time of Twitter users ... 41

Figure 5.10: Histogram of tourist attraction proportion of Twitter users ... 42

Figure 5.11: Histogram of ellipse area ... 42

Figure 5.12: Histogram of core point proportion ... 43

Figure 5.13: Histogram of local follower proportion of Twitter users ... 43

Figure 5.14: Pie Chart of percentage of localness type ... 45

Figure 5.15: Pie Chart of percentage of localness steps ... 45

Figure 5.16: Scatter plot of maximum interval and duration of users with unknown localness ... 46

Figure 5.17: Spatial Distribution of typical users’ tweet points ... 48

Figure 5.18: Confusion Matrix of localness assessment result ... 50

Figure 5.19: Confusion Matrix of each step in localness assessment ... 51

(8)

Table 3.1: Localness type description ... 16

Table 4.1: User features and feature description ... 25

Table 4.2: Conditions of user features for localness types ... 26

Table 4.3: Condition combinations used in step 3 of localness assessment ... 29

Table 5.1: Attributes in the data set ... 32

Table 5.2: Description of Dataset ... 33

Table 5.3: Percentiles of Ellipse Areas ... 42

Table 5.4: The number of assessed users for each localness type in each step ... 44

Table 5.5: Maximum interval and duration of users with unknown localness ... 46

Table 5.6: User number of each localness type ... 49

Table 5.7: Confusion Matrix of a binary case ... 49

Table 5.8: Evaluation measures for each localness type... 50

Table 5.9: Ground truth of users assessed as unknown ... 52

(9)
(10)

1. INTRODUCTION

1.1. Background

In the last few decades, the tide of constructing smart cities has swept the globe and cities have become more efficient with the aid of Information Communication Technology (ICT) (Silva, Khan, & Han, 2018).

A smart city architecture should offer users digital, efficient and reliable services and it usually consists of five components: application plane, sensing plane, communication plane, data plane and security plane (Habibzadeh, Soyata, Kantarci, Boukerche, & Kaptan, 2018). In these components, applications are designed to meet the social needs and the rest of smart city architecture is determined by the technical requirements of applications. Typical applications includes smart environment, smart home and smart building, smart surveillance and smart transportation (Habibzadeh et al., 2018). Although the wide application of technologies is the foundation of realizing a smart city, the social infrastructure such as the awareness and commitment of citizens has significant influence on the quality of life (QoL) in the city (Silva et al., 2018), and the QoL should be one of the main attributes of smart cities (Mohanty, Choppali, &

Kougianos, 2016). In ideal conditions, people in smart cities should have a higher level of qualification, creativity, open-mindedness and more participation in public life (Giffinger & Gudrun, 2010). The intelligence of smart cities should reside in the combination of technologies and human brains (Nam &

Pardo, 2011). But to use the intelligence of people who live in smart cities, there are some questions: which kind of information from people can be used in smart cities, where can we get the information and who can be the supplier of the information?

Local knowledge of citizens can be considered as important information which can contribute to smart cities. Local knowledge is the information directly related to the local contexts or settings, including the knowledge of “specific characteristics, circumstances, events, relationships and the important understandings of their meaning” (Corburn, 2003, p. 421). In urban contexts, local knowledge is helpful to solve some social problems in some fields, such as public health, environment monitoring, community governance and urban planning (Black & McBean, 2016; Brabham, 2009; Díez et al., 2018; Kim, 2016; Scott, 2015). However, due to the intangibility of local knowledge, obtaining itis not easy for the municipal politicians, urban planners and anyone who needs the knowledge related to one local context.

Social media data can be one possible data source of local knowledge. According to Kaplan & Haenlein (2010, p. 61), social media are a group of Internet-based applications build on the Web 2.0, and people can create and exchange user-generated content with them. Social media platforms have millions of users and the users create content any time any place. Due to the open and public nature of social media, it is possible to collect massive data from social media and some social media provide APIs for people who want to use the data on platforms such as Twitter and Flickr. Georeferencing and geocoding make citizens become human sensors to provide geographic information about local activities and life (Goodchild, 2007). Social media that contain both users’ comments and geographic information can be called geosocial media (Zhang

& Feick, 2016). From geosocial media data, local knowledge can be discovered from the combination of

what people posted and the locations where these contents are related to. For instance, Konsti-Laakso

(2017) used Facebook data to reveal security issues and events in a neighbourhood. After the determination

of the data source, there is another problem: local knowledge points to the information related to specific

areas, but how to identify the people who can provide the information from millions of users of social

media?

(11)

The identification of people related to one area and the relationship between people and the area are necessary for local knowledge extraction from social media data. A common assumption used in social media studies is that users can be considered as local people from where they posted tweets, but in fact, it is highly possible that users post some comments during travelling to other places (Johnson, Sengupta, Schöning, & Hecht, 2016). So, only the appearance of one person in one area cannot prove that the person is local in the area. It is just the evidence that the person has a relationship with the area. The relationship between one person and one area indicates how much local knowledge the person can provide.

Undoubtedly, compared to short-time visitors and tourists, long-term residents can often provide more reliable local knowledge about the city Human mobility leads to a diversity of this relationship, especially in global cities. People may visit another area for various purposes such as education, health care, work, leisure and so on. Once they visit one area, they build a relationship with that area and this relationship may change over time. To meet the needs of globalized production, people with specialized skills tend to cluster in a limited number of cities, and the populations of global cities are changing frequently (Sassen, 2001). Those global cities usually have large populations and more resources which means that they have a stronger motivation to construct smart cities and gather local knowledge from people who have a relationship with these cities.

In previous research, the relationship between people and one area was identified by binary classification and the diversity of this relationship was ignored (Andrienko, Andrienko, Fuchs, & Jankowski, 2016; Grace et al., 2017; Ostermann et al., 2015). In such a binary classification, social media users are classified as local or non-local. Identification based on the location field in the user profile is the simplest way, but this information is not very reliable (Hecht, Hong, Suh, & Chi, 2011). They reported that over one-third of Twitter users provided fake locations, words not related to their location or did not enter anything in the location field and some users entered multiple locations. All these noises decrease the credibility of location information in the user profile, so this information is not enough for identifying the relationship between social media users and areas. Another commonly used method is classifying by temporal criteria and authors usually use a simple filter to separate local people from visitors. For example, the filter used in Andrienko’s work (2016) is that if the time span between the first and the last tweets is longer than 100 days the user can be considered as a local person, and Li, Goodchild, & Xu. (2013) used 10 days as the threshold in the filter.

Johnson et al (2016) summarized four common criteria for local people identification: whether one user stays at least n-days in the region, where most comments of the user are posted, where is the home location of the user and location information in the user’s profile. But they only used single criteria when they identified local people and they did not consider the diversity of the relationship between people and one area either.

“Localness” is a noun which refers to the quality or state of being local (Meriam Webster, 2019). The state of being local in one area for one person is one representation form of the relationship. The purpose of this relationship identification is extracting local knowledge, so to fit this purpose the term “localness” should be redefined based on the generic definition and on local knowledge. Localness should be an attribute of people to represent the relationship between people and areas. Considering the diversity of the relationship, the localness in this study should have the ability to indicate potential local knowledge of people and distinguish groups of people based on the local knowledge they may have. After such a localness definition and taking social media data as the source of local knowledge, an approach to localness assessment of social media users is needed to identify the users based on the potential local knowledge they have.

1.2. Research identification

This section is to specify the research objectives of this study and the corresponding research questions of

each objective. The research objectives are built up based on the research problem, as described in the

previous section, and the objectives are achieved by answering the research questions following them.

(12)

One localness assessment approach for social media users will be the outcome of this study. First, the existing localness definition and related criteria will be reviewed and be used as the basis of new localness definition and assessment. Localness will be redefined from a local knowledge perspective, and to represent the relationship between people and areas localness types will be specified. In the approach, users are represented by their features and conditions for each localness types will be designed. The localness assessment is based on the comparison of user features and these conditions. To evaluate the approach, it will be implemented in one case study to check the shortcomings and application effect of this approach.

1.2.1. Research Objectives

The overall objective of this study is to design an approach to assess the localness of social media users and implement the approach in one global city.

To achieve the overall objective, it is split into four sub-objectives:

1. To evaluate existing localness definitions and assessment criteria 2. To define individual localness and specify localness types

3. To design an approach to assess the localness of social media users

4. To implement and evaluate the approach using real-world data in a global city

1.2.2. Research Questions

Research questions related to sub-objective 1:

1. How do related works define localness?

2. Which are the criteria used in related works to assess the localness of individuals?

Research questions related to sub-objective 2:

1. How to define localness of individuals?

2. How to conceptualize different types of individual localness?

Research questions related to sub-objective 3:

1. Which user features can be used to determine the localness type of users?

2. How to assess the localness of a social media user based on user features?

Research questions related to sub-objective 4:

1. To what extent can the approach assess the user’s localness correctly?

2. What are the application conditions and the limitations of the approach?

1.3. Thesis structure

The research consists of four objectives and the thesis is organized by the sequence of objectives.

After the background and research identification in Chapter 1, the first objective is addressed in Chapter 2 by a literature review of existing localness definitions and criteria.

As for the second objective, localness will be defined based on local knowledge, and the properties and types of localness are specified in Chapter 3.

The description of localness is applied in Chapter 4 to assess the localness of social media users. The user

feature extraction in the approach is based on the localness properties and available information in social

(13)

media data. Localness assessment in the approach is based on user features and localness types. Some conditions are designed for each localness types, and whether one user features can meet conditions of one localness type will determine whether one user can be assigned as this localness type.

To implement and evaluate the approach, Chapter 5 presents a case study using Twitter data in Greater London. For the data set, the user features are calculated, and the approach designed in Chapter 4 will be used. The process of approach implementation and the result will be interpreted and discussed, and the evaluation of the case study results is based on a small sample which includes the labelled localness of users.

Chapter 6 contains the overall conclusion and discussion of this study and the recommendation for further

research.

(14)

2. LOCALNESS STUDIES

In this chapter, the first research objective will be addressed: reviewing and evaluating existing localness definitions and assessment criteria. First, localness definition in different fields are reviewed, especially localness used in user-generated-content. Second, to define localness based on local knowledge, definitions of local knowledge are summarized. Third, researches use localness or local people identification are reviewed, and existing assessment criteria are evaluated. Fourth, to describe localness systematically, one commonly used concept “mobility” is used in this thesis and mobility studies are reviewed in the last section of this chapter.

2.1. Localness Definition

Localness as a relatively common term has been used in several fields. In the field of economy, localness is used as a statement related to the local market and it is comparable to globalness or internationalization. For example, Persky and Wiewel (1994) used localness as an indicator to demonstrate the trend of being more local in global cities. Swoboda, Pennemann and Taube (2012) used localness and globalness to analyse how consumers perceive retail brands. Schmitt, Dominique, and Six (2018) proposed five criteria to assess the degree of localness of food production. In computer science, Tu, Su, and Devanbu (2014) used localness to represent a property of source code meaning that the code contains local regularities. Ballatore, Graham, and Sen (2017) defined a localness indicator as the ratio between the number of local Google search results and the total number to reveal the unevenness of digital geography. In politics, Gschwend, Shugart, and Zitte (2009) use the term “localness” to imply the phenomenon that some legislators have a subjective local focus and please local constituents.

Although these localness usages or definitions are proposed by authors in different fields, all of them try to represent the relationship between their object of studies and the local environment. Usually, they have a concept or situation as an opposite to show lesser degrees of localness such as globalness or the situation with less local connections.

Compared to the applications of localness in the above fields, the term “localness” is being used more in

the analysis of user-generated content (UGC), especially in the analysis of Volunteered Geographic

Information (VGI). Hecht et al. (2010) paid attention to the “localness” of participation across entire UGC

repositories and introduced spatial content production models to describe the proportion of locals in

repositories of Flickr and Wikipedia. Tahara and Ma (2014) used the term “localness” as the main indicator

to extract regional terms from linguistic features of tweets in ae method of local Twitter search. In their

work, the value of localness is the product of four indicators: the frequency of appearance of one regional

term, whether users in other areas post the term, the number of users in the target area posting the term

and the number of days users in the target area post the term. Regional terms with high localness values are

used in the local area feature vector to identify local users. This is an application of localness in social media

user identification, but the localness is relative to areas instead of users. Sen et al (2015) treated localness as

a property of VGI and localness in this work was used to demonstrate how much VGI about one place was

originating from this place. To study geographic content biases in VGI, the authors collected Wikipedia

editions in 79 different languages and examined the data from two relationships: 1.the geographic articles

and the locations of their editors; 2. the relationship between the geographic articles and the location of

sources cited in the articles. In Johnson’s work (Johnson et al., 2016), localness is a common assumption in

social media. Under the localness assumption, people who post contents with geographic information in

(15)

social media will be considered as locals to the region of corresponding geographic information. Johnson also examined this assumption based on some existing local user identification criteria and localness was also treated as the indicator in this examination which was calculated as the relative proportion of users who were classified as locals. Following Johnson’s work, Kariryaa, Johnson, Schöning, & Hecht (2018) defined localness more clearly by identifying the definition of “local” in earlier works. They present three definitions of localness: a person is local only to the region where he lives, a person is local to the region where he votes, and a person is a local to a region if he has enough knowledge about the region. Huang and Wang (2016) thought that the localness of users depends on the answer to the question whether the user is a local resident in one city and they identified locals using the local attractiveness of venues.

In summary, so far, localness is treated as an abstract noun to show how local the study objects are. The study objects can be a whole, like a VGI repository and a group of social media users. Localness is then the proportion of local elements in the whole. The study objects can also be individuals, like a social media user and a regional term. In such a case, localness means the relationship between the individual and one area, i.e. whether one user is local to one area and to what extent one term can represent one area.

In this thesis, the localness of individuals represents the relationship between people and one area. It is defined to extract local knowledge, and the diversity of the relationship can be embodied in the localness properties and localness types which I will describe in Chapter 3.

2.2. Local knowledge and localness

As mentioned in section 1.1, local knowledge of citizens is an important information source which can contribute to smart city construction. One motivation of this study is to find social media users who might have some local knowledge about one area and identifying localness of users should be one way to achieve that. Local knowledge and the localness are closely related based on the same local environments: the localness should be the indication of some types of local knowledge and the accumulation of local knowledge is one requirement of localness changing. Therefore, knowing the exact meaning of local knowledge is necessary for the definition of localness.

The term “local knowledge” has been defined by different authors. Lindblom and Cohen (1979, p. 12) characterized local knowledge as “common sense, casual empiricism, or thoughtful speculation and analysis”

. Geertz (1983, p. 75) defined local knowledge as an organized body of thought which is “practical, collective, strongly rooted in a particular place and based on immediacy of experience”. Corburn (2003, p. 421) defined local knowledge by comparing local knowledge with professional knowledge and that comparison is shown in Table 2.1. He indicated that both geographically located community members or the members in specific context groups can hold local knowledge, and local knowledge can come from the tactile and emotional experiences in their lives.

Table 2.1 Differences between local knowledge and professional knowledge

Who holds How to gather How to be

credible

How to be tested

Local Knowledge

Community members:

geographically located or contextual to specific

groups

Life experience Tactile and emotional experiences

Public forums: public narratives, community stories, street theatre

Professional Knowledge

Members of a profession, discipline, or research

institution

Experimental methods and disciplinary tools

Scientific discussion

Peer review

(16)

Therefore, local knowledge about one specific area comes from people’s life experiences in the area and this is the basis of localness definition in this thesis.

2.3. Localness of Social Media Users

The localness assessment problem in using geosocial media data is a relatively new problem and only attracted attention from some authors in recent years as follows:

Some studies just want to focus on the content from local people but do not try to solve the local people identification problem and used a simple way to filter the locals. For example, Andrienko et al (2016) separated the local people from visitors using a simple filter including the time span of the tweets.

Ostermann et al (2015) used the overall duration and a distinct day number of tweets to identify residents and filtered out the tourists using a 30 day time window. Kumar, Bakhshi, Kennedy, & Shamma (2017) used the same way to classify the locals and tourists.

Except for the traits of user posting behaviours, other data can also be useful in this field. Grace et al (2017) came up with the “Social Triangulation” method to identify the local citizens using the local organizations they follow on Twitter, assuming that if one user follows more local organizations he is more likely to be assigned as a local. Huang, Wang, & Tao (2017) used the online check-in data for venues in social media to identify local people and nonlocal people, and assumed that local people visit more venues in the city.

Huang, Wang, & Zhu (2017) proposed a framework called “Diversified Local Users Finder” to identify the set of local users from check-in data. They estimated users’ home locations with check-in traces by an unsupervised framework, and then computed diversity scores by geographical distance between users’ home locations, and local users were identified by maximizing their diversity scores. Tahara and Ma (2014) proposed a method for local Twitter user identification based on linguistic features of tweets. They constructed local area vectors and user vectors based on the extracted regional terms and then calculated the Cosine similarity of local area vectors and user vectors. Local users are the users of whom the vector is similar to the local area vector.

Johnson et al (2016) verified the localness assumption and summarized four common criteria (n-day, plurality, geometric median and location field) which had been used to determine localness in other works.

They found that about 25% of the users who are not local posted tweets in the study city. In the paper, they only used the four criteria separately and compared the results. They used four datasets to test the four criteria and the comparison of the results showed that each single criterion did not have a stable performance in the four datasets and that there is a substantial disagreement between the four criteria. Considering the criteria of localness in parallel, the authors suggested that a combination of multiple criteria may have a more robust performance. Following Johnson’s work, Kariryaa et al. (2018) indicated that the definition of people localness should contain information about where people currently live, where people currently vote and which places people are familiar with. They used existing criteria mentioned in Johnson’s work to test the new definitions and ground truth data used in the tests are collected from Twitter’s ad platform.

From this literature review of localness assessment of social media users, we may conclude that there are some significant shortcomings in the existing methods. First, existing localness assessment methods only focus on the identification of either local or non-local. The relationship between people and areas is simplified to only two choices and any diversity is ignored. So, the existing methods cannot distinguish, for example, long-term residents, short-term residents, visitors and any other potential situations of user localness. Second, in the existing methods only single criteria were applied. According to Johnson et al.

(2016a), the results of using single criteria are not reliable and the combination of multiple criteria can lead

to a more robust performance. Third, existing methods did not take full advantage of all characteristics of

(17)

social media users. Some user characteristics may be useful but were never taken into account, such as the temporal distribution of posting behaviours, the spatial distribution of locations, local social networks, and so on.

In Chapter 4, an approach to localness assessment will be designed, with the aim to make up, to a certain extent, for the above shortcomings of existing methods.

2.4. Localness and Mobility

Human mobility is one of the reasons of the diversity of localness. To some extent, the diversity of localness can measure a population’s heterogeneity, which is partly caused by human mobility. At the individual level, mobilities are the “means to combine goals in space” , and these spatial behaviours combined over time can demonstrate the life course trajectories of individuals (Bell & Ward, 2000). Both human mobility and localness represent the relationship between humans and space: the former focuses on the location changes over time from the individual perspective, while the latter can be considered as a person’s specific states at certain times from the location perspective. Because of the complexity of the relationship between humans and space, human mobility has various forms and the localness also has different types. So, a review of mobility studies can be helpful for the understanding of localness.

Bell and Ward (2000) compared permanent migration and temporary mobility in key concepts (usual residence and return) and three dimensions (duration, frequency and seasonality). Then they used the boundaries in space and time to classify all population movements into detailed mobilities such as commuting, seasonal work and so on. This is the first literature comparing permanent migration and temporary mobility in a systematic way. Williams and Hall (2000) linked tourism and migration and specified some migration forms, such as labour migration, retirement migration and so on. Montanari (2005) proposed an approach to analyse the mobility flows at territorial level in phases of local development related to the social and economic development of the territories like detailed mobility forms related to labour.

Williams et al. (2012) illustrated the pattern of population centralization and decentralization in urban contexts by analysing the migration in Portsmouth UK, and paid attention to the fluidity of urban populations which manifested the importance of temporary population movements, but the analysis of mobility forms was limited by the secondary data used. King (2012) reviewed the migration theory from a geographic perspective and summarized a typology of migration which is a relatively comprehensive typology and can be the basis of mobility form analysis. Novy (2018) proposed a pentagon of mobility from the place consumption perspective to illustrate the diverse tourism mobilities.

Although the emphases in these works are dissimilar, the classifications are consistent, and the main factors considered in the mobility form conceptualization are time, purpose and path. Permanent mobility and temporary mobility are the most common classifications using the time factor. Temporary mobility indicates that people will stay in the destination of the movement for varying durations and return thereafter, while permanent mobility means a change of usual residence and the last relocation (Bell & Ward, 2000). Migration can be considered as one result of permanent mobility and commuting is a special case of temporary mobility. The second mobility classification is production-related versus consumption-related movement.

This classification is based on the purpose of the mobility: production-related mobilities occur for an

economic contribution while the latter occurs to get a good or service. The distinction between them is

fuzzy because production and consumption are concurrent in most cases. A path indicates the geographical

relationship between origin and destination of movement, and the distance or geographical scale of

movement is the main indicator (King, 2012; Williams et al., 2012).

(18)

Literature presents mobility forms based on different topics, so they all have advantages and limitations and none of them are comprehensive. Some mobility forms are put forward based on the specific purpose of mobilities, such as business, healthcare, second home or visits to relatives and this is the mainstream of the mobility form identification. In other studies, the forms are identified by temporary or spatial traits like (temporary) migration and internal/international migration. Moreover, all the identified forms are the result of a qualitive analysis of mobilities and lack a clear definition and threshold, which means that they cannot provide distinct mobility forms covering all the mobilities without overlap.

To find a systematic way to identify mobility forms, it is necessary to know the properties of mobilities.

Karamshuk et al. (2011) reviewed the progress in the field of human mobility and classified the related findings along spatial, temporal and social properties. The spatial properties are related to the travel distance of people, the temporal properties pertain to the time and frequencies of the visit and the social properties are related to the social interaction between persons. These three properties can also connect to the mobility forms mentioned before. Therefore, the properties can be used as the three fundamental directions to identify mobility forms.

Since both localness and mobilities are a representation of relationships among space, time and humans, they will have similar properties and forms. The relationship between localness and mobilities will be explained in detail in the next chapter.

In this chapter, existing works related to localness definition and localness assessment of social media user

are reviewed, and local knowledge definition and mobility studies are summarized to support the localness

definition, properties and localness types in next chapter. In Chapter 3, the localness will be defined and

described based on local knowledge.

(19)

3. INDIVIDUAL LOCALNESS AND TYPES

In this chapter, the second research objective will be addressed: defining individual localness and specifying localness types. Figure 3.1 shows the relationship among the concepts used in this chapter. Local knowledge and concepts about mobility have been defined clearly in existing works and they are used in the localness definition. To describe localness, the relationship between mobility and localness is explained first, and then the properties of localness are illustrated based on the localness definition and mobility properties. The localness type conceptualization is based on the localness definition and some mobility forms, and localness properties are used to describe localness types.

Figure 3.1: Concepts relationship in Chapter 3

3.1. What is Individual Localness?

As mentioned in section 2.2, local knowledge generated based on life experiences. People develop local knowledge over time in a given environment based on their experience and they hold local knowledge individually and dynamically (FAO, 2004). Not only do the original inhabitants of an area have local knowledge, but also all the people who have experience related to the area. For instance, migrants or visitors may hold local knowledge about this area. Because people develop and possess local knowledge individually, people in different conditions like age, gender, educational background, occupation, or socio-economic status have different kinds of local knowledge. Local knowledge can be broad (covering many aspects) and/or deep (knowing a lot about a single aspect), and the amount and types of local knowledge of one person are affected by a lot of factors which all of them are related to the generation of local knowledge.

The generation of local knowledge is closely related to how long people stay in one area and local knowledge is attached to the physical area where they have activities. For one city, tourists and long-term residents are likely to have different types of local knowledge, and commuters not living in this city and visitors staying in the city only one weekend also have different impressions of the city. In addition to the temporal perspective, the validity of local knowledge has obvious spatial boundaries. For instance, the information about an unsafe area in one city is useless for the people in another city unless they plan to visit that city.

Moreover, based on different experience in the same area and period, people may generate local knowledge from different perspectives. One example can be that visitors with health care purposes will pay more attention to local medical information, while tourists will only possess local knowledge about local tourist attractions in which they are interested, and about other limited information related to their journeys.

In summary, life experiences in local environments is the source of local knowledge, and individuals

accumulate their local knowledge through familiarity with the local environment from different perspectives

based on different life experiences. Therefore, I define individual localness based on the local knowledge as

result of accumulating life experiences in a local environment.

(20)

This definition should be elaborated on from three angles. First, localness is related to local knowledge, but different from the latter. Local knowledge is a concept that was developed to describe a kind of knowledge of individuals and this knowledge is about specific areas. Localness is a concept that describes the relationship between individuals and areas, and the relationship indicates the potential local knowledge of individuals from an area perspective. Localness can be helpful to identify the people who might have specific local knowledge, but this identification is the selection of individuals instead of local knowledge. Localness only indicates that some people are more likely to have specific local knowledge but cannot guarantee that.

Localness can be a representation form of how people connect to one specific area and it can also be an attribute of individuals in the discussion of the relationship between people and one specific area. Second, to fit the mobile and globalized world, this definition assumes that one person can possess local knowledge about all areas he or she has ever stayed in, while the common explanation of local knowledge emphasizes knowledge only about the area people live. So, in this definition, the “local environment” may be variable for one specific individual, which means that one person can have a localness attribute for different areas and the value of such a localness attribute will depend on the life experiences in each of these areas. Third, to keep the definition generic, the local environment can be areas at different scales and the choosing of the localness scale depends on the objectives of different studies. Intuitively, a city will more easily make an impression on people than a bigger administrative region or a smaller unit like a postal district, and a city is the common choice when a person introduces his location to others. In addition, given the background smart and global cities in this study, choosing the city localness scale is appropriate. So, localness in the rest of the thesis means the localness at city scale.

3.2. Relationship between Localness and Mobility

To conceptualize localness types systematically, I will first link localness to concept “mobility” mentioned in section 2.4. Both localness and mobility are a representation of the relationship between humans and space. Considering the change of the human-space relationship over time, the temporal dimension is also an integral part of this relationship. Figure 3.2demonstrates how humans, space and time determine localness, as well as the relationship between localness and mobility.

Figure 3.2: The relationship between localness and mobility

As for time factor, localness of this person for this city can change if he/she will stay there in the future or

visit the city more times, because persons will accumulate more life experiences when they stay in one city

for longer accumulated times. Mobility means change of the space factor and mobility between cities is the

precondition for people to have life experiences in another city. Compared to localness relative to the

original city, persons visiting a new city will have a different localness relative to the new city due to the life

experiences in the city during the visit period. As for human factor, Figure 3.1 demonstrate the situation of

one person, while different persons have different life experiences in one city then localness of persons are

also different. Thus, localness as an attribute of persons can represent the relationship between persons and

cities, and the change of any factor among humans, space and time can change the localness.

(21)

Mobility focuses on how people move, other than how people interact with one city. One person and one movement between two cities can determine one mobility. Except for permanent migration, every mobility has a return time and a visit time, and these time factors indicate the return mobility or the next mobility in the trajectory of one person. So, mobility represents the relationship between humans and space by describing the process of change of a person’s location.

The relationships between localness and mobility are the following. First, localness can be considered as one attribute of persons and it represents a person’s state, while mobility shows the process of movement and can be considered as actions of people instead of one feature of people. Second, mobility of people is the precondition for the situation that one person changes his/her localness for different cities by accumulating life experiences in those different cities. Third, except for permanent migration, the visit time in each mobility is consistent with the time persons are staying in one city for one trip. If one person is not a permanent resident migrating to one city during a period, every time he/she visits the city means one mobility of him / her to this city, and the more he/she visits it, the more life experiences about this city he/she has.

3.3. Localness Properties

Localness as a comprehensive result should have the ability to describe the life experiences of individuals from many aspects. The time one person staying in one city is one factor of an individual’s localness, and his/her activities in the city, including visiting places inside the city, and the interaction with other people living in the city are also important aspects of individual localness. As mentioned in section 2.4, the properties of mobility include three aspects: spatial, temporal and social. Like the properties of mobility, localness of individuals can also be analysed from these three aspects.

These properties should be adjusted to fit the definition of localness. Mobility studies pay more attention to

how people move between different cities while localness studies should concern about how people generate

life experiences in one city. For temporal properties of localness, information about the duration of visits is

needed and only the numeric result of visit frequencies is not enough. To get insight into how people stay

in one city, the temporal distribution of people staying here should be described in detail, for instance

through some statistics of interval and visit times of individuals. In addition, localness is closely related to

local knowledge, but some types of local knowledge will decay over time, such as the location of some

restaurants or recreation places. For example, one person might have lived in one city as a resident and had

possessed some local knowledge about the city, but then he/she migrated to another city and did not stay

in the original city for a long time anymore. In this case, the person’s localness relative to the original city

not only depends on the life experiences about the city but also the time elapsed after leaving the city will

have an influence on it. Thus, there should be a temporal property to describe the last time one person

stayed in one city and the duration of the period since he/she left the city will be important factors of an

individual’s localness. For the spatial properties, localness studies should focus on the geographical

distribution of one person’s activities inside the city instead of the travel distance of movement, and both

the scope and the concentration of activity locations are the indication of localness. For social properties,

the interaction with other people cannot describe social property of localness comprehensively. Localness

studies should concern more attention to the overall social network of people and the common interests of

people, because localness concerns more about final state of the relationship between people and city instead

of how people build the relationship.

(22)

Figure 3.3: Localness Properties

Figure 3.3 shows the properties of localness. Localness should reflect the life experiences of people in a local environment from spatial, temporal and social perspectives and it is a comprehensive result to assess how local one person is. From a temporal perspective, localness indicates when a person stayed in this city, how long he stayed here, and what was the temporal distribution of his/her visit time if he/she visited the city more than once. From a spatial perspective, localness describes whether one person ever stayed in this city, the geographical scope of his life in the city and the degree of concentrated activities. From a social perspective, localness describes how one person interacts with people who live in the city. Social networks contain all friends, relatives and anyone the person knows. Interests can result in online or offline groups even if people do not know every member in the group personally and people in the same group may have similar activities, especially in the Web 2.0 era. Moreover, interests can reflect the socioeconomic status of people and the purpose of visits like education and health care which are an indication of which kinds of local knowledge they have.

3.4. Localness Types

Localness types mean the different possibilities of people’s localness. Localness types are the result of a localness assessment of people and the types are also candidate values of localness attributes of people. The purpose of this section is to conceptualize types of localness. To simplify the localness types, I will ignore the situation that the last time people stayed in the city is so long ago that their local knowledge about the city became obsolete.

Intuitively, long-term residence is a necessary type of localness. Long-term residents in one city usually have usual resident in this city and almost all their activities happen in the city. From a localness perspective, people can get used to one city if they stay in the city for one year, because one year is enough for people to integrate into life in the city, explore the city and have regular life in the city. So, in this thesis one year is an import condition to identify long-term residents. Their activities have obvious centres which are their home location, work place or any place they visit at a higher frequency. They usually have visited more places in this city and constructed stronger social local networks than the following localness types.

Besides long-term residents, mobility to one city is a precondition for people who can possess local

knowledge about this city. Furthermore, visiting a city can give people chances to gain life experiences in

the city, and the longer one person stays there the more life experiences he/she may gain. As mentioned in

section 3.2, visit time of mobility is as the same as the time people staying in one city and the total time one

(23)

person stayed in one city is an indication of life experiences of the person. So, mobility forms based on visit time provide an important reference for localness type conceptualization. Bell and Ward (2000) listed population mobilities in a time-space table. After dividing by time, mobilities are at four temporal levels:

within one day (shopping, commuting), within one week (visit, excursions, health care and business travel), within one month (study and vacations) and within one year (seasonal work). These temporal mobilities can be a reference of localness type conceptualization.

For the mobility forms within one day, some other authors also mentioned similar forms in their works, such as shopping and leisure (Montanari, 2005). But most of these mobilities are parts of daily life for every person staying in one city and cannot be used to identify people with different life experiences accumulation levels (i.e. localness). However, given the city localness scale, commuting from other cities is a special kind of one-day mobility form. Some people may reside in one city but work in another city. Most of their activities about their daily life may happen near their home location in another city and they can only focus on the work-related activities which can lead to a relatively small activity scope near their work place in the city. So, these people may have different perception about the city compared to people who both live and work in the city. Another trait of these people is that they usually stay in their working city by daylight and on weekdays. The life experiences about the city and local social network they have depends on the time they live as commuters. Based on the above traits of non-local commuters, it is treated as one localness type.

Mobility forms with a short-term visit time may have different purposes such as health care, business, friend/relative visiting and so on. People visit the city to achieve their goals, and most of them will focus on their core activities. Compared to resident localness types, visitors do not have a usual residence and daily life experiences in the city. How long visitors stayed in the city is based on their visitor purposes and 30 days is a time limit used in existing works to distinguish visitors from residents(Girardin, Calabrese, Fiore, Ratti, & Blat, 2008; Vikas Kumar, Bakhshi, Kennedy, & Shamma, 2017; Sun, Fan, Helbich, & Zipf, 2013).

In these works, the authors did not distinguish visitors from tourists, and they used one of these two terms to indicate all people who had short-term visits to one city. It’s noteworthy that visitors may go to one city more than once based on their purposes, but they will not stay in the city for more than month in each visit.

Both one-time visitor and more-than-once visitor are treated as visitor localness type in this thesis.

Among these short-term mobilities, tourism is a special one and some authors reported that tourism mobility has a significant influence on the processes of urban change and many other mobility forms (Novy, 2018;

Williams & Hall, 2000). Tourists also have distinct characteristics that distinguish them from visitors with other purposes. Obviously, tourists usually pay attention to the tourist attractions in the city, stay in one city for less time. The number of tourists is large but the tourism population changes at any time. Lau et al.

(2006) reported that for the tourists who spent several nights in Hong Kong they usually stay in the city for less than seven days. Combined Lau’s finding with the mobility forms in the work of Bell and Ward (2000), I propose a visit time of tourists of seven days. Based on the above characteristics of tourists, treating tourists as a separate localness type is meaningful, especially for tourist cities.

Visit time of the other mobilities is from weeks to months and there are two possibilities in this situation:

mobility for work or study and seasonal migration. Some people may migrate to another city for work or

study, and they will live in the city as temporary residents for several months until they finish their work or

study. Some people may move to the city and start life there, but they should be treated as short-term

residents until they stay in the city long enough. Compared to visitors, these people will have some basic life

experiences about city life, and they will accumulate this life experiences over time. Compared to long-term

residents, these temporary or short-term residents usually have a smaller activity scope and a weaker local

(24)

social network because they do not have enough time to explore more places and know more local people in the city.

The other situation is seasonal migration. The reasons for seasonal migration might be vacation, seasonal work or a second home (Bell & Ward, 2000; King, 2012; Williams & Hall, 2000). People will go to another city seasonally and stay there for weeks or months and then they will return instead of staying in the city for a long time and becoming long-term residents. Seasonal residents will have moderate local social networks, stay in the city for months but less than one year and have an intermediate activity scope. Longitudinal data sets, including the movement or activity data for a group of persons for years, could be used to identify seasonal residents. Because of the special patterns of seasonal residents and the widespread existence of seasonal mobilities, the seasonal resident is one localness type that should not be overlooked.

Mobilities only focus on how people move between cities instead of how people live in one city and perceive the city, so the localness type conceptualization only based on the mobility will not be complete. For example, the localness of indigenous residents is not related to any mobility form. In addition, there should be a special localness type for people without life experiences in one city. Some people may never have been to a city but may know something about the city, and other people may even have never heard about it. All these people do not have any life experiences with the city, therefore, there should also be a localness type named “no experience”.

To make sure the comprehensiveness of the localness type conceptualization, the types should be distinguished based on the properties of localness. Intuitively, spatial and temporal information about people’s activities in one city can be used for localness assessment because the information is about measurable, closely related activities and can be divided meaningfully. Once a person visits a place in the city, there must be some corresponding spatial and temporal information related to where he/she visits, when he/she goes there and the summary of all the spatial and temporal information of activities can reflect the patterns of people’s life. People with different localness types will have different life patterns in the city.

Social properties are also useful in localness determination as long as they can be represented as comparable data. Local social network of one person is one of the results of life experiences accumulation, which can reflect the connection between people and local environments from a social perspective. Interest as the other aspect of social properties can indicate the source of different local knowledge types and can be the basis of a detailed typology for local knowledge discovery.

Table 3.1summarizes the localness types that can be distinguished. The description along localness

properties elaborates localness types and enables the individual distinction by localness.

(25)

Table 3.1: Localness type description

Localness Type Localness Type Description Based on Properties

Temporal Spatial Social

Long-term resident >12 months Wide activity scope, home & work place

Strong local social network Temporary or short-

term resident 1-12 months Moderate activity scope, home &

work place

Moderate local social network Seasonal resident 1-3 month for each year Moderate activity scope, home &

work place

Moderate local social network

Non-local commuter

Periodic, working hours and work

days

variable activity scope, work place Variable local social network

Visitor <30 days for one visitor, may visit more than once

variable activity scope, activity centre depends on visit purpose

Variable local social network

Tourist <7 days,

weekend & holiday

Small activity scope, near tourist attractions

Weak local social network

No experience Have never been there - -

In chapter 3, I proposed the generic localness and localness properties, and for each localness potential I

defined a localness type. The localness definition, properties and types can be used in any study related to

local knowledge discovery and people identification by life experiences in cities. As mentioned in section

1.1, social media data is one possible data source of local knowledge and the localness assessment of social

media users is helpful for local knowledge discovery from social media data. Therefore, in Chapter 4, I will

design an approach to assess the localness of social media users.

(26)

4. LOCALNESS ASSESSMENT APPROACH

In this chapter, the third research objective will be addressed: designing an approach to assess the localness of social media users. Figure 4.1 demonstrates the overall process of the approach. The approach is divided into three steps: first, social media data will be collected as input data of this approach and the data will be cleaned by filters; second, user features of four categories will be extracted from the data; third, localness types will be assigned to each user based on the result of a comparison between user features and the localness type conditions designed in the approach. The output of following this approach is the allocation of localness types to social media users.

Figure 4.1: Overall process of localness assessment approach

4.1. Data Collection and Filter

To collect social media data, the use of Application Programming Interfaces (APIs), as provided by social media platforms, such as Twitter API or Flicker API, is to be preferred. When collecting social media data by APIs, time and space filters can be used and the data will be in a consistent format like JSON. There are also some libraries for accessing the APIs in common programming languages like tweepy

1

in Python. For the purpose of user localness detection, the time period of a dataset should be longer than the longest

1 http://www.tweepy.org/

(27)

duration with which target localness types have been defined, or long enough to find out about the visit patterns of users. Otherwise, the identification will be not credible. In other words, if the target localness type is e.g. long-term resident, the dataset should contain data for at least a one-year period and if the target is a seasonal resident the dataset should cover at least two years to find the seasonal visit patterns as mentioned in section 3.4. The spatial filter should be used in data collection to make sure that all the social media contents in the dataset are located in the target area (in this study: a city), and using a spatial filter also means that the dataset will only contain geo-tagged postings and exclude other social media data without spatial information.

There are three additional data sources used in the localness assessment if all features will be extracted. The first one is the locations of tourist attractions in the target city and this will be used to measure the proportion of activities near to tourist attractions. The second one is the followers’ information of social media users which will be the data source to measure the connection between target users and local people. Here, APIs provided by social media will be used again and using the names or IDs of target users as input, the names or IDs of one user’s followers will be the output. The third one is the list of local organization accounts on the social media platform, and this will be used to measure the connection between users and local society.

After data collection, not all the data in the set will be useful for the study at hand. This means that the dataset will have to be cleaned. First, due to some mistakes in social media API’s, the coordinates may not be located in the study area, so the spatial filter should be used again in this data cleaning step to remove those outliers. Second, to get enough information for each user, there should be a lower limit for the number of geo-tagged tweets. In this step, users who have less than three distinct post points will be discarded given the requirement of spatial feature extraction in next section. Third, in order to filter out the social media accounts which are not controlled by real persons or post excessive contents which are irrelative to user’s life experiences, the frequency of tweeting behaviour will be calculated, outliers of frequency will be detected by box plot and the accounts with frequency outliers will be discarded. The last step in data cleaning is about the additional data source of the users’ followers. Some users protect their account against this kind of data collection. This means that the social network information is not accessible for these users and, therefore, they should also be removed from the dataset.

4.2. User Feature Extraction

Based on the properties of localness mentioned in section 3.3, three types of information can be helpful to identify the localness of social media users: temporal, spatial and social. These three types of information will be used to derive a set of features for the further user localness assessment. Besides information related these three properties, some other information from social media user are also useful in localness assessment, and the information is classified into other features in this approach.

4.2.1. Temporal feature

Temporal features are used to describe social media users from a temporal perspective and there are five temporal features: duration, maximum interval, average visit time, night posting proportion and weekend posting proportion. With these features, how long one user stayed in the city and when the users were in the city can be answered.

- Duration

Duration is the time difference between the first and last post of one user in the dataset. Longer durations

indicate that users stay in the study area for longer times. The duration value does not always mean that

Referenties

GERELATEERDE DOCUMENTEN

Gezien deze werken gepaard gaan met bodemverstorende activiteiten, werd door het Agentschap Onroerend Erfgoed een archeologische prospectie met ingreep in de

This vision is as follows: A design for the hallway and coffee corner which improves the well-being of the students and which is a calling card for the library.. This vision leads

This study has been conducted to show what the role of women is nowadays. There will be looked at how it will be possible for women to improve their minority position

The size and complexity of global commons prevent actors from achieving successful collective action in single, world- spanning, governance systems.. In this chapter, we

The next three columns show the reduction achieved by the guard-based stubborn ap- proach, based on necessary enabling sets only (nes), the heuristic selection func- tion (nes+h),

The scale that this paper proposes supports research into the antecedents of social entrepreneurship by being in a position to shed light on whether social entrepreneurs are

The perfusion variables in the nail bed of dig III sin before the digital nerve block were: average AUC 9.7 PU, perfusion dip time 10.9%, average dip amplitude 89.0 PU,

2(a) shows, for each Booter separately and on the over- all of all surveyed databases, how many times users purchase attacks from Booters. As expected the number of users that did