• No results found

Your Data, Your Choice? Exploring the relations between citizens’ willingness to share data, awareness, trust, and demographics in a virtual smart city

N/A
N/A
Protected

Academic year: 2021

Share "Your Data, Your Choice? Exploring the relations between citizens’ willingness to share data, awareness, trust, and demographics in a virtual smart city"

Copied!
45
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Your Data, Your Choice?

Exploring the relations between citizens’ willingness to share data, awareness, trust, and demographics in a virtual smart city

Warner Hoekstra, 1628984

January 10th, 2020

Master thesis Public Administration, Economics & Governance track

Leiden University Supervisor: Prof. dr. M.G. Knoef Second reader: dr. E. Suari Andreu

(2)

Abstract

Big Data is becoming a well-known phenomenon, but we know relatively little on the attitudes of citizens towards the increasing application of Big Data technologies in the public environ-ment. This research aims to explore patterns and associations between personal characteristics, the type and context of data collection initiatives, and the willingness of individuals to share their personal data. Using unique data gathered through a gamified survey, this study finds that in general, data awareness is negatively related to willingness to share, while trust in govern-ment is positively associated with willingness to share data. When focusing on specific data sharing situations, the results show that the relation between data awareness, but also of various demographic characteristics and the willingness to share data is dependent on the context of the situation. For instance, data awareness is related to the willingness to share data only in situations in which there is no clear personal benefit as compensation, and not in situations with a personal (financial or convenience) benefit. These results highlight the context depend-ency of data collection and the need for governments to take heed of the specific risks – e.g. in terms of privacy and intrusion – and attitudes of citizens on a case-by-case level. If not done correctly, governments risk not addressing the concerns of citizens, and potentially harming public support for the development of Big Data in public policy.

(3)

A

CKNOWLEDGEMENTS

I would first like to thank my thesis advisor Prof. dr. Marike Knoef of the Department of Economics at Leiden University for giving me invaluable feedback, all the while encouraging

me to pursue my academic interests. I would also like to express my sincerest gratitude towards the Centre for BOLD Cities, and Prof. dr. Liesbet van Zoonen and Luuk Schokker

specifically, for granting me access to the data used in this study and sharing their insights and information. Finally, my gratitude goes out to my friends, family, and partner for all their

support, brainstorming sessions, and encouragement.

(4)

ABSTRACT ... I ACKNOWLEDGEMENTS ... II

1 INTRODUCTION ... 3

2 BACKGROUND ... 5

2.1BIG DATA AND THE ROLE FOR CITIZENS ... 5

2.2DATA AWARENESS ... 7

2.2.1 Socio-economic characteristics ... 8

2.3WILLINGNESS TO SHARE DATA ... 8

2.3.1 Data awareness ... 9

2.3.2 Trust in government ... 9

2.3.3 The purpose of data collection ... 10

3 METHOD ... 11

3.1GAMIFIED SURVEY ... 11

3.2MEASUREMENTS ... 12

3.2.1 Data awareness ... 12

3.2.2 Willingness to share data and purpose of data collection ... 13

3.2.3 Trust in government ... 15

3.3ANALYTICAL MODEL ... 15

3.3.1 Robustness checks ...16

4 DATA ... 17

4.1DATA COLLECTION AND SAMPLE DEMOGRAPHICS ... 17

4.2DATA AWARENESS...18

4.2.1 Awareness of data points ...19

4.3WILLINGNESS TO SHARE DATA AND TRUST IN GOVERNMENT ... 22

4.4PURPOSE OF DATA COLLECTION ... 23

5 RESULTS ... 25

5.1DATA AWARENESS ... 25

5.2WILLINGNESS TO SHARE DATA ... 26

5.2.1 Socio-economic characteristics ... 27

5.2.2 Data awareness and trust in government ... 27

5.3PURPOSE OF DATA COLLECTION ... 29

5.3.1 Financial incentives ... 29

5.3.2 Convenience incentives ... 30

5.3.3 Security incentives ... 31

(5)

6 CONCLUSION... 34

7 BIBLIOGRAPHY ... 37

APPENDIX A ... 40

APPENDIX B ... 41

(6)

1

Introduction

Big Data has become a well-known concept in the past years, and Big Data analyses are now performed in virtually every sector. For instance, Big Data analyses are used in the retail sector to determine whether new stores are profitable at a certain location (Cardinaels, 2019), or by the Dutch government to gain new insights that can help to prevent crime (Zicht op Ondermijning, n.d.). Accordingly, the amount of data that is being gathered, stored, and processed is becoming larger year by year. Local governments are not unaware of this trend, and have started to implement their own data projects, with hopes of eventually transforming into ‘smart cities’ that monitor many aspects of public life, such as infrastructure, welfare, and tourism. The application of Big Data technologies carries large implications for citizens and their privacy, since the data that these applications process carry valuable information that might be attributable to individual citizens in specific cases. The creation of the European reg-ulation on personal data privacy, the General Data Privacy Regreg-ulation, is an acknowledgement of the importance of privacy for citizens.

However, it is unlikely that every citizen is equally consenting to share their personal data, and neither would every citizen be convinced by the same arguments. While one person may be willing to share information to prevent crime or terrorism, others would not be willing to sacrifice part of their privacy. Knowing the preferences and distribution of preferences of citizens with respect to data collection and analysis is crucial for the success of these data projects, since gathering more information than people are willing to share could have detri-mental consequences for the overall acceptance of data initiatives. Therefore, knowing when and why different people are willing to share their information could provide invaluable infor-mation for the success of many smart city initiatives.

Despite the implications that potential differences between (groups of) citizens in their acceptance of data collection could bring for maintaining public support, there has been only little research on citizen behaviour with regards to sharing their data in the public area. For example, research on multi-service smartcards found that usefulness and security are important for intentions to continue using the service (Belanche-Gracia, Casaló-Ariño, & Pérez-Rueda, 2015). Other scholars have highlighted how concerns – and consequently behaviour – might be dependent on the type of data being collected, and the purpose for which it is analysed (Van Zoonen, 2016). However, no research comparing different groups of people through a range of data sharing options has been conducted as of yet. This paper therefore attempts to fill this gap by exploring the relationship between the data sharing behaviour of citizens in eight specific situations and a wide number of factors, such as socio-economic characteristics, the awareness of data collection of an individual, and trust in the government. Analysis of these factors could

(7)

provide new insights and further research lines on how different types of data projects generate support or resistance among groups of respondents, therefore revealing some of the risks of these projects.

In exploring these associations, this paper uses data gathered with “Jouw Buurt, Jouw Data” (https://www.jouwbuurtjouwdata.nl/), a gamified survey. In the game, the players walk through a virtual city and encounter various situations in which they are asked to share their data. Furthermore, using a number of questions and puzzles, the degree of knowledge and awareness of respondents is measured. This innovative method of data collection further con-tributes to extant literature by providing increased immersion to respondents, possibly leading to more genuine and ‘real-life’ answers.

The results of this study show that patterns of data sharing behaviour can be very dif-ferent per case. In general, I find that in case of a clear personal benefit associated with sharing data, there is no relation between one’s awareness of data collection and the willingness to share data. However, when personal benefits are absent, lower levels of awareness are associ-ated with higher likelihoods of sharing data. Trust in the government is always significantly positively related with the willingness to share data, but the dimensions of trust that are most important, are different per case. Other factors, such as age and education, are generally nega-tively related with the willingness to share data, although this relation does not always occur in specific situations. Finally, urbanised residence is only related to sharing personal infor-mation for added convenience in public transport, implying that the frequency of use also plays a role.

This paper is structured as follows. In the next chapter I will provide an overview of the relevant literature, as well as the conceptualisations of Big Data, the smart city, and other concepts of interest such as data awareness, purpose of data collection, and the willingness to share data. Chapter three will then give some background information on gamified surveys, which is an important aspect of this study. Additionally, the measurement of each concept is discussed, as well as the analytical models used in this study. Next, Chapter four discusses the data that was gathered, giving descriptive statistics on demographics and on all explanatory and outcome variables. Chapter five will give an analysis of the data, provide general answers to the hypotheses, and make a selection of the most interesting results. Finally, Chapter six concludes the study with a discussion of the limitations and results of this study, and implica-tions for future research.

(8)

2 Background

This chapter provides an overview of the relevant literature, and develops hypotheses which will be explored in later sections. First, the concepts of Big Data and smart cities will be discussed, as well as criticisms to these concepts and new perceptions and responses. Then, for each concept that is operationalised and measured in this study, an explanation is given that leads up to a hypothesis.

2.1 Big Data and the role for citizens

With the advent of a new era, sometimes coined ‘the age of information’, ‘the infor-mation revolution’ (Bruncko, 2015), or ‘the age of analytics’, Big Data has become associated with a wide range of uses that can improve the efficiency of resource utilization, quality of life of citizens, and transparency and openness of government policy (Al Nuaimi, Al Neyadi, Mo-hamed, & Al-Jaroodi, 2015; Bertot & Choi, 2013). Big Data itself is an elusive concept, and can be defined in general as “vast datasets that cannot be analysed using conventional software and analytical tools” (Bertot & Choi, 2013, 2). Additionally, these datasets contain different types of data, such as texts, numbers, photos, and videos from a wide range of sources: social media, sensors, surveillance cameras, smartphones, and many more (Al Nuaimi et al., 2015; Bertot & Choi, 2013; Khan, Uddin, & Gupta, 2014). Big Data is often understood using the ‘V’s’ of Big Data, such as volume (extremely large quantities of data), variety (combining dif-ferent types of data), and velocity (the immediate collection and interpretation of data) (Khan et al., 2014; Sivarajah, Kamal, Irani, & Weerakkody, 2017). Furthermore, Big Data is modular, and the composition of elements can differ based on the purpose. For instance, traffic manage-ment systems would rely more on high velocity, real-time interpretation of data than Big Data projects aimed at detecting fraud with unemployment benefits. The complexity of these da-tasets and the difficulty of analysing and extracting value from them explains why the applica-tion of Big Data is still in its infancy, despite its popularity among both scholars and policy-makers. The desired outcome is to reach the final Big Data ‘V’, value, but it can be very costly to retrieve value from data. In other words, the value-cost ratio has to be sufficient in order to warrant the collection, storage, and processing of the data (Khan et al., 2014).

Nevertheless, Big Data is increasingly being implemented by both companies and pub-lic organisations, sometimes in joint partnerships in so-called ‘smart city’ projects. These smart cities make use of Big Data to manage urban transportation, utility networks, and many other aspects of urban life. To illustrate, the development of Dholera, a planned smart city in India, includes the management of supply and waste removal chains through Big Data analytics for maximum efficiency (Greenfield, 2015). Another example of a smart city initiative is a mobile application that would train marginalised, low-literacy citizens of Philadelphia to gain the

(9)

necessary skills for a job in the 21st century (Shelton, Zook, & Wiig, 2015). As these examples illustrate, the smart city uses the potential that Big Data offers to increase the efficiency of its service provision, both infrastructural and social welfare services.

However, these types of Big Data initiatives have also been criticized for their top-down approach and their technocratic orientation (Cardullo & Kitchin, 2019; Greenfield, 2015; Shelton & Clark, 2016). For instance, Kitchin (2014) wrote that “technocratic forms of govern-ance are highly narrow in scope and reductionist and functionalist in approach”, and that they focus only on very specific types of information and disregard the effects of culture, politics, and policies on life in the city (p. 9). Without the awareness of this limitation, smart city efforts that address inequality could turn out to be ineffective, as the true (cultural or political) factors of inequality are not addressed (see Shelton et al., 2015). Furthermore, Kitchin (2014) perceives an inherent tension between Big Data analytics and individual and societal rights due to the surveillance potential of Big Data. These issues must be addressed if the goal is to keep public support for the use of Big Data in smart cities.

In response to this criticism, many developers of smart city initiatives have reframed their Big Data projects to be citizen-centred, citizen-engaged, or something likewise. However, in practise smart city initiatives mostly use citizen engagement in an instrumental sense – providing feedback on an application or collecting data for analytical purposes – rather than in a political or normative sense (Cardullo & Kitchin, 2019). Greenfield (2019) also questions who benefits from the ‘improvements’ that Big Data offers, and which criteria are being used to define improvement, while stating that the public is almost never involved in these considera-tions. Especially when smart city programmes rely on private companies for data collection and processing – which occurs more often than not, due to the lack of technical knowledge within local government administrations – they are susceptible to serving corporate or state interests, rather than the interests of citizens. Some have expressed their concern on what hap-pens with the use of Big Data when progressive governments make place for authoritarian regimes. For instance, Andrew Townsend eloquently questioned this vulnerability: “In our rush to build smart cities on a foundation of technologies for sensing and control of the world around us, should we be at all surprised when they are turned around to control us?” (Townsend, 2013, p. 276). A lack of regulatory oversight to prevent abuses of data could then lead to increased resistance against Big Data analytics by citizens (Kitchin, 2014).

The question then becomes how the smart city can prevent these issues. First, we need to eliminate the myth of data as a politically neutral decision-making tool, and instead be aware that data are socially constructed (Shelton et al., 2015). As Lisa Gitelman argues in her book “Raw data is an oxymoron” (2013); data has to be generated before it exists, and is therefore already subject to bias by the data collector: it is already ‘cooked’ to a certain extent. Creators

(10)

of Big Data projects make decisions on which type of data (not) to collect, the period of col-lection, and where to collect data from, and these decisions are rarely free from bias. This bias can only be seen from an analysis of the raw data, and not the output of Big Data analysis. Therefore, making raw data publicly available is seen as a prerequisite for the public generation of knowledge by open data activists. This democratisation of knowledge then allows citizens to participate in decision-making processes, since, as an open data activist phrased, in order “to participate, people need information” (in Baack, 2013, p. 5). Without citizen involvement in this debate, it is likely that Big Data applications will serve to control, rather than empower citizens (Cardullo & Kitchin, 2019; Sieber & Johnson, 2015).

2.2 Data awareness

A first step of generating knowledge on data collection is becoming aware of data col-lection in one’s own environment. In other words; before one can know what data is being collected, one needs to be aware that data of the individual is being collected. Not much is known on awareness of data collection in the urban environment (hereafter: data awareness). However, research on store discount cards in 2002 showed that only few consumers associated those discount cards with the practice of personal data collection for ‘database marketing’, while most of the respondents identified these cards with loyalty and competitiveness motives (Graeff & Harmon, 2002). More recently in an investigation of digital health platforms, Deb-orah Lupton (2014) found that many online platforms that portrayed themselves as patient-centred forums in fact had increasingly commercial purposes. Since the disclaimer on the com-mercial use of personal data is often ‘hidden’ within the fine print of the terms and conditions, Lupton (2014) concludes that “it is likely that many of the people who engage in patient-expe-rience and opinion platforms for personal or altruistic reasons are not fully aware of the extent to which their accounts have become valuable commodities” (p. 865). While these results are not translatable to the context of the smart city, they do bring up the question to what extent people in a smart city environment are actually aware of data collection around them.

Data collection in the smart city, as opposed to data collection on the internet or through government services, differs in that awareness and explicit consent are not necessarily present. For instance, while an application for subsidies would ask the consent – and therefore aware-ness – of the citizen to process personal information, being recorded by CCTV (surveillance cameras) in the shopping centre does not require awareness or explicit consent. Instead, a no-tification at the start of the area that is under surveillance suffices. This brings us to the question to what extent people are aware of data collection in the smart city. With surveillance cameras, it is likely that awareness of these data collection points is higher, as they have been used for security reasons in the Netherlands since 1998 (Korthals, 1999). These more or less

(11)

‘conventional’ methods for data collection have received attention in public debate, and people have had time to get acquainted with their presence in the public space. For newer, more inno-vative initiatives, however, one would assume that awareness is lower, since people have had limited opportunity to inform themselves. This brings us to the first hypothesis:

H1. Data awareness is relatively high for conventional data collection points, and relatively low for innovative ways for data collection in the smart city.

2.2.1 Socio-economic characteristics

Next, potential factors that could influence data awareness are personal socio-economic characteristics, such as age, education, gender, and income. In an investigation on participation in projects aimed at promoting citizen engagement, Wijnhoven, Ehrenhard, and Kuhn (2015) did not find evidence of an influence of socio-economic characteristics on the willingness to participate. Research in the context of the smart city did not establish an association of socio-economic characteristics with participation either (Belanche-Gracia et al., 2015). Nonetheless, individual characteristics such as education and age might be associated with data awareness. In the aforementioned research on store discount cards, older respondents were less likely to know the purpose of these cards (Graeff & Harmon, 2002). Additionally, some experts on smart city technology have argued that there is a risk that new technologies only serve the already privileged groups, whereas the most vulnerable – the elderly, the disabled, and the poor – risk being excluded (Rowling, 2019). Smart cities, like any other city, are geographically uneven in many ways, and not every area in a smart city will therefore be equally smart (Shel-ton et al., 2015). The hypotheses centre around some of the socio-economic characteristics that are most often associated with vulnerable groups:

H2a. Having a lower income is related to a lower data awareness. H2b. Age is negatively related to data awareness.

2.3 Willingness to share data

In this study, participation in Big Data in the context of the smart city is defined as citizens choosing to share their data. To illustrate, in the case of surveillance cameras, if an individual chooses to walk through an area recorded by CCTV, the person implicitly chooses to share their personal information, and therefore chooses to participate. Often, data collection initiatives can be complemented by service provision, such as a public WiFi network in the city centre. In these cases, use of the service is then a choice to share their personal data with the

(12)

service provider. The question here is which factors are related to the willingness to share per-sonal information of an individual, and whether these factors are only relevant in certain situ-ations. For instance, income could be a relevant factor when data sharing is coupled with the provision of a financial service, but less so without financial incentives. The following para-graphs describe some of the concepts that could be associated with the willingness to share personal data, specifically data awareness, trust in government, and the purpose of data collec-tion.

2.3.1 Data awareness

Let us briefly return to data awareness. As of yet, no research has investigated the rela-tion between awareness and willingness to share data in data initiatives in the smart city. Is this willingness regarding Big Data projects dependent on awareness of data collection, or would awareness in fact lead to a lower willingness to share data? As mentioned previously, mation is seen as a condition for participation, and in order for citizens to have access to infor-mation, they must first be aware of its existence. This leads to the following hypothesis:

H3. Awareness of data collection is positively associated with willingness to share data in the smart city.

2.3.2 Trust in government

In choosing to share or withhold their personal information, people often make calcu-lations based on the perceived benefits and risks. In the context of personal information, the risks can manifest in the potential that personal information is gathered without authorization, that unintended users – e.g. hackers – gain access to sensitive information, or that sensitive information is not transmitted securely (Belanche-Gracia et al., 2015; Li, 2012). The perception of these risks causes privacy concerns and could harm the individual’s willingness to share data. Furthermore, other approaches emphasize the perception of ‘procedural justice’, that is: the believe that the procedures in place are adequate for the protection of individual privacy (Li, 2012). However, in order for citizens to make these types of calculations, they need exten-sive knowledge of both the types of risks they are exposed to in specific situations, as well as the technologies and laws that aim to minimize the risks of abuse of personal information. According to Alan Westin (2001), trust is an important mediating factor for the majority of citizens. When distrust is high, people become less accepting towards collection of information, and when distrust is low, people are more accepting. Additionally, Westin (2001) identified three types of privacy attitudes: 1) privacy fundamentalists, who are very concerned of their privacy and sceptical towards data collection and data collectors; 2) the privacy unconcerned,

(13)

who do not know and care about privacy issues, and finally; 3) privacy pragmatists, who weigh the risks and the benefits, and assess their trust in the data collector to decide whether or not to disclose their personal information.

Trust could therefore be considered an addition to knowledge, as it includes an individ-ual’s expectation about the behaviour of the data collector, based on past experiences of the individual himself or those of others. The influence of trust on willingness to share data in the smart city environment has not yet been examined. Regarding online transactions, however, Dinev, Hart, and Mullen (2008) found that concerns about government intrusion were not di-rectly related to a lower willingness to provide personal information. Government intrusion concerns were associated with privacy concerns, which negatively influences the willingness to disclose information. Besides the aforementioned research, no investigation of the direct relation between trust in or perception of government and willingness to disclose information has been conducted in the context of the smart city. This discussion leads to the following tentative hypothesis:

H4. The trust in the government is positively associated with willingness to share data in the smart city.

2.3.3 The purpose of data collection

Research has also shown that the purpose of the data collection affects the willingness to share personal information. For instance, Van Zoonen (2016) writes that collecting personal data for service purposes is likely to be less contested than collecting personal data for surveil-lance purposes. This is due to the service purpose being a more straightforward trade-off be-tween the benefit of the service and the cost of sharing personal information, whereas data collection for surveillance purpose entails a wider social goal, with less straightforward bene-fits for the individual. These findings also shed some light on the phenomenon of the ‘privacy paradox’, that states that although privacy is a large concern of people, individuals are often willing to disclose private information for small rewards (Kokolakis, 2017). If the purpose of data collection is beneficial to an individual, the individual could be more willing to share their personal information, even if privacy concerns are present. Finally, when individuals perceive a Big Data initiative as useful, they are more likely to continue using it, and thereby continue to share their data (Belanche-Gracia et al., 2015). The hypothesis is as follows:

H5. Citizens are more willing to share personal data if there is a clear personal benefit, than if they share data for a wider, societal purpose.

(14)

3 Method

This chapter introduces the methodology of this study. The first section of this chapter describes the unique gamification element of the survey, the implications it carries, and how this contributes to extant literature. Next, each relevant concept is introduced in its operation-alised form, and an explanation is given on how these concepts are measured within the game. Finally, the models used in answering the hypotheses are presented.

3.1 Gamified survey

The data used in this article was gathered through an online gamified survey called “Jouw Buurt, Jouw Data” (Your Neighbourhood, Your Data), which is freely accessible at www.jouwbuurtjouwdata.nl. This study is unique in that it is the first to analyse data sharing behaviour with use of data gathered through a game. The gamification element causes the player to become immersed in the virtual environment, and the player then becomes more likely to give a ‘natural’ response to a request to share data than when presented a similar option through a conventional survey. Consequently, this innovative method could provide new in-sights in this field of study.

The concept of gamification has gained increased scholarly attention in recent years, being linked to increased user engagement and positive effects on use of the service (Hamari, Koivisto, & Sarsa, 2014). Gamification, according to Hamari, Koivisto, and Sarsa, refers to “enhancing services with (motivational) affordances to invoke gameful experiences and further behavioural outcomes” (2014, p. 3036). Alternatively, Looyestyn et al. define gamification as software that contains game elements (2017). More specifically, such elements can be goals, challenges, levels, points, progress, feedback, rewards, badges, leaderboards, and stories or themes (Cugelman, 2013).

The main goal of gamification is to improve so-called ‘respondent engagement’ (Downes-Le Guin, Baker, Mechling, & Ruyle, 2012). This has mostly been investigated through two outcome variables: psychological outcomes and behavioural outcomes, operation-alized as e.g. motivation, attitude, and enjoyment for the former, and as e.g. completion rates and data quality in the case of the latter (Hamari et al., 2014). In the case of the latter, Downes-Le Guin and colleagues (2012) found that completion rates for the gamified version of a survey were much lower than other versions, namely text-only, decoratively visual and functionally visual surveys (58% as opposed to 94%). Many of these non-completers quit during the intro-duction of the game or during the loading process, but when these non-completers are excluded, the rate is still significantly lower (72%) than other versions of the survey. However, they did not find evidence for a demographic bias caused by the low completion rate. Additionally, the authors could not find evidence of increased data quality, although respondent enjoyment did

(15)

increase. Important to note here, however, is that the authors chose to have text-only questions in the gamified survey, so that the only difference is the presentation style. Furthermore, Keusch and Zhang (2017, p. 157) comment that the gamification narrative was not related to the survey topic, which could explain the lack of positive effect on data quality, and the low completion rate.

Other authors did find an increase in engagement due to gamification (Looyestyn et al., 2017). Reviewing literature on multiple gamification elements in combination showed that, es-pecially in a short period, engagement effects were positive. However, the effect of gamifica-tion on user engagement seems to wears off over time as the novelty fades. Addigamifica-tionally, some researchers investigating the influence of gamification on massive open online courses (MOOCs) found that gamification can increase retention rates and average scores (Krause et al., 2015). Furthermore, in an experiment involving the effects of gamified surveys on youth from seven to fifteen years old, Mavletova found evidence for a lower occurrence of straight-lining, less burdensome evaluations of the survey, and more enjoyment and ease of answering, although nonresponse rates were higher for gamified surveys (2014). The author attributes this to the inclusion of flash-based questions, as mobile users had a higher nonresponse rate than PC users (mobile devices generally do not support flash). Finally, Turner, Van Zoonen, and Adamou (2013) showed high satisfaction rates, with 96 per cent indicating that they would like to conduct a gamified survey again, indicating that it might be an effective way to prevent survey fatigue and increase completion rates.

In sum, gamification features can be a welcome addition to increase user engagement and to enhance the experience, but attention should be paid to the completion rate and whether or not there is self-selection present that causes demographic bias. Gamification is more likely to have positive outcomes on data quality if multiple gamification elements are combined, complete, and used to increase engagement in a short time, or in a one-time survey. Finally, gamification of surveys seems to increase enjoyment, satisfaction, and ease of answering for those who completed the survey, and could therefore be meaningful to prevent occasions of survey fatigue.

3.2 Measurements

3.2.1 Data awareness

For the operationalisation of data awareness, a small task within the game is used as an indication of data awareness. At two moments while playing the game, respondents are asked to recognise data points in a picture of an area within the city; at a park and a town square. Respondents have 90 seconds to recognise all 10 data points, varying from security cameras to smart street lights, and from traffic loop sensors to parking meters. Respondents are also able

(16)

to select objects that do not collect data, such as a taxi. The more data collection points a person recognised, the higher the score, up to a maximum of 20 data points. Figure 1 shows one of these puzzles. The results of these scores are subsequently standardised in order to allow com-parison. Respondents with a score lower than one standard deviation below the mean were defined as having low data awareness, while respondents with a score higher than one standard deviation above the mean were attributed high data awareness. These scores are used as a measure of data awareness. Finally, in order to verify the validity of this classification, all regressions are consequently estimated with 0.7 standard deviation as the threshold for low and high data awareness.

3.2.2 Willingness to share data and purpose of data collection

At varying times during the game, players are asked whether or not they are willing to share their data for a specific reason. After learning about the data sharing activity, players either participate and share their data, or abstain. These activities are grouped by their under-lying rationales, namely: data sharing with financial incentives, convenience incentives, secu-rity incentives, and social incentives. Table 1 displays the eight data sharing activities, as well as the incentive that is implied in the situation. These incentives accordingly refer to the pur-pose of data collection. Whereas data that are shared for financial and convenience incentives are identified as reasons of personal benefit, data that are shared for security and social reasons have no clear personal benefit, and mainly serve a wider social goal.

Note that one of the data sharing choices for security incentives, sharing your location data, did not explicitly indicate the presence of a choice for the respondents. Instead, respond-ents could either accept the request for data, or click on a link to know more, as is shown in the table. After clicking on this link, they could reject the request. This was done to make the option more similar to a real-world scenario, where citizens are regularly requested to share their data through these techniques. Furthermore, letting the security guard scan the respondent’s identity card was a condition for accessing the pier in the smart city, which had a number of attractions, stalls, and shops on it.

(17)

Figure 1. Data points recognition mini-game

Three possible data points have been selected on this picture, indicated by a white circle (drone, public transport gates, parking surveillance car). An illustration of the other data recognition mini-game can be found in Appendix A.

Table 1. Data sharing choices

Data sharing

activity Description of data-sharing choice Incentive

Neighbourhood discount card

With this neighbourhood card you can get a discount on activities in this area. Fill in your email address and sign here, and receive your card instantly!

Financial

Smart bin You drank something, and discard your empty container at a smart bin. Scan your bank card, and the packaging deposit will automatically be refunded.

Financial

Public transport card

You are taking the metro to [location], will you check in with your personalised public transport card, or do you buy a separate ticket?

Convenience

Surveillance route

You are heading towards [location], will you take the short route with surveillance cameras, or do you take the long route to avoid cameras?

Convenience

Smartphone location data

The city centre is expecting many visitors today. Your phone’s location data are being used to guide visitors safely through the centre. If you want to know more, go to settings [link].

Security

Identity card scan

Security guard: “For security reasons I need to scan your identity card. Otherwise you cannot enter the pier.”

Security

Souvenir picture

You passed a number of surveillance cameras today. Can the municipality save images with your facial features? As a thank you, you will receive a nice video of these images to share with family and friends!

Social

Rating Do you wish to rate your visit today and share it on social media?

(18)

Table 2. Trust in government questions on a scale from 0 to 10

Westin questions Privacy concern

I believe that people have no control over the personal data that the government collects of them.

Intrusion

I believe that the government deals with citizens’ personal data in a

neat and trustworthy manner. Manipulation

I believe that the laws and rules in our country ensure the protection of the privacy of citizens.

Discrimination

3.2.3 Trust in government

To measure the degree of the respondent’s trust in the government, the game asks re-spondents to answer three questions regarding their attitude towards the governments, shown in Table 2. Westin (2001) notes three components of privacy from his surveys: concern for intrusion, manipulation, and discrimination. The questions were answered on a slider scale, from 0.0 to 10.0. The first question, relating to intrusion concerns, was reversely coded before analysis, such that high scores reflects high degrees of trust. The average of the answers to these questions were then used to create a general measure of the individual’s trust in the gov-ernment.

3.3 Analytical model

First, two simple OLS regressions are estimated to explore the association between data awareness and socio-economic characteristics, and to test hypothess 2a and 2b. This model is then expanded to control for trust, to see if the degree of trust of an individual is related to one’s awareness, which leads to the following models.

𝐻𝐴𝑊𝐴𝑅𝐸𝑖 = 𝛽0+ 𝛽1𝑇𝑅𝑈𝑆𝑇_𝐼𝑁𝑇𝑖+ 𝛽2𝑇𝑅𝑈𝑆𝑇_𝑀𝐴𝑁𝑖+ 𝛽3𝑇𝑅𝑈𝑆𝑇_𝐷𝐼𝑆𝑖 + 𝛽4𝑋𝑖 + 𝜀𝑖 𝐿𝐴𝑊𝐴𝑅𝐸𝑖 = 𝛽5+ 𝛽6𝑇𝑅𝑈𝑆𝑇_𝐼𝑁𝑇𝑖+ 𝛽7𝑇𝑅𝑈𝑆𝑇_𝑀𝐴𝑁𝑖+ 𝛽8𝑇𝑅𝑈𝑆𝑇_𝐷𝐼𝑆𝑖 + 𝛽9𝑋𝑖+ 𝜇𝑖

Here, the outcome variables HAWARE and LAWARE refer to an individual having a high data awareness score or not, and having a low data awareness score. both outcome varia-bles are binary, which makes them linear probability models. TRUST_INT, TRUST_MAN, and TRUST_DIS refer to trust regarding intrusion, manipulation, and discrimination by the govern-ment of individual i, and X is a vector of socio-economic characteristics.

Then, a similarly basic regression of socio-economic variables on the willingness to share data is estimated. Next, I expand this model, first to include dummies for high and low

(19)

awareness, then with a composite for all measurements of trust, and finally with each aspect of trust measured separately. In the final form, the model is equated as follows:

𝑊𝐼𝐿𝐿𝐼𝑁𝐺𝑖 = 𝛾0 + 𝛾1𝐻𝐴𝑊𝐴𝑅𝐸𝑖+ 𝛾2𝐿𝐴𝑊𝐴𝑅𝐸𝑖 + 𝛾3𝑇𝑅𝑈𝑆𝑇_𝐼𝑁𝑇𝑖+ 𝛾4𝑇𝑅𝑈𝑆𝑇_𝑀𝐴𝑁𝑖 + 𝛾5𝑇𝑅𝑈𝑆𝑇_𝐷𝐼𝑆𝑖+ 𝛾6𝑋𝑖 + 𝜇𝑖

Here, the outcome variable WILLING is not a binary variable, but an interval measuring the amount of times the respondent shared their data, with a maximum of eight times.

Finally, to check for differences in patterns between each choice to share data, the fol-lowing model is estimated simultaneously using eight equations, using Conditional Mixed-Process (CMP) regression (Roodman, 2011):

𝑊𝐼𝐿𝐿𝐼𝑁𝐺𝑖𝑝= 𝜃0+ 𝜃1𝐻𝐴𝑊𝐴𝑅𝐸𝑖𝑝+ 𝜃2𝐿𝐴𝑊𝐴𝑅𝐸𝑖𝑝+ 𝜃3𝑇𝑅𝑈𝑆𝑇_𝐼𝑁𝑇𝑖𝑝+ 𝜃4𝑇𝑅𝑈𝑆𝑇_𝑀𝐴𝑁𝑖𝑝+ 𝜃5𝑇𝑅𝑈𝑆𝑇_𝐷𝐼𝑆𝑖𝑝+ 𝜃6𝑋𝑖𝑝+ 𝑣𝑖𝑝

Here, WILLING refers to the willingness of individual i to share data for purpose p, which are eight different choices grouped in four categories of two choices each: financial, convenience, security, or social. HAWARE and LAWARE refer to the dummies for high and low awareness, respectively, while TRUST_INT, TRUST_MAN, and TRUST_DIS refer to trust regarding intrusion, manipulation, and discrimination by the government.

3.3.1 Robustness checks

OLS in the case of binary outcome variables has two main advantages, that being rela-tive ease of interpretation of the coefficients, and computational demand in the case of CMP. However, LMP outcomes are vulnerable to heteroskedasticity, and could additionally lead to nonsensical expected probabilities (p > 1 or p < 0) in extreme cases. To control for heteroske-dasticity, robust standard errors are estimated. To check the robustness of the eight simultane-ously estimated equations, they will also be estimated with probit models, as this model ensures the predicted probabilities will be between 0 and 1. Finally, to test the validity of the classifi-cations of high and low data awareness, regressions are estimated with 0.7 standard deviation from the mean as the threshold for low and high data awareness. This classification enlarges both high and low data awareness groups, while reducing the size of the average data awareness group.

(20)

4 Data

The next section will first give an overview of the demographic characteristics of the sample, after which averages on data awareness, willingness to share data, and trust are given per socio-economic characteristic. Finally, descriptive statistics on data sharing percentages per purpose of data collection are shown and discussed.

4.1 Data collection and sample demographics

The data used in this paper were collected from a representative panel by Motivaction between the 23rd and 26th of May 2019 in the Netherlands. In total, 2800 respondents started the game, 2118 of whom completed the game, a completion rate of around 75%. This is in line with the completion rate found in research on gamified surveys (Downes-Le Guin et al., 2012). Next, all respondents with incomplete information regarding their income were removed, leav-ing a sample of 1646 respondents. Since the answer on one measure of trust was missleav-ing for one individual, this respondent was excluded in regressions including trust.

A number of personal socio-economic characteristics were measured. For age, respond-ents are categorised in five age groups, from 16 until 80 years old. Income levels were coded as a dummy variable (Lower than modal income = 1, modal income or higher = 0), in order to analyse the effects for citizens who are more restrained by their level of income. Highest level of attained education was operationalised as a dummy variable, being 1 for higher education (applied university or university degree), and 0 for lower levels. Additionally, place of resi-dence was also included as a dummy, being 1 for citizens living within the agglomeration of the three largest cities in the Netherlands (Amsterdam, Rotterdam, and The Hague), and 0 for anywhere else. Finally, gender was coded as a dummy variable, with 1 for male respondents, and 0 for female respondents. Table 3 presents these demographics and compares them with population percentages from the Netherlands. While the surveyed sample is on average higher educated and older than the Dutch average population, the percentages of gender and urbanised residence are similar to the population average.

(21)

4.2 Data awareness

Figure 2 shows the distribution of scores on data awareness, as well as the average amount of times data was shared for each possible score. The blue vertical lines represent bor-ders between low, average, and high levels of data awareness, based on one standard deviation from the mean. All respondents who identified less than seven data points were classified as having low data awareness, while a score of higher than twelve was defined as a high data awareness score. To verify the results, regressions are also run with the thresholds for low, average and high data awareness on the orange lines, making low and high data awareness scores below eight or above eleven points, respectively. Additionally, the blue dotted line shows the average number of times data was shared for each score on the data recognition puzzle. Disregarding the averages for extremely high and low scores on the data recognition puzzle, a general downward trend is visible, indicating that as people recognised more data points, they shared their data slightly less frequently.

Section A of Table 4 shows the percentages of respondents that ranked low, average, and high on the data recognition task within the gamified survey. These results give a first impression on the different levels of awareness between various groups of citizens. First, men

Table 3. Sample demographics

Key Demographics Sample frequency (%) Dutch population, %*

Gender Male 808 (49.09) 49.65 Female 838 (50.91) 50.35 Age 16-34 years 197 (11.97) 31.51 35-44 years 276 (16.77) 14.96 45-54 years 330 (20.05) 18.28 55-64 years 436 (26.49) 16.94 65-80 years 407 (24.74) 18.30 Education Higher education 677 (41.13) 30.41 Secondary/vocational school or lower 969 (58.87) 69.59 Income Lower income (<modal**) 687 (41.74) N/a

High income (≥modal**) 959 (58.26) N/a

Residence

Urbanised*** 266 (16.16) 15.91

Other 1380 (83.84) 84.09

Note: * Data obtained from CBS Statline, 2019.

** Defined as 79% of the average income per fte by CBS; €36,500 in 2019.

*** The three largest cities of Amsterdam, Rotterdam, and The Hague and their neighbouring municipalities.

(22)

Note: Horizontal gridlines represent intervals on the left axis. The blue vertical lines on the histogram show the borders between low, average, and high data awareness scores based on a standard deviation of 1, while the orange vertical lines represent the borders when 0.7 standard deviation is set as the threshold.

had high scores on data awareness slightly more frequently than women, but the percentages for low data awareness were similar. For respondents under 35 years, high scores were much more frequent (36.04%) than for older categories (13.27% for 65-80 years). As we move to-wards older age groups, the relative frequency of high data awareness scores drops, while low scores become more frequent. Higher educated respondents more frequently scored relatively high on the data recognition puzzle, whereas lower educated respondents scored low and aver-age more frequently. Next, the percentaver-ages for income levels give some first relevant insights for hypothesis H2a: the percentages of high-scoring respondents with a low income are much lower than for individuals with higher income levels. Last, the respondents living in more ur-banised environments scored more extreme; both high-scoring and low-scoring percentages were higher than for other respondents.

4.2.1 Awareness of data points

Next, we look at the recognition of each data point separately. The first hypothesis pro-poses a relation between the conventionality of points of data collection and the recognition of these points as data collectors by respondents. Figure 3 shows the percentage of respondents that were able to recognise each point that collects data. In general, the figure indicates support for the hypothesis: among the most recognised data points are well-known and often-used data points, such as a public WiFi network, public transport gates, and parking meters. Objects that interact with citizens in an indirect manner, such as WiFi trackers, traffic loop sensors, and a bodycam on a police officer were generally less often recognised. There are some interesting

2 3 4 5 0 50 100 150 200 250 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 Avera ge time s d at a sh ar ed (max. 8) Fre q u en cy

Figure 2. Histogram of data point recognition; average data

shared

(23)

outliers, however. First, the smartphone was least recognised, despite being an enormous source of data in the smart city. This could be partly explained due to its inconspicuousness within the image. Second, only around 17 percent of the respondents recognised the city hall as a data collector. Furthermore, while surveillance cameras have received much attention in pub-lic debate and media since the start of their appearance, less than half of the respondents iden-tified them as data collection points. In sum, it seems that conventionality alone is not a suffi-ciently explanatory condition for awareness, but that other factors, such as the degree of inter-action of citizens with the data collection point, are also relevant for the level of awareness. As an example, the low recognition score for the weather station data point could be explained because it is an object that gathers data without interaction with citizens.

0 10 20 30 40 50 60 70 80 90 100 Perc en ta ge

(24)

Table 4. Sample demographics and scores on awareness, data sharing, and trust (A) (B) (C) Demographic characteristics Low data awareness, % Average data awareness, % High data awareness, % Mean number of times data shareda Mean trust: intrusionb Mean trust: manipulationb Mean trust: discriminationb Total 17.68 (N=291) 61.72 (N=1016) 20.60 (N=339) 4.10 (1.91) 3.95 (2.50) 5.57 (2.54) 5.39 (2.42) Gender Male 17.57 60.40 22.03 4.08 (2.03) 3.95 (2.69) 5.56 (2.72) 5.48 (2.59) Female 17.78 63.01 19.21 4.12 (1.80) 3.96 (2.31) 5.59 (2.36) 5.30 (2.25) Age 16-34 years 9.64 54.32 36.04 4.41 (1.64) 3.97 (2.21) 5.58 (2.22) 5.16 (1.98) 35-44 years 15.22 57.97 26.81 4.16 (1.86) 4.13 (2.45) 5.50 (2.42) 5.35 (2.24) 45-54 years 17.27 61.21 21.52 3.94 (1.97) 3.97 (2.51) 5.63 (2.64) 5.30 (2.57) 55-64 years 18.12 66.05 15.83 4.06 (2.00) 3.86 (2.59) 5.42 (2.57) 5.39 (2.46) 65-80 years 23.10 63.63 13.27 4.10 (1.94) 3.91 (2.59) 5.75 (2.65) 5.60 (2.57) Education Higher education 14.77 59.23 26.00 3.91 (1.87) 4.05 (2.50) 5.60 (2.51) 5.28 (2.35) Secondary/vocational education or lower 19.71 63.47 16.82 4.24 (1.94) 3.89 (2.51) 5.56 (2.57) 5.47 (2.47) Income

Low income (<modal) 21.40 63.03 15.57 4.07 (1.88) 3.81 (2.47) 5.40 (2.60) 5.25 (2.49)

High income (≥modal) 15.02 60.79 24.19 4.13 (1.94) 4.05 (2.52) 5.70 (2.50) 5.49 (2.37)

Residence

Urbanised 21.05 56.77 22.18 4.38 (2.03) 4.03 (2.51) 5.58 (2.49) 5.49 (2.42)

Other 17.03 62.68 20.29 4.05 (1.89) 3.94 (2.50) 5.57 (2.55) 5.37 (2.42)

Numbers in parentheses are standard deviations, unless stated otherwise. a Means show how many times respondents chose to share their data out of 8 choices. b Ratings for trust vary from 0.0 to 10.0. Sections (A) and (B): N=1,646, section (C): N=1,645.

(25)

4.3 Willingness to share data and trust in government

Figures 4-6 show the histograms for each of the questions measuring trust in the gov-ernment, where 0 is absolutely no trust, and 10 is complete trust. The figures show that trust regarding intrusion is much lower than for manipulation and discrimination. Although averages for the latter two dimensions are similar, the histograms show that ratings for trust regarding manipulations are more dispersed than discrimination, where respondents gave average ratings slightly more often instead. Returning to Table 4, high educated citizens reported higher ratings for intrusion, but lower for discrimination than low educated citizens. Finally, low income re-spondents gave lower ratings for each measure of trust than higher income citizens. The corre-lation matrix of the ratings for all dimensions of trust can be found in Table 8 of Appendix B. Sections B and C of Table 4 show the average number of times data was shared and average ratings for trust, respectively. Regarding the average times data was shared, the most willing respondents were either young citizens or citizens living in an urbanised environment. Higher educated individuals shared data less frequently than lower educated citizens. Low in-come also shows slightly lower average data shared.

0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9 8-8.9 9-10 0 50 100 150 200 250 300 Rating

Figure 4. Trust in government: Histogram of

intrusion ratings

(26)

4.4 Purpose of data collection

Next is the effect of the various purposes of data collection on the willingness of an individual to share their data. Figure 7 shows the percentages of respondents who chose to share never, once, or twice per category, and Figure 8 shows the percentage of respondents who were willing to share their data for each choice they were given in the gamified survey. Fifty percent of the respondents were willing to share personal information for financial incen-tives, both in the case of the neighbourhood discount card (50.7%) and the smart recycling bin (51.0%). More citizens were prepared to share personal information in return for extra conven-ience, such as a shorter route with CCTV (67.1%) or faster and easier access to public transport

0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9 8-8.9 9-10 0 50 100 150 200 250 300 Rating

Figure 5. Trust in government: Histogram of

manipulation ratings

0-0.9 1-1.9 2-2.9 3-3.9 4-4.9 5-5.9 6-6.9 7-7.9 8-8.9 9-10 0 50 100 150 200 250 300 350 400 450 Rating

Figure 6. Trust in government: Histogram of

discrimination ratings

(27)

(60.8%). If data was requested for security reasons, such as a scan of an identity card, over half of the respondents was willing to do this (62.9%). The other security-related data sharing ac-tivity, smartphone location data, was shared significantly more often (85.3%). This can likely be explained through the fact that the option to reject was hidden behind a “learn more” button. Nevertheless, this gives some valuable insight in real-life behaviour of citizens, since the option of sharing location data is often presented using these kinds of nudges. Finally, data was only very rarely shared when coupled with social incentives: only 9.1% chose to share their infor-mation by rating their visit to the virtual city on social media, and slightly more were willing to let their facial features be saved in return for surveillance footage of themselves (23.3%). These differences in sharing percentages indicate that there might be a relation between the purpose of a data collection initiative and the willingness to share data.

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Social data shared

Security data shared Convenience data shared Financial data shared

Figure 7. Relative frequency of amount of data shared per

purpose

No data shared Data shared once Data shared twice

0% 10% 20% 30% 40% 50% 60% 70% 80% 90% 100% Souvenir picture (SOC)

Rating (SOC) Smartphone location data (SEC) Identity card scan (SEC) Public transport card (CON) Surveillance route (CON) Neighbourhood discount card (FIN) Smart bin (FIN)

Figure 8. Data sharing per choice

(28)

5

Results

This chapter discusses the results of the data analysis. First, the relation between the various socio-economic measurements and measurements of trust, and data awareness is ex-plored, to see whether certain groups within the population are associated with relatively high or low data awareness. Then, the results of the regression models with the willingness to share data as the outcome variable are presented, first with only socio-economic characteristics as the explanatory variables, and then expanded with variables for data awareness, and trust in the government. Finally, the willingness to share data is analysed for each specific data sharing choice, first using only socio-economic characteristics, and then including variables for data awareness and trust.

5.1 Data awareness

Results of the first regression explore the association between socio-economic charac-teristics and having high or low data awareness. Table 5 shows the coefficients of gender, age, education, income, and urbanisation on having high and low data awareness. First, all signifi-cant coefficients are either positive for low data awareness and negative for high data aware-ness, or vice versa. Next, coefficients are more often significant and stronger for high data awareness than for low awareness. For instance, while there is no gender distinction within respondents with low data awareness, men were 5 percentage points more likely to have a high data awareness score than women. Additionally, the youngest age group was more than 10 percentage points more likely to have high awareness than the reference group, and less likely to have low awareness scores. The eldest respondents show an opposite association, being 13 percentage points less likely to have high data awareness, and almost 7 percentage points more likely to have low awareness scores. High education was only significantly associated with high data awareness. Finally, Hypothesis 2a proposed a positive relation between having a low income and having low data awareness. Results from Table 5 show support for this hypothesis, indicating that low income respondents were 5 percentage points more likely to have low data awareness than higher income respondents, and were over 5 percentage points less likely to have high data awareness.

Additionally, the models including measures of trust indicate that having more trust regarding the manipulation of data by the government is associated with an increased likelihood of having low data awareness. By contrast, for trust regarding government intrusion, lower ratings of trust were associated with higher likelihoods of having high data awareness.

(29)

This implies that worried citizens know more, and those who do not worry, know less, although the association is limited in size and tied to specific dimensions of trust. These results remain consistent when the threshold for having a low or high data awareness score was expanded with one point.

5.2 Willingness to share data

Table 6 shows the results of the models with the willingness to share data interval var-iable as the outcome varvar-iable. The first model estimates the association between socio-eco-nomic variables and the willingness to share data. The willingness to share data is measured in this table as the amount of times the respondent chose to share his or her data in the game at the options shown in Table 1. In total, a citizen can share their data 8 times, and a coefficient of -0.446 for high education indicates that high educated citizens on average shared data 0.446 times less than non-high educated citizens. The second column expands the model by including the relation of high and low data awareness on the willingness to share data. Model 3 includes a measure of trust in government, using the average scores on all Westin questions. Finally, the fourth column shows the effect of each aspect of (dis)trust in government on the willingness to share data.

Table 5. Regressions of socio-economic variables and trust on data awareness

Low data awareness High data awareness

1 2 1 2 Trust in government: Intrusion 0.001 (0.004) -0.011*** (0.004) Manipulation 0.011** (0.005) 0.001 (0.005) Discrimination -0.003 (0.005) 0.003 (0.005) Gender (Reference = Female) -0.01 (0.020) -0.008 (0.020) 0.051** (0.021) 0.050*** (0.020) Age (Reference = 35-44 years) 16-34 -0.062* (0.035) -0.063* (0.035) 0.107*** (0.037) 0.106*** (0.037) 45-54 0.016 (0.031) 0.015 (0.031) -0.047 (0.032) -0.047 (0.032) 55-64 0.019 (0.030) 0.020 (0.030) -0.101*** (0.031) -0.103*** (0.031) 65-80 0.068** (0.030) 0.065** (0.030) -0.131*** (0.032) -0.134*** (0.032) High education -0.027 (0.020) -0.028 (0.020) 0.049** (0.021) 0.050** (0.021) Low income 0.050** (0.020) 0.054*** (0.020) -0.056*** (0.021) -0.057*** (0.021) Urban residence 0.038 (0.025) 0.038 (0.025) 0.021 (0.027) 0.022 (0.027) Constant 0.148*** (0.028) 0.095** (0.038) 0.237*** (0.029) 0.262*** (0.039) R2 0.019 0.024 0.050 0.054 N 1646 1645 1646 1645

(30)

5.2.1 Socio-economic characteristics

Starting with individual characteristics, gender is not found to be significantly related to willingness to share data in any of the models. For age, only respondents between 45 and 54 years scored consistently significantly negative on the willingness to share compared to the reference group. Although coefficients for the younger group were consistently positive, and consistently negative for the older age groups, only some were statistically significant, and only when measures for data awareness and trust were included. This provides some evidence in support of the hypothesis that older respondents are less willing to share their data, more so since the effects more frequently become significant when the other explanatory variables are included and the coefficient of determination improves. Higher education is highly correlated to a lower willingness to share data in all four models, while living in (the neighbourhood of) the three largest cities is associated with higher willingness to share data. A lower income is significantly related to a lower willingness to share in the first and second column, but this effect disappears when a measure of trust in government is included in the other columns. This could indicate that having an income below modal level is related to having lower trust in the government, and through this channel their willingness to share data is lower.

5.2.2 Data awareness and trust in government

In all regression models, high data awareness is strongly related to lower willingness to share data, meaning that respondents who identified more data collection points within the game were less inclined to share their data. Inversely, low data awareness scores are associated with a higher willingness to share data, although results were significant at lower levels (p<0.05 in model 2, p<0.10 in models 3 and 4). These results remained significant when the threshold for having a low or high data awareness score was expanded with one point, although the cor-relation coefficients expectedly converged to 0. Interestingly, these results contradict the rela-tion that is put forward in hypothesis 3. Although awareness can be considered as a prerequisite for participation in data initiatives, these results in the context of the smart city seem to indicate that awareness has a deterring effect on participation. The more aware citizens are of data col-lection in the smart city, the less they choose to share their data. This association could also be explained through reverse causality, however, implying that people who are less inclined to share their data, read up and increase their awareness on data.

Column 3 includes a composite measure of trust in government, and the final column includes each measure of trust separately. Note that for the models including a measure of trust, the coefficient of determination increased significantly, indicating that trust is a relatively im-portant explanatory factor in determining the willingness to share data. In both estimations, an increase in trust in the government was strongly related to an increase in the number of times data was shared. In the last column, where trust regarding intrusion, manipulation, and

(31)

discrimination were estimated separately, all dimensions of trust were highly significantly re-lated to the willingness to share data. Of all three, manipulation is the dimension with the strongest relation with the willingness to share data. Meanwhile, intrusion concerns affect the willingness of citizens to share their data least of all. However, in general these results provide support for the hypothesis that trust in government positively influences the willingness to share data.

Table 6: Relation of awareness and trust on willingness to share data

(1) (2) (3) (4)

Independent variables Coeff S.D. Coeff. S.D. Coeff. S.D. Coeff. S.D.

Low data awareness

(Reference = average data

awareness) 0.314** 0.126 0.215* 0.116 0.198* 0.116

High data awareness

(Reference = average data

awareness) -0.350*** 0.121 -0.316*** 0.112 -0.342*** 0.111

Overall trust in

government 0.382*** 0.022

Trust regarding intrusion 0.065*** 0.018

Trust regarding manipulation 0.183*** 0.023 Trust regarding discrimination 0.119*** 0.024 Gender (Reference = Female) 0.009 0.099 0.030 0.098 0.035 0.091 0.039 0.090 Age (Reference = 35-44) 16-34 0.282 0.178 0.338* 0.178 0.350** 0.164 0.336** 0.163 45-54 -0.264* 0.156 -0.285* 0.155 -0.275* 0.143 -0.297** 0.143 55-64 -0.171 0.149 -0.213 0.148 -0.175 0.137 -0.187 0.136 65-80 -0.116 0.152 -0.183 0.152 -0.222 0.140 -0.250* 0.140

High education (Reference

= Lower education) -0.446*** 0.101 -0.421*** 0.101 -0.395*** 0.093 -0.387*** 0.093

Low income (Reference =

higher income) -0.190* 0.101 -0.225** 0.101 -0.106 0.093 -0.106 0.093

Urban residence 0.372*** 0.128 0.368*** 0.127 0.341*** 0.117 0.349*** 0.117

R2 0.021 0.032 0.180 0.188

N 1646 1646 1646 1645

Standard errors are in the second column of each regression; level of significance is indicated as *<0.10, **<0.05, ***<0.01. Results are coefficients, and show the association between independent variables and the willingness to share data. With dummies for data awareness the socio-economic variables, the coefficient for e.g. 16-34-year-olds shows how many more/less times this group shared their data out of a max. of 8 than the reference group. In the third model, 16-34-year-olds shared data 0.35 times more than 35-44-year-olds. With the ratings of trust, coefficients show the association between an increase in the rating of 1 on the number of times data was shared.

(32)

5.3 Purpose of data collection

Next is the final hypothesis, analysing the dependency of willingness on the purpose of data collection. Table 7 offers a more detailed account of the influence of each factor per data sharing activity. The table is divided in two sections, the first of which controls only for the relation with socio-economic characteristics. Section B includes measures of data awareness and trust in government in the regression. The table displays coefficients of the independent variables on the decision whether or not to share personal data at that specific data point. A gender coefficient of 0.075 therefore indicates that men are 7.5 percentage points more likely to share their data than women (the reference group).

Some additional comments need to be mentioned before the results are discussed. First, correlation coefficients are consistently positive and generally significant, and can be found in Tables 9 and 10 of Appendix C. This is an indication that there are unobserved factors that have a significant correlation with multiple data sharing choices, and they have the same association – either positive or negative – with both options. Second, in order to test whether the results remain consistent with different thresholds of low and high data awareness, the regressions were re-estimated with the expanded groups (lower than eight or higher than eleven data points). These changes did not lead to different results of the coefficients. An additional re-estimation using the probit model did not lead to meaningful changes in the size and signifi-cance of the coefficients. The next sections contain a discussion of the most important results per purpose of data collection.

5.3.1 Financial incentives

Columns one and two of Table 7 show the regression estimates for both financial data sharing choices, respectively the neighbourhood discount card and the smart recycling bin. Looking at Section A, having completed higher education is significantly related to a 10 per-centage points lower likelihood of sharing data (p<0.01). Respondents in the 65-80 years group were less likely to share their information, and the coefficients became stronger in Section B. An interesting observation is that gender was only a significant explanatory factor for the smart recycling bin option, with men being more likely to share their personal information.

Next are the variables included in Section B. The lack of significant coefficients indi-cates that data awareness is not related to the likelihood of an individual to share their data for a financial incentive. Although coefficients for citizens with high data awareness were still negative, congruent with the results from Table 6, they lost their statistical significance. Citi-zens with different levels of awareness seem to make similar decisions when financial benefits reward citizens for sharing their data.

Referenties

GERELATEERDE DOCUMENTEN

2) Their mindset and current awareness regarding water consumption; 3) In order to better connect the data with the user, the data needs to be personalized, this can be done

Peter has had the Smart Meters application now for about 8 months and, although that is what they promised, accessing his data is rather complicated and he still does

the kind of personal data processing that is necessary for cities to run, regardless of whether smart or not, nor curtail the rights, freedoms, and interests underlying open data,

The exchange of data is made possible by these functional building blocks such as tags that identify citizen, sensors that collect data about citizens, actuators

De afwezigheid van gebouwomtrekken op de ferrariskaart en de Atlas van Buurtwegen op de betrokken percelen is een bewijs voor het feit dat alleszins in de 18 de en zeker ook

1 0 durable goods in household 1 Heavy burden of housing and loan costs 2 1 durable good in household 2 Heavy burden of housing or loan costs 3 2 durable goods in household 3

Introducing a right for data subjects to know the value of their personal data may increase their awareness and controllership on their own personal information