Mobility turns to data: Tracing the data protection programs of smart mobility developers

(1)

mobility turns to data

t r a c i n g t h e d a t a p r o t e c t i o n p r o g r a m s o f s m a r t m o b i l i t y d e v e l o p e r s

m a s t e r t h e s is b y D e n n i s A l e xa n d e r L e e f t i n k ( 2 0 1 8 )

s u pe r v i s o r T h o m a s Po e l l s e c on d re a d e r C h a r l e s F o rc e v i l le

(2)

The author grants the University of Amsterdam the non-exclusive right to make this work available for non-commercial, educational purposes, provided that this copyright statement appears on the reproduced materials and notice is given that the copying is by permission of the author. To disseminate other -wise or to republish requires written permission from the author.

(3)

A B S T R ACT

Connected and ‘smart’ vehicles can greatly extend what types of information can be obtained from people and their environments, potentially undermining data protection efforts as the General Data Protection Regulation (GDPR) not only because their mobile sensory capacities expand the scope of what data might possibly be acquired but also because the regulation itself may not be properly attuned to how these technologies operate. Where the GDPR man -dates data collection capacities to be limited to what is ‘strictly necessary’, developers can often be observed wanting to expand these capacities, if not for securing and optimising (vehicle) systems than for secondary uses beyond sys -tems functioning (e.g. targetted adverting, user segmentation, behavioural prediction). It seems that currently, actors even speak and maintain their own data protection languages, with ideals not easily translatable between the two. To bridge this ‘language gap’ new approaches need to be found to align the mindsets of actors—especially when consid ering that most data protection measures can be tacitly implemented into products and services while providing col -lectors similar processing capabilities as before. By turning to the languages of smart mobility developers as Alphabet, Tesla and Uber, this thesis traces the data protection strategies (‘programs’) that can be observed in their discourse, while introducing a new set of target programs that can be more easily communicated to ‘non-speakers’. I ask: How key is data protection to smart mobility discourse? Building on Bruno Latour’s actor network theory, quantitative (co-word analysis) and qualitative (KWIC-coding) methods are combined to trace the programs these developers align with, constructing eight large repositories of developer blogs to uncover which data protection programs are key to their efforts. It is found that data protection programs are not quite key to their discourse, with developers aligning with data expansion, tacit data acquisition, profiling techniques and fusing disparate data sources rather than minim -ising their collection efforts, letting individuals roam unobserved or keeping data isolated, while giving moderate atten-tion to informing users about what data is collected and how users can set their privacy preferences. In addiatten-tion, se-mantic backing is provided to each program in the form of keywords lists, which can be used to interrogate data-re-lated activities and pursuits beyond the investigated sample.

(4)

(5)

foreword

acknowledgements

This thesis could not have been possible without the aid of my supervisor Thomas Poell, always quickly coming to aid when I got stuck, providing essential feedback and guiding me through the increasingly tangled world of platforms and data protection. I would also like to thank Bernard Rieder for providing me ample room to experiment and learn new analytical skills during our tutorial sessions, which have been immensely valuable to this thesis. I am also sincerely indebted to Thomas Mol for helping me build a more than capable web scraper, which has allowed me to retrieve textual data in any shape and form. In addition, I would also like to thank Joost Oliemans for running through and checking the mathematical side of this thesis (who could have thought Whatsapp would be the ideal medium for exchanging models and calculations…). I would also like to express my gratitude to Tony de Die and Robbert Werker for ex-tensively checking my work providing me with valuable insights outside the field of media studies, my mum and dad for their support over the last year, my sister Julia for her endless visual inspiration and my sisters Nicole and Leonie for making sure each visualisation could be easily understood. Lastly, I want to thank Seastar So, Charlie Vielvoye and the other ‘research punks’ for the memorable group work during my masters.

(6)

foreword

programming a language

What if we used our programming capacities to target the world of language, rather than the world of software?

(7)

DEFINITIONS

S M A R T MO B I L I TY

A type of mobility enhanced by information technologies, including smart mobility technologies (SMT’s) that enhance people’s mobile capacities while collecting any data when in use. D IS CO U R S E

The culmination of actors talking about certain topics in particular balanced or unbalanced ways—in key at times and dissonant at others.

P R OG R A M S

A plan of action or set of strategies put forward by actors to realize certain goals, with actors aligning with different programs in varying degrees.

S TA B I L I S AT I O N

A process of actors aligning with similar programs until stable viewpoints/languages emerge.

K E Y N E S S

(8)

i n t r o d u c t i o n

a n e w l a n g u a g e 1 2 q u an t i f y i n g d i s c o u r s e 1 3 t h es i s o v e r v i e w 1 4 l i m i t a t i o ns 1 5

d a t a p ro t e c t i o n re s e a rc h

1 . 1 d a t a as p ro te c t i o n 17 1 . 2 d a t a s a ns p ro te c t i o n 19 1 . 3 d a t a o u g h t p ro te c t i o n 2 0 1 . 4 d a t a an d p ro te c t i o n 22 1 . 5 t u r n i n g to l a to u r 2 3

t r a c i n g d i s c o u r s e

2 . 1 i d e n t i f y i n g p ro g r a m s 25 2 . 1. 1 ro o t s a n d t a rg e t s 26 2 . 1. 2 t r a ns l a t i o n s 27 2 . 2 s e l e c t i n g s o u rc es 29 2 . 3 an a l y s i n g te x t s 3 1 2 . 3. 1 q u a n t i f y i n g c o - wo rd s 3 2 2 . 3. 2 q u a l i f y i n g ke yw o rd s 3 3 2 . 3. 3 s um m a r i s a t i o n 3 6

c o l l e c t i o n p ro g r a m s

3 . 1 s c o p e 40 3 . 1. 1 l i m i t 41 3 . 1. 2 e x pa n d 42 3 . 2 a c q u i s i t i o n 4 4 3 . 2. 1 i n f o r m 4 4 3 . 2. 2 a c q u i re 4 5 3 . 3 m o n i t o r i n g 47 3 . 3. 1 d r i f t 48 3 . 3. 2 t r a c k 48

o n e

t w o

t h re e

(9)

p ro c e s s i n g p ro g r a m s

4 . 1 f u s i o n 50 4 . 1. 1 i s o l a te 50 4 . 1. 2 f us e 51 4 . 2 a t t r i b u t i o n 5 2 4 . 2. 1 d e t a c h 53 4 . 2. 2 d e r i v e 53 4 . 3 g u i d a n c e 55 4 . 3. 1 s e t 5 6 4 . 3. 2 n u d g e 56

m a i n t e n a n c e p ro g r a m s

5 . 1 s h i e l d i n g 5 9 5 . 1. 1 s h ie l d 6 0 5 . 1. 2 s c a t te r 6 0 5 . 2 wa rd s h i p 62 5 . 2. 1 o p e n 62 5 . 2. 2 c l o s e d 6 3 5 . 3 s to r a g e 65 5 . 3. 1 t r a ns i s t 65 5 . 3. 2 p e r s i s t 6 6

m a k i n g u p t h e ba l a n c e

6 . 1 ke y p ro g r a ms 6 9 6 . 2 f i n a l re ma r k s 72 6 . 3 wh a t ’s n e x t 7 4 -re f e -re n c es 77 a p p e n d i x a n e t w o r k m e as u re me n t s 87 a p p e n d i x bi n f o r m a t i o n re t r i e va l 8 9 a p p e n d i x c te c h n i c a l s c he m a t i c 91

f o u r

f i v e

s i x

(10)

introduction

mobility turns to data

With autonomous and ‘smart’ vehicles promising a trillion dollar industry it is no surprise that developers as Alphabet, Tesla and Uber are competing for a share in this emerging market (Colbert; McKinsey & Co). In re-cent years, these companies have been amassing data to unseen proportions, with Alphabet’s prototypes already generating multiple petabytes of sensor information a year, Tesla tacitly tracking vehicular per-formance and Uber aggregating driving behaviour to streamline its ride-hailing service (Lafrance Privacy

Apocalypse). Although these developers refrain from using the smart moniker, their prototypes and

pro-duction vehicles posses many of the familiar characteristics, combining arrays of sensors, ambient com-munication systems and machine learning techniques to paint a picture of their environments while in-creasingly making autonomous decisions (Hubeaux et al.; Edwards 4-5; Anderson et al. 58). In effect, smart vehicles can radically expand issues of privacy and surveillance, as they capture data that are

in-discriminate and exhaustive, recording any interaction of its users/their environments; are mobile and distributed across cities, roads and highways, widening the scope of what can possibly be recorded; plat-formized and interconnected, enabling data to easily flow between vehicles and places, and continuous,

routinely collecting and transferring data to maintain safe functioning (Kitchin Getting Smart 30). These are no developments of a far future: it is estimated that current production vehicles already gener-ate thousands of data points while accounting for more than a third of new cellular connections (West 1, Colbert 2). Despite collecting and generating copious amounts of data, current discourse often forgoes is-sues inherent to vehicular datafication, rather focusing on increases in safety and efficiency than the ‘fuel’ that’s driving them: the data collected at every turn. On the one hand, these technologies have the potential to restructure many aspects of life, enabling smarter modes of travel that can reduce conges -tions, increase vehicular safety, redefine city infrastructures and equalise access to transport (Hill; Featherstone & Thrift; Sheller & Urry). On the other hand, their sensory capacities pose to greatly affect people’s privacy, enabling developers to paint increasingly detailed pictures of one’s movements (places/duration of visit, real-time whereabouts, driving history, cabin-activity) which when combined with other datasets (mobile devices, smart infrastructures) enables a degree of inference beyond what information originally has been obtained—both for good and ill (Kitchin Getting Smart 31).

Alphabet, Uber and Tesla are no strangers to this. Alphabet has been involved in numerous data collec-tion scandals, including the colleccollec-tion of wifi home-addresses without permission (Kravetz), taking part in covert surveillance programs (Lee) and using its subsidiary services to resale user preferences and provide targetted ads (Angwin). Uber has become known for various data collection scandals, involving its ‘God View’ mode that enables staff to track any user travelling with its ride-hailing app (BBC), identity breaches that affected millions of driver-partners (Statt) and shadowing authorities through its infamous Greyball security tool (Lafrance Uber’s Secret). While lesser known for data malpractices, Tesla has admitted to tracking vehicle logs of journalists (Peckham; Muller), lockingoff users from accessing data re -corded by their vehicles (Thielman) and uses fleet data to scout cities and highways (Lambert Autopilot).

(11)

Such activities are not at all exclusive to mobility developers: today’s society has to increasingly cope with previously unmonitored activities being captured and processed by external parties (Kitchin

Real-Time City). In an effort to curb such unbridled collection practices, the European Union has recently

called into life a new regulation mandating data protection by design—a ‘data protection language’ to be understood by all (Fusters). This General Data Protection Regulation (GDPR) requires data collectors to implement data protection measures from the outset, mandating developers to minimise data collection to ‘what is strictly necessary’, enable users to consent/decline data collection activities, pseudonymise personal data, store data for a short amounts of time and enable users to erase their data upon request (EC). While commendable in aim, a current drawback lies in how data collectors can hide behind a veil of correctly implemented technical procedures while continuing to collect and process data in similar ways as before (Koops).1_{In reviewing the GDPR’s ‘privacy by design’ clause, Koops and Leenes argue that}

‘hard coding’ data protection into technologies may not be the least effective, as a patchwork of technical dependencies, social considerations and formal laws introduce conflicting and opposing requirements not easily translatable to technological designs (Koops & Leenes 166-168).

Edwards (5-8) and Colbert (13-14) for instance, point to some of the conflicting requirements raised by smart vehicles: Who determines what sensor data is ‘strictly necessary’ for vehicular functioning? How can users consent to vehicles passing down the road? If people travel on smart roads or make use of connected transport systems, should these be part of the same privacy bubble as people’s homes? What happens to vehicles crossing borders with different legislations? And to what extent can data generated by these vehicles be considered personal, especially when these are likely to be owned and operated in fleets? Even the EU seems conflicted: where multiple briefings have pointed towards the significant po -tential of the smart vehicle data to be ‘misused’ (Pillath) elsewhere they are lauded as “data golden mines” (Bonneau et al). A reframing of data protection discourse is then urgently needed, as stakehold -ers maintain different positions on both data’s collection and its protection. Koops and Leenes suggest:

What is often missing in the design of information systems is a way to bridge the abstract idea of data protection with the very concrete formulation of techno-rules. [D]ata protection is unlikely to be achieved by focusing on rule compliance through techno-regulation, which leads system developers into a morass of selecting and translating rules that cannot be simply translated into system design requirements. Instead, taking privacy seriously requires a mindset of both system developers and their clients to take into account privacy-preserving strategies (167-168).

Currently, stakeholders seem to be lacking a shared mindset on data and its protection, with different (regulatory) desires not easily translatable to technological design. On the one hand, regulatory desires seem to rest on a data minimisation mindset increasingly misplaced in the 21st_{century, as modern life}

has to come increasingly rely on end-to-end information technologies (Leenes et al.). On the other, developers often perceive data protection as a ‘red light’ halting technological progress rather than a “reas -onable set of rules guaranteeing personal freedoms” (Koops 11; cf Lambert Data protection; Weiland). Instead of dwelling on explicit technical specifications, Koops and Leenes then suggest that it is time to reach ‘the minds, rather than the computers of actors’ (168). In other words, fostering an environment upholding data protection may not so much be located in targetting technologies as in reaching the mindsets of actors. Following their reasoning, the success of data protection reform then depends on properly aligning stakeholders and their pursuits: current data protection/collection clashes may entirely rest on actors not yet agreeing on a shared cause, with developers often seeking to expand techno -logical capabilities while regulators are pressing for their limitation.

1 User consent can be implicitly ‘granted’ (or rather, ‘taken’) by letting people access a service for free, pseudonyimsation can be easily reversed by triangulating data points, and data is transferable between storage facilities preventing true erasure (Koops).

(12)

Taking this further, it seems that current stakeholders even speak and prefer their own data protection/

collection languages, with ideals not easily translatable between the two (Fusters). Fusters for instance,

demonstrates how the transfer of privacy-related concepts across languages introduces problems of ‘nonequivalence’: the issue of terms in one language not corresponding to terms in another (9). Al -though Fusters specifically focusses on translation difficulties across European languages, problems of equivalence may just as well apply to data protection itself: its ‘language’ may simply miss equivalent translations across domains. These misequivalences are at the root of what data protection research needs to address: “A promising way forward is that the people who commission IT systems, together with the system developers, try to internalise the data protection framework as part of their mindset,” Koops and Leenes suggest (168). If data protection is to materialise in practice, more effort then needs to be put towards aligning the mindsets of actors, targetting the misequivalances between them.

A new language

In other words, data protection efforts may only succeed when its strategies are understood by many; the more people ‘speak’ its language, the more stable a shared mindset might become. To address this is-sue, this research sets out to trace the different ‘data protection languages’ of developers, specifically those of Alphabet, Tesla and Uber. By doing so I aim to demonstrate that for data protection to be under -stood by all, existing dialects need not be forgotten, as imposing a regulatory language without taking note of existing peculiarities only serves to reinforce differences, or worse, make it as effective as actors communicating in a foreign language. That is not to say that regulatory efforts are or may be altogether ineffective, but that until now different ‘data dialects’ have not been thoroughly mapped—surely re-quired for a language understood by all. To reveal current language differences I ask: How key is data protection to smart mobility discourse? As will turn out, even this question’s wording contains a lan-guage on itself, approaching discourse as a culmination of actor voices, in key at times and dissonant at

others (I will return to this in a moment).

Where initially only smart vehicles were investigated, it became clear that such inquiries could not easily be separated from larger ‘smart mobility discourse’ if not for the many intelligent transit technologies embedded in society than for the extensive pursuit of people’s movement data. This makes smart mobil-ity technologies (SMT’s)2_{and their surrounding discourse fertile grounds for priming current data}

pro-tection/collection activities: they are designed to be as unobtrusive and seamless as possible, weaving themselves into the fabric of daily life while capturing any data about their use (Edwards, 17), combining various modes of transit (smart vehicles, public transport, ride-hailing services, bicycles) and technolo-gies (sensors, beacons, networks) to provide personalised mobility needs (Van Dijck, Poell & De Waal 98-101). As other successful mobile technologies (cars, mobile phones, etc.), do SMT’s have the potential to recede from people’s attention while saturating their lives, once more introducing a plethora of pri-vacy and surveillance related problems often invisible to the public (Feenberg; Kitchin & Dodge

Code/space). Where prior mobile technologies offered users a choice whether a product or service could

collect their data (at least, in principle), ubiquitous SMT’s make this increasingly hard if not downright impossible to do, often leaving users in the dark of what data they collect (Bloom et al.). Think Google

Streetview, in real-time and at every corner.

How developers frame these technologies provides clues on current developments and potential collec -tion/protection implementations (Birnhack et al.). More specifically, the technological traces actors (and as will be shown, artefacts) leave behind in their languages offers rich sites for gauging technological 2 Often used to indicate a fusion of IT and mobile technologies, ‘ensuring greater and more effective types of mobility while improving access of opportunities to urban populations’ (Batty et al.). However, I take SMT’s to indicate any technology related to optimising the mobility of people based on collecting and processing their transit data.

(13)

thought and developments (Coeckelbergh). By following these linguistic traces, this research then firmly builds on the ideas and methods proposed by Bruno Latour’s actor network theory, assuming that “relations between humans and artefacts do not develop separately, but are the coevolving result of [t]ech -nological translations” (Schulz-Scheaffer 131). In short, this approach perceives humans and technolo-gies as a mesh of interconnected actors acting and pursuing certain programs: stratetechnolo-gies working with and against one another to realise preferred outcomes (Latour Durable 105). In case of this research, broadly speaking, developer’s data expansion programs and related technologies working against regu-lator’s data limitation programs and related governing instruments and vice-versa.

By introducing ‘program thinking’, Latour posits language as a system of signs mediating socio-technical interactions, where meaning is not atomized from the social and even technologies are part of the mean -ing-making process (Latour Modern 63-64; Latour Missing Masses). This makes Latour’s vision of language not unlike the programming lanlanguages of software developers, comprising both technological op -erations (i.e. software code, processors) and higher level concepts (i.e. programmer visions, operating systems). In this view, a stable (programming) language arises from chaining and designing subpro-grams in certain ways—just as words within sentences can be chained to relay specific ideas. In other words, languages stabilise the more its subprograms align, for instance by minimising ill-behaved or er-roneous code(s) or by addressing ineffective programs. To feed this back to the research question, a stable data protection language then requires its programs to optimally align technical operations and higher level ideals, as actors from different backgrounds need to be able to connect with its contents; otherwise, their interaction may simply misalign.

A shared language may then only arise when many actors are able to connect with its programs, a pro-cess similar to people adopting a new language: the more it is spoken, the wider its reach, becoming a way of viewing and interacting with the world (Coeckelbergh ch. 6). This points to the urgency of invest -igating the programs actors align with, as once stabilized, their languages may become increasingly hard to transform (Latour Durable122). A successful data protection language thus requires its programs to be easily understood, enabling both speakers and ‘non-speakers’ to connect with its contents and ideally, foster easier communication among them. Similarly, how actors connect to current data protection pro-grams then provides a coarse indicator of their understanding: if data protection is already on the minds of developers, one would expect this to be visible in their ways of speaking rather than an afterthought unworthy of attention. A large part of this research then rests on tracing the connections developers make when discussing smart mobility, potentially revealing which programs are key to their discourse. Quantifying discourse

In effect, a Latourian discourse analysis comprises an analysis of the programs observable in the lan -guage of actors. If actors connect to certain programs, a natural extension is looking at the words actors connect when speaking; the stronger certain links, the more stable certain concepts, topics or ideas (He). As such, even language itself might be approached as a networked phenomenon, taking the words people use in their communication as literal links between concepts and ideals (Danowski Network Analysis). Ihese connections aid to reveal which topics are most tightly knit, for instance how data protection and smart mobility are intertwined, to what extent they are associated in practice and if not, what issues are associated with either. By drawing on the research tradition of ‘co-word analysis’, these questions might then be answered by looking into the frequent association patterns people make. Similar to bibliometrics and other disciplines where word linkages are used to map ‘semantic fields’, this method uses large tex-tual repositories (corpora) to trace the association patterns actors (un)consciously make (Callon; Ley-desdorrf; Stubbs). James Danowski, one of the earlier researchers to apply these methods describes:

(14)

A networked perspective can capture relationships among words. Using word-pairs as input, you map the language land-scape. On the map, instead of cities, the nodes are words. Rather than roads, there are links among them. Travelling through the network are fleets of social objects, the vehicles of concepts, ideas, or physical things that people linguistic-ally describe. As people link words to vehicles in their everyday communication, this propels them through the network, steering them in the flow of traffic, away from certain words or towards them. (Danowski Network Analysis 197; tran-scribed for clarity)

Where other statistical techniques are often used to ground and postulate ‘language truths’ (Porter & Ha-gety; Boyd & Crawford), Danowski’s approach is rather one of semantic equivalence: which words are strung together by actors and how can one inquire these linkages? Rather than considering direct links between actors, this effectively considers actors linked by the words they use instead. At its core, Danowski proposes to start at the level of word pairs, iteratively following links to other pairs. Repeat this for many terms and semantic structures emerge; repeat this for many texts and overarching net -works are revealed. These net-works have great analytical potential, as they comprise the hubs, cores and peripheries of how topics are associated in practice, revealing which topics are at the (literal) centre of people’s language. While this method forgoes direct links between actors (social groups, communities), co-word analysis aids to reveal how sometimes people associate topic x with topic y while other times they do not, revealing which connections can considered weakest/strongest in their discourse—in case of this research, data protection, smart mobility and any topic in between.

Thesis overview

To trace the data protection programs of smart mobility developers and their (stable) languages, this thesis is structured as follows. Chapter O N E investigates current and prior data protection research,

out-lining advances and shortcomings. By comparing research rationales, it is argued that instead of focus-sing on economic (risks, technical measures), social (privacy transformations) or normative contexts (formal rules), more effort needs to be put towards the discursive aspects of data and protection: the ways data protection is (and has been) rationalized by actors. Instead of focussing on technological im -plementations, social transformations or normative pursuits, this shifts to the discursive intersections and gaps between them, trying to bridge different mindsets by means of language. This moves data pro -tection research away from strict technological approaches to data pro-tection and to the strategies and rationales observable in the language of actors. For this, Latour’s notions of ‘programs’ and ‘stabilisation’ are introduced, providing the conceptual backing for tracing actor languages.

Chapter T W O builds on this premise and formalises a new set of root and target programs: a set of data protection strategies targetting currently rooted practices. This model enables actor languages to be sys-tematically compared, comprising a wide-range of activities and ideals developers might be involved in. Essentially, these programs provide eighteen simplified data collection/protection strategies useful for tracing actor alignments, combining the works of Spiekerman & Crannor, Solove and Hoepman to cover a wide range of data related activities developers might partake in. To demonstrate program efficacy, these programs are used to categorise the language of three smart mobility developers, for which eight large-scale blog corpora are constructed and categorised (In addition, reasons for choosing Alphabet, Tesla and Uber are discussed). The amount of attention developers give to each identified root and tar-get programs is quantified through categorising keyword clusters via a novel interface, resulting in a fine-grained view of smart mobility discourse. Lastly, statistical summaries are simplified by considering how actors might voice their writings: a crowd of voices sounding in key or out of tune. This formalises Latour’s notion of stabilisation into a statistical model, where ‘languages’ stabilise from actors discussing (‘aligning with’) programs in a balanced fashion across different domains.

(15)

ChaptersT H R E E, F O U R and F I V E discuss program observations, charting the data protection languages of smart mobility developers. These chapters are structured according to the ‘data life-cycle’ as indicated by Spiekerman & Crannor, starting with its collection, followed by its processing and ending with its storage and maintenance. For each stage, related root and target programs are uncovered, comparing language differences and overlaps between developers through alignment charts. For this, an in-depth demonstration is provided in 3.1 Chapter S I X concludes with program balances, revealing which protec-tion/collection programs are key to smart mobility developers and whether a data protection language might have stabilised in their discourse. It concludes with future research directions.

General limitations

There are many pitfalls to language research, not the least due to how languages are spoken and under -stood differently. How people interpret, translate and engage with languages can differ greatly, often making corpus based analyses quite a subjective affair (Bednarek 21). Should this exclude language re-search from grounding technological developments? Quite the contrary: when technologies are increas-ingly concealing their inner workings (i.e. how data is collected, processed, stored), the only thing we are left with are the languages surrounding their use (‘opening the black box’; Latour; Winner). As interpret-ations vary from person to person, one may then wonder why it is useful to use statistics to ground such a reading: for one, because reading, comparing and categorizing a representative language sample of current discourse by hand is not something one researcher can aspire to accomplish, and two, because it enables to break down a complex and multivariate issue (data protection) into smaller analytic units that can be more easily interpreted and compared. In addition, the resulting research data can be easily shared to others (all annotated corpora can be easily reviewed), enabling some form of reproducibility as results can be systemically reviewed by third parties. In any case, here the use of statistics is not to de -termine a final, ground-truth of what smart mobility developers deem key, but to rather provide a struc-tured reading of actor statements while measuring language difference in terms of keyness.

Two larger issue persist in the assumption that ‘what is said’ offers sufficient evidence for what actors mean and that discourse offers accurate representations of technological functioning. Both are problem-atic, as often one has to read ‘between the lines’ to pinpoint actor views and assumptions (‘what is ab-sent may tell more than what is preab-sent’) just as how actors speak of things seldom has a one-to-one relation with how technologies operate (some aspects may simply be not worth discussing or might be ex -plicitly left out). Apart from crawling under the skin of developers or deconstructing their technologies, these issues are not easily resolved. However, one cannot ignore that language that is ‘uttered and out there’ may provide clues on how developers think of/use their technologies (in case of this research, de -velopers views expressed through various blogging channels), just as their blogs may be the closest thing available to what happens ‘behind the screen’ besides direct interviews or surveys.

However, the current focus on English speaking developers also omits local variations of how is spoken of data protection/collection around the globe, limiting the generalizability of observations (a ‘data Es-peranto’ remains beyond the scope of this research). Other problems with Latour’s actor network theory and corpus based methods are discussed more in-depth in their respective chapters, mostly coming down the issue of ‘where’ meaning is located (In texts? In the minds of readers? Somewhere in between?). While these questions are slightly touched upon, this research does not intend to explore these questions in full, rather assuming that how is spoken of things can be used for grounding a shared understanding of a technology (for a more in-depth analysis of ‘technological narratives and framing’, see Coeckelbergh). I thus do not intend this analysis to be final, but rather a stepping stone in diagnosing data protection concerns by means of ANT. Mirroring the words of Anne Mol:

(16)

The strength of ANT is not that it is solid, but rather that it is adaptable. The terms and texts that circulate in ANT are co-ordination devices. They move topics and concerns from one context to another. They sharpen the sensitivity of their readers, attuning them/us to what is going on and to what changes, here, there, elsewhere. They care, they tinker. They shift and add perspectives (265).

If anything, ANT’s potency lies in coordinating topics and and concerns across domains, shifting and moving between sites of interest to explore new practices and perspectives. It strives to map relations that are simultaneously technical and social, just as applicable to mapping agenda’s as to providing insights into (and even tinker with) actor programs. It offers to engage with issues of privacy, security, sur veillance not only in strict technical, social or normative terms, but as phenomena embedded and con -tested at various sites of techno-social activity. Or as John Law puts it: ANT tells stories about how

(17)

chapter one

data protection research

In the literature, different strands of data protection research have been identified, mapped extensively by the likes of Philip Agre, Roger Clarke, Paul Dourish and recently by Daniel Solove, Robert Kitchin and Bert-Jan Koops. It is fruitful to return to some of their main lines of inquiry, enabling to ground some key terms while highlighting shortcomings. For this, I build on the work of Dourish and Anderson, who identify four main approaches to data protection, namely economy, sociality, normativity and discourse.3

As the authors demonstrate, each approachis distinguishable in terms of episteme, principles and prac-tices, objects of study and most of all, language, making their framework a promising stepping stone for language inquiries. They note: “Language does not simply describe the world, but is also part of the pro -cess of constituting and shaping the world that we experience. The issue here is to understand how the notion of privacy and security are used to categorize activities, events, and settings, separating accept-able actions from unacceptaccept-able ones” (328). In their view, data protection efforts can then be understood by looking at the ways actors have framed data collection/protection and how related activities have been rationalized (329). In this regard, three main research strands can be identified which I deem

as-protection (economic approaches §2.1), sans-protection (social approaches §2.2) and data-ought-protection (normative approaches §2.3), with conjunctions reflecting how researchers pursue data

protection (Table 1). Where each offers certain ways of thinking about data protection, I argue that in-stead of focussing on economic, social or normative pursuits, more effort needs to be put towards invest-igating matters of data-and-protection: a research direction focussing on the discursive and relational contexts in which data and its protection are embedded [2.4]. For this, it is argued that Latour’s actor network theory provides a suitable starting point [2.5].

1.1 DATA-AS-PROTECTION

Data protection research has many actors adopt (either directly or indirectly) an approach to data that is broadly informed by matters of economy. In this sense ‘economy’ is to be understood en principe rather than by its purely financial connotation, positioning data protection in terms of risks, threats and

re-wards and costs-benefits (Dourish and Anderson 325-326). From this perspective, data might be used as a means for securing systems; in other words, a way of using data-as-protection. This has seen personal

data being conceptualized as “an individual, commodified good that can be traded for other market 3 The authors use a slightly different terminology, namely ‘economic rationality’, ‘practical action’ and ‘discursive practice’ (320).

TA B L E 1

A general overview of data protection research mapping based on Dourish & Anderson’s taxonomy (325-329)

Pursuit Principles Objects of study Typical projects

Data-as-protection Economy, costs, risks, rewards Artefacts, techniques, systems Security, optimisations

Data-sans-protection Sociality, agency, privacy, freedom Individuals, identity, communities Social transformations

Data-ought-protection Normativity, ethics, power, rule Policies, governance, regulations Governing devices

(18)

goods”, inflecting that revealing one’s private life is a conditional necessity for accessing ‘free’ products or services (Hull 89). Debates on ‘implicit consent’ for instance, largely focus on the balance between revealing oneself and what one might expect in return, targetting developers that regard consent as ‘impli -citly granted’ when using a product or service (Barocas and Nissenbaum). More broadly, economic ra-tionales approach data protection in terms of increasing security rather than preserving privacy, the former focussing on securing technological features and the latter somehow resulting from optimally configuring the former (Dourish and Anderson 322). Here, security is often conceptualised as the state of being free from danger, which might be ensured by designing appropriate technical measures (322-323). Although technological prerequisites (e.g. data encryption/anonymisation) are certainly required for se-curing data against unintended uses,4_{a sole focus on security measures forgoes the social waywardness}

these measures cannot account for. This is exemplified by the so-called privacy paradox: people’s higher willingness to disclose personal information when given more control over their personal data.5

Regarding SMT’s and data protection much effort has come from the engineering community, often res-ulting in comprehensive protocols and ‘privacy enhancing technologies’ (PET’s) that promise to ensure safer communications between vehicles (Heurix et al.), provide security against road-side hackers (Hubeaux et al.), nudge users to make more informed privacy decisions (Acquisti), relay consent trough con -nected beacons when in transit (Maglaras et al.) or allowing users to set their trade-off between location accuracy and service quality (Gruteser & Grunwald). Despite such advances are security issues far from solved, mainly arising from the plethora of different standards and practices between engineers, city planners, programmers and legislators, as smart technologies can lead to many interdependent and fail-ure prone systems where security issues can easily cascade into the next (Kitchin & Dodge

Transduc-tion). An often-cited approach for ensuring personal privacy while mitigating security issues comes from

Spiekermann & Cranor, introducing a three-layered approach to securing data processing activities [Table 2]. By dividing the process into three stages of data collection, processing and storage, the authors present a framework where data-related activities are plotted against identifiability, proposing various strategies for minimizing personal identification. On this gradient, the authors place privacy-by-policy on one end of the spectrum (written agreements, consent measures) while privacy-by-design (technical measures) on the other, arguing that identifiability is minimised when either end of the spectrum is sat -isfied in company policy or technological design (75). “Because privacy is technically enforced in such systems, little reasonable threat remains to a user’s privacy,” the authors rmark (79).

However, enforcing such technical measures then often rests on optimally balancing efficiency, security and privacy in a zero-sum or game-based approach, where increases in functionality are weighted against losses in privacy, often approaching security and privacy as stable ideals (Dourish & Anderson 325-327). While intuitively appealing for matters of safety and efficiency, skewing towards costs and risks forgoes how individuals may have different attitudes towards risks—if perceived at all. Just as problematic is that focusing on technological security forgoes the simple fact that systems can be (and 4 ‘Without security, there can be no guarantee that the data is reliably protected’ (Leenes et al. 144).

5 Consent measures are perceived as an act of trustworthiness by users (Brandimarte, Acquisti & Loewestein). TA B L E 2

Spiekermann & Cranor’s stages of identifiability (75)

Stage Identifiability Type of measure Techniques

Data collection Full Policy-based User consents to collection of unique identifiers

Data processing Medium Combination De-linking unique identifiers from user data

(19)

often are) misused, and that even the simplest privacy preserving technologies often require more time than most people are willing or able to give (Hull). Furthermore, technical anonymisation measures can be easily reversed, especially people’s movement data (Narayanan & Shmatikov; Ohm; Yakowitz). Ap-proaching data-as-protection may then perpetuate ideals of technological security offering optimal solu-tions to largely social problems, which are key to the next research pursuit.

1.2 DATA-SANS-PROTECTION

This second strand of research sees scholars and practitioners turning away from conceptualizing data as a means for securing systems towards investigating data-sans-protection: the social issues arising when data are inadequately protected. The proliferation of contemporary data collection technologies has lead many scholars to investigate the social transformations arising from their use, turning to agency rather than costs, identity rather risks, and privacy rather than security. This questions the supposed ‘neutrality’ of technical measures and technologies at large, approaching technological design as inher-ently shaped by subjective decisions that can profoundly affect social life. For instance, automatically sorting people on basis of their driving behaviour when determining insurance rates (Graham) or using ‘Big Data’ techniques to predict people’s behaviour (Dalton & Tatcher). “This is a significant departure from technical models that suggest that your security or privacy needs can be ‘set up’ through a control panel and then left alone”, Dourish and Anderson observe, suggesting that privacy issues are ongoing ‘ac-complishments’ constantly produced and reproduced (328).

However, as information technologies have become increasingly part of people’s lives, past decades have seen large shifts in how individuals can accomplish their privacy needs, some going as far as declaring privacy a dead affair (Rambam). Yet, declaring the death of privacy may be counterproductive to the cause, as to maintain functioning societies some private boundaries are required—a private (head)space for entertaining one’s believes and question those of others (Reiman). Balancing what to ‘hide and share’ has made some theorize privacy as a gradient of anonymous to fully-known, informing one’s actions by the degree others can inquire one’s personal life (Kitchin Getting Smart 25; Reiman in Nissenbaum 70; Solove Taxonomy 480). Privacy in this sense, a state of revealing/concealing oneself against the scrutiny of others (Nissenbaum 115). Reiman argues that sustaining this gradient is essential for maintaining democratic societies, as constant observance limits the space for roaming and acting freely—at odds with liberal ideals as individual decision making and the freedom of movement (Reiman 42).

Regarding SMT’s and data protection, scholars have put significant efforts towards tracing the trans-formative capacities of smart technologies, from wide-scale social surveys on how users perceive the surveillance capacities of smart vehicles (Bloom et al.) to technographies of how people use and experi -ence new mobility technologies (Kien), and from theorizing the car as an extended private sphere away from home (Hay & Packer) to producing extensive taxonomies of privacy configurations in smart cities and beyond (Van Zoonen; Solove). Solove for instance, intricately maps how personal privacy might be ‘violated’ by contemporary information collection procedures, providing extensive examples and de-scriptions of the ‘privacy harms’ posed by modern society [Table 3]. Mirroring the three-pronged approach by Spiekerman & Cranor, Solove presents an indepth look at how different techniques affect pri -vacy, moving away from focussing on what privacy is towards “the specific activities that pose privacy problems” (Taxonomy 482). Instead of dwelling on abstract conceptions of privacy, Solove argues that the “activities that affect privacy are not necessarily socially undesirable or worthy of sanction or pro-hibition … in many instances, there is no clear-cut wrongdoer, no indisputable villain whose activities lack social value” (559). He suggests that by cataloguing these activities privacy problems might be mit-igated, minimising ‘conceptual confusions’ between stakeholders on reasons for its protection (558).

(20)

Although Solove sees that not only technologies but also the ways people use technologies introduce pri -vacy problems (560), social inquiries often im/explicitly regard pri-vacy harms resultant from how tech-nologies are designed—rather than entertaining the possibility of techtech-nologies revealing societal prob-lems in situ,as painful as these may be (cf Rieder). In Privacy in Context for instance, Helen Nissenbaum argues that “it is not enough for proponents to point to the connections between privacy and values like autonomy, freedom, and democracy … they must be able to address conflicts between privacy and com -peting values served by the offending technologies,” (Nissenbaum 112). This fixation on ‘technological offences’ reveals a problem with many social inquiries, often taking new technologies as forces upsetting social life. Rieder expounds on this, arguing that next to introducing machine-biases, statistical err or epistemological inadequacies (51), automated data collection could rather be understood as a reaction to the information overload of contemporary society. He argues that from a developers perspective, “sys-tems are not conceptualized as convenient access points to stable trees of knowledge, but as engines of

order, capable of projecting latent structures present in data” (12). These ‘ordering engines’ may merely

reveal what’s there—in the end it might society that is biased, not our data. Merely highlighting biases inherent to personal data collection may then not be enough for bringing about change in a society where data collection takes centre stage, requiring new norms and acceptable practices to be negotiated and disseminated, which are central to next research pursuit.

1.3 DATA-OUGHT-PROTECTION

Where economic and social approaches focus on technological/social transformations, central to norm-ative pursuits are negotiating the norms through which data ought to be protected. Central to this pur-suits are issues of power and rule, often approaching privacy issues as “the costs of state interference […] measured against the consequences of failing to intervene” (Post 2096). Certainly rooted in economic ra-tionales, normative approaches differ from economic approaches in the sense that they aim to delineate ‘right’ from ‘wrong’ practices, with costs and benefits supplemented by moral and ethical concerns and a hefty focus on shoulds (Dourish & Anderson 329; cf Alpert, Post, Leenes et al. Jones). Typically targetting the surveillance practices of businesses and institutions, it focusses on how institution are able too with -draw themselves from daily life while observing and controlling people through automated means (Barry et al.). Here surveillance is often conceptualised as the monitoring of everyday life for the purpose of managing people and their behaviours (Lyon Surveillance Studies), nowadays often extended with the

capture model that describes how technologies are not only monitoring but increasingly structuring how TA B L E 3

Solove´s taxonomy of privacy harms, adopted from Kitchin (Getting Smart: 27)

Stage Privacy harm Description

C O L L E C T I O N Surveillance Watching, listening to, or recording of an individual’s activities

Interrogation Various forms of questioning or probing for information

P R O C E S S I N G Aggregation The combination of various pieces of data about a person

Identification Linking information to particular individuals

Insecurity Lacking data security measures

Secondary use Re-using data without individual permission

Exclusion Excluding individuals from interfering in data handling process

M A I N T E N A N C E Breaches Breaking a promise to keep a person’s information confidential

Disclosure Revelation of sensitive information about a person

Exposure Revealing another’s nudity, grief, or bodily functions

Accessibility Amplifying the accessibility of information

Blackmail Threat to disclose personal information

Appropriation The use of the people’s identity to serve the interests of another

Distortion Dissemination of false or misleading information about individuals

(21)

activities come to unfold (for example, facial recognition systems to unlock devices and services and per-sonalised smart cards to access public transport; Agre; Dodge & Kitchin Automatic Management).6

Regarding SMT’s and data protection, much attention has been given to establishing new governing devices and regulatory frameworks, from ‘pre- and sticky’ consent measures that let users relay their privacy preferences across smart cities (Edwards) to co-ordinating regulatory bodies to adopt privacy strategies attuned to connected vehicles (Jones), and from harmonizing international perspectives on vehicular privacy (Colbert) to developing comprehensive ‘privacy design patterns’ that bridge regulators and engineers (Hoepman). Hoepman for instance, builds on the work of Spiekerman, Cranor and Solove to derive eight generic data protection patterns, simply termed ‘minimise’, ‘hide’, ‘separate’, ‘aggregate’ and so on (Table 4). Rather than focussing on specific technical implementations (Spiekerman & Cranor) or privacy violations (Solove), these strategies group various information collection/processing activit-ies into broad categoractivit-ies from which designers can choose.7_{‘Hide’ for example, might be achieved using}

“data encryption, onion routing or anonymous credentials”; techniques appropriate under different cir-cumstances (Hoepman in Koops & Leenes 167). Point being, developers can choose the most adequate procedures to the task at hand rather than a law dictating to implement technical measure x for data pro-tection problem y. The role of the regulator then becomes one of relaying ideals and aligning mindsets, rather than imposing ones from above.

Others have investigated how governing instruments operate in practice. White & Case point to how data has been defined as either being personal or non-personal: “This [distinction] is critical because EU data protection law only applies to personal data. Information that does not fall within the definition of per-sonal data is not subject to EU data protection law” (White & Case 1). This is problematic as in current form data protection only applies to data that is personally identifiable, meaning that stripping data from identifiers (‘pseudonyimisation’) enables collectors to “justify processing that would otherwise be deemed incompatible with the purposes for which the data were originally collected” (1). Koops ob-serves how pseudonymisation still allows individuals “to be recognised as a previously known, without (necessarily) being able to associate identifiers with named individuals,” allowing similar levels of pro-cessing as before—the only difference being that no explicit names are attached (Koops 10). Regarding smart vehicles such definitions become even more complex: “Data produced by the sensors on a vehicle may be classified as non-personal data or machine-generated data, insofar as they do not relate to an identified or identifiable individual which are not covered by data protection legislation” (EC 11).

6 “Cars are now being designed in a way that activities which external systems sought to self-discipline (by fear of being caught by the police) are reshaped by the car itself – such as the code in the vehicle’s ECU will not start the engine unless it senses that the driver’s seat belt is clicked in place” (Dodge & Kitchin 270).

7 “The specific activities identified by Solove are too fine grained. Although they may be interesting from a legal perspective, many of them involve basically the same methods at a technical level.” (Hoepman 4)

TA B L E 4

Hoepman’s data protection strategies (2012: 5-8)

Strategy Example technique Description

Minimise Selective collection The amount of personal information that is processed should be minimal

Hide Encryption and delinking Any processed personal information should be hidden from plain view

Separate Decentralisation Information processing should be done in a distributed fashion

Aggregate Dynamic granularity Personal information should be processed with the least possible detail

Inform Privacy dashboards Individuals should be informed whenever their information is processed

Control Dynamic consent settings Individuals should have control over how their information is processed

Enforce Rights management Enforcing privacy policies compatible with legal requirements

Demonstrate Data auditing Demonstrate compliance with legal requirements

(22)

Where currently, the GDPR draws a hard line on what does and does not constitute personal data (and by extension what types of data are/are not protected), in practice personal data is rather scaled between different degrees of identification, limiting the scope governing instruments that rely on strict definitions. Koops for instance, draws a parallel to light: “Just as light sometimes acts as a particle and sometimes as a wave, data sometimes act as personal at other times as non-personal, and we simply cannot always predict which of the two occurs” (13). Unfortunately, this fluidity is often lost when formal ised in the text of law, making it no surprise that strict regulatory approaches have failed to establish ap -propriate data protection conditions; especially when most regulatory devices only attain legal force

after data has already been collected (‘after the damage has been done’, cf Bennet and Raab; Birnhack et

al.; Weis & Archick; Moerel; Privacy International). If anything can one question the feasibility of regulat-ing a technological environment that is based on acquirregulat-ing more data, not less. Even the soundest regu-latory frameworks may be toppled by the shear amount of data collection technologies society has come to rely on, just as the rule of law might be to slow to tackle newly developing technologies. What then is to be done?

1.4 DATA-AND-PROTECTION

It seems current debates are often missing a way of bridging their heterogeneous pursuits, which at present either focusses too much on strict technical/economic solutions, lays blame on inadequate or biased designs or puts faith in normative rules not easily adaptable to shifting socio-technical needs (Koops 12-14). A promising approach is then shifting from separate data protection contexts to the dis-cursive intersections between them. Dourish and Anderson note: “Approaching [data protection] as a discursive phenomenon places primary emphasis and broader social and cultural logics” (329), suggest-ing data related issues not to be analysed in sole economic, social or normative terms but rather to be considered in tandem (335). When investigating matters of data and protection, attention then needs to be given to the contexts in which both are embedded, be it technical (Cranor & Spiekerman), social (So-love) or normative (Hoepman) in scope. Where these authors have intricately traced data protection strategies within certain (research) contexts, combining their works then promises to build on their strengths while supplementing blind spots. Together, their works comprise a wide array of data related activities, enabling to move beyond abstract conceptions of security, privacy and surveillance and to the ways actors perceive data protection in practice, be it technical, social or normative in scope. In other words, how data protection is perceived in practice.

This shift draws attention to data’s discursive aspects: the elements actors include/exclude when ration -alising data-related activities (cf Jorgenson & Philips 27). Interestingly, in the research literature these discursive aspects are severely underexposed. In a corpus of 4910 smart mobility related articles, just 29 mention discourse related keywords (0.29%).8_{However, recent work by Birnhack and colleagues traces}

the mindset of regulators/developers regarding privacy enhancing technologies (2014). To gain insight into different rationales, they present an in-depth reading of the text of law (law’s technological mindset) and canonical privacy engineering texts (technology’s privacy/security mindset) to reveal where regulat-ors and engineers misalign. Here the authregulat-ors use mindset to refer to “the overall doctrine that emerges from the texts, which has its own objectives, language, and characteristics” (40), suggesting that different mindsets can be bridged by “reverse engineering each field to expose underlying assumptions” (68). In any case, the authors demonstrate that constricting one’s view to either mindset may be too limited when trying to understand issues of data protection at large, as it ignores how data collection activities can transform society far beyond issues of security and privacy—think worker automation, nudging people or excluding individuals from accessing products and services (Kitchin & Dodge Automated). 8 Derived from Scopus using the keywords ‘smart mobility|vehicle’ / ‘data|protection’ and counting articles containing ‘discourse’.

(23)

Focussing on what data is rather than what data does then misses how even when protected, data can still affect society in material ways. In The Big Data Revolution Rob Kitchin explores this ‘performativity’: From data’s Latin root dare (to give) Kitchin observes that data has often conceptualized as a neutral car-rier of observed truth, implying ‘something given by nature’ (2). However, more often than not data is not given by but rather taken from others, with capta (‘that what has been taken’) better describing its ontological qualities (2). This shift from giving to taking lends a definition of data that comprises “all in-formation that is taken from all that could potentially be given” (2-3). In effect, Kitchin’s move entails a research direction concerned with every instance, setting or activity where data is collected: not just pri-vacy-contexts, not just surveillance-contexts but any context transformed by capturing data (25: data

as-semblages). Kitchin then proposes a research direction that concerns itself with “all technological,

polit-ical, social and economic apparatuses and elements that constitutes and frames the generation, circula-tion and deployment of data” (Kitchin & Lauriault 1). By ‘unpacking’ these assemblages through ethno-graphies, interviews or readings one can gain then insights into the heterogeneous activities and dis-courses in which data are embedded (under the name of critical data studies, 14).

1.5 TURNING TO L ATOUR

This once again brings issues of language and technology to the fore, asking to look for who speaks on

be-half of data and what data does do people’s ways of speaking (Dalton & Thatcher pt. 2). This asks for a

mode of inquiry that considers both language and technology as features inherent to technological developments, for which Bruno Latour’s actor network theory (ANT) offers a comprehensive approach—spe -cifically its concepts of programs and stabilisation. Not so much a theory as a ‘methodological toolbox’, ANT puts forward a way of thinking about the visible and invisible networks binding actors and things (Latour Durable). By assigning agency to non-human actors (‘technologies’), Latour wishes to break down the social/material divide often observed in accounts on technology:

“Society and technology are not two ontologically distinct entities but more like phases of the same essential action ... If we abandon the divide between material infrastructure on the one hand and social superstructure on the other, a much larger dose of relativism is possible. When actors and points of view are aligned we enter a stable definition of society; when actors are unstable and the observers’ points of view shift endlessly, we are entering a highly unstable and negoti -ated situation” (129; Latour’s use of ‘domination’ is omitted).

By assigning agency to both humans and artefacts, Latour proposes a networked view of society that sta-bilizes/destabilizes from the degree of alignment actors and artefacts might have (122). From this per-spective, successful technologies are the stable hubs where viewpoints have been aligned and controver-sies are accounted for: “It is as if we might call technology the moment when social assemblages gain sta-bility by aligning actors” (129). Although simple in premise, ANT yields great analytical power, as it moves away from sole technological or social pursuits to the very networks in which these are embed-ded. Riis elaborates on this premise, using Latour’s metaphor the speed-bump to illustrate how ‘lan-guages’ are spoken between actors and things:

Speed bumps speak their own unmistakably concrete-language … Instead of having a police officer standing by, day and night, to control speed, traffic engineers delegate this work to speed bumps. The agency of the speed bump can be un-derstood as a complex of a number of agents ranging from police officers, engineers, politicians and construction work-ers to different sorts of materials taken from various places and times. The speed bump is a ‘technical delegate’ that re-distributes the absence and presence of various agents and interferes directly with the daily life of urban car-humans (Riis 287-288).

(24)

This example demonstrates how technologies come to function in societies by means of ‘speaking’, com-prising the ‘governing languages’ of police offers/politicians, the ‘construction languages’ of engineers & road-workers and the ‘driving languages’ of cars & chauffeurs to generate a new ‘concrete language’ of the speed-bump, embedding different ideals into a material artefact. While surely, this idea of ‘talking speed-bumps’ seems odd at first, over time everyone comes to understand its vocabulary: mind your

speed or damage your vehicle. These vocabularies can be said to become more ‘firm’ or ‘stable’ the more

they are connected to by different actors, a process that has actors pursuing different programs (‘do not speed’) or anti-programs (‘speeding’)—the strategies put forward to realise certain goals (Schulz-Scheaf-fer 131). These techno-languages are said to stabilize when a large range of programs are accounted for, be it spoken/written statements (STOP signs, police tickets, a juror’s sentence), objects (thicker con -crete, speeding camera’s) or institutes (driver’s licenses, DMV), either desired or undesired based on one’s perspective (Latour, Bijker & Law 261). That is not to say that stability is easily reached (if ever) but points to the urgency of investigating current ‘vocabularies’ as once stabilized these may become in -creasingly hard to transform (Latour Durable 105-106). Think how the ‘concrete language’ of the speed-bump can only be directly influenced by a small set of actors and will certainly not prevent all drivers from speeding; driver inattention, vehicle suspension or the existence of alternative routes act against a speed-bump’s desired outcome, requiring languages to be constantly renegotiated.

Latour’s concept of program’s and stabilisation then enables to see that data protection might only ma-terialise when linked to in practice: the more actors and its programs interlink, the more stable a lan-guage might becomes. Lanlan-guage in this sense, can thus be understood as both technological practices and ways of speaking, comprising both the enunciations of actors and the range of activities that techno-logies enable (Coeckelbergh ch. 6: “It suggest that language is a technology; we use tools and words, sentences, etc. as much as we are used by them”). From this perspective, languages take on similar charac -teristics as programming languages, comprising multiple subprograms that need to work together to realize functioning systems; just as words within sentences can be combined to relay different concept and ideas. Where Spiekerman & Crannor, Solove and Hoepman have outlined various data protection strategies [1.1-1.4], from a Latourian perspective these might then best be understood as a set of distinct programs actors might want to pursue, each characterised by certain collection/protection activities and specific ways of speaking (‘risks vs security’, ‘privacy violations’, ‘shoulds,’ and so on). Although ANT cov-ers a lot more ground, this broad conception of ‘language’ already yields great analytical power, enabling technologies and their surrounding languages to be traced in terms of the programs actors align with. In fact, this is exactly what ‘tracing’ entails: a mode of inquiry that ‘retraces’ human and technological strategies by decomposing the networks in which they are embedded.9_{While this manner of research}

has been rightly criticised (most notably its ignorance of processes happening outside actor networks and its egalisation of humans and artefacts; assigning ‘desires’, ‘interests’ and ‘strategies’ to technologies is not the least dismissive of the human hardships and struggles defining and embattling the systems ex -erting control over their lives; Müller 31; Whittle & Spice; Winner), I will not use Latour’s concepts as de-terminant of technological developments but rather as coarse indicators of how stable (or unstable) a current data protection languages might be. To explore this premise, the next chapter formalises some of Latour’s notions, resulting in a framework for tracing actor programs and their languages.

9 Latour: “There is no difficulty in seeing that [ANT) is not about traced networks by about a network-tracing activity. As I said above there is not a net and an actor laying down the net, but there is an actor whose definition of the world outlines, traces, delineate, limn, describe, shadow forth, inscroll, file, list, record, mark, or tag a trajectory that is called a network. No net exists inde -pendently of the very act of tracing it, and no tracing is done by an actor exterior to the net” (Clarifications 14).

(25)

chapter two

tracing discourse

This chapter describes methods for tracing the languages of actors. From a Latourian viewpoint, this asks to describe which programs actors might pursue, who the pursuing actors are and how their pro-grams might be visible in their language (Law; Coeckelbergh). For this, the research question’s wording yields some clues, specifically its usage of key.10_{Rather than posing this question from the perspective of}

an existing language (‘Does a data protection language exist?’) or honing in on one actor group (‘How does Alphabet discuss data protection?’) the usage of key is highly specific. By using this term I approach discourse as a composition of actor voices: in key at times and dissonant at others. Conceptualising dis-course this way has two advantages. First, it stays close its chief definition (“a continuous stretch of lan-guage”; Haase) and its humanist interpretation (“a particular way of talking about and understanding the world”; Jørgensen & Philips 1), positioning discourse as a composition of voices speaking of things in particular balanced/unbalanced ways. Second, it simplifies (statistical) summarises, describing dis-course in terms of notability: Which voices sound loudest? Which are lost in the cacophony? Are they bal-anced or out of tune? For this metaphors as volume and balance are introduced to specify Latour’s notion of programs/stabilisation to language inquiries, to which I will return in a moment.

Using this terminology also marks a decided shift from ‘traditional’ actor-network analyses. Where for Latour, discourse is but one level to analyse actors (“Latour would analyse relationships between media, scientists, politicians, technologies, and the changes in nature itself”; Bryant) I take on a smaller task, limiting myself to actor voices instead.11_{Taking this further: if we take discourse to be a composition of}

actor voices, its decomposition aids to reveal what these voices speak of (To scourge Latour’s

Com-positonist Manifesto, “What is to be composed may, at any point, be decomposed” 474). Asking whether

data protection is key to smart mobility discourse then asks for three elements to be decomposed: cur-rent data protection programs, smart mobility discourse and ways of gauging keyness.12_{The rest of this}

chapter is structured according to these objectives. First, current data protection programs are identified [2.1]. Second, language corpora are constructed, using the ‘web as a corpus’ to represent a slice of dis -course [2.2]. Third, a method for tracing the alignments and stable languages of developer texts is pro-posed, merging quantitative (co-word analysis) and qualitative methods (KWIC-coding) to identify which programs can be considered key to the language of developers [2.3].

2.1 IDENTIFY ING PROGRAMS

The first part of the question asks to identify current data protection programs, providing the strategies (or simply, ‘categories’) that will be assessed in the language of actors. This is no trivial task, as the regu-latory ideals of the GDPR might be regarded as anti-programs to the pursuits of data collectors just as de-velopers’ data collection efforts can be regarded as anti-programs to the privacy desires of users. This perceptual ambiguity may be due to Latour’s distinctive use of ‘anti’, often connoting a ‘negative or op-10 How key is data protection to smart mobility discourse?

11 ‘Following the text’ rather than ‘following the actor’ (see 2.2).