• No results found

Privacy Protection and the Digital Trail

N/A
N/A
Protected

Academic year: 2021

Share "Privacy Protection and the Digital Trail"

Copied!
57
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Privacy Protection and the Digital Trail

A Critical Assessment of the Idea of Personal Information and the Control Paradigm for the Protection of Privacy

Mark W. van Dorp - 10090436 Master Philosophy 2015 University of Amsterdam

Supervisor: prof. dr. B. Roessler Second Reader: dr. G. van Donselaar

(2)

“Yeah, yeah, but your scientists were so preoccupied with whether or not they could that they didn't stop to think if they should.”

(3)

Acknowledgments

If ever there was a time I have felt the importance of some measure of privacy it has been during the process of writing this thesis. The opportunity to be alone with one’s thoughts I have found to be of fundamental importance in the process of philosophical deliberation. An opportunity I perhaps would have liked to have a bit more of over the last couple of months. Fortunately there is much more to writing a thesis than solitary deliberation. Perhaps even more important is the chance to debate your argument and have your thoughts challenged. I would like to thank my supervisor, Beate Roessler, for doing just that. Our vivid and informative meetings helped me to organize my thought and structure my argument. I would also like to thank my second reader, Gijs van Donselaar, who for the second time during my academic career at the UvA has been so kind to take the time to read my contemplations. A special thank you to my colleagues, friends and family, who put up with me during this time, both in discussions as well as absence. And last but not least to my girlfriend for her patience and support. I hope our little girl will inherit your cheerful demeanor.

(4)

Table of Contents

Introduction ... 5

Chapter 1. Not All Information is Created Equally ... 8

Personal Information and Identifiability ... 9

Challenging Identifiability: The Digital Trail ... 10

The Data – Information – Knowledge Triad ... 17

Big Data, Data Mining and the Digital Trail ... 18

Conclusion ... 21

Chapter 2. How We Currently Protect Our Online Privacy ... 22

Online Privacy Strategies ... 22

The Control Paradigm ... 25

The Control Paradigm and the Data Trail ... 28

Conclusion ... 33

Chapter 3. Two Alternative Approaches to the Protection of Privacy ... 35

Contextual Integrity ... 35

Accountability, Agency and Algorithmists ... 39

Conclusion ... 42

Chapter 4. Protecting Privacy in Light of the Digital Trail ... 44

Accountability ... 44

Method ... 46

Values, Ends and Purposes ... 48

Conclusion ... 52

(5)

Introduction

New technologies appeal to me. Not only do I own an iPhone; I can operate my thermostat when I’m away from home and can have my lights switch on when I arrive. When I have an appointment I let Google Maps tell me how to get there and how long it will take. And when I need to get gas along the way, I like that I don’t have to have cash on me. What I don’t like however is the fact that all these things generate data about me. Not so much because I feel watched, because I don’t, but because I cannot be sure what this data will be used for eventually. Apparently the type of oil I buy isn’t just information about the fact my car needed oil, but it can be interpreted to reveal so much more about me: “[P]eople who bought cheap, generic automotive oil were much more likely to miss a credit-card payment than someone who got the expensive, name-brand stuff. People who bought carbon-monoxide monitors for their homes or those little felt pads that stop chair legs from scratching the floor almost never missed payments. Anyone who purchased a chrome-skull car accessory or a “Mega Thruster Exhaust System” was pretty likely to miss paying his bill eventually.” One of the many cautionary tales as told by one well-known commentator on privacy and big data, Charles Duhigg (2009). It paints a dystopic picture of a world where corporations know more about us than our loved ones, or even we ourselves, do. New technologies have often been cause for privacy concerns. Given the enormous impact digital and online technologies have on our lives it shouldn’t come as a surprise the topic of privacy has earned a firm place as a recurring subject in current public debates. Academically it seems to be a veritable Hydra of a subject; you cut off one head and two seem to grow back. In practice people seem to be a lot less worried however and trade their privacy, without too much thought, for other goods like convenience, safety or economic benefit1. In the meantime the guardians of privacy are increasingly outraged, while their opponents offhandedly discharge privacy as an outdated concept; and get away with it too. All leading to a polarized discourse and mostly resulting in irreconcilable views like the call for strong privacy protection on the one hand2 versus for example a denial of privacy as contemporary social norm on !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

1 “Seem” to be a lot less worried; as a recent report suggests people are more resigned than willing to trade personal information for certain benefits (Turow, Hennesy, Draper, 2015).

(6)

the other3. When we try to resolve these tensions the first issue we face is that a right

to privacy is invoked in very disparate cases. Privacy seems to be a concern when a neighbor peeks through our window, when the police tap our phone, when a database with medical data is hacked, but also when a government interferes with choice of religion; all very distinct cases where different things seem to be at stake; nevertheless all considered under the one umbrella of a “right to privacy”. In this paper I will be limiting myself to one of these issues, particularly the one that deals with personal information about individuals: so called informational privacy.

Many interests seem to collide around the subject of online privacy. While e-commerce, online marketing and the promise of big data produce significant economic and societal benefits; people’s privacy interests seem to be increasingly at stake. This is caused among other things by the fact that the nature of much of the information is relatively new and its impact on privacy unclear. Personal information appears in many ways, shapes and forms, but should all its manifestations be considered private by default? It is intuitively clear my medical or financial information as held by my doctor or my bank are pieces of personal information and in at least a sense private, due to their sensitivity in this case. But what about the information I voluntarily and openly post on a social networking site? Personal yes, but is it still private? It becomes even harder when we consider pieces of information that aren’t personal in the traditional sense of them being intimate or confidential (Wacks, 2010) or identifiable to the person (EU Directive 95/46/EC). The technologies of digital storage of information as well as the extensive interconnectedness through the Internet have created many new ways of gathering different types of data. As the wealth in literature on the subject suggests the ability to combine and interpret all these types of data in novel ways seems to warrant attention in terms of privacy. This paper adds to that discussion and deals with part of the data generated and made possible by these technologies, the data we generate in the wake of using the Internet and digital devices: our “digital trail” or the “data exhaust” (Mayer-Schönberger & Cukier, 2013).

Partly responsible for the difficulties with the protection of our privacy is the conception of what to protect, namely “personal information”. As I will try to show !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

(7)

the idea that it is a “specific something” we can protect seems to have become problematic in certain cases. Especially because the data we store can be used in so many ways (many of which beneficial) and at unspecified times in the future. One of the interesting aspects of the digital trail is the fact that the need for the mediation between harms and benefits is so omnipresent, as big data offers so many (potential) advantages on both a societal as well as an individual level. So while we want to protect our privacy in a meaningful way, we at the same time want to avoid being too restrictive towards the many advantageous uses of data. So both the “what” as well as the “how” will be the main questions I will be preoccupied with in this paper, or: “When considering the digital trail, how can we protect our privacy in a meaningful way without being too restrictive towards its beneficial uses”?

I will start my inquiry by discussing some of the difficulties with the idea of “personal information”. Then I will try to show why the currently widely followed paradigm of privacy as personal control is less than helpful in the case of the digital trail. Once I have done this I will offer an alternative in data-user accountability and investigate what values are invoked through our privacy concerns when dealing with this type of data. The idea being that getting a clearer image of what it is exactly we are protecting will make it easier to inform decisions and protective actions. The ultimate goal being the ability to make informed decisions on how to handle cases where our privacy is at stake, without being too restrictive towards beneficial uses of all the data we dissipate on a daily basis. I will follow the subsequent structure: Chapter one will be about data. I will start by clarifying the current state of affairs, what conception of data/information I find to be problematic and what is new about this situation. In chapter two I will be discussing protection. I aim to show why the control paradigm fails in the case of the digital trail. This will be followed in chapter three by the discussion of two alternative approaches to protection, informed by the work of Helen Nissenbaum and Viktor Mayer-Schönberg & Kenneth Cukier, by way of contextual integrity, delegated control and accountability. I will bring all this together in chapter four by offering a synthesis of these two approaches and underpinning the resulting perspective with prevalent norms and values. By way of this line of reasoning I hope to offer a useful guide for the appropriate and legitimate use of the digital trail and big data’s many benefits, while in the meantime offering meaningful protection of people’s privacy.

(8)

Chapter 1. Not All Information is Created Equally

There is no denying that one of the signs of our times is that we increasingly live our lives online. We communicate, shop, entertain ourselves and get informed in the digital realm. In the process of these activities we continuously communicate a lot of information about ourselves to the world around us. We post photos or messages to social media; we leave our address and financial details with the sites we shop at and we chat about our day-to-day lives with friends and family. But what many do not realize is that by far not all information we communicate we communicate consciously; let alone willingly. While in many aspects of our lives we share information unconsciously, the online domain has given rise to some technology-based systems and practices that fundamentally transform this condition and therefore stimulate public controversy (Nissenbaum, 2010, p.19ff). First of all in the ways the information can be gathered (surreptitious or otherwise): companies like Google continuously monitor and track people and thereby amass enormous amounts of data, most of which people are unaware. Secondly the capabilities of storing and analyzing this information: with storage becoming cheaper every day the gathered data can be stored indefinitely and accessed at any time. And through the enormous computing power companies increasingly have at their disposal the ability to analyze this data is snowballing. It is unclear what, if any, are the limits to this ability, and so the future (or option) value of data has become enormous (Mayer-Schönberger & Cukier, 2013, p. 102). Thirdly and last the increased capacity for publication and dissemination brings endless new varieties of the distribution of information.

What this short overview shows is that the structural differences between the situation in which we unwittingly share information offline and the one where we disseminate data online are becoming vast, as are the differences between the applications of all this data. As is often pointed out the upsurge in the scale of data gathering is causing the state of the data to change, it is said the quantitative change causes a “qualitative change” (Mayer-Schönberger & Cukier, 2013; Nissenbaum, 2010; Millar, 2009). And while authors like Mayer-Schönberger & Cukier assess this situation for its positive sides and mention the “ability of society to harness information in novel ways to produce useful insights or goods and services of significant value” (2013, p. 2) to

(9)

many others the changing role and use of data should be considered a “serious social problem” (Turow, 2001, p. 7).

This first chapter will be about data. Or rather, it will be about data, information and if we should consider them to be the same thing. I will evaluate a commonly held view as expressed in much of the literature and legislation on what type of data should be considered personal and because of this warranting privacy consideration. As a first step I will be reflecting on the fact that for many such views identifiability plays a crucial part in demarcating the personal from the impersonal. I will then introduce a type of data I will call the digital trail, which is a type of data that in at least some ways seems to challenge the logic of identifiable personal information. What makes this type of data notable is that, because amongst other things its seemingly innocent nature, we do not normally consider it to be private. However because of the ever-increasing analyzing capabilities it can yield privacy sensitive insights and therefore perhaps deserve reappraisal. I will clarify this claim by way of the data-information-knowledge distinction and a short assessment of the technology of predictive data mining. I aim to show that because of the technology of predictive data mining people’s privacy can be affected without the source data having to be personal (i.e. identifiable), thereby undermining the concept of personal information as the focal point of protective measures.

Personal Information and Identifiability

Many of the things we do online leave traces, traces that can relate to what we do, where or who we are and even what we want. Not all these traces can be connected to us as individuals, but surprisingly many can. Nowadays companies have a lot of tools at their disposal to know who is at the “other end of the line”. IP addresses, cookies, beacons and digital fingerprints, to name just a few, are all ways to identify a specific user (Turow, 2011, p. 167). When contemplating privacy concerns the primary distinction that is made is which of these traces are supposed to be considered personal and which should not. Whether information is personal, and therefore of interest in privacy terms, is often interpreted in one of two ways. One is substantive and looks at the “sensitivity” of the information, in other words: is the information intimate or confidential (Wacks, 2010, p. 114)? The other way is procedural and concerned with identifiability: can the information be related to an identifiable person

(10)

(EU Directive 95/46/EC). As the substantive account constitutes a subset of the procedural account – for information to be intimate or confidential for a person it has to at least be relatable to her – I will take the procedural account as my starting point. So: for a piece of “trace data” to be considered “personal information” according to this procedural account it should somehow be connected and traceable to a natural person. Identifiability is therefore often seen the defining feature of personal information; as is reflected in the EU Directive on data protection:

"[P]ersonal data" shall mean any information relating to an identified or identifiable natural person (“Data Subject”); an identifiable person is one who can be identified, directly or indirectly, in particular by reference to an identification number or to one or more factors specific to his physical, physiological, mental, economic, cultural or social identity.

(EU Directive 95/46/EC - The Data Protection Directive, Article 2a) Once information is recognized to be of a personal nature it is, or at least should be, protected from certain appropriation and/or use. This protection is typically offered by either limiting the possibility of collecting said data (by way of prohibition) or having people control said collection (by for example notice and/or consent). In most of the currently used schemes for the protection of online data the emphasis is primarily on the latter measure, focused on the moment of collection of said data. I.e. when data about a person is collected she needs to be informed about the purpose of why this data is gathered and she has to consent to its storage. A lot of doubt has been cast on the effectiveness of informed consent as an appropriate protective measure (Millar 2009; Solove, 2013; Zuiderveen Borgesius, 2014; Turow, Hennesy, Draper, 2015) and in chapter two I will go into the aptness of the control paradigm, but I will start out by an analysis of the focus on “personal information” as a distinct entity that should be protected. I cast doubt on the notion of personal information as an adequate concept for the protection of privacy by looking at all the small pieces of information that are not normally considered personal: our digital trail.

Challenging Identifiability: The Digital Trail

What can be wrong with determining which information should be considered private based on the idea of identifiability? A lot of legislation is in place that regulates the

(11)

collection and processing of many types of information, but the question arises whether those policies can – and should – be used to protect the collection and processing of all data in the same way. The main reason for this is that under this notion of privacy all data properly stripped of its identifiers is of no privacy concern. So all data and connected identifiers like IP addresses, cookies and phone numbers are classified as of privacy interest, but data that is properly anonymized is not.

[W]hereas the principles of protection shall not apply to data rendered anonymous in such a way that the data subject is no longer identifiable.

(EU Directive 95/46/EC - The Data Protection Directive, Article 26) So under this scheme a lot of data can be collected legitimately, but as we will see in the coming paragraphs the advances in the technologies and practices, often called big data, challenge the legitimacy and desirability of this situation. What then can be considered to be an adequate measure for the protection of our privacy in light of these new technologies and practices? Why wouldn’t these measures be sufficient? Don’t our current notions of privacy already consider this data to be personal and private? In order to understand the problems we are facing I would like to examine an oft heard assumption that much of the data that is gathered of us is rather innocuous in and of itself and hardly personal. The Interactive Advertising Bureau (IAB), the organization that looks out for the interests of the online advertising industry, for example relies on the concept of identifiability to defend online targeting practices and states the following about the nature of the data used in online advertising:

Generally, the Advertising Industry relies solely on non-personally identifiable information that it collects through a computer’s browsing experience, so they don’t actually know [sic] identity of individual consumers. Many consumers do not realize that the web data collection capabilities are much more limited and less granular in nature than more traditional offline practices. For example, for many years offline companies have been analyzing our shopping behavior across retail through the use of data captured through our credit card spending. Yet, the idea that data is captured online feels more

(12)

personal even though it is not personally identifiable, and instead used for anonymous and browser-based targeting4.

Identifiability seems to be an intuitively sound point of demarcation for what should be included in the personal. It can be stated that when for example the data that a person was at a website at a specific time is stored on a server somewhere, but that data is properly stripped from its identifier, it seems unreasonable to claim someone’s privacy was breached by doing so. Now anonymized data on for example the type of software version or phone people use while browsing the web, what they click on or how they move their mouse is not normally considered personal or private, nor perhaps should it be? And this is exactly what happens at an ever-increasing scale: companies hungrily gather all these “snippets” of data, mostly of this seemingly anonymous, innocuous and initially innocent nature.

One specific manifestation of this type of data is the data generated as a byproduct of our actions online: the data we can call the digital trail, digital footprint or data exhaust.

A term of art has emerged to describe the digital trail that people leave in their wake: “data exhaust.” It refers to data that is shed as a byproduct of people’s actions and movements in the world. For the Internet, it describes users’ online interactions: where they click, how long they look at a page, where the mouse-cursor hovers, what they type, and more.

(Mayer-Schönberger & Cukier, 2013, p. 113) The terms digital trail, digital footprint and data exhaust all can be used interchangeably. The trail or footprint analogies should however not be understood as a reference to a linear, step-by-step process or the idea that “if you trace the trail long enough you end up with the person”. The exhaust comparison does justice to the more molecular and untraceable nature of this type of data, but lacks, at least in my view, on the aesthetical level. What all three terms try to convey is the granularity and ostensible insignificance of this particular sort of data.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

(13)

Now what then is this data we leave in the wake of our digital activities? A good place to start getting a clearer image of what this category of data entails is by looking at what one of the main online players says in this regard: Google. Google is relatively open about what data it collects. In its privacy policy5 Google specifically mentions the different types of data it stores6:

• Device information; information regarding the device you use to go online, including hardware, software and phone number.

• Log information; Details about how you use the Internet like search queries, IP address, cookies, but also call list including time and duration of calls.

• Location information; Information on where you are when you are using Google’s services, like location, GPS information, sensor information and IP address.

• Unique application numbers; Information regarding what applications you use and what version they are.

• Local storage; certain information, including personal information, stored on the device you are using, like browser history and application usage history. • Cookies and anonymous identifiers; various technologies to identify who you

are when you are using Google services or sites that do.

Obviously much of this data is already considered private under current protective schemes, mostly protected by method of notice and consent. This consent a company like Google often obtains by offering free services in return. A trade that may be considered to be illegitimate (Roessler, 2015), but this is part of a debate that lies outside of the scope of our current project. What is significant for our purposes here is that a rather large part of this data, when properly anonymized7, can be obtained legitimately even without the subjects consent (see the previous paragraph). The following pieces of data can be considered to be part of the digital trail:

• Device hardware; • Device software; !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

5 http://www.google.com/policies/privacy/#infocollect

6 The reader should take note that not all the listed data should be considered as part of the digital trail or is part of the problem we are discussing here.

7 It should be noted that in many cases the same data can also be used un-anonymized once consented upon by the individual. This doesn’t change the nature of the data or disqualify it as part of the digital trail.

(14)

• Search queries; • Duration of calls; • Location information; • GPS information; • Sensor information; • Application information; • Browser history;

And one important part of data Google omits, but which is of considerable importance:

• Purchase information.

The above is not intended as a complete list8, but what it means to show is that a lot of information, under privacy protection schemes that focus on identifiability, is up for grabs and can be used freely and unlimitedly once collected.

Now let us delve deeper into the possible uses of these data, in both their beneficial and harmful ways. For the sake of brevity I will make a selection of three of the previously mentioned types of data. The choice for these three is based on the fact that de-anonymized they mostly are of an intuitively personal and private nature, while anonymized they become “snippets” of information that many people share in the exact same way, making this data get “lost in the crowd” so to speak and thereby apparently rendering it harmless. Furthermore these types of data can be processed into information with clearly beneficial as well as harmful uses. The three types of data I will be considering are:

• Location information

• Log information; Clickstreams and Search queries • Purchase information

Now with these pieces of “trace data”, or parts of the digital trail, in hand we can have a look at ways this data can be used when combined or analyzed by way of one of the novel systems and practices I mentioned in the introduction of this chapter.

!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

8 And in the case of Google it is doubtful anonymization is sufficient protection as it is likely Google would be able to recombine the data in sensible ways.

(15)

Location information - Because of the widespread use of smartphones with GPS

capability many companies can keep track of my location over time. Not only my mobile operator, but also companies like Google, Apple and many more, can keep track of my whereabouts and those of many others. On an individual level sharing this information enables us to use a variety of, mostly free, tools and services that greatly enhances our lives, one obvious example being free navigation software. On a collective scale sharing this information offers even more benefits. Letting companies store and analyze this data enables them to give us highly accurate and up-to-date traffic information and even predictions9. Information that before was not available, or only at great cost, is now freely available to everyone.

It becomes less harmless when this information is combined with for example other, often openly available, information sources. When we link the anonymous location information to a database of crime rates for example, correlations might be unearthed that were not visible before. It could be shown people driving a certain route at a certain time of night have an increased risk of committing a crime, prompting for example invasive police inspections (see for example Reiman’s extrinsic loss of freedom in this regard: 2004, p. 201ff).

Log Information; Clickstreams and Search queries - The storage and analysis of log

information like for example clickstreams and search queries have many, often unexpected, benefits. For an individual website overall usability can be improved by analyzing clickstreams. Or by analyzing search queries a search engine’s search results can be improved, thereby improving people’s access to information. An example of a more indirect benefit is the fact that Google uses the typing errors that people make in their search queries to build one of the most powerful spelling checkers available today (Mayer-Schönberger & Cukier, 2013, p. 39). And even more ancillary is the fact that researchers have been able to accurately predict flu outbreaks in certain areas based on people’s search queries10.

The possible harms are clear once we recognize that clickstreams, especially when combined across websites, can paint an insightful picture of my current wants, needs and interest. And what’s even more unnerving is the fact that the seemingly !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

9 An example of this is http://livetraffic.tomtom.com 10 https://www.google.org/flutrends/

(16)

anonymous search queries can be traced back to the person with relative ease. As was shown by the NY Times, who were able to connect an anonymized set of queries to 62 year-old Thelma Arnold, revealing as she stated “her whole life”11.

Purchase information - Purchase information is the last example of data that can have

both beneficial and harmful uses. Amazon and Spotify are well-known examples of companies that use purchase information to enhance their customers’ experiences. By employing big data analysis on their vast databases of purchase information they can offer shoppers valuable recommendations on what else they might be interested in and thereby helping them discover new books or music at a whim or even predict what they might like tomorrow (see Amazon’s predictive analytics: Burg, 2014).

A negative effect of this recommendation model is the so-called “filter bubble” (Pariser, 2001). By making assumptions about someone’s preferences based on what other people viewed, liked or purchased, and recommending certain content based on those assumptions, we are offered a limited version of the world. And once we act on that version of the world we are offered new recommendations, based on the same assumptions, capturing us in a bubble of reduced options based on computerized filters and models.

Two things I hope to have become clear from this more detailed look at these three types of data: The first is that a data-set can be just as easily used in beneficial as well as harmful ways and the second is that what is significant about the current state of data and its analysis, generally catalogued under the term “big data”, is that it matters less and less if it can indeed be linked directly to a person by way of a unique identifier like for example a name or an address. Millar (2009, p.111) shows us that through predictive data mining and psychological profiling a lot of accurate predictions can be made about habits, interests, beliefs, intentions, desires and background and therefore in a very real sense about “who I am”. But not only that, all these initially innocuous pieces of data can be processed into many new and different pieces of information with widely different effects and purposes. So while in most literature on privacy the terms data and information are used interchangeably, I find this to lead to confusion. Therefore I would now like to take a closer look at what we can understand data and/or information to be in this regard. Is data essentially the !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

(17)

same thing as information and deserving of the same treatment or can we discern differences that warrant another approach?

The Data – Information – Knowledge Triad

As has been observed by many information is not “intrinsically anything” (Nissenbaum, 2010, p. 153). Information has no distinct ontological status; in stead what it is – its meaning – is determined by for example its place in a certain context. The same piece of information can signify many different things depending on the subjects, time and situation it is shared in or how it is interpreted. My health status for my doctor is information on a course of treatment, while the same might be a reason for my insurance company to increase my premium and for my friends and family it can be a cause for concern. That is why it can be confusing to locate the source of privacy concerns in the type of information, for example the one categorized as “personal”. To clarify why this is let us take a closer look at the Data-Information-Knowledge triad (Ackoff, 1989)12; not as individual ontological categories, but as distinctions we can use in the evaluation of privacy matters.

What then are the distinctions we can draw between data, information and knowledge? Data we can understand as the as of yet unorganized, “atomical” facts of the world we store somewhere, for example on a server. As Gitelman (2013) warns us we should be careful with too an atomical or “raw” view of data. Data, she point out, is always “cooked”, in that it is always collected, stored and transmitted under predefined circumstances and situated within certain institutions and practices (ibid, p. 3). I will not be defending one conclusive perspective on the ontological status of data here, nor do I need to. What I want to argue for is that there is a way of looking at data as the building material for information and therefore a distinction can be made between the two terms. Under this view for example the specifics of the length of ones body or the number of days since birth (not yet interpreted as age, or height) can be considered to be pieces of data. For this data to become information it has to be interpreted, organized and put into context: “Information is data endowed with relevance and purpose” (Drucker, 2006, p.129). For example the individual heights and ages of a group of children in a certain age range can be put into a table as to !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

12 Ackoff additionaly adds the stage of Wisdom to this list, but we will not need this stage for our purposes here.

(18)

discover statistical relationships. This information can subsequently become knowledge, like for instance the fact that a specific child is behind on the growth curve. And knowledge can in turn lead to action13, like de administering of growth hormones. It is important to keep in mind this data-information-knowledge distinction is not meant to be strictly uniform or clearly demarcated. What it tries to show is that the same piece(s) of data can be interpreted in multiple ways and lead to different kinds and pieces of information, which in turn can lead to multiple courses of action. It helps to turns our gaze to the step where data is turned into information and subsequently processed into knowledge, because this is where it can lead to the morally relevant step of action14.

Understood like this we can appreciate the problems that arise with current schemes that focus on protecting a particular type of information, i.e. the “personal”. Much of what is collected about people is very rudimentary data, which is not interpreted into information and therefore often not (yet) personal in the traditional sense. A further complicating matter is that these schemes don’t take into account the option value of data as its focus is primarily on the initial point of collection and not on any later use. This while much of the significance and challenges of the “datafication” of our world lie exactly in this potential of use and reuse of data (Mayer-Schönberger & Cukier, 2013, p. 73). So the challenges I put before many of the current protective schemes are with their conception of what information should be considered private and at what point we should protect it. More about the point of protection in chapter two; I will now conclude this chapter by elaborating on the idea that “personal information” is inherently problematic in light of the digital trail and data mining practices.

Big Data, Data Mining and the Digital Trail

When we now combine the pieces of the puzzle laid before us we get a hint of what about our current situation can be considered problematic. The nature of the digital trail, its lack of identifiability and the data-information distinction combined with the new technologies and practices belonging to the science of “big data”, that enables !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

13 Marx (2012), while not calling it DIK, does discuss a similar move in his seven surveillance strips, where step 4 is data processing/analysis of raw data, step 5 data interpretation and step 6 uses/action. 14 Scholars like Reiman (2004) would disagree with this view as for them the storing of data in itself has a moral component, because of matters like panoptification and chilling effects. While not unsympathetic to these views, when followed consistently I do see them leading to an overly inclusive view on what data should be considered private and therefore too limiting to beneficial uses of big data.

(19)

piecing together data in new and unforeseen ways, have caused a change in the way we should perceive certain data and the way it is and can be handled. What remains for this chapter is to take a closer look at these technologies and practices called big data and data mining and how they bring about the changes that deserve our attention. Defining what big data is appears to be just as difficult a task as finding a catchall definition of privacy. An intuitive first step is too look at the size of the datasets, but as we have seen Mayer-Schönberger & Cukier (2013, p. 6) point out the change caused by big data is not only quantitative, but qualitative as well. They mention three major moves away from the traditional way of thinking about and dealing with data: more is better (2013, p. 19ff), messiness is okay (2013, p. 32ff) and what is important is correlation rather than causation (2013, p. 50ff). Big data in these regards fundamentally departs from entrenched practices, because while data used to be about carefully creating statistical representative samples from a population with a predetermined question in mind, one of big data’s main aims is unearthing – or

mining – new and unforeseen insights from datasets. Datasets that should be as large

as possible (i.e. close to n=all), can have a certain degree of messiness and possibly be combined with several other datasets. These departures of the old way of handling data also bring about changes in the collection of data. The more is better attitude is an incentive for ever-increasing data gathering and “datafication” (ibid, p. 15). All with the ultimate goal of “Knowledge Discovery in Databases’ (KDD, Millar, 2009, p. 105).

This is the main component that sets apart big data and its mining for knowledge from traditional ways of dealing with data: the advent of emergent data (ibid, p. 111ff). Or rather, to keep in line with the DIK-triad, emergent information. The concept of emergent information points to the possibility of discovering new information in existing data and thereby splits apart data and information into distinct entities more fundamentally than before. Previously data and information weren’t as distinct as data was gathered in such a way that it was more or less directly information. In the big data mindset data should be seen as building blocks and a source that can be mined over and over again for all kinds of new and unpredictable information. This brings about new challenges to the way we handle this data and its resulting information. For example data that is not considered private can yield private information by the process of data mining. Or private information that was legitimately obtained can

(20)

generate new private information about the data subject that should lead to questions about the legitimacy of this new information.

Now one predictable move to deal with this problem is to simply broaden the definition of “personal information” to include more types of data and/or information into the personal, thereby including much of the problematic data that is the digital trail:

[P]ersonal data ... is a broad concept, which includes, by way of example, the following types of personal data: User generated content, including blogs and commentary, photos and videos, etc. – Activity or behavioral data, including what people search for and look at on the Internet, what people buy online, how much and how they pay, etc. – Social data, including contacts and friends on social networking sites; – Locational data, including residential addresses, GPS and geo-location (e.g. from cellular mobile phones), IP address, etc. – Demographic data, including age, gender, race, income, sexual preferences, political affiliation, etc. – Identifying data of an official nature, including name, financial information and account numbers, health information, national health or social security numbers, police records, etc.

(OECD 2013: 7) Whereas such a broader definition would (partly) solve the issue that certain data currently is not considered private, while it perhaps should be, it to my mind is not an adequate solution to our problem with the digital trail. The main reason for this is that it is a substantive view of what private information consists of and is aimed at protecting certain data and not its use. And because it includes more into the personal it would be even more restrictive towards the gathering of data than the current conception. What it fails to consider is that by far not all data is gathered and transformed into information with nefarious purposes. On the contrary, much of the information that is gathered, stored and analyzed is used in ways that make our lives easier and more comfortable. This is part of what makes it so difficult to regulate through limiting collection, because this approach cannot leave enough room for the beneficial use of the data trail. So because we want to protect our privacy, but we don’t want to be too restrictive towards the beneficial uses of for example data mining, looking at the data is not the route I think should be taken.

(21)

Conclusion

To conclude: more than ever before in history vast amounts of data are being amassed, waiting to be linked, analyzed and interpreted. Ever-cheaper storage, easier access and increasing computing power have opened up all sorts of data sources to novel ways of interpretation. We find ourselves in a situation in which we often don’t know we are leaving data, we aren’t clear on what this data is used for and there is no way to know what it can be used for in the future, because all sorts of quantitative changes have caused a qualitative change where information can be mined out of data that before would have stayed hidden (Wacks, 2010, p. 126). Where the traditional way of acquiring knowledge was through looking for causation, we increasingly have algorithms look for correlation. Seemingly immaterial pieces of data, when amassed enough and combined in the right way, offer valuable insights that were simply not possible before. And because this data never spoils new value can be found at any time in the future.

Many approaches to privacy look at a specific conception of information that should be shielded from use by certain parties and they focus protection on the point of collecting. This protection has an emphasis on limiting collection and putting the control of ones information in the hands of the individual. The nature of the data exhaust poses significant threats to this method in both feasibility as well as desirability, which I will try to show in the following chapter.

(22)

Chapter 2. How We Currently Protect Our Online Privacy

“The difficulty in articulating what privacy is and why it is important has often made privacy law ineffective and blind to the larger purposes it must serve” (Solove, 2002, p. 1090). Solove points us to the fact that privacy arguments are generally developed along one of two lines; one focuses on what it means to be private and the other on why privacy should be considered important. This chapter will center on the first line of argument: What does it mean to be private, or more specifically: what does it mean to enjoy privacy in the context of the digital trail? Up until now I have been discussing the role of different conceptions of data and information in the protection of privacy, or the “what” we should protect. I will now turn to the “how” we should protect it. Getting a clearer idea of what different substance has been given to this “how” should be the first step towards a better understanding of what it would mean to have privacy considering our digital trail. Let me start by taking a look at when for example the IAB considers people’s privacy sufficiently protected:

The AdChoices icon, also known as the Advertising Option Icon (which has its own video on the DAA's website here) is all about transparency and control. Whenever you see the Icon, you’ll know two things: (1) You can find out when information about your online interests is being gathered or used to customize the Web ads you see, and (2) you can choose whether to continue seeing these types of ads. So Laura, and you, are in control of when the right ads find you15.

As we can see here according to the IAB it is two things that are important in the data exchange that takes place between consumer and advertiser: there should be a level of transparency and the consumer should have a level of control. Let us review how this connects to some other views on online privacy.

Online&Privacy&Strategies&

“Data protection law aims to strike a balance between protecting and empowering the data subject.” Zuiderveen Borgesius writes, “On the one hand, data protection law aims to empower the data subject by fostering individual control over personal data. !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

(23)

On the other hand, data protection law contains many safeguards that the individual can’t waive” (2014, p. 162). What we see here is the split between protecting and enforcing the individual. While for (European) online privacy protection the initial position is one of protection, as by default a prohibition on collection is put in place, most online privacy protection actually follows the logic of individual control as this prohibition can be lifted with the consumers consent (ibid, p. 240). “Data protection law is deeply influenced by the privacy as control perspective and the concept of informational self-determination” as Mayer-Schönberger (1997, p. 232) puts it. This view finds its manifestation in two-pronged privacy strategy: individual notice & consent and opting out. One additional strategy that needs mentioning here focuses on the information itself and its protection by way of anonymization. This strategy is connected to the idea that if information is not identifiable it is not private (see previous chapter) and thereby bypasses the data subject’s consent. The reason I mention this here as a protective strategy is because so many companies use this method as a justification for extensive data collection. As we have seen in the previous chapter: “Generally, the Advertising Industry relies solely on non-personally identifiable information that it collects through a computer’s browsing experience, so they don’t actually know [sic] identity of individual consumers.” 16 Now let us have a look at each of these three options individually.

Opting In: Individual Notice and Consent

The most widely used way of protecting people’s privacy online is by way of a default ban on data collection that can be lifted by individual notice and consent. One example of this is the way the use of cookies is currently regulated in the Netherlands: Websites that use certain types of privacy sensitive cookies have to inform their visitors of this fact and the visitor has to consent before the cookies can be placed17.

The underlying rationale is that if I’m aware of the fact information is gathered and stored about me and I have the ability to consent to or refuse this collection my privacy is sufficiently protected. This clearly is a manifestation of the notion of privacy as control over personal information. In theory this individual notice and consent approach can be seen to offer ample protection of a person’s privacy. In !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

16 http://www.iab.net/data/definition.html#consumer

17 http://www.rijksoverheid.nl/onderwerpen/internet/bescherming-privacy-op-internet/cookiewet-regels-en-richtlijnen

(24)

practice this approach is fraught with problems however (Millar 2009; Solove, 2013; Zuiderveen Borgesius, 2014; Turow, Hennesy, Draper, 2015). How informed can we expect people to be in technically complex cases like the use of cookies. And in what way do people need to give consent; does it need to be explicit or is implicit consent sufficient? An approach as seemingly straightforward as informed consent runs into a multitude of difficulties when we try to apply it to actual real-world cases.

Opting Out

A logical consequence of a control approach is that besides opting in, an individual has to be able to opt out. One has to be able to reverse a situation in which one opted in or opt out of a situation one did not opt in for. Besides the right to reverse an opt-in, opting out is often used in situations in which the privacy concerns are not that severe and implied consent is seen as sufficient for acceptable storage and use of certain data18. Email marketing for example is required to have an easy and clear opt out procedure. And for certain on-site tracking methods an opt-out option is sufficient for it to properly adhere to privacy regulations (see the previously mentioned Dutch cookie regulation). But the opting-out strategy becomes a lot less potent when transparency decreases. With all the sites we visit and tools and apps we use, how do we keep track what we opted in for? And once certain data is stored, which future uses did I opt in for? Opting out (and -in for that matter) on an individual level might be too cumbersome to be a viable protection of privacy: “A person may be able to manage her privacy with a few entities, but privacy self-management does not scale well. Even if every entity provided people with an easy and clear way to manage their privacy, there are simply too many entities that collect, use, and disclose people’s data for the rational person to handle.” (Solove, 2013, p. 1888).

Anonymity

One last widely used technique for the protection of privacy, be it on the data collector’s side, is anonymization. As I mentioned previously many of the online advertising companies that use customer profiling and behavioral targeting techniques anonymize the data in order to comply with privacy regulations, thereby de facto !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

(25)

bypassing the necessity for the data subject’s consent. The idea being that once information is stripped from its identifying component(s) it cannot be connected to an individual and consequently should not be considered personal. This is possibly true for traditional pieces of sensitive information handled in traditional ways, for example an anonymized list of account balances at a bank. As we have seen in the previous chapter the many new ways of interpreting and analyzing data and the extensive predictive models that can be built upon them makes anonymization a way less effective protector of privacy. Data doesn’t need to have an identifier to be very useful for organizations in their interactions with individuals. A lot can be known about someone, and this knowledge used to influence her or her situation, without having to be able to pinpoint by way of an identifier who the subject is. Today even anonymized information, interpreted in the right way, can reveal many personal facts. After all as Turow reminds us: “If a company can follow and interact with you in the digital environment—and that potentially includes the mobile phone and your television set—its claim that you are anonymous is meaningless, particularly when firms intermittently add off-line information to the online data and then simply strip the name and address to make it “anonymous.” (2011, p. 178)

The Control Paradigm

“A core principle of data protection law,” Zuiderveen Borgesius cites Bygrave, “is that persons should be able to participate in, and have a measure of influence over, the processing of data on them by other individuals or organizations.” (2014, p. 162). We can safely say the protection of online privacy is strongly influenced by the control paradigm. Over the last decades a certain development in this idea on what it means to enjoy privacy can be discerned. This development I would like to interpret here as a gradual move from protection to empowerment and therefore from certain safeguards (that the individual can’t waive) to a form of personal control. In the first stage privacy is understood as a right to be let alone, in the second stage it is understood that people should enjoy a state of limited access to their information and the third and final stage considers privacy to mean that one has control over ones information. Privacy in this context is often envisioned as a type of protective sphere or bubble surrounding the individual. Solove (2002, p. 1131) notes: “The notion of public and private spheres understands privacy by way of a spatial metaphor. Often, theorists

(26)

speak of privacy as a spatial realm, a sort of bubble zone that surrounds a person.” While we should keep in mind this metaphor has limitations, because it doesn’t specify shape or contents (ibidem), it does help in determining what is theoretically at stake, i.e. imagining what could be considered to be the content of this bubble and/or the access others have to it. With this comparison in mind we can start evaluating the aforementioned three stages of the control paradigm.

The “right to be let alone” perspective originates at what can be seen as the start of the modern privacy debate. Likewise motivated by the arrival of a new technology (the invention of the portable camera) Warren and Brandeis reflected on what could be considered the foundation of people’s claims on not having their information (like a photograph) taken and transferred without their approval. This led to them stating that this foundation can be found in people’s right to be let alone:

These considerations lead to the conclusion that the protection afforded to thoughts, sentiments, and emotions, expressed through the medium of writing or of the arts, so far as it consists in preventing publication, is merely an instance of the enforcement of the more general right of the individual to be let alone. It is like the right not be assaulted or beaten, the right not be imprisoned, the right not to be maliciously prosecuted, the right not to be defamed.

(Warren & Brandeis, 1984, p. 81) This notion has a certain intuitive appeal; in many contexts it makes sense to think of privacy as an impenetrable bubble others should not be able to enter willy nilly (think for example of the context of privacy in one’s home). Theoretically however this concept has been criticized for being too broad and too vague (Solove, 2002). If we think of privacy in the informational sense it would confront us with a sphere that is so undefined in both shape and contents it would leave us with a completely unwieldy concept. What we would have is therefore too general and inclusive a term to be of use for us here: almost everything could be considered to be private. As Solove puts it ever so eloquently: “Even a punch in the nose could be considered a violation of privacy” (2002, p. 1102). A lot of the information currently stored about us online would be off limits, for example a website owner would not be allowed to use analytical software to improve its use. Many of the current practices and benefits of the online world would collapse. In short: a right to be let alone seems to me to be an

(27)

unfeasible approach when we try to protect informational privacy and at the same time still want to leave room for the use of certain data. What we need is something a bit more refined.

Solove remarks that the limited access principle perhaps offers such a refinement (2002, p. 1102). Under this notion informational privacy can be seen as a condition where people can undertake certain actions without others having the ability to collect information about them (Solove, 2008, p.19). Not so much a fixed configuration, but more of a gradual distinction (i.e. “limited” access), this perspective allows certain access, while it prohibits others. So for example in the territory of digital data, the fact that a website owner stores certain usage information I disseminate (like for example my visit to a certain part of the website) could be allowed while him storing cookie information could be interpreted as a breach of privacy. Keeping in mind our previous considerations on data & information and the contents of the private sphere we can see why this protective measure may raise issues. As privacy is still inherently linked to what should be the contents of the sphere and therefore depends on a specific idea on what data/information is, we need to specify in advance to what information parties can and cannot have access. And thus such an approach would conceivably have large problems with keeping up with the ever-changing structure of online data and information.

In the move away from privacy as a type of seclusion or secrecy (that have often been considered to broad and vague) and protection by overall prohibition many theorists have defined privacy as a measure of control: “Privacy”, notes Fried for example “is not simply an absence of information about us in the minds of others; rather it is the control we have over information about ourselves.” (1984, p. 209). Westin writes: “Privacy is the claim of individuals, groups or institutions to determine when, how and to what extent information about them is communicated to others.” (1967, p. 7). Privacy defined like this is control over access to ones information or, differently put, I have privacy when I am in control of the information others have about me. This empowering perspective puts the emphasis less on the “content of the sphere”, i.e. the information, and more on the role of the data subject in the information exchange. While it on the one hand empowers people and takes their individual preferences into account (for example of who they want to share information with and who not) we

(28)

can see that this does put the burden of actively protecting her privacy largely on the shoulders of the data subject. But such can be quite cumbersome, if not impossible, as was previously noted. Now as we have seen current online privacy protection schemes are predominantly based on this control approach; let us see how this relates to the digital trail.

The Control Paradigm and the Data Trail

No with this strong penchant towards control in our liberal market-oriented societies in mind (Zuiderveen Borgesius, 2014, 91) – which is partly understandable as the idea of control connects to fundamental notions of freedom and autonomy – it has become clear this approach has certain practical drawbacks. Reasoning from the situation where data and information could be considered to be more or less the same thing, because information was mostly immediately stored in its final form, limiting access for example made sense. In this context this perspective on privacy would be sufficient as information mostly did have a more or less fixed status and context. My physical medical dossier in a hospital is sufficiently protected by putting it in a locked filing cabinet and limiting access to this cabinet to authorized personnel for example. One of the features of the digital trail, as we have seen in the previous chapter, is that because of the indeterminate nature of data this approach no longer holds. Data is stored indefinitely and can be processed into to many new types of information allowing the use and context to change way too easily. In this new situation the ways in which this personal control is supposed to be achieved has serious limitations in applicability and in many, if not most, cases doesn’t actually realize its ultimate goal: the protection of people’s privacy. This is true in light of online privacy issues, like how to regulate the use of cookies, but perhaps even more so in the case of our topic at hand, the digital trail. I would like to distinguish two main points of critique to the control approach when considering the digital trail: feasibility and desirability.

Feasibility of the Control Approach

Let me briefly recap and outline the problem we are dealing with. With the digital trail we mean all the data that is generated as the byproduct of our digital or online activities, the traces we leave in our digital and online lives. In and of themselves these pieces of data are not normally seen as personal or private, but when subjected

(29)

to big data analysis they might reveal information that is. Three properties of this type of data make it harder to regulate than other forms of data: its surreptitious collection, the pervasiveness and accessibility of data and the increasing analyzing power. As it is “merely” a byproduct of our activities the existence, let alone its collection, of this data is often unknown to us. The pervasiveness and accessibility of this information through digital storage makes its (future) uses unpredictable, as do the ever-increasing analyzing capabilities brought on by the growing computing power.

The feasibility of personal control over information of a more directly personal nature is rather questionable in and of itself. One look at the recently launched Google privacy dashboard19 or Microsoft privacy statement20 makes it abundantly clear there are severe knowledge- and time constraints connected to the idea of personal control. These problems are mainly of a practical nature though. With proper education, simplification and presentation these problems could be, at least in theory, surmountable. The problems become more fundamental when we consider the digital trail. First of all the collection of a lot of data is mostly considered legitimate under current regulation and secondly because of the pervasiveness, accessibility and analyzability its future use necessarily unpredictable. These issues cause fundamental problems for the personal control approach and its strategies of anonymity, notice and consent and opting-out.

The Nature of the Data in the Digital Trail

The digital trail consists mainly of the seemingly insignificant snippets of data discussed in chapter one; it comprises of databases of coordinates, search terms, so-called clickstreams or lists of products people purchased. And most of this data is anonymized, and therefore, at least in theory, unidentifiable to natural persons. When considered like this the digital trail falls mostly falls outside of the scope of current privacy protection as that only considers information relating to identified or identifiable natural persons (as we recall: EU Directive 95/46/EC - The Data Protection Directive, Article 2a). Obviously there is more to identity than a name, an address or a phone number we have seen Turow remark (2011, p. 178). The EU directive recognizes this when it says that identification can also be achieved by one !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

19 https://myaccount.google.com/privacy

(30)

or more factors specific to physical, physiological, mental, economic, cultural or social identity. So identifiers include fingerprints, pseudonyms or combinations of specific social factors. But what part of identity it currently does not include is people’s habits, interests, intentions, propensities and the like (as long as they are not traceable back to us individually). And this is exactly what the data exhaust subjected to big data analysis is particularly good at: unearthing these characteristics. And while for my account balance or medical history to be private it is essential it can be connected to me through an identifier like my name (otherwise it is for example just a number), my interests or propensities don’t need an identifier to possibly be of use to someone and therefore to be personal and perhaps private.

Let me explain by way of an example why I think this to be true: in the past a credit company for example would need actual information about me and my past behavior to assess my risk profile. There would have to be actual missed payments in my past for the credit company to assess my risk as being too big to offer me a loan for example. With current big data techniques credit companies like Wonga.com21 (about which more in chapter three) can use certain markers in my behavior (i.e. how or when I use their website) to do the same thing, without them having to know my history or even who I am. Traditionally information had to be associated to my person to be of use to third parties, because its fundamental supposition is that of causation. Information about my past or current situation or behavior informs assumptions on what I would do in the future. My not having paid all my installments on a loan for example informs the credit company of an increased risk of missing payments in the future. Big data relies on correlation rather than causation. Seemingly completely unrelated data can be revealing of for example intentions or propensities in many cases. Therefore there often is no need to identify me in the traditional sense in order to be able to be able say something about me. The way I use the credit companies’ website, through extensive analysis of existing data, can tell them my risk of defaulting on my loan. They can derive this information from aforementioned use without them knowing anything else about me. Even if we disregard the possibility of de-anonymization, which is a real threat as we saw in the paragraph on search queries, anonymity loses a lot, if not all, of its strength when the data can be subjected to big !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

21 See for example http://www .wonga.com and

http://www.slate.com/articles/technology/future_tense/2013/01/wonga_lenddo_lendup_big_data_and_s ocial_networking_banking.html

(31)

data analysis. So data that is anonymous is not necessarily “safe” any longer and more needs to be considered when we want to protect our privacy in a meaningful way.

The Option Value of Data

Another fundamental issue that faces control over information as privacy approach is that it is impossible to know all possible future uses of data in advance. Even if and when we consent to the collection of our data by a third party we by cannot know in advance what this data might be used for in the future. Certain guidelines and principles can – and have – been put into place that try to tackle this problem (see the OECD Privacy Principles22 for example or Zuiderveen Borgesius’ list, 2014, p. 114ff). A purpose specification might have to be given, which means the data collector needs to specify what he will be using the data for at the time of collecting. A use limitation can be in place, where data collectors can only use data for a predefined purpose, unless they get renewed approval by the data subject. There can be a temporal limit to the use of the data, an expiration date of sorts. And openness and accountability of the data collector tries to safeguard fair and lawful use of data. But pre-set limitations to the use of the data in the data exhaust directly contravenes with the operations of this type of big data. When part of our goal is to not to be so restrictive as to hamper all the possible beneficial uses of the data exhaust, this preventive approach that limits use in advance will not work. We need to find a way of separating the desirable from the undesirable, without being too limitative from the get-go. A case-by-case individual permission model is utterly unfeasible and one additional complicating factor is that due to the fact that big data often tries to unearth unspecified correlations it is impossible know to the outcome in advance and therefore what one would be agreeing to.

Protection at the Moment of Collection

The option value of data points to one more fundamental shortcoming in the currently employed control schemes. We have seen that since everything digital is so easily collected, stored, retrieved and analyzed and many of the things we do leave traces, a lot of our activities unknowingly become data for others to be turned into valuable information. Valuable information that is not necessarily private and that in many !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!

Referenties

GERELATEERDE DOCUMENTEN

Hypothesis 2: Within the large base segment of the advertising agencies, no clear creativity stimulating organisational encouragement routines can be distinguished

The success of the viral video idea competition is affected by the quality of the video(s) chosen for the viral campaign (creation stage) and the quality of the seeding

Deze hield een andere visie op de hulpverlening aan (intraveneuze) drugsgebruikers aan dan de gemeente, en hanteerde in tegenstelling tot het opkomende ‘harm reduction’- b

Oric was the Senior Commander of Bosnian Muslim forces in municipalities in eastern Bosnia and Herzegovina, including Srebrenica, from 1992 until the fall of the Srebrenica enclave

At the same time, employers (and, indirectly, the public) often have a legitimate interest in policies and practices that may impact on privacy. In this chapter, a number of

•  For females, the largest effect was found for utilitarian motives for green cars and hedonic motives in the brown segment.. •  For males, the largest effect was found

This relationship is also not influenced by the high (vs. low) need for closure of consumers. This personality trait does not change the consumers’ intention to

marr i es Nora h and so the Vanstone sisters sha r e the Vanstone inheritance after all. The First Scene, Chap.. Lecount and Wragge use the other chaTactu rs as