Regulating Algorithms: An ethical analysis of the General Data Protection Regulation

(1)

REGULATING

ALGORITHMS

An ethical analysis of the General Data Protection Regulation

Thesis research Job Vis (P.J.M. Vis) S1307894

Master Crisis and Security Management Leiden University

Supervisor: Dr. J. Shires

Second reader: Dr. C.W. Hijzen

Word count: 17,496 (Excluding references, tables, footnotes, contents, abstract) Date: 26-01-2020

(2)

1

Chapter

1. Introduction 3

- Public versus private use of algorithmic technology 4

- Scope of algorithms 4

- Societal relevance 5

- Case selection and research question 6

- Academic relevance 7

2. Literature review 8

- Conceptualization 8

- Ethical issues surrounding big data 11

- Accountability 13

- Transparency 15

- Fairness 16

3. Theoretical framework 18

- Transparency model of Zarsky 18

- Six ethical concerns: Mapping the debate 19

- Organizing and interaction of the two theories 22

4. Research design 25

- Sub-questions 25

- Methodology 26

- Validity and reliability 26

- Case selection 27

5. History of data protection EU 29

- Before the ‘Directive’ 29

- The ‘Directive’ 95/46 31

- The Right to be Forgotten 34

6. Analysis 38

- Analysis of the GDPR 39

- Analysis Ethics guidelines for Trustworthy AI 49

7. Conclusion 55

8. Bibliography 58

(3)

2

In the last decades society has become more and more involved in the online sphere, a place where value is created from the digital footprints ordinary people leave behind. Huge amounts of knowledge gained from these only footprints are now in the hands of technology companies and governments which allows them to encroach on privacy and affect individuals lives, positively but also negatively, at an unprecedented scale (Flyverbom, Deibert, & Matten, 2019). The use of algorithmic decision-making (ADM) is an interest of this research because it removes humans from the decision-making process. Algorithms are mathematical processes which can impact individuals through revealing correlation between information input. Ethical problems arise when people don’t know they are providing digital information or when they don’t know their information is being used. Other ethical problems are associated with biased input data which creates biased decision by ADM, lack of transparency by data controlling companies, lack of responsibility and accountability by institutions that provide unethical decisions. The current EU legal framework that is relevant here is the General Data Protection Regulation (GDPR) which has gone into effect in 2018. This research aims to provide the reader with sufficient information about the adverse effects unethical ADM can have on data subjects. In order to better understand the current EU legal framework this research also provides a historical chapter in which EU data protection is described and analyzed; from non-binding initiatives and the ‘Directive’ towards the Right to be forgotten and the GDPR. This research will use two different ethical theories which will be applied to the current GDPR in order to answer the research question: Why has the EU sought to influence ADM and how successful have they been?

(4)

3

“Can machines predict human behavior? Vast datasets of personal information available to commercial and governmental entities combined with advances in mathematics and computer science enhance not only the ability to but also the appetite for predictive computer programs”

(T. Z. Zarsky, 2013: 1505).

Although society as a whole has become more digital in the last decades it is only recently that users of the internet know that (mostly private) companies gather comprehensive data about their lives because not only social media but other online services as well, required citizens to give up part of their privacy in order to use free convenient online services (van Dijck, 2014). Users leave digital footprints and (mostly) private companies gather this data about people regarding their shopping, their location and much more. An increasing amount of people are connected to the internet for daily use which highlights the potential to create value from these users by analyzing their digital footprints. The capturing, storing, aggregating and eventually analyzing of these enormous amounts of data is called ‘big data’. Big data sets serve as the basis from which data technology companies can use algorithms, and algorithmic decision making (hereafter ADM), to analyze and combine huge amounts of (meta)data which provides extremely precise knowledge about individuals and their (personal) lives (Lyon, 2014). This raises some moral and ethical questions because it has left the average internet user at the mercy of these technology companies; on the one hand they want to continue using their online applications but as a result there is a huge amount of knowledge now in the hands of said companies which allows them to encroach on privacy and affect individuals lives, positively but also negatively, at an unprecedented scale (Flyverbom, Deibert, & Matten, 2019). In order to limit the power of data companies while simultaneously giving citizens more protection the European Union (hereafter EU) has updated its data protection regulation from 1995, the Data Protection Directive (hereafter ‘Directive’), in 2018 into the General Data Protection Regulation (hereafter GDPR). This meant that across Europe citizens could enjoy the same fundamental rights with regards to data protection and allowed closer monitoring of data collection companies from within the EU.

(5)

4

Public versus private use of algorithmic technology

The increase in the amount of people now using online services also means there is an increase in the amount of digital trace data that is collected through these digitized devices (Newell & Marabelli, 2015:1-2) which gives companies, like Google, the opportunity to use this data, in this case for positive action. Mayer-Schonberger & Cukier (2013) describe in their book1 a situation where the Center for Disease Control (CDC) in the United States wanted doctors to inform them on new flu cases in order to better prevent them from spreading. Engineers from Google published a paper stating that they could predict the spread of flu in the US, not just nationally but down to specific regions. They achieved this by analyzing online data (search queries) and looking for correlations between frequency of certain queries and the spread of flu over time and space. By analyzing the data through different mathematical models, (Mayer-Schonberger & Cukier, 2013) they could predict in near real time where the flu had spread. It proved that Google’s system was more useful then government statistics because it used data in combination with algorithms to make decisions which proved faster and more accurate (Mayer-Schonberger & Cukier 2013: 2-4).

The scope of algorithms

Algorithms are complex not only from a mathematical standpoint but also from a social standpoint. Algorithmic work can be seen in Twitter Trends, Facebook or Twitter news and adverts or recommendations and through personalized Google search results or suggested Google map directions (Willson, 2017). They have a big impact on everyday peoples (online) social life but only a handful of people can write the codes used for algorithms, data collection, data aggregation and machine learning codes. According to the Evans Data Corporation in 2018 there were approximately 24 million software coders in the world2 which translates to only ~0,3% of the entire population of the world. By contrast in 2000 there were approximately 300 million users of the Internet which corresponds to around 5% of the world population at the time. Since then this number has gone up to respectively 4,5 billion users and around 60% of the population in 2019 3. “Today we are now generating 2.5 quintillion bytes of data which means that 90% of all digital data in the world today has been created in the last two years alone” (Herschel & Miori, 2017: 31).

1_{Mayer-Schönberger, V., & Cukier, K. (2013). Big data: A revolution that will transform how we live, work, and}

think.

2_{Data gathered from Evans Data Company}_{https://evansdata.com/reports/viewRelease.php?reportID=9} 3_{Data used from Internet World Stats on:}_{https://www.internetworldstats.com/emarketing.htm}

(6)

5

The impact of algorithms can be enormous and because only a very limited amount of people is capable of writing code the quality of algorithms needs to be of the highest standard because they can affect a lot of people.

Societal relevance

Big data can often be obtained in unethical ways because people don’t know they are providing information or how much information is gathered, they often don’t know what is happening with their gathered information. Algorithms and algorithmic decision making (ADM) also have ethical issues in that algorithms are judged by making correlations between inputs and outputs, based on the data gathered in ‘big data’. The tracking and recording of people’s online activities can happen through social network activity, online shopping or ATM withdrawals but also through other ‘traceable’ activities like the use of GPS tracking on smartphones. The usage by governments, through private data collection companies, on data collected of their citizens allows for substantial opportunities for the surveillance of communications, movements, behavioral patterns, and political activities of citizens. All this individual data left by people online as digital footprints is gathered in big datasets, or ‘big data’ which “gives society the ability to harness information in new ways to produce useful insights or goods and services of significant value” (Mayer-Schonberger & Cukier, 2013: 2). Following the example of Google and the CDC by using algorithms on this data creates a “a process that leverages massive data sets and algorithmic analysis to extract new information and meaning” (Martin, 2015). They use captured data to process input (data) to deliver output (predictions on human behavior). But these predictions often represent a ‘black-box’: humans produce algorithms to measure and analyze data from big datasets but algorithms don’t involve understanding the causes and consequences of certain behavior that is identified (Newell & Marabelli, 2015, Mayer-Schonberger and Cukier, 2013), they work within the parameters in which it is programmed and specified by developers and configured by users with desired outcomes in mind. This privileges some values and interests over others which can contribute to the fact that, even if algorithmic outcomes fall within accepted parameters, this does not guarantee ethically acceptable behavior (Dann & Haddow, 2008, Mittelstadt et al. 2016). Therefore, the impact and influence these algorithms can have on people’s life while also having ethically and morally questionable adaptations constitutes the societal relevance of this research.

(7)

6

Case selection and research question

Currently technology provides us with increasing amounts of tools, processes and methods to analyze data produced by ordinary people to gather value from then. But as “technology moves faster than the law” (Fenwick, Kaal, & Vermeulen, 2016) keeping up with technological changes in the field of data protection and ADM regulation is getting increasingly more complex. The main focus of this research is the GDPR which has “entered into force in an important time for the ever changing digital economy we are part of because although ADM and algorithms produce big opportunities to create online value, create welfare and enhance digital mechanisms, the complexity of the field in this rapidly changing environment is a challenging task for data protection regulators like the EU” (T. Zarsky, 2017: 996). The goal of this research is to evaluate why and how EU regulatory influence through data protection regulation like the ‘Directive’ and the GDPR has minimized the risks associated with the use of big data, ADM, AI and algorithms in relation to ethics. Reason for this is that the EU is an interesting case because “EU policy-makers have adopted a frame of analysis to differentiate the EU strategy on AI from the US strategy (developed mostly through private-sector initiatives and self-regulation) and the Chinese strategy (essentially government-led and characterized by strong coordination of private and public investment into AI technologies) 4. The case selection therefore centers on EU data protection from its earliest initiatives in the end of the 20th century, through the first legal data protection basis of the Directive, all the way to the recently adopted GDPR (2018). The research question for this research is:

Why has the EU sought to regulate algorithmic decision-making through GDPR and how successful has this been?

The research question is formulated in this way because it remains explanatory and although the question is directed as to why the EU has influenced ADM, because this research also examines EU data protection in a historical context it should help with answering also how the EU has influenced ADM. Although the research question itself doesn’t mention ethics directly the goal of this research is to evaluate if (1) the EU has sought to influence regulation of ADM for ethical reasons, which this research argues they have, and (2) how successful has the EU been at regulating ADM in order to minimize ethical issues. The first question (further explained in the research

(8)

7

design chapter) looks at the bigger historical data protection timeframe of the EU in order to analyze what kind of reasons the EU has to influence ADM (ethical, economic, societal) while the second question focusses on analyzing the EU with the help of two theories that are applied to ethical concerns. Where the historical chapter is of a descriptive nature the evaluation of the GDPR will be analytical and will apply the theories that are discussed in the theoretical framework chapter.

Academic relevance

Within Europe there is a tendency to consider the social impact of collecting big data, and the usage of algorithms on big data, to fall within the scope of privacy and data protection laws and regulations. In 1998 the EU the Directive on the Protection of Individuals regarding the Processing of Personnel Data and the Free Movement of such Data (Directive) became effective. It signaled significant regulatory controls over business processing and use of personal data. It provided the EU with “more regulatory power towards third countries, mainly the US with regards to data collection companies (or technology companies), to ensure an adequate level of protection of data” (Shaffer, 2000). Since 2018 there has been a new regulation in place; the General Data Protection Regulation (from now on GDPR). The aim of the GDPR is to protect all EU citizens from privacy and data breaches in today’s data-driven world. Although the key principles of data privacy still hold true to the ‘Directive’, many changes have been proposed to the regulatory policies5. There have been many authors that evaluate certain aspects of data protection, some look at specific ethical concerns relating to back data and ADM while others only look at the GDPR and what it means, what implications the adaptation brings with it. This research aims to bring all these different views and opinions together in order to fully comprehend the influence the EU has had on ADM. The EU itself is an interesting case because it is a unique organization; comparing data protection in the EU with other countries or organizations would simply not work because they differ too much. There are authors that write about the GDPR (Vedder & Naudts, 2017; Wachter, 2018; Wachter, Mittelstadt, & Floridi, 2017) and there is literature available with regards to ethical issues surrounding algorithms so combining these two to look at EU influence on the use, adaptation and application of algorithmic decision making by analyzing the Directive and GDPR with a focus on ethics will add to the existing literature.

(9)

8

Important for the literature review regarding big data, algorithms, ADM, data protection and the GDPR is that although relatively ‘new’, all these concepts are fundamentally changing our knowledge and actions. Algorithms do not operate in isolation but perform functions as part of a larger structure. They are parts of computer systems and without data to pair algorithms to they are without meaning (Vedder & Naudts, 2017). This means that their utility depends on data, but this also means that the quality of data and the way in which data is used can be have a big influence (positive or negative) on the algorithmic outcome/output. The goal of the literature review is to clarify some of the relevant concepts used within this research. After that the ethical issues relating to ‘big data’, the source of algorithmic decision making will be discussed. Algorithms should operate with respect towards the fundamental principles of personal data protection which means algorithms have to abide by the concepts of fairness, transparency and accountability (Vedder & Naudts, 2017: 6) which will be discussed.

Conceptualization

Personal data is “any information relating to an identified or identifiable natural person” and the Data subject is the natural person to whom data relates (Goodman & Flaxman, 2017). Big data is the collection of data in large data sets. ‘Big data’ refers to “gathering, storing, sharing, evaluating and preparing the information created by humans through online networks or connected devices” (Herschel & Miori, 2017: 31). Laney divides ‘big data’ in three segments that asses the ‘worth’ of big data. These three V’s are Volume (the amount of data determines value); Variety (data arise from different sources/databases and are cross matched to find relationships), and Velocity (data are generated quickly) (Laney, 2001). The definition used in this research follows the one mentioned in the introduction in that ‘big data’ “gives society the ability to harness information in new ways to produce useful insights or goods and services of significant value” (Mayer-Schonberger & Cukier, 2013: 2) which is connected through the central concept of algorithms through the fact that by using algorithms on this data creates a “a process that leverages

(10)

9

massive data sets and algorithmic analysis to extract new information and meaning” (Martin, 2015).

Artificial Intelligence (AI) is the theory and development of computer systems able to perform

tasks normally requiring human intelligence, such as visual perception, speech recognition and decision-making6.

Algorithms operate on data gathered in ‘big data’ sets. In the broadest sense, algorithms are encoded procedures for transforming input data into a desired output, based on specified calculations (Gillespie, 2012). Algorithmic processes are often described through the analogy of a recipe. A recipe has an end point, a meal for example, but to get there you must follow a list of ingredients (variables). By using these variables while following a step-by-step description of a process of what needs to happen and when in a very specific detailed order (Willson, 2017: 4-5). The algorithms we are focused on in this research are statistical; meaning that through correlation of input and output they are right a percentage of the time. They are mathematical processes which can impact individuals through this correlation. Algorithms are therefore made to produce – “they are designed to bring about particular outcomes according to certain desires, needs and possibilities” (Willson, 2017:4) by which the algorithm is coded. An algorithm is designed with a specific function, steps it needs to follow, but it also needs to be able to communicate with and within other systems; they do not operate in isolation, but perform functions as part of a larger structure (Vedder & Naudts, 2017: 3; Willson, 2017: 4-5).

Algorithmic decision-making system (ADMS). Algorithms that support humans in decision making, by taking over tasks normally carried out by humans, fall under the concept of algorithmic decision-making system (ADMS). .“Algorithmic decisions are based on rules as to what should happen next in a process, making calculations over massive amounts of data” but these decisions also involve what information to select first, what values to attach to certain information and making connections between data points (Diakopoulos, 2014: 3-4). Diakapolous names four different decisions ADMS can make: prioritization, classification, association and filtering of information.

Prioritizing within algorithms happens when certain information gets emphasized at the expense

(11)

10

of other information. Prioritizing algorithms are coded with certain criteria in mind which allows it to make choices and value ‘judgements’ (so to speak) to determine if information is placed high or low in a ranking (Diakopoulos, 2016: 57). An example would be Google search results where an algorithm has decided which websites (apart from sponsored sites/advertisements) will be placed higher than others when typing in a certain search query. Classification is the second process which consists of marking people’s/entities as belonging to one class or another. Belonging to a certain class can impact decisions made by algorithms and there is plenty of opportunity for bias, uncertainty and mistake within automated classifications. Association decisions are focused on finding relationships between subjects, things or peoples. The association can vary from directly related to vaguely similar and can lead to connotations when humans interpret these associations. Just like with classification and prioritizing a lot of the power of this process is left to the people who develop these algorithms. Filtering is defined by the inclusion and exclusion of information. Filtering depends on rules and criteria captured within the algorithm itself. Too much filtering can lead to censorship seeing as the algorithm can red-flag certain information therefore resulting in that type of information being excluded from the people (Diakopoulos, 2016).

Machine learning (ML):

Whereas ADMS can produce outcomes that are important enough that human operators have to check/control the outcomes (Diakopoulos, 2014: 3), machine learning algorithms are slightly different: they are autonomous. Machine learning is based on building computers/algorithms that improve automatically through experience (Jordan & Mitchell, 2015). Machine learning is the next step up from ADMS. “Instead of programming an algorithm manually to anticipate certain response (output) for possible inputs, meaning ADMS, it can be easier to train a system, to let the ‘machine learn’, by showing it examples of desired input-output behavior” (Jordan & Mitchell, 2015: 255).

(12)

11

Ethical and moral issues surrounding big data

“What makes ethics so valuable is that it helps us to frame our arguments about what is right or wrong using logical, rational arguments. One can use them to understand and evaluate whether

they think the use of big data is morally right” (Herschel & Miori, 2017: 35).

As discussed in the previous section algorithms work on data collected through the use of online services and devices but because algorithms are dependent on this data for their predictive outcomes and the decisions made by algorithms is also based on this data it is essential that the data used is of the highest, ethical, quality. Therefore, this section briefly discusses the ethical concerns that might influence algorithmic output and decision making. Some of the problems with big data and data collection come from the more pervasive technologies and techniques used for data collection; lack of transparency as to when people are giving up information, how much and what for; how is it being used? Figure 1 serves as an example of how big data (or training data in this example) functions as the input on which algorithmic analysis and later decision-making works to deliver an outcome/output. If the input data already contains biases or unfairness this will undoubtedly show up in the outcome. Some aspects of this figure will be discussed later such as fairness, accountability and ethical norms as well as an analysis if appropriate rules and policies are in place to limit these ethical concerns in relation to the GDPR.

Figure 1 Algorithmic decision making including ethical concerns and implications that arise by the use of big data (training data) and algorithms (from K.Martin, 2019:842, slightly adjusted for this research)

(13)

12

Ethical issues with regards to big data and the gathering of information are usually concerned with the violation and/or the intrusion of privacy, a lack of consent towards the data provider and lastly problems with transparency of data companies that collect personal information (Flyverbom et al., 2019: 11). In short this means that “the use and availability of data and the way in which data is compiled can interfere with norms of privacy and consent, as well as transparency, for the people providing the data” (Flyverbom et al., 2019: 12). These (big) data sets can also lead to new concentrations of power because “many of the most revealing personal data sets such as call history, location history, social network connections, search history, purchase history, and facial recognition and other information is already in the hands of governments and corporations” (Herschel & Miori, 2017: 32) and these data sets are (at the moment) “never methodologically removed from human design and bias" (Crawford, 2013; Crawford, Miltner, & Gray, 2014). The amount of data is enormous which also creates big opportunities and big risks. One of these risks is that secondary use of data can reverse engineer past, present and future breaches of privacy and confidentiality (Herschel & Miori, 2017: 32).

Big data sets can also present blind spots because they cannot account for people that don’t leave behind digital prints but do need to be incorporated within the application of algorithmic decision making. The lack of transparency towards outsiders regarding big data makes it difficult to understand what is happening. At the same time the reliance of big data on the three V’s (volume, variety and velocity) has a tendency to “neutralize discourse surrounding big data and the possible errors it can contain” while also “failing to account certain peoples and communities that are not incorporated in big data which is an example of discrimination”. (Crawford et al., 2014: 1667). It is also important to note that big data has a profound faith in predicting behavior through correlation, but this ignores the import adage “Correlation does not equal causation” (Diakopoulos, 2016).

(14)

13

Ethical issues surrounding algorithms

Accountability, transparency and fairness

Algorithms use the data gathered in big data sets therefore it was important to describe these risks first because if the basis of the data gathered is not up to the ethical standards or does not correspond with the appropriate legal/privacy rules then decisions made by algorithms based on this data can face similar problems as well. A key component of ADM and machine learning should be explainability; being able to explain the actions made by algorithms. Explainability, which is closely related to interpretability and transparency, means that systems “are interpretable if their operations can be understood by a human, either through introspection or through a produced explanation” (Biran & Cotton, 2017:1). Another related concept besides transparency/explainability is justification or accountability which should explain “why a decision is a good one, but it may or may not do so by explaining exactly how it was made” (Biran & Cotton, 2017:1); it revolves around justifying certain actions made but also being accountable for the overall result or prediction made by an algorithm. Lastly fairness revolves around the idea of absence of bias and discrimination in algorithmic input and output but also a more subjective idea of fairness exists in which the produced outcome delivered by an algorithm should be ‘fair’ in that the risk of being wrong should be as minimal as possible in order for the outcome to be perceived as fair. Because all three central concepts have multiple explanations and are quite broad, they will be discussed in detail in this chapter. A report by the EPRS is used in this research for the working definitions of these three concepts which entails that accountability relates to justifying the conclusions or predictions made by algorithms, transparency relates to showing how an algorithm has come to certain conclusions and that the process needs to be understandable and lastly fairness is the absence of bias and discrimination within the entire process of ADM (input, analyzing, output/predictions)7.

Accountability

Algorithms are designed to be executed and to bring about particular outcomes according to certain desires, needs and possibilities. They [algorithms] are relational in that they need to communicate with other systems with which they interact. For outsiders they are difficult to describe and

7_{European Parliamentary Research Service (2019) Understanding algorithmic decision-making: Opportunities and}

(15)

14

understand not only because of the technicality of coding but also due to a lack of understanding about their functions and impact on society. This raises questions about accountability: how did an algorithm come to a certain prediction or conclusion (output)? Who or what can justify the actions and predictions by algorithmic agents?

If one wants to understand the decision-making capabilities of algorithms, and justify the actions algorithms make, one must possess expert knowledge of the system language and mathematics in which an algorithm is written. Seeing as computer coding and algorithmic writing is “a complex and difficult field even for experts making their living out of it, it is understandable that a majority of the general public will not be in possession of the knowledge needed to truly understand algorithms” (Vedder & Naudts, 2017: 3). Another problem between the relation of algorithms and the general public is that algorithms are rigid codes, they are precise whereas public policy, which affects the general public, “is characteristically imprecise which means that even when a well-designed piece of software does assure certain properties, there will always remain some room to debate whether those assurances match the requirements of public policy” (Kroll et al., 2016: 646). Because algorithms are so efficient and accurate, they are found in anything from very complex cars to basic home appliances however the accountability mechanisms that have to govern ADM have maybe not kept up pace compared to the technological advancements made in the last years (Kroll et al., 2016: 636).

The perception of human beings is influenced by norms, emotions and certain values. But the idea that the “computer knows best” and that algorithms are ‘neutral’ mechanisms that use data to provide use with new insights or even ‘undeniable truths’ (Vedder & Naudts, 2017: 4), leaves little room for justifying or keeping them accountable. Just because algorithms can do so many more calculations and predictions then humans cannot exclude them from critique because in the end algorithms are only as good as their code allows them to be which depends on the coders that produce algorithms Therefore, algorithms have be to be viewed and analyzed as objects created by humans which means that individual, group or institutional influence can have influenced the design and therefore the capabilities of algorithms (Diakopoulos, 2014: 10) which illustrates the importance of the next concept, transparency.

(16)

15

Transparency

Algorithmic decision-making is being embedded in more public systems—from transport to healthcare to policing—and with that has come greater demands for algorithmic transparency (Diakopoulos, 2016; Flyverbom et al., 2019; Martin, 2015; Pasquale, 2015). Many authors see transparency, or at least an increase in transparency, as the solution for automated processing (ADM), but it is unclear in what form transparency leads to more accountable and understandable algorithms (Kroll et al., 2016: 638). Some argue that transparency follows “a certain chain of logic in that observation produces insight which creates knowledge to hold systems accountable” and that “observations can be seen as a diagnostic measure for ethical action because observers with more access to the facts describing a system will be better able to judge whether a system is working as intended”(Ananny & Crawford, 2018: 2). This is all well and good, but is this true with regards to algorithms as well? Does this logic of more facts lead to more truths hold true for algorithms as well?

Diakopolous argues that transparency relating to algorithmic power is useful if we consider the bounds and limitations. “The objective of any transparency policy is to clearly disclose information related to a consequence or decision made by the public—so that whether voting, buying a product, or using a particular algorithm, people are making more informed decisions” (Diakopoulos, 2014: 11). But often corporations’ benefit from limited transparency since exposing too many details about algorithms being used may hurt them in a competitive way, it can hurt their reputation and ability to do business, or leave the system open to manipulation (Diakopoulos, 2014: 12).

Another approach towards transparency with regards to algorithms is described by Kroll in which the most obvious approach would be to disclose a system’s source code. But this is, at best, only a partial solution to the problem of accountability and transparency because as mentioned above companies would lose their competitive edge but also because of the high complexity of algorithms the source code would be too complex for the general public to understand (Kroll et al., 2016: 638). Machine learning, a relevant concept within this research, is also ill-suited to this source code analysis because it involves situations where the decision (output) emerges automatically because it has learned to this by itself which means that often humans can’t explain the outcome of machine learning algorithms. “Source code alone teaches a reviewer very little, since the code

(17)

16

only exposes the machine learning method used and not the data-driven decision rule” (Kroll et al., 2016: 638).

Fairness

According to Lee “society evaluates fairness by looking at the procedures used to come to a certain decision and that algorithmic decision makers will have a higher perceived fairness due to algorithms following the same process or procedure every time while not being influenced by human emotions” (Lee, 2018: 4). And although in this context this idea is not wrong it misses the fact that algorithms themselves are produced with certain values in mind and that coders can have biases themselves that they mirror onto algorithms. To reiterate: algorithms are coded to attach certain values to specific information, to search for correlations between factors and this means that, even if algorithmic outcomes fall within accepted parameters, and the algorithm follows the same procedure every time, this does not guarantee ethically acceptable behavior (Dann & Haddow, 2008, Mittelstadt et al. 2016). Algorithms apply to the most difficult and complex tasks and at the same time they are applied to everyday tasks and practices as well. “They function within the social, cultural and political spheres which inescapably results in biases being enacted and although this bias can be beneficial or detrimental or both, similarly they could be intentional and unintentional” (Willson, 2017: 9) and even though bias is unintended it still means that fairness of that algorithm is not guaranteed.

Because an algorithm is programmed/coded with specific goals and values it can prioritize certain information in favor of others. “As a result of the algorithmic process, individuals might be treated differently than their peers—people similar to them in every relevant aspect—on the basis of irrelevant differences” (T. Zarsky, 2016: 127). Therefore, fairness is closely related to transparency: there is a lack of understanding why autonomous algorithms have come to certain conclusions or predictions and there is usually no human counterpart to explain these processes to the people affected. An algorithm does what it is programmed to do and does it well, but it does nothing more. Many concerns surrounding the concept of fairness also relate to the opaque nature of algorithms where “affected persons find ADM arbitrary due to lack of transparency and their ability to question the outcome and understand the process is also affected due to the opaque nature of algorithms” (T. Zarsky, 2016: 129). This arbitrary nature of algorithms can be seen when for example two similar people apply for a loan, but the algorithm only gives out the loan to one of

(18)

17

them. The criteria within the algorithm have decided that one person is better suited for a loan because some indicators within the algorithm have produced this outcome. “These criteria essentially embed a set of choices and value propositions that determine what gets pushed to the top of the ranking” (Diakopoulos, 2014: 5). But what these criteria are, how they are weighted and if they even matter towards getting a loan is unclear from an outsider’s point of view which gives credit to the idea that certain bias or discrimination could have occurred, especially if the ‘inner workings’ of this algorithm that has produced this outcome are not disclosed or discussed with people affected by their decisions.

(19)

18

The call for and use of transparency as a concept to control or check (parts of) the ethical nature of has already been discussed in the literature review. Zarsky acknowledges that calls for transparency with regards to (ADMS) are ineffective and proposes to divide the process of algorithmic decision making into three segments. He argues that every segment has different challenges and therefore understanding these challenges is important in order to understand the relative tension between transparency in relation to ADMS (Zarsky, 2013: 1521). The figure below illustrates these three segments although this theoretical section only focusses on A, B and C.

Figure 2 Zarsky Model

Transparency stages

Transparency is relevant in all three steps of the predictive modeling process, or what in this research is called the ADM. The first segment that Zarsky analyses focuses on (A) the collection and aggregation of datasets in which transparency needs to provide information regarding the kinds and forms of data and databases used in the analysis (Zarsky, 2013: 1523). This also means that human decisions made in this stage need to be transparent as well because human intervention or bias can impact the usefulness of an algorithm. Zarsky uses the example that humans must decide in what similar records in different datasets are matched into one source (T. Z. Zarsky, 2013) meaning that if this isn’t done properly it can lead to exclusion of certain data (i.e. individuals). Finally Zarsky also highlights the importance of clear protocols relating to the role of humans in data collection, presenting the actual data used (the input or big data) by algorithms and lastly the

(20)

19

creation and updating of law and enforcement of elements not currently present in data law (Zarsky, 2013: 1524).

The next segment is (B) data analysis which incorporates human aspects again but also technical aspects. Transparency here is proposed by releasing the source code of programs used. Human analysts should also be able to establish the level of support algorithms have in delivering results, relating to “how frequent or obscure the uncovered pattern can be within the database so as to be further considered” (Zarsky, 2013: 1525). As mentioned earlier correlation produced by algorithms is no reason for causation. This means that analysts have to establish sufficient (1) ‘support’, meaning what level of support would be acceptable for the outcome produced by algorithms, and they also need (2) ‘confidence’ which refers to the accuracy of the produced output by an algorithm meaning that there needs to be agreed upon levels of correlation before you can act on the produced output of an algorithm (Zarsky, 2013: 1525).

Transparency with regards to actual (C) usage of models refers to disclosure of the strategies and practices for using data; they are the predictive models, or ADM, formulated through data-mining to produce “profiles” which technology companies use to target certain individuals or events (Zarsky, 2013: 1526). There is a tendency of reluctant government transparency and a “disparity between the language of regulations and its actual implementation (Zarsky, 2013: 1526-1527). Information related to ADM should be conveyed to the public in understandable ways in which Zarsky even proposes that all “relevant processes should be interpretable” for the broad public leading to new regulatory paradigms that should explain these processes to the public (Zarsky, 2013: 1528), something which this research aims to investigate with regards to the GDPR.

Mapping the debate

The second ethical theory is the one proposed by Mittelstadt et al. which describes six types of ethical concerns (see figure 3). In this article they follow the definition of an algorithm by Hill as a “a mathematical construct with ‘‘a finite, abstract, effective, compound control structure, imperatively given, accomplishing a given purpose under given provisions” (Hill, 2015: 47) although they do not limit themselves to this mathematical definition stating that in public discourse algorithms are usually referred to by their particular implementations on individuals, not just their underlying mathematical code.

(21)

20

1. Inconclusive evidence

Inconclusive evidence here means that the output of algorithms (conclusions) it produces are probable but not certain knowledge. Statistical methods like ADM can produce (highly) significant correlations but as mentioned earlier correlation does not mean causal connection, and neither does it change here. Algorithms are often used in settings where alternative techniques are not available or too costly but this does not mean algorithms are infallible (Mittelstadt, Allo, Taddeo, Wachter, & Floridi, 2016:4). The risk of being wrong should be considered when dealing with algorithms which overlaps a bit with the idea of Zarsky that sufficient support and confidence need to be present.

2. Inscrutable evidence

This ethical concern relates to the fact that there needs to be evidence between the data used (input) and the conclusion or decision made by an algorithm (ADM) while at the same time the decision should be accessible, intelligible and open to critique. The problems with this are that the data being used by algorithms can be flawed but more importantly it is usually unknown how certain parts of the algorithms many data points contribute to the overall conclusion (output), which values have had the biggest impact, and this creates practical limitations in the use of ADM if not checked properly (Mittelstadt et al., 2016: 4)

3. Misguided evidence

Algorithms, like all data-processing constructs, are limited in the sense that their output (a conclusion/decision) can never exceed the input (data being used). This relates back to the transparency and consent issues raised earlier in the literature review in which it was stated that “if society contains inequality, exclusion or other traces of discrimination, so too will the data” (Goodman & Flaxman, 2017). Mittelstadt et al. use the informal example of ‘garbage in, garbage out’, in which conclusion can only be as reliable or neutral as the data they are based on (Mittelstadt et al., 2016: 5).

4. Unfair outcomes

Unfair outcomes relate to the concept of fairness discussed in the literature review. The first three concerns relate to the quality of evidence and the accountability of algorithms whereas unfair

(22)

21

outcomes look at the outcome (prediction/conclusion) of an algorithm in relation to ethical criteria such as: is there a bias or are the results discriminatory. “An action, produced by an algorithm, can be found discriminatory solely from its effect on a protected class of people, even if made on the basis of conclusive, scrutable and well-founded evidence” (Mittelstadt et al., 2016: 4).

5. Transformative effects

Transformative effects state that ethical challenges cannot always be retraced to epistemic or ethical failure (point 1-4) because autonomous decision-making algorithms (ADM) can be questionable but appear ethically neutral because there is no obvious harm detected (Mittelstadt et al., 2016). Algorithms can therefore transform the way we conceptualize and apply value to the outcomes and insights produced by these algorithms. This is because “value-lade decisions made by algorithms can nudge the behavior of data subjects and human decision-makers by filtering information” (Ananny, 2016;Mittelstadt et al., 2016: 9)

6. Traceability

Traceability refers to complexity of assigning blame when algorithmic decisions cause harm, what is the cause of harm; is it a bias in the code, a problem with the quality of data used to train a machine learning algorithm. Algorithmic complexity also makes it difficult to hold people or entities accountable. Ethical assessment requires both cause and responsibility for the harm to be traced if any or all of these concerns are identified (Mittelstadt et al., 2016: 5)

(23)

22 Ethical concerns / three main concepts Accountability Fairness Transparency

A – Collection and aggregation of datasets x X

B – Data analysis x X C – Usage of models x X 1 – Inconclusive evidence X 2 – Inscrutable evidence X 3 – Misguided evidence X 4 – Unfair outcomes X 5 – Transformative effects X 6 - Traceability X

Table 1: Organizing ethical concerns from Zarsky and Mittelstadt in one of the three main

concepts. (Main focus point of ethical concerns marked with X; secondary focus marked with x is of less(er) importance.

Organizing and interaction of the two theories

In order to make the theories by Zarsky and Mittelstadt more applicable for the analysis later on the ethical concerns raised in both theories have been organized in such a way that they align with the three central concepts of accountability, fairness and transparency. By doing this it will create a narrower focus when analyzing the data, but it will also help with categorizing ethical issues for one of the three main concepts. Therefore, justification is needed for why certain concerns are categorized the way they are. It speaks for itself that the three points raised in the transparency theory by Zarsky are categorized under transparency seeing as that is the main concern of the entire theory. It can be argued, as seen in the table, that the collection and aggregation of datasets also involves some form of accountability because Zarsky also includes that human decisions made in the algorithmic process should also be transparent while at the same time “access should be provided to the working protocols analysts use” (T. Z. Zarsky, 2013: 1524) which can be seen as a form of justification of the methods used therefore it (A) can also be seen as a measure for accountability. The focus however is on transparency which can be seen by the X and a secondary focus can be assigned to accountability. For data analysis (B) transparency should be realized by “disclosing the names of software/source code used (technical side) while also establishing sufficient support and confidence (human side)” (T. Z. Zarsky, 2013: 1524-1525). Support and

(24)

23

certain parameters in which the algorithms operate therefore, data analysis can also be seen as (semi) accountable. The usage of models (C) requires full transparency according to Zarsky in which all relevant processes and outcomes produced by algorithms should be accessible and interpretable for the general public (T. Z. Zarsky, 2013: 1528). With the actual usage of predictive modeling Zarsky also states that the actors using algorithms should also be required “to assure that the prediction scheme does not involve the use of factors that are considered off limits because they are discriminatory and unethical” (T. Z. Zarsky, 2013: 1528) which are general considerations with regards to the idea/concept of fairness. Overall the theory of Zarsky, as was expected, accounts for three different types of concerns and solutions surrounding transparency (as can be seen in table 1).

Inscrutable evidence, transformative effects and traceability are all considered in this research to fall under the concept of accountability. Inscrutable evidence relies on a clear connection between the data used by an algorithm, the input, and the conclusion it produces (output). The fact that inscrutable evidence should require justifying both input and output makes it fall under the concept of accountability. Transformative effects mean that “algorithms can affect how we conceptualize the world, and modify its social and political organization” (Floridi, Fresco, & Primiero, 2015; Mittelstadt et al., 2016) therefore even if algorithms act within agreed upon parameters and do not seem to malfunction or cause any ‘obvious harm’ this does not mean they are perfect and their outcomes are flawless. People who produce the algorithms, deliver and check the data on which it operates still need to be accountable for the decision ADMS makes. Lastly traceability involves tracing the cause and responsibility for any harm that has occurred through the use of algorithms (Mittelstadt et al., 2016: 5) which corresponds with the used definition of accountability in justifying the actions made by algorithms.

Inconclusive evidence, misguided evidence and unfair outcomes are all considered to be aspects of fairness in this research. Inconclusive evidence relates to the idea of fairness because it doesn’t presume probable connection based on algorithmic output is certain knowledge while also, again, reiterating that correlation doesn’t mean causation. The idea of fairness as the absence of bias and discrimination should be respected in that evidence can be probable but this does not exempt the evidence from being closely evaluated to ensure fair procedures are being followed. Misguided evidence relates more clearly to fairness in that inequality will be apparent if input data (big data)

(25)

24

is not checked properly; meaning that the ‘garbage in, garbage out’ principle stands because the output of algorithms can never exceed the input data and if that data has biases/discrimination then so will the output. Lastly unfair outcomes clearly relate to fairness because even though an algorithm works like it is designed it can still affect certain protected classes of people or affect people that are underrepresented within the data in a disproportionate manner.

(26)

25

The literature review and theoretical framework highlight the importance of ethical concerns regarding the collecting of data and the following analytical use of that data through algorithms to eventually produce conclusions or predictions. The focus within this research will be on the GDPR but also acknowledges the importance of the Directive 95/46 as the foundation on which the GDPR is build, therefore this is something that will be discussed and analyzed as well. The literature review and theoretical framework serve as the arena of information in which European decision-making regarding ADM must prove its worth. Many of these reasons have already been discussed in the previous chapters. The research question is a ‘why’ question as we want to know the reasons the EU has that makes them want to influence ADM. How they wish to influence ADM, mainly through regulations of course, but through what regulations and how would these regulations hold up based on the available literature. A ‘why’ or ‘how’ question results in an explanatory research (Yin, 2003) and that is exactly the aim of this research: to explain why the EU has sought to influence ADM through the GDPR and if their influence been successful in limiting, reducing or minimizing ethical risks that are attached to the use of algorithms.

Sub questions

This research uses ethical concerns as the main reason for EU influence through regulations like the GDPR, but it also acknowledges that ethical concerns are not the only reason for EU action towards data protection. There are also economic reasons as the use of big data and algorithms have shown that more calculations and observations with increased accuracy and capabilities promote the idea of economic efficiency: more valuable data is gathered, used and analyzed with less human interactions resulting in lower costs (Crawford et al., 2014: 1666). Therefore, when analyzing the data for this research it will assume two sub-questions:

1) Did the EU influence ADM for ethical reasons or also other reasons like economic incentives?

2) When EU data shows that their influence was aimed towards limiting ethical concerns which of the two theories used is best suited to explain EU influence?

(27)

26

Question 1 serves the purpose of inclusivity which means that this research will look at the bigger historical perspective of data protection within the EU in order to see where, when and how the EU has sought to influence ADM because of ethical concerns. It is to be expected that there are other reasons for EU influence, but ethics remain the focus of this question which will be answered at the end of the fifth chapter. Sub-question 2 aims to clarify which of the two ethical theories used is best suited to explain EU influence but acknowledges the fact that it can be a bit of both or even neither of them. The answer to this question will be in the analysis (chapter 6). Both questions help with answering the research question because they further expand and elaborate on the notion that EU has influenced ADM through different means.

Methodology

The research design follows a single case-study design, more specifically a within-case study design. The reason for this is simple: the EU as an economic, monetary and societal supranational organizations is unique in the world. There are no other organizations or institutions to compare it with. For example, if we were to compare European influence on ADMS with the situation in the United States many more factors that could contribute to the outcome can be influential. Different laws, different views, different private-public relationships to consider which would hinder the research. It is important to note that within-case methods are less useful in the development of generalizations about their findings. “Neither across-case nor within-case approaches alone enable the researcher to interpret an experience both through its parts and as a whole, such that readers can recognize individual experience in a generalizable way” (Ayres, Kavanaugh, & Knafl, 2003: 873). On the other hand, if done correctly the single-case study can represent significant contribution to knowledge and theory building in a way that such a study can even help to refocus future investigations in an entire field. It can represent a critical test of significant theory/theories. (Yin, 2003:40-41). Seeing as this is a unique case this could be relevant.

Validity and reliability

With regards to construct validity this research places the EU influence towards ADMS in a longer historical context. Therefore, the basis of data protection law in Europe which was the Data Protection Directive 95/46 from 1995 will be analyzed as well in order to form more comprehensive conclusions when addressing certain regulatory changes relating to ethical concerns. The analysis section in the next chapter will start with placing the GDPR in this longer

(28)

27

historical context of EU data protection. The internal validity of this research will be guaranteed as much as possible by the introduction of sub-question 1 which acknowledges that other factors have or could been a reason for EU influencing, or increasing influence, on data protection through the GDPR. At the same time the focus of this research remains on ethical concerns but will, when relevant or necessary, make comments and arguments relating to other sources that increased or could increase EU influence like economic issues. The external validity of a within-case study research like this does remain quite low as is the problem with many case studies and this one is no exception because, as mentioned earlier, the EU as a supranational, regulatory and economic organization is unique and generalizing findings here towards other ‘similar’ organizations will be difficult. Nevertheless, generalizability can be somewhat obtained if the findings result in implementations for several EU institutions but as of now this is unclear. Reliability can be obtained by using the same documents (DPD, RTBF, GDPR) and the same ethical theories but this research also acknowledges that human bias is hard to minimize and therefore limitations could be that certain aspects of articles/documents that are examined and compared with the theories are maybe overlooked or their influence underestimated. The goal is of course to be as thorough and meticulous as possible when examining these documents in order to minimize researcher bias or errors and therefore maximizing the reliability of this research.

Case selection

As mentioned in the introduction the main reason for looking at EU influence on algorithms and ADM is that the field of AI is an interesting, ‘relatively’ new field that is rapidly changing. Therefore, the task of regulating AI, ADM and algorithms by providing fundamental rights and protection for data subjects against unethical or unmoral behavior is a difficult one. In order to better understand EU data protection initiatives and their attempts at regulating ADM this research has adopted an approach in which the longer historical process of data protection will be analyzed first in order to see what has changed and how far the EU has come. This historical chapter will mainly look at the ethical concerns and how the EU has tried to address them, but this chapter will also be of a descriptive nature. The analysis chapter will be purely analytical by analyzing the newest form of EU data protection, the GDPR, with the help of the theoretical framework. Table 2 shows the documents that will be used in the next chapters. The final document will be analyzed because it is one of the more recent advisory reports written for EU institutions and therefore could

(29)

28

provide this research with interesting information regarding the future of data protection and ADM regulation by EU institutions.

Name of document Date of publishing EU institutions involved

Data Protection Directive (‘DPD’) 95/46 EC

Published: 23-11-1995 In effect since: 13-12-1995

European Parliament, European Commission, Council

Right to be forgotten (RTBF) Case of Google vs. Mario Costeja González

13-5-2014

Court of Justice of the European Union (CJEU)

General Data Protection Regulation (‘GDPR’)

Published: 4-5-2016 In effect since: 25-5-2018

European Parliament, European Commission, Council

High Level Expert Group on A.I. (HLEGAI)

Non-binding document 8-4-2019

The AI HLEG is an

independent expert group that was set up by the European Commission in June 20188. (Table 2: presentation of the documents that will be used in this research)

8_{The High Level Expert Group on AI has made Ethical Guidelines for a trustworthy AI.}

(30)

29

In order to evaluate current European legislative work like the GDPR with regards to the ethical and moral issues introduced within the literature review it is important to understand how European data protection has evolved throughout the years. Although the European Union started to get involved in the regulation of data processing in the early 1970s (González Fuster, 2014: 111) it took until 1995 for the first real legislative regulation concerned with data protection to be in place. Up until the introduction of the GDPR in 2018 “the Directive 95/46 has been the main EU data protection instrument which incorporates the regulatory model and the guiding principles that characterize the EU data protection approach” (De Hert & Papakonstantinou, 2016: 131). And even though the focus of this research and the research question is mainly aimed towards the GDPR and ethics it is important to see the historical progression of data protection in the EU and the roots it is based on. Even the change from a Directive to a Regulation is a revolutionary change (Albrecht, 2017: 287) and to see why we need to first discuss the history of data protection in the EU before the analysis and empirical research can take place. This chapter analyses the progression from the Data Protection Directive 95/46 up until the General Data Protection Regulation with the goal to answer sub question 1: Did the EU influence ADMS for ethical reasons or also other reasons like economic incentives? The analysis that follows in the next chapter will be aimed towards answering sub question 2 and therefore focusses more on the research question. Nevertheless, this historical chapter is necessary in order to grant insights into why the EU sought to influence data protection as a whole and how much of this influence was geared towards limiting ethical concerns.

Historical overview of European data protection: Before the Directive

As mentioned above the earliest involvement of the EU regarding data protection came in the 1970s but it took until 1995 for real legally binding rules to be in effect. The Data Protection Directive (‘the Directive’) is influenced by seven principles outlined by the Organization for Economic Cooperation and Development (OECD), more specifically the OECD’s Recommendations of the Council Concerning Guidelines Governing the Protection of Privacy and

(31)

30

Trans-Border Flows of Personal Data from 1980 (Lord, 2018). The seven principles were non-binding at first but when the European Commission realized that “data protection changed on where you were located within the EU” and that “data flows were being hindered by different privacy laws within the EU they adopted these guidelines into the Directive” (Lord, 2018). The seven principles were:

- Notice – individuals should be notified when their personal data is collected

- Purpose – use of personal data should be limited to the express purpose for which it was collected - Consent – individual consent should be required before personal data is shared with other parties - Security – collected data should be secured against abuse or compromise

- Disclosure – data collectors should inform individuals when their personal data is being collected - Access – individuals should have the ability to access their personal data and correct any

inaccuracies

- Accountability – individuals should have a means to hold data collectors accountable to the previous six principles (Lord, 2018)

These were considered the building blocks upon which the Directive was built. Many problems in the available literature today prescribe problems with almost all these principles. A large part of the literature review and theoretical framework is based upon the fact principles like accountability, disclosure, access and purpose are not present in the amounts they should be therefore allowing for ethical concerns regarding data protection and the use of algorithms to exist. Data protection was a new legal field that relied on privacy law in the last three decades of the 20th century. Although countries like Sweden and Germany enacted data protection in the 1970s it was important, as can be seen from the title of the OECD guidelines, that data protection and data flows needed to be addressed on an international level instead of a national level because this would lead “to economic and social development according to the OECD” (Birnhack, 2008: 6). This was needed in order to follow the technological advances made in the field of data use and data flow. Besides these particular OECD guidelines there were also other initiatives that aimed towards more control and protection over data (subjects) within Europe “like UN guidelines Concerning Computerized Data Files, Fair Information Practices (FIP), and “in 1985 the Council of Europe promulgated a convention For the Protection of Individuals with Regard to Automatic Processing of Personal Data” (Cate, 1994: 431-432) but all of these initiative lacked any ‘teeth’; they were non-binding, ‘soft law’ (Birnhack, 2008: 6-7). And although these initiatives did not result in

(32)

31

uniformity of data protection law, they did set the stage for further protection that would lead to the creation and eventual implementation of the Directive 95/46.

The Data Protection Directive 95/46

As seen by the previous section the initiatives towards more data protection had good intentions but lacked legally binding measures. The idea of the Directive “was to create a uniform market at the European level for personal data ensuring a high level of protection for data subjects” (Poullet, 2006: 207). The Directive contains provisions aimed at “data quality, special categories of processing, the rights of data subjects, confidentiality, security, liability and sanctions, codes of conduct and supervisory authorities” (Robinson, Graux, Botterman, & Valeri, 2009: 7). Furthermore, the Directive did not only serve to protect data (subjects). From an EU point of view the economic idea of facilitating free trade was also important although this free trade was aimed at the internal EU market rather than globally. The Directive combined “the right to privacy (recitals 2, 9-11, 68, art. 1(1)), alongside economic and social progress and trade expansion (recitals 2, 56) and the free flow of personal data” (art. 1(2)) (Bainbridge, 1997: 18; Birnhack, 2008: 8). The right to privacy (article 1) is considered a fundamental human right as stated in the article as the aim to “protect the fundamental rights and freedom of natural persons, and in particular the right to privacy with respect to the processing of personal data” while the overarching “central concept of the Directive is concerned with the ‘processing of personal data’ which differs from the previous attempts that were more “focused on the recording of data” (Elgesem, 1999: 284).

The Directive does however have a problematic trait in common with previous attempts at data protection within the EU which is the fact that although legally binding the definition of a “directive” is that “it sets out goals (the goals are related to increased data protection) that all EU countries must achieve but every individual country may devise their own laws on how to reach these goals”9_{. This has created a lack of uniformity between Member States regarding their} adaptation of data protection law as prescribed in the Directive. Some countries have made only minor modifications, “whereas others have deeply modified the structure, added new definitions or principles or sometimes adopted sectoral or specific legislation” (Poullet, 2006: 207) which not

9_{Information gathered from the official European Union website, at:}

(33)

32

only made comparing the effectiveness of different EU Member States difficult it also meant that similar cases in different countries could possibly have processes or even outcomes.

With regards to the use of automated algorithmic decision making the Directive does mention this, for the first time, albeit only on a few occasions. The original text of the Data Protection Directive 95/46 does not mention algorithms once and the word ‘automated’ is only encountered six times. The relevant article here is Article 15 “Automated Individual Decisions” in which the first paragraph states that:

“Member States shall grant the right to every person not to be subject to a decision which produces legal effects concerning him or significantly affects him and which is based solely on automated processing of data intended to evaluate certain personal aspects relating to him, such

as his performance at work , creditworthiness, reliability, conduct, etc.”10

This article within the Directive is rather unique because instead of focusing on data processing it has a singular focus on a decision being made that affects data subjects. Furthermore, article 15 is the only provision that directly addresses the concept of automated profiling; a concept which was previously mentioned by Zarsky’s transparency theory in which data mining is used by technology companies to create a ‘profile’ of a certain data subject. Article 15 provides restrictions based solely on profiling in which profiling is considered “the process of inferring a set of characteristics (typically behavioral) about an individual person or collective entity and then treating that person/entity (or other persons/entities) in the light of these characteristics” (Bygrave, 2001: 17). Article 15 can therefore be considered as the first legal base for ethical concerns within European data protection history and a reason for EU influence with regards to ethical concerns.

However, article 15 is not without limitations or faults which greatly diminishes the value of the article. The article requires that four conditions must be met in order make use of the right that is listed in the article. These four conditions are:

- A decision must be made;

- The decision concerned must have legal or other significant effects on the person whom the decision targets;

10_{Article 15 (1) from the official EUR-LEX text of the Data Protection Directive 95/46, at:}