Using or being used by algorithms : the ethical concerns to be aware of

(1)

Using or Being Used by Algorithms –

The Ethical Concerns to be Aware Of.

“Any tool can be used for good or bad. It's really the ethics of the artist using it” John Knoll; American Artist

Thesis MBA Big Data & Business Analytics

by: Marlies Veltheer;

Student number: 11165111 Contact: marvelaa@gmail.com Supervisor: Marc Salomon

(2)

Abstract

Over the last years a great deal has being written and said about ethical concerns in algorithmic decision-making processes. It can be overwhelming to find your way to get an overview of current ethical concerns and issues. One can even find (mostly US) websites giving overviews of relevant articles and suggestions to background readings. In this thesis I will present and elaborate on the prevalent perspectives given in current literature on the dominant ethical concerns and issues in algorithms.

I aim to present the concerns, ideas and guidelines such that every reader will be able to reflect on his own activities or organization. Understanding the limits, ethical threats and opportunities of algorithmic processes will provide opportunities for corporations and individuals to engage in more meaningful dialogues about what we really want and value, as corporations, individuals and society (Van Lier, 2016).

I will introduce briefly both the concepts ethics and algorithm. One cannot discuss ethical concerns in algorithms in-depth without having a sound understanding of their concept, specifically the concept used in the context of this thesis.

To find the ethical aspects of the algorithmic processes, I will link the concerns and issues to the Cross-Industry Standard Process for Data Mining (CRISP-DM)(Larose, 2004). CRISP-DM gives the six phases of a standard data mining process. The purpose of the linking to the CRISP-DM steps, is to create insights in where in the algorithmic decision-making process ethical concerns may arise. It will become clear that in any step of the process ethical concern can materialize.

The insights may help to identify opportunities to mitigate the concerns, and propose adequate solutions. It will become clear that currently there is no uniformity on how to identify ethical concerns. Organizations will leave this open to either personal, departmental or projects to interpret and judge. Finally I will present some instruments companies can use to identify potential ethical concerns early in the creation phase. These techniques are useful but should only be seen as a step towards a more intelligible society.

(3)

Table of Content

Introduction ... 4

I. Scoping and Key Definitions ... 7

A. Ethics ... 8

B. Algorithm ... 10

C. Machine Learning / AI ... 13

II. Methodology ... 14

A. The CRSIP-DM ... 15

III. Perspectives on most reported concerns ... 17

A. Privacy and data protection ... 19

B. Autonomy ... 21

C. Digital Divides, Disparate treatment ... 23

D. Professionalism ... 26

E. Bias in data set... 31

IV. Linking the ethical concerns to the CRISP-DM Model... 33

A. Business/Research Understanding Phase ... 33

B. Data Understanding Phase ... 35

C. Data Preparation Phase ... 36

D. Modeling Phase ... 36

E. Evaluation Phase ... 37

F. Deployment Phase ... 38

V. Business application ... 39

A. How organizations deal with ethical concerns. ... 40

B. Instruments and Techniques to Identify Ethical Concerns ... 40

Conclusion ... 42

(4)

Introduction

It is nearly impossible to not being exposed to algorithmic decision-making processes, these are practically everywhere. Organizations collect all kinds of data and use it optimizing the algorithmic processes, built to predict when to offer products and services to a specific person. Algorithms compile our search results browsing the internet. With algorithms

organizations try to find out what products or services users need next. When they are ready to buy and what they are willing to pay. All actions individuals do online are recorded. And there are surveillance cameras and sensor networks, recording peoples whereabouts and the pills they take. Data brokers and cookies recording what books users read and sites they visit. Algorithms target citizens for police scrutiny, decide on granting or denying

immigration visas, help the Tax departments select which taxpayers to audit(Kroll et al.,

2017). Important decisions that used to be made by people, are now made by algorithms.

Without the databases algorithms work on, though, algorithms are most meaningless machines (Gillespie, 2014). Both data and algorithms act together in processes. But do we know to which organizations what data is available, what they use it for, how the algorithmic decision processes are constructed and work and for how long they use and keep the data?

Algorithms are incredibly useful tools, that can optimize nearly everything, from making things easier, to conquering chaos and saving lives. But an algorithmic decision-making process can also have negative side effects, it can perpetuate bias, limit choices and autonomy, stimulate disparate treatment and create filter bubbles.

It is this not knowing and feeling in the dark about eventually real and crucial consequences

that gets more frequent attention in both science and media.1_{Apparently one can teach a}

soulless machine intellect and intuition, but what about ethics (Aharouay, 2017)? When

governments and companies collect data, these don’t just collect data, but collect

interpretations of behavior (Februari, 2017). And when one translates behavior into data,

this can’t be done free of values. The professionals writing the algorithmic code and

generating the models applied, do this including their moral values and their biases in every step of the development. Sometimes algorithms that are created with good intentions, lead to unintended wrongful consequences when applied in daily live.

(5)

In 2016 Stahl and others (Stahl, Timmermans, & Mittelstadt, 2016) published, what they called, the first systematic and comprehensive review of the literature on the ethics of computing. As they wanted to understand the subject area from the perspective of the computer scientists and members of related communities, they explored the relevant literature available to these communities. They reviewed publications published between 2003 and 2012, and finally included 599 papers. The dominant ethical issue they found was by far the issue concerning privacy (including data protection). Followed by

Professionalism/work-related issues, Autonomy, Agency and Trust. Organized in five major categories, they found that over the research period the type of ethical issues discussed was generally consistent.

At the start of 2017 Pew Research Center published “Code-Dependent: Pros and Cons of the Algorithm Age” (Rainie & Anderson, 2017). The researchers conducted a large-scale

canvassing of technology experts, corporate practitioners, scholars and government leaders, to elucidate the current views on potential impact of algorithms. The experts were asked the question: “Will the net overall effect of algorithms be positive for individuals and society or negative for individuals and society?” (p.5). Looking at the reported concerns, these, although presented differently and not uniquely focusing on ethics, are quite similar to the ethical issues Stahl and others found as mentioned above.

So both papers are published recently, are built on extensive research, and the mentioned concerns about the algorithmic era show quite some similarities. Following this last remark combined with the findings alluded above, one could deduce that not only in the research period 2003-2012, but also since 2012, the type of ethical issues and concerns discussed has not changed substantially, but remained relatively static. Building on these two papers, I will elaborate on the dominant ethical concerns in algorithms and present the perspectives given in literature on these ethical concerns and issues.

I aim to present the concerns, ideas and guidelines such that every reader will be able to

reflect on his2_{own activities or organization, whether he is currently active in designing and}

operating algorithmic decision-making processes, active in the field of risk or reputation management, policy making or leadership or he is the actual user of the algorithmic

(6)

decision-making processes. Understanding the limits, ethical threats and opportunities of algorithmic processes will provide opportunities for corporations and individuals to engage in more meaningful dialogues about what we really want and value, as corporations, individuals and society (Van Lier, 2016).

Not surprisingly, new technology will create new ethical issues. Things that weren’t possible in the past, are now, such as biomedical questions, intellectual property of digital content or even just big data. Also in the computer applications of new technologies new ethical issues can arise. For a better understanding of the presented perspectives on the dominant ethical concerns and issues, I will first introduce briefly both the concepts ethics and algorithm. One cannot discuss ethical concerns in algorithms in-depth without having a sound

understanding of their concept, specifically the concept used in the context of this thesis.

Since algorithms increasingly rely on learning capacities, I will also briefly introduce the concept of machine learning. Machine learning is “defined by the capacity to define or modify decision-making rules autonomously” (Mittelstadt, Allo, Taddeo, Wachter, & Floridi, 2016, p. 3), and due to this degree of autonomy it can be challenging for humans to predict how inputs will be handled or understand how decisions were made.

To find the ethical dimensions in algorithmic development processes, I will link the concerns and issues to the Cross-Industry Standard Process for Data Mining (CRISP-DM)(Larose, 2004). CRISP-DM gives the six phases of a standard data mining process. It will be interesting to see how the ethics concerns and issues fit the process. The purpose of the linking to the CRISP-DM steps, is to create insights in where in the data mining process ethical concerns may arise. These insights may help to identify opportunities to mitigate the behavior leading to the concerns, and propose adequate solutions. And will give a general applicable overview which any company or professional may use to its own benefit.

Finally I will give some insights into how companies deal with identification of potential ethical concerns and present some interesting instrument and techniques that early in the developing process can help companies to alleviate the potential concerns arising. With well-educated staff on the potential of ethical concerns arising in algorithmic decision-making processes, companies can arrange for checks and balances within the company and organize meaningful dialogues about what the company really wants and values.

(7)

I. Scoping and key definitions

It is nearly impossible to not being exposed to algorithms and automated decision-making processes, these are practically everywhere. Important decisions that used to be made by people, are now made by algorithms, such as how data should be interpreted and the

actions that should be taken as a result (Mittelstadt et al., 2016). Understanding the nature

(what it is) and the effects (how they work) of algorithms is important to fully grasp their power, influence and consequences. Not fully understanding what an algorithm is and does, could lead “to midjudge their power, to overemphasize their importance, to misconceive of the algorithm as a lone detached actor, or to miss how power might actually be deployed through such (new) technologies” (Beer, 2017, p. 3).

Not surprisingly, new technology will create new ethical issues. Things that weren’t possible in the past, are now possible, think of biomedical questions, intellectual property of digital content or even just big data. Also in the computer applications of new technologies new ethical issues can arise. Ethics has a major impact on the degree of acceptance of new technologies, as well as upon legislative and other responses (Stahl et al., 2016). In line with (Wakunuma & Stahl, 2014) I believe that one can only make normative recommendations, once one understands ethics and moral positions. A sound understanding of ethics and of the concept of algorithms is key when discussing and reviewing the ethics of algorithms or potential ethical concerns and issues with regards to algorithms. For a better understanding of the presented perspectives on the dominant ethical concerns and issues, I will introduce briefly both the concepts ethics and algorithm. One cannot discuss ethical concerns with algorithms in-depth without having a sound understanding of their concept, specifically the concept used in the context of this thesis

Since algorithms increasingly rely on learning capacities, I will also briefly introduce the concept of machine learning. Machine learning is “defined by the capacity to define or modify decision-making rules autonomously” (Mittelstadt et al., 2016, p. 3), and due to this degree of autonomy it can be challenging for humans to predict how inputs will be handled or understand how the algorithm came to the decisions.

(8)

A. Ethics

To grasp the concept of ethics is not something every reader will be familiar with nor to grasp the relevance of ethical aspects of computing. One might feel that discussing ethics will bring deep philosophical discussions, detached from real and practical relevance. (Wakunuma & Stahl, 2014) found that computing professionals most of all are interested in the job they are doing, and less in the ethical issues that might surface. Others should deal with those. (Stahl et al., 2016) even wonder whether technology-oriented communities are aware of, and engage with ethics discourses. Asked (Wakunuma & Stahl, 2014) how the professionals identify ethical issues the answers ranged from knowing intuitively what issues are ethical concerns, to some answering training had enabled them to identify ethical issues, and others that the identification of ethical issues was only required for specific projects run by the company. To cultivate one’s ethical sensitivity, one should start caring about

something one didn’t care about before, or, with regards to something one does care about, start realizing the full implications (Volkman, 2015). Caring involves inquiry-based teaching and mentoring, it cannot be internalized by content-based legislating and enforcement. This teaching and mentoring will strengthen one’s ability to identify moral values, when analyzing an issue.

According to (Maxim, 2014, p. 554) “Ethics is a systematic reflection on the principles and moral values as well as on real life and the spiritual practice of individuals and the

community, in relation to Good and Evil”. More basically ethics is “doing the right thing” (Baase, 1997, p. 333). It is theory about morals, with ethical rules to follow, when interacting with other people. Ethical rules that are intended to reach good results for people in general, and for situations in general. These rules should elucidate our obligations and responsibility. And still acknowledge that each human has his own set of values and goals, has judgment and will. All decisions and actions, whether in personal life or business, are taken by individual people (Baase, 1997). In the ethical theory a distinction is made between

descriptive and normative ethics (Baase, 1997). Just like the word says, descriptive ethics is about describing what ethical rules a particular society, or person, or company has adopted; ‘how people behave’. Normative ethics, on the other end, tries to formulate moral standards for right and wrong behavior; ‘what we should do’. (Stahl et al., 2016) refers to ethics of computing as a component of applied ethics, which is concerned with the analysis of

(9)

particular moral and ethics issues in particular areas in society. Some examples are bioethics, business ethics, environmental ethics, clinical ethics, technology ethics.

Giving some action the ethical quality good or bad, or right or wrong, is usually based on explicit norms and values, accepted within a social group or culture. When norms and values clash, ethical judgment is required. This involves explicit reflection on the bases and

assumptions that lie underneath the values and norms. Humans have the ability to use reason and logic. These capabilities are used by humans in their daily interactions with others and are important in decision making. These capabilities help humans making sense of the world around us and achieving the desired results from a problem of issue (Kizza, 2013). When making decisions humans tend to rely on a variety of biases and heuristics. When searching for a solution, for many real-life problems there are systematic structures and rules to follow, take for example a mathematical problem. Unlike these real-life

problems, ethical problems cannot be solved by following rules (Kizza, 2013). When ethical or moral reasoning is required, ethical principles are integrated in the reasoning process. (Kizza, 2013, p. 36) defines the ethical decision making as “the process of making a decision which may result in one or more moral conflicts”. This process involves thinking through options and alternatives, making value judgements (how things ought to be or not, what is good or bad, desirable or undesirable) and weighing the alternatives against all others, drawing conclusions and finally making a decision, assisting the decision maker with a safe or valid alternative. As a result of the process either a solution to the ethical problem is

reached, or at least, the understanding of the ethical problem at hand is deepened, which may lead to a resolution at some future date. (Stahl et al., 2016) give the example of ethical questions in big data. Large data sets can be used for all kind of applications that promise mutual benefits for public and company. Where at same time, the application may raise questions concerning privacy and ownership, which may lead to negative benefits especially for public, whose data is used. Obviously there can be striking contradictory interests and values for company versus public. The identification of the ethical issues in such a case will be a challenge. Striking a balance, subsequently, might be even difficult (Stahl et al., 2016). Ethical dilemmas occur in situations where the choice what is the right decision is not instantly clear. To get to the right or, at least, optimal course of action an ethical impact

(10)

assessment might be required, engaging probably different stakeholders (Wright et al., 2014).

The key component is the ethical dilemma and what is perceived as ethically problematic by public and in discourses. Technological innovations tend to create tempting situations and new possibilities that we haven’t encountered before and may fall outside our current set of moral principles. These may raise new ethical concerns and moral dilemmas that we can’t solve with societies basic sets of moral values (Kizza, 2013; Wright et al., 2014). This makes implementation of technological innovation into society a complex and unpredictable process.

I briefly introduced the ethics concept here, with the purpose to create a better understanding when the dominant ethical concerns and issues are presented. When presenting the dominant ethical concerns and issues I will not “apply” substantive ethical principles to judge these concerns. As alluded above, I will show and clarify what is at stake in various concerns and issues, from various angles and perspectives, found in available literature.

B. Algorithm

Algorithms are used for a wide array of applications and for a wide range of problems. They are important to our technological age and essential to computer work. Some algorithms can have a huge impact on human lives. They perform tasks autonomously we never expected they could without a human in charge. What is more, the same algorithm can serve multiple purposes. For example, a selecting algorithm used for selecting films, can also be useful in cancer research (Jaume-Palasí & Spielkamp, 2017). So understanding the nature (what it is) and the effects (how they work) of algorithms is important to fully grasp their power, influence and consequences. One cannot discuss ethical concerns with algorithms in-depth without having a sound understanding of the concept, specifically the concept used in the context of this thesis.

In daily speech algorithm is often used as synonym for code, software, computers, computer programs or machine. Most users tend to treat algorithms as unproblematic tools, that one uses for solving the problem or finding the answer (Gillespie, 2014). There are many

(11)

implement, the thing that gets data processing and other computation done” (Hill, 2016, p. 35). Basically algorithms are sets of instructions to perform a task, transforming input data into a desired output (Doneda & Almeida, 2016; Gillespie, 2014). Hill defined algorithm as “a finite, abstract [so no space time locus], effective [so requiring no judgement, learning,

insight or understanding], compound control [so from one state to another] structure [so organized; steps under partial order], imperatively given [so a how-to; gives directions or orders], accomplishing a given purpose under given provisions” (Hill, 2016, p. 47). The

important subject here is the instructions (the ordering an action or issuing a command), not the execution. And Hill adds that something is not an algorithm when the set of instructions can said to be followed well of badly. An algorithm can’t be just right or wrong in their application, but rather appropriate or inappropriate, or relevant or misguided. Algorithms have two characteristics: 1. They do exactly what you ask them to do; and 2. They don’t explain their outcomes, so they are black boxes (Luca, Kleinberg, & Mullainathan, 2016). So when an algorithm acts unexpectedly, this can be due to human miscoding or others errors. According to (Kitchin, 2017) an algorithm has two components: logic and control. Logic is the problem domain-specific component and specifies what is to be done. Where control is the problem-solving component and describes the steps by which it should be solved.

Algorithms are not static, but are regularly being updated and ‘tweaked’, with major updates happening less frequently (Gillespie, 2014). Their designers can easily, instantly, radically and also invisibly change an algorithm. And an algorithm can be part of another algorithm, so that in practice we might have to deal with many algorithms when referring to an algorithm.

Algorithms are not neutral, for companies they need to create value and capital and nudge user behavior in a preferred ways or sort and classify people. The way they work is not impassive (Kitchin, 2017). They take all kinds of decisions. According to (Kitchin, 2017), algorithms search, collate, group, match, sort, profile, analyze, model, simulate, categorize, visualize and regulate, both people, processes and places. Latzer, on the other hand, looked at internet-based applications and distinguished nine different purposes into a functional typology of algorithmic selection applications ([Latzer (2015)] in Saurwein, Just, & Latzer, 2015). The types being: search, aggregation, observation/ surveillance, prognosis/forecast, filtering, recommendation, scoring, content production and allocation. (Diakopoulos, 2016)

(12)

prioritization (rank information or results, based on pre-defined criteria), classification (categorize information into classes, based on features), association (creating relationships between entities) and filtering (including or excluding information based on various rules or criteria. In (Lepri, Staiano, Sangokoya, Letouzé, & Oliver, 2016) the authors adapted the ideas of Diakopoulos and Latzer into the table I below.

Table I. Source: (Lepri et al., 2016). Algorithmic functions and examples.

In line with (Gillespie, 2014; Sandvig, Hamilton, Karahalios, & Langbort, 2016), I will assume for this thesis, as do computer scientists, that an algorithm exits independently and can be analyzed as a method separately from the particular computer system upon which it may be implemented and the data its works on. Choosing an algorithm is indeed making an explicit choice. This will enable me, later on, to find the ethical issues raised by the modeling of the algorithm itself. The modeling of algorithms requires expertise, judgment, choice and constraints. Thus I will follow Hill’s (2016) formal definition of an algorithm, as described above. An algorithm is meaningless without the data it works on. To take action and have effects, an algorithm must also be implemented and executed. Actually, the model is meaningless without an use case. Discussing and judging the ethics of an algorithm just looking at its composition is likewise meaningless (Kitchin, 2017; Sandvig et al., 2016), without taking into account the particular inputs and potential outputs, consequences, the data it is applied on and how the algorithm is implemented and executed in software, programs and information systems. That algorithms learns from historical cases and learns

(13)

from the system’s operators, is likewise important in judging the ethics of the system (Sandvig et al., 2016).

For this thesis, the algorithms in scope are not defined into great detail. Referring to algorithms, I consider algorithms that can be operated via digital computers, that can interact with or impact directly the individual user, and so have the ability to effect large numbers of individuals, while making socially consequential decisions. For example, and not limited to, the types and examples of algorithms mentioned in table I. No having in mind, for example, are industrial algorithms, optimizing industrial processes, such as optimizing maintenance work or performing product quality control.

C. Machine Learning / AI

Machine learning is “any methodology and set of techniques that can employ data to come up with novel patterns and knowledge, and generate models that can be used for effective predictions about the data” (Cited [Van Otterloo, 2013], in (Mittelstadt et al., 2016, p. 3)). In his blog (Hardt, 2014) describes a machine learning algorithm as “any algorithm that takes historical instances (so-called training data) of a decision problem as input and produces a decision rule or classifier that is then used on future instances of the problem”. Machine learning algorithms are good a predicting things. They are designed to pick up statistical patterns from historical data cases and can inform decisions that hinge on a prediction. As long as the thing that needs to be predicted is clear and measurable (Kleinberg, Ludwig, & Mullainathan, 2016). Machine learning can define or modify decision-making rules

autonomously. The model learns and updates either via hand labelled example inputs to find a general rule that maps inputs and outputs (supervised learning), via making sense of and finding structure in the sets of data inputs itself (unsupervised learning) or via reinforcement learning where the model interacts with a dynamic environment and is provided rewards and punishment feedbacks while navigating the problem. The learning first involves existing data sets that are prepared as training data. This training data behaves in a certain way and impacts the lessons the models happens to learn, which may lead to a discriminatory model. According to (Barocas & Selbst, 2016) this can come from two different things: first, when in the example training data prejudice has played some role, the model may reproduce this prejudice. Second, when the example training data is a biased sample of the population, any

(14)

decision that rests on inferences drawn from this sample, “may systematically disadvantage those who are under- or overrepresented in the data set” (p. 681).

To validate the model, next, a test set is prepared with data the model hasn’t seen set. During the deployment the learning and updating of the algorithm and model further involves new live data streams. One can imagine that if the training and/or test data set for example, reflect existing social biases against a disadvantaged groups in the input, the algorithm is likely to incorporate these biases in the output. This will depend on the

correlation between the target variable and the input feature. When there is no correlation, including the biased input feature as input will not change the output, and thus no biased effect. When a model has a large number of features as input, it will determine the extent to which this existing social bias against disadvantaged group feature is relevant to the target variable and will decide whether or not this information feature is an input (Barocas & Selbst, 2016).

The complexity for humans to understand the inner workings of an algorithm has increased with the growing use of machine learning techniques. Since a machine learning algorithm learns from the data, it can rearrange and morph itself, to a point where humans can hardly understand or explain the inner workings of this algorithm and how it arrived at its decision any longer (Doneda & Almeida, 2016; Van Lier, 2016).

II. Methodology

Without the data sets algorithms work on, algorithms are most meaningless machines (Gillespie, 2014). Data and algorithms act together in processes. It is the algorithm that gives the data a purpose and direction. “Critical decisions are made not on the basis of the data per se, but on the basis of data analyzed algorithmically: that is, in calculations coded in software” (Pasquale, 2016, p. 21). And by adding “failing clear understanding of the

algorithms involved – and the right to challenge unfair ones – disclosure of underlying data will do little to secure reputational justice (p. 22), Pasquale demonstrates clearly the

interdependency and association between algorithms and (big) data. So “an algorithm is only as good as the data it works with” (Barocas & Selbst, 2016, p. 671).

Data analytics is defined by (Mittelstadt et al., 2016, p. 3) “as the practice of using algorithms to make sense of streams of data”. Analytics illuminates how algorithms can challenge

(15)

human decision making, even for processes and decisions previously made by humans. To perform the task, algorithms determine which features in the data are relevant to a given target variable. The number of features considered by the algorithm can run easily into tens of thousands (Mittelstadt et al., 2016). So the type of task previously done by humans, say like a task of classifying an insurance customer to a risk class, is replaced by an algorithm doing same type of task, but with qualitatively different decision-making logic.

Discussing and judging the ethics of an algorithm by just looking at its composition is meaningless, without taking into account its context. I propose to use the CRIP-DM process framework to map the prevalent ethical issues and concerns about algorithms. CRISP-DM distinguishes the six phases of the life cycle of a standard data mining (data analytics) process. The CRISP-DM is widespread accepted methodology. Organizations might in

practice not use CRISP-DM for their data mining activities, but use some other methodology. The steps, though, to go through doing a data mining project are the similar. Some models or organizations might aggregate 2 steps into 1, such as data understanding & preparation, or modeling & evaluation, the activities still need to be done. The use of CRIP-DM is

intended as a prescriptive and practical framework, and not a theoretical framework. It will be interesting to see how the ethical concerns and issues about algorithms fit the phases of the data mining process. The goal of the linking of the concerns to the CRISP-DM steps is to create insights in where in the data mining process ethical concerns may arise. These

insights may help to identify opportunities to mitigate the behavior leading to the concerns, and propose adequate solutions.

A. The CRSIP-DM

The Cross-Industry Standard Process for Data Mining (CRISP-DM)(Larose, 2004) was

developed by some analysts from Daimler-Chrysler, SPSS, and NCR, is freely available to use. CRISP-DM distinguishes the six phases of the life cycle of a standard data mining (data analytics) process. Larose describes data mining as the process during which useful trends and pattern are discovered in large data sets. So both data sets and the modelling and deployment of algorithms are integral part of a typical data mining process. The CRISP-DM puts the relationship between data and the algorithms (the modelling) neatly in context.

(16)

Figure 2. CRISP-DM process, (Larose, 2004))

As is illustrated in figure 2 by the arrows there are dependencies between the different phases. A next phase often depends on the outcomes of the previous phase. Take for example the modelling phase, depending on the way the model behaves and its

characteristics, before moving to next phase, one might have to return to a previous phase, like data preparation phase to bring to data into line with specific model requirements. Humans involvement is essential in every phase of a data mining process, from the design to

the ongoing functioning (Diakopoulos & Koliska, 2017).Even different functions might play a

role, as a task at understanding the business objective and formulating these into a data mining process, one needs specific capabilities, different from the person developing the model and choosing or building the (data mining) algorithms. Essentially modeling an

algorithm is building simplified version of the world, normally using data and probably some process that ranks, classifies or predicts. Key to any model is the input scheme, the features to be used by the algorithm. So different steps, strong interdependencies between steps, active involvement of humans in each step, it is easy to grasp that a data mining process can be done badly very easy and errors can be made in each step. Errors that can have

disproportionally disparate impact on disadvantaged groups in society. These errors can be created either “by specifying the problem to be solved in ways that affect classes differently, failing to recognize or address statistical biases, reproducing past prejudice, or considering

(17)

an insufficiently rich set of factors” (Barocas & Selbst, 2016, p. 675). Even unintentionally the model could pick out proxy features for disadvantaged groups. Barocas and Selbst

additionally note that designers and data scientists have the ability to obfuscate intentional discrimination as accidental.

III. Perspectives on most reported concerns

Looking at the effects of algorithms, several experts in the (Rainie & Anderson, 2017)

canvassing mentioned “the many positive impacts that aren’t even noticed” (p.31), they will make life easier – “better shopping experiences, better medical experience, even better experiences with government agencies” (p.32). But most experts also point out some dark sides and threats. As one expert representatively wrote: “The overall effect will be positive for some individuals. It will be negative for the poor and uneducated. As a result, the digital divide and wealth disparity will grow. It will be a net negative for society” (p.65). Like these experts several other researchers and scholars have identified and discussed ethical issues and concerns about algorithms. To name a number of the broad range of recent studies: (Wakunuma & Stahl, 2014) looked at ethical issues in the field of Information Systems; (Bozdag, 2013) analyzed biases present in filtering processes; (Mittelstadt et al., 2016) reviewed the current discussion of ethical aspects of algorithms; (Zuiderveen Borgesius et al., 2016) analyzed whether empirical evidence exist to support concerns about filter bubbles; (Dörr & Hollnbuchner, 2017) discussed the ethical challenges of algorithmic Journalism; (Jaume-Palasí & Spielkamp, 2017) introduced a taxonomy to discuss ethical criteria for technology; (Saurwein et al., 2015) took a risk view on algorithmic selection; (Ananny, 2016) examined ethical dimensions of networked information algorithms; (Citron & Pasquale, 2014) discussed algorithmic scoring and the variety of (ethical) problems it brings about; (Sula, 2016) discussed the challenges for research ethics introduced by big data; (Cantrell, Salido, & Van Hollebeke, 2016) discussed why the organizations should embrace data ethics; and (Sandvig et al., 2016) investigated how data scientist can determine whether an algorithm is unethical.

In 2016 (Stahl et al., 2016) published, what they called, the first systematic and comprehensive review of the literature on the ethics of computing. They reviewed publications published between 2003 and 2012, and finally included 599 papers in their

(18)

study. Organized in five main categories, they actually found that over the research period the type of ethical issues discussed, was generally consistent. As is shown in Figure 1 below.

Figure 1 Five main categories of ethical issues, development over time (Stahl et al., 2016)

At the start of 2017 Pew Research Center published “Code-Dependent: Pros and Cons of the Algorithm Age” (Rainie & Anderson, 2017). The researchers conducted a large-scale

canvassing of technology experts, corporate practitioners, scholars and government leaders, to shed light on the current views on potential impact of algorithms. The experts were asked the question: “Will the net overall effect of algorithms be positive for individuals and society or negative for individuals and society?” (p. 5). Nearly all experts indicated some negative aspects in algorithmic functions. The overall predictions were 38% - 37% versus 25%. The first predicting the positive aspects will outweigh the negatives. The second predicting the negative aspects will outweigh the positives, and the other 25% predicting 50%-50% on the positive and the negative aspects (p. 5). Looking at the reported concerns, although

presented differently, and not uniquely focusing on ethics, the reported concerns are quite similar to the ethical issues Stahl and others found in their literature survey. Combining the findings of the two studies, one may infer that the type of ethical issues and concerns discussed in research, has remained relatively static over the last 14 years. Building from these last two papers, I will elaborate on the dominant ethical concerns in algorithms. I will also include the perspectives given in other current literature on these top ethical concerns and issues. Assessing the available literature, often the ethical concerns and consequences are blended in literature. Based on what I found in literature combined with the findings in the two studies mentioned above, five ethical themes emerged. Below, I will discuss these themes more in-depth.

(19)

A. Privacy and data protection

It is not surprising that privacy is found in the (Stahl et al., 2016) survey to be the dominant ethical concern and concept discussed. Privacy can be seen as a moral right for individuals, often related to concepts as autonomy and freedom. With regards to computing

technologies two types of privacy can be distinguished: data privacy and personal privacy. Data privacy, obviously, concerns the data about a person himself, the control a person has over his personal information and informational representations, and the right to not have his data collected. Personal privacy includes forms of privacy other than data, such as “right to be left alone” and the division between public and private spaces. Personal privacy and data privacy are closely interrelated and in debates often discussed with focus at the right of the individual (Jaume-Palasí & Spielkamp, 2017). When looking at algorithmic decision-making processes, though, these make decisions based on a group characteristics. The two concepts (personal and data privacy) get blurry in the debates, and it is difficult to really separate the privacy-only concerns. In the (Rainie & Anderson, 2017) canvassing privacy did not exhibit as a separate concern. Several experts in the canvassing expressed the fear that humanity could be lost, when people put too much trust and faith in data and algorithmic decision-making processes.

One of the issues considered in the literature, is the digital format of data. This digital format makes it less likely that data will disappear over time. Even when users delete information, this data may still remain in data sets that were collected earlier (Sula, 2016). This timeless availability of digital data will lead to (a perceived) loss of control over their data for individuals. As more and more data is shared and multiplied digitally, more and more data will be included in algorithmic decision-making processes. It will be harder to protect privacy. When data from different data sources are combined, this will allow for new inferences about private information. Potentially inferences an individual would never disclose himself. For algorithms to function well, a lot of information about individuals is needed, which requires lot of information sharing with the marketplace. An expert in the (Rainie & Anderson, 2017, p. 53) canvassing expressed a big downside: “It may be that, in time, people are – in practical terms – unable to opt out of such marketplaces.” He fears that to receive services the old fashioned way, an individual might need to pay a premium.

(20)

“[T]here’s a danger that we sleepwalk into things without realizing what it has cost us” (p. 53).

Privacy is often discussed together with trust. Trust is a characteristic of the relationship between two or more parties (Stahl et al., 2016). There is a mutual understanding and confidence that the other party will appropriately use the algorithmic decision-making system and process data, in an agreed and acceptable manner. Trust is given, accepting the risk that this trust will be betrayed. When users trust data processors, this helps diminishing the concerns about the opaqueness of algorithmic decision-making (Mittelstadt et al., 2016). Even, users could perceive the algorithms as trustworthy, without even knowing who the data processors are or having placed any trust in them. While interacting in the online world, it is difficult to build trusting relationships. With every visit of a website new relationships are created, and users interact directly with the algorithmic decision-making process. Knowingly or unknowingly, users interact with, or are subject to curating algorithms.

Without any transparency, users simply must trust the algorithm to act properly, and not, for instance, skew results it delivers. Several expert in the (Rainie & Anderson, 2017) canvassing doubt that the companies creating and operating the algorithms, will act in the interest of the users, but will let their own interest be served first. As one expert expressed: “It is difficult to imagine that the algorithms will consider societal benefits when they are

produced by corporations focused on short-term fiscal outcomes” (p.44). So, though there is no such thing as complete privacy, since people will share information with others, there is a need for individuals, to be in control of when and in which context they are prepared to share information and when not.

Another privacy related concern are the online ‘free available’ data sets. The opportunity to ‘scrape’, or automatically collect, data from the web, without the web developer knowing it, or web user’s consent, is a concern and a challenge (Krotoski, 2012). Not all algorithms use personal data, some only use non personal data. Not all data collected or released on the web is controversial. But scraped data, that initially seem innocuous, may lead to cause harm, when integrated with other data sources (Booch, 2014). Unexpected personal information may be exposed or identifiable individual profiles might be generated.

Furthermore, the data might be analyzed outside the context and purpose it was collected

(21)

Pseudonymization, the process of removing personally identifiable information from the collected data, and anonymization, when all direct and indirect identifiers have been removed, have long been the holy grail for organizations, to operate data sets, while

mitigating risks and dealing with fewer legal requirements (Polonetsky, Tene, & Finch, 2016). Although advancements have been made in algorithmic anonymizing techniques, it is more and more feasible to undermine these techniques and infer identities from anonymized data (Lepri et al., 2016). (Sula, 2016, p. 19) warns that “what is anonymous today may become personally identifiable tomorrow based on integration with new data sets and the

introduction of new analytical methods”.

B. Autonomy

Autonomy “is the ability to construct one’s own goals and values, and to have the freedom to make one’s own decisions and perform actions based on these decisions” (Cited [Brey 2005] in (Stahl et al., 2016, p. 55:23)). In Western society autonomy is an important value for people’s self-determination, often strongly related with free choice and control. These aspects of free choice and control in decision-making, make that we bear responsibility for the consequences of our actions.

When discussed related to technology and algorithmic decision-making processes, the impact on the individual’s autonomy can be seen as both positive and negative (Stahl et al., 2016). When algorithms autonomously decide on which film one should see or what news content one is shown, that may feel as and lead to paternalism, loss of control, and

dependence. An individual might experience these algorithmic processes as having a negative influence on his autonomy. At the same time, algorithmic processes may increase independence and enable human enhancement. One might become a better doctor using knowledge offered by curating algorithmic processes, when making a diagnosis.

One may discuss autonomy from both the angle of technology, i.e. the capability of autonomous decision-making, and the individual, i.e. how it affects the individual’s

(perceived) autonomy. The complexity for humans to understand the inner workings of an algorithmic processes will increase with the growing use of machine learning techniques. Since a machine learning algorithm continues to learn from the (new) data, it can rearrange and morph itself. Due to this growing degree of autonomy and tendency towards

(22)

centralization of decision-making, it can be challenging for humans to predict how inputs will be handled or understand how decisions were made, or how they will impact them. The algorithms respond to inputs independently and may even stray from the guidelines and assumptions initially given by their designers. And no human can monitor or influence these changes, or determine whether these are legal and ethical any longer (Etzioni & Etzioni, 2016).

It is this opacity that may impact user’s perceived value of autonomy and freedom of choice negatively. It will also make it much harder for individuals to stay true to their own values, when using these self-learning algorithmic systems. In (Rainie & Anderson, 2017) a professor of computer science at George Mason University, gave a crystal clear example (p.75) of an individual driving a self-driven car, which car obviously has collision-avoidance and risk-mitigation algorithms. “Suppose a pedestrian crosses in front of your vehicle. The embedded algorithm may decide to hit the pedestrian as opposed to ramming the vehicle against a tree because the first choice may cause less harm to the vehicle occupants. How does an

individual decide if he or she is OK with the myriad decision rules embedded in algorithms that control your life and behavior without knowing what the algorithms will decide? This is a non-trivial problem because many current algorithms are based on machine learning techniques, and the rules they use are learned over time. Therefore, even if the source code of the embedded algorithms were made public, it is very unlikely that an individual would know the decisions that would be made at run time”. Upon deciding to drive a car, the individual can well enough assess this car, as to whether he will get to the destination safely and comfortably, without understanding the inner working of the car engine. Upon deciding to use a self-driving car, though, the individual can’t properly assess the ethical decisions rules embedded in the algorithms, nor can he test how these fit his own values and ethical decision rules. It is not just the self-driving car that will be behave as a black box to its user, most curation algorithms will act like black boxes. And it is not only his technical illiteracy, that prevents the user from understanding the details and that renders the algorithmic process opaque. This effect may also come from the scale of the algorithmic application, where either due to the machine learning capabilities in the algorithms, or to the number of designers involved in designing the process, the algorithms render opaque even to the designers (Etzioni & Etzioni, 2016).

(23)

With regards to algorithmic processes dedicated to individuals, (Jaume-Palasí & Spielkamp, 2017) paper distinguished “self-directed services” (such as fitness trackers, games or music apps) from “services that relate to individuals but are used by third parties” (e.g. scoring or support systems used to make a decision on granting a visa) (p.13). Using self-directed services, individual users have a certain degree of control over the services. Many of self-directed services, though these services could be sensitive, one can do without. Thus one can use an alternative or decline the use of the services. For these services, users have a freedom of choice, which will impact their level of autonomy positively. On the contrary, services that relate to individuals but are used by third parties, are rarely services one can choose freely to use. These kind of services are both used in public and private sector, and will be obligatory to use for example applying for a visa, social benefits or a loan. So no freedom of choice for users, which will impact their level of autonomy negatively.

The loss of autonomy directly resonates to concerns of dependency and control (Wakunuma & Stahl, 2014). The loss of autonomy and freedom of choice will lead to dependency on technology. “Such dependency may be a catalyst for control because the technology is then able to anticipate and make decisions on behalf of the user” (p.390). And the users will passively follow.

C. Digital divides, Disparate treatment

When automated decision-making processes step in and start making the decisions about an individual’s life, using algorithms and eventually artificial intelligence, this raises ethical concerns like discrimination, exclusion of disadvantaged groups, societal divisions, social stigmatizing, and the narrowing of choice. The lack to participate in social interaction or economic activities, enabled by computing technologies, can be an significant ethical concern (Stahl et al., 2016). Not having ready access to computers and internet may affect the individual and collective life chances. Causes of digital divide can be found in types of inequality, brought about by socioeconomic status, age or education.

The (Rainie & Anderson, 2017) canvassing found two connected lines of thinking with

regards to societal divisions. First, that the algorithmic processes will widen the gap between the ones having the capabilities of dealing with these technologies, and those that do not

(24)

the social and political divisions, since algorithms will “encourage people to live in echo chambers of repeated and reinforced media and political content” (p. 63).

When data is used predictively to assist decision-making, probably sorting and/or selecting algorithms are involved, which results in deciding on winners and losers. If the data mining is not done carefully this can result in disproportionately disparate effects for disadvantaged groups in society, in ways that look quite similar to discrimination (Barocas & Selbst, 2016). The winners will succeed further, following past successes. The disadvantaged will suffer from cascading disadvantages (Pasquale, 2016). The professionals, the model designers, have to convert a (sometimes amorphous) business problem into a data mining problem definition, expressed in formal language that computers can parse. Next they have to define the outcomes of interest, i.e. the target variable, which specification of what one is looking for is frequently not obvious (Barocas & Selbst, 2016). Especially when defining the target variable means creating classes, this process includes arbitrary decisions since the

professional must define classes into measurable outcomes. Next the existing examples in the data sets need to be labeled with a class, and one needs to exercise judgment to decide which of the available labels fits best to the particular case. This labeling can be controversial with different choices of class having different impact on disadvantaged groups. According to (Barocas & Selbst, 2016) subjective labeling of examples will skew the resulting findings, and will influence any decision taken from these findings. Future cases as continuous inputs for the machine learning algorithm, will be characterized in the same way.

Basically algorithms do not allow for individual perspective, they are impersonal and based on large data sets and generalized assumptions. Specifically predictive modeling is based on statistical analysis. They are looking correlations in the data and know that the observed correlation between two variables, doesn’t necessarily mean that there is a causal relationship between those two variables. Making decisions on the results, as though the found correlation equals causation, will lead to wrongful or unfair decisions, possibly harming the individual user. Actually algorithms are not interested in the individual, but rather in the statistical person. Algorithms rely on data sets, that are based on non-random samples, since the data sets don’t include data of the whole population. So there is a gap between the individual user and the statistical person. As one of the expert in the (Rainie & Anderson, 2017, p. 60) canvassing wrote: “Real life does not always mimic mathematics.

(25)

Algorithms have a limited number of variables, and often life shows that it needs to factor in extra variables” and “A person may be otherwise a good person in society, but they may be judged for factors over which they do not have any control.” Algorithms produce

probabilistic results (Rainie & Anderson, 2017). Depending on the purpose of the algorithmic process, this might be inappropriate when accountable results are expected, as in

automated processes selecting jurisprudence.

Once the data is in the public or semi-public sphere, it will be hard to recapture the data. Even data that has been deleted by the person himself, might already have been included in one or more data sets, and still will be used in the algorithmic decision making processes. So individuals must be careful, and there is a real threat that any mistakes made, leading to evidence online, will have consequences for a (very) long period. An expert in the (Rainie & Anderson, 2017) canvassing called to mind the difficulties stemming from automated algorithmic credit scoring in the US. The credit scoring models were, in addition, used for purposes far removed from loan decisions, such as employment decisions. Private

information was leaked, intentionally or by negligence, to organizations that did not have the best of their customers at heart. For a customer it was very difficult and time-consuming to correct any data on file or in the data set. If faulty data cannot be corrected, the

algorithmic decision-making will lead to wrongful outcomes, potentially stigmatizing

individuals, by segmenting them into unfit buckets, where they can’t escape from. Certainly this is not a consequence of the algorithmic decision making process in itself, but follows from the choices and decisions humans involved make. As (Barocas & Selbst, 2016, p. 692) point out “any form of discrimination that happens unintentionally can also be orchestrated intentionally”. Such as only taking into account variables that will result in higher rates of incorrect calculations for individuals of disadvantaged groups. In acting this way the decision makers will find their justification to write off complete groups of individuals, not by

engaging in open discrimination but by masking these efforts. Obviously this behavior will harm both the individual user of the disadvantaged group and society overall.

Another concern is the power imbalance between the professionals deploying the curating algorithms and the individuals using them. When companies have full access to the users’ data and transactions they might even know an individual better than he knows himself.

(26)

Algorithms are often opaque to the users, which might be due to a lack of technological literacy or as a consequence of seamless design (Eslami et al., 2016). Seamless design makes the process effortless for the user, but at same time less transparent. It is doubtful that the users can fully assess and anticipate the implications of using the curating algorithms. Even so doubtful, when asked to sign a consent form, before using an algorithmic service,

regardless of the language used in the form. It is complicated for a user to grasp the idea and consequences that apparently innocuous data, combined for lots of people, may create new sensitive facts and that companies will act upon those new facts.

D. Professionalism

The (Stahl et al., 2016) survey found professionalism as a key concern. They also raised the question whether computing can be seen as a profession. They stated responsibility towards the public, with attention for ethics, as a typical defining feature of profession. So they regarded professionalism an ethical issue “insofar as it has material consequences for the way that ethics of computing is dealt with in the real world” (p. 55:23). For my thesis I will cover professionalism from angle of both the (professional) actors involved in the design and operation of algorithmic processes and the actual (non-professional) users of the algorithmic processes.

Algorithmic decision-making processes are often not designed in isolation by just one individual professional, but various actors are involved in the process from idea, via developing, building to applying the decision-making systems (Jaume-Palasí & Spielkamp, 2017). Involved in different stages are the business, who comes up with new business proposition or innovative idea. The designer (or code writer) translating the business

requirements into building the algorithm. The data scientist working on the data and training the algorithm on specific data only. Both the designer and the data scientist work on further (and continuous) development of the algorithm during lifespan. And next there are the actual users, towards whom the algorithm is directed and adapted. And all these actors have an influence on the algorithmic process and bring their own biases. Consistent with this perspective, the (Rainie & Anderson, 2017) canvassing found among experts a line of thinking with regards to biases, that algorithms designers have their own perspectives and normative values and take these along while building their algorithms. Even when they strive for neutrality, objectivity and inclusion. Since all above named actors have an influence, they

(27)

all bear some part of the responsibility for these algorithmic decision-making processes. What about the responsibility, when an algorithm has flaws or makes wrong decisions, that are harmful to individual users or society at large?

First concern: the professionals are the designers of the algorithms, as they are writing the code and the data scientist, as they are working on the data and training the algorithms. One of their important roles is to avoid harm to the users and society caused by their algorithms. When working on the development of algorithmic processes they bring their own normative values and biases, coming from their cultural background and socialization processes

(Jaume-Palasí & Spielkamp, 2017; Mittelstadt et al., 2016; Sandvig, Hamilton, Karahalios, & Langbort, 2014). Quite some experts in the (Rainie & Anderson, 2017) canvassing mentioned the limited diversity among designers and data scientists. I’ll give two examples: “The

algorithms will be primarily designed by white and Asian men – with data selected by these same privileged actors – for the benefits of consumer like them” (executive director at the MIT Teaching System Lab. p.12). “Built-in biases (largely in favor of those born to privilege such as Western Caucasian males, and, to a lesser extent, young south-Asian and east-Asian men) will have profound, largely unintended negative consequences to the detriment of everybody else: women, especially single parents, people of color (any shade of brown or black), the ‘olds’ over 50, immigrants, Muslims, non-English speakers, etc.” (anonymous expert, p.57). As algorithms reflect normative values created by human designers around race or gender or other social justice related areas, these values will actually be

institutionalized into the code (Citron & Pasquale, 2014; Mittelstadt et al., 2016). One should realize that an algorithm does exactly what the designer has programmed it to do. So if an algorithm is programmed with inequality, and the algorithm is used to decide on the likely outcome for a person, it will probably reinforce inequality. This could lead to controversial ethical issues like individuals categorized in a non-fitting category, being denied

unnecessarily a loan or flight or ads for interesting jobs. As one expert in the (Rainie & Anderson, 2017, p. 59) canvassing stated: “Algorithms value efficiency over correctness or fairness, and over time their evolution will continue the same priorities that initially formulated them”. So this is a controversial issue.

(28)

algorithm. In their choice of how to build and deploy the algorithmic process, the designers and data scientist will get an unfair position of power. The algorithm will be capable of influencing and moderating the behavior and decisions the individual user will make based on the output, without the user even knowing that their choices are manipulated. A user might well assume that the implicit ethical assumptions made by the designer are in line with his own ethical norms and values, where the contrary might be true (Kraemer, van Overveld, & Peterson, 2011). A user will then base his normative choices on wrong ethical assumptions. This is not too big of an issue when the stakes are low, like with a film recommendation. When the stakes are high, though, like in case of predictive policing or credit scoring, more care is needed. Especially since the incentives of the designers are commercially rooted as company employee, and not necessarily aligned with the interests of the individual user or society. When the market pressure to outdo the competitors is high, and companies transfer this pressure to their professionals, and when there is a possibility to dupe the user, there will be an incentive to use that possibility (Rainie & Anderson, 2017; Volkman, 2015). (Wakunuma & Stahl, 2014) showed in their study that computing

professionals realize the relevance and importance of ethical issues, but mainly understand ethical issues that are widely discussed in discourse and are well-established in their daily practice. They also found that these professionals are foremost interested in the job and problems at hand and not as much in the ethical concerns the job might bring. They prefer resolution of acute problems above deeper ethical root cause analysis of the problem (Pasquale, 2016; Wakunuma & Stahl, 2014). One might question whether this behavior will lead to responsible innovations and responsible algorithmic processes. As a solution

(Kraemer et al., 2011) are of the opinion that the ethical decisions should be left as much to the users. The users should be able to set the ethical preferences. When designers are forced to make controversial ethical assumptions when building the algorithm, at least these assumptions should be transparent and identifiable by the user.

Some of the experts in the (Rainie & Anderson, 2017) canvassing have overall a more

positive view. An associate professor at Carnegie Mellon University predicts that algorithmic process will be net positive for individuals and society. He predicts that in most situations not just a person or algorithm will make the judgement, but that expert and algorithm will be combined to form a decision. “We have several thousand years of human history showing

(29)

the severe limitations of human judgement. Data-driven approaches based on careful analysis and thought design can only improve the situation” (p.35).

That brings me to second concern: who will be responsible when decisions are wrong and harmful. Is there a responsibility for the actors designing, developing and operating the algorithmic decision-making processes? Or for the users of the algorithms? Or can

responsibility be attributed to the algorithmic decision-making process itself? (Jaume-Palasí & Spielkamp, 2017) present that intentions will control and motivate certain behavior, which will lead to actions. And the person showing this behavior and actions bears responsibility to control his intentions. As alluded above all actors involved in the algorithmic process have an influence, so they all bear some part of the responsibility for these algorithmic decision-making processes. The opportunity to control or influence a certain step in the process, makes that, at least at moral level, responsibility can be attributed, even if the person is not in control of the overall design and development. This is in line with our current society where the one that will be held to account is the ultimate decision-maker, and there are liability insurances to mitigate the risk. For example a driver of a vehicle, the manager deciding on business proposals, a doctor making a diagnosis. There is no solution yet when algorithmic decision-making processes decide autonomously. To what extent will the (passive) passenger enjoying de ride in an autonomous driven vehicle, be responsible when the vehicle collides with another (autonomously driven) vehicle or a pedestrian. An expert in the (Rainie & Anderson, 2017, p. 78) canvassing stresses the importance of legal reform. “The legal concepts around product liability closely define the accountabilities of failure or loss of our tools and consumable products. However, once tools enter the realm of decision-making, we will need to update our societal norms (and thus laws) accordingly. Until we come to a societal consensus, we may inhibit the deployment of these new technologies, and suffer from them inadvertently”. That there is no clarity yet is an ethical concern.

Third: As alluded above the professionals involved in the design and development of

algorithmic decision-making processes, have a (powerful) influence on behaviors and actions of users of the algorithms. Obviously his power can be misused, and the scandals repeatedly occurring show this misuse in practice. One can question whether the real issue isn’t missed, the professionals aren’t held to account. And still treat their innovations as technological

(30)

fast, these professionals should also be thinking about the ethical implications – current and in the future of their actions and innovations (Fung, 2015).

To minimize bias, professionals should look beyond the features of the system. They should envision not only a algorithmic process intended situation and context of use, but also

reasonably anticipate probable contexts of use. Professionals should take into account divers context of use when designing the algorithmic process (Bozdag, 2013).

Fourth: In the (Stahl et all. 2016) survey education-related issues were named in the top 20. The specific educational concerns were broadly addressed in the available literature. More training and basic education for current and future professionals and users is regularly presented as essential need for society. These decision-making models and data science are becoming more ubiquitous, and individuals and society at large should improve their

knowledge of and critical thinking on how algorithms and models work, why they can be a threat, and how they may shape social change and opportunity. The complexity for humans to understand the inner workings of an algorithm will increase with the growing use of machine learning (and more advanced) techniques. Since a machine learning algorithm learns from the data, it can rearrange and morph itself, to a point where humans can hardly understand or explain the inner workings of this algorithm and how it arrived at its decision any longer (Doneda & Almeida, 2016; Van Lier, 2016). As a professor and director at the University of South Carolina School if Library an Information Science predicts: “That said, unless there is an increased effort to make true information literacy a part of basic education, there will be a class of people who can use algorithms and a class used by algorithms” (Rainie & Anderson, 2017, p. 75).

It is not only his technical illiteracy, where the user’s comprehension falls short to

understand the complexity and function of the algorithms, and that renders the algorithmic process opaque. This effect may also come from the scale of the algorithmic application, where either due to the machine learning capabilities in the algorithms, or to the number of designers involved in designing the process, the algorithms render opaque even to the designers (Etzioni & Etzioni, 2016).

(31)

E. Bias in data set

Bias is not an ethical concern specifically named in the (Stahl et al., 2016) survey. Bias is not a result in itself, but will influence the algorithmic decision-making process, which results subsequently may cause harm to individuals. With regards to biases, the line of thinking the (Rainie & Anderson, 2017) canvassing found, concerned the data sets themselves, which have their own limits and deficiencies. Algorithms depend upon data that is often limited, not representative, deficient, incorrect or biased, and those data sets are regarded a serious problem. Data is the key driver in the algorithmic decision-making processes, and by that is as valuable as the algorithms themselves. Different ethical concerns with regards to data and data sets are found.

First of: the data sets do not include data inputs of everyone or even a representative sample (Gillespie, 2014). Part of the population might be not included at all, depending on the origin of the data set. Certain services or products might only be used by parts of the population. When data sets are based on a non-random sample, the outcomes can’t be projected on the population other than the individuals in the sample.

Second: algorithms run on digital data, but not all information is digitally available, thus the non-digital data will be absent for the algorithms to run on (Bozdag, 2013). Even not all digital information will be available to the algorithmic services. Internet services might not index all data on the internet, which leads to coverage bias. Internet search services might feel different pages are to similar, or not relevant to the users, or not in the interest of the users, when the pages have a bad reputation (Bozdag, 2013). There might be technical reasons, blocking the possibility for these services to crawl the data.

Third: data sources, like internet services, might not want their data to be included or indexed, for various reasons. Data scientists have an influence on the source selection and can choose what data (elements) to include and what not to use. So they can leave data sources out on purpose, which will influence the outcomes of the decision-making process.

Fourth: Although data sets may contain thousands or more data elements, these will not reflect the fullness of people’s lives and experiences, their needs, hopes, wants and desires. Nor will these fully reflect the complexity of the problem that the data scientist is trying to