Bias in Artificial Intelligence

(1)

1

TOWARDS UNBIASED

ARTIFICIAL INTELLIGENCE:

LITERARY REVIEW OF

DEBIASING TECHNIQUES

Smulders, C.O., Ghebreab, S.

Abstract

Historical bias has been feeding into the algorithmic bias inherent in artificial intelligence systems. When considering negative social bias, this process becomes indirectly discriminatory and leads to faulty artificial

intelligence systems. This problem has only recently been highlighted as a possible way in which bias can propagate and entrench itself. The current research attempt to work toward a debiasing solution which can

be used to correct for negative bias in artificial intelligence. A literary analysis was done on technical debiasing solutions, and actors which effect their implementation. A recommendation for the creation of a

debiasing open source platform is made, in which several technical proposed solutions should be implemented. This allows for a low-cost way for companies to use debiasing tools in their application of artificial intelligence systems. Furthermore, development would be sped up, taking proposed solutions out of

the highly restricted academic field and into the world of external application. A final note is made on the feasibility of elimination of bias in artificial intelligence, as well as society at large

1: Introduction

“Artificial intelligence presents a cultural shift as much as a technical one. This is similar to technological inflection points of the past, such as the introduction of the printing press or the railways” (Nonnecke, Prabhakar, Brown, & Crittenden, 2017). The invention of the printing press was the cause of much social progress: the spread of intellectual property, and the increase in social mobility being just two facets of this breakthrough. It allowed for a shake-up of the vestiges of absolute power derived from medieval society, and reduced age-old economic and social biases (Dittmar, 2011). Conversely, the invention of the steam engine and the industrial revolution that followed, was the cause of great social division: economic domination of the many by the few, the formation of perennial economical groups, and arguably the invention of modern

(2)

2 slave labor. While this distinction in not clear

cut, it serves as an easy reminder of the possibilities and the danger that a massive paradigm shift in the form of artificial intelligence could bring about (figure 1). The introduction of artificial intelligences into the commercial market (Rockwood, 2018), political systems (Accenture, 2017), legal systems (Furman, Holdren, Munoz, Smith, & Zients, 2016; Nonnecke et al., 2017), warfare (DARPA, 2016), and many other fields is indicative of its future widespread usage. Indeed, sources from

both commercial (Coval, 2018; Gershgorn, 2018; Nadella, 2018) as well as political institutions (Daley, 2018; Goodman & Flaxman, 2016; STEM-C, 2018) highlight the need for fast and successful implementation of artificial intelligence systems. At a recent keynote address at the World Economic Forum of 2017, the growing AI economic sector, as well as the market efficiency achievable through implementation of AI systems in other sectors, was highlighted as the most important economic change for the coming decades (Bossmann, 2017). Similarly, the European Union (Goodman & Flaxman, 2016), United States (Furman et al., 2016), Great Britain (Daley, 2018), as well as representatives of the G20 (Accenture, 2017), have all

mentioned that the use of artificial intelligence and big data in decision-making as key issues to be legislated in the coming decades.

The large-scale realization that society is on the brink of an industrial revolution level paradigmatic shift is partly due to the widely discussed book: ”Weapons of Math Destruction” by Cathy O’Neil (O’Neil, 2016). In her book, O’Neil outlines the proliferation of artificial intelligence in many sectors of society, a process that has ramped up significantly after the economic crash of 2008. While Artificial Intelligence is often marketed as efficient, effective, and unbiased, O’Neil criticizes the use of artificial intelligence as markedly unfair, explaining that it propagates existing forms of bias, and drives increasing economic inequality. While she highlights specific cases, such as the racist bias in recidivism models and the economic bias of credit scoring algorithms, the books message revolves around the larger danger that artificial intelligence poses for ingraining existing inequalities. The main message propagated by O’Neil is ‘garbage in, garbage out’, later hyperbolized to: ‘racism in, racism out’. If algorithms are made without regards to the previously existing biased social structures (e.g. financial segregation of minority groups in the US), then those biases will be propagated and enforced. Examples and an explanation of how this phenomenon occurs, both

mechanistically and practically, will be expanded upon in the section “Bias in AI”.

In the two years following the publishing of “Weapons of Math Destruction” a host of literature, both scientific and newsprint, has been published on this subject (O’Neil, 2017). Many of these articles focus on identifying other segments of society where artificial intelligence is currently causing biased classification (Buranyi, 2017; Knight, 2017), while others focus on potential solutions to the problem (Howard &

Figure 1: model of resource and appropriation theory adopted from van Dijk (2012). Information and Computer Technologies (ICTs) have technological aspects (characteristics), and

relational aspects (e.g. participation which affect expanded inequality (i.e. bias). The current research will focus mainly on the characteristics of artificial intelligence (one possible ICT).

(3)

3

Borenstein, 2017; Sherman, 2018). Unfortunately, as this field of literature is very young, no synchronicity exists amongst different proposed solutions, nor has much criticism of existing solutions been levied. This leaves a gap in our literary understanding of how different actors can works towards reducing the effects of negative bias in artificial intelligence systems.

In a recent article in the New York Times, Kate Crawford, an industry specialist currently working on the issue of ethical artificial intelligence, mentions the urgency of this problem (Crawford, 2018). She compares the issue of biased artificial intelligence to the problem of creating autonomous artificial intelligence noted by tech giants like Elon Musk and Warren Buffet (i.e. the problem of emergence), and notes that these concerns are only relevant if the current problem of biased artificial intelligence can be solved. Indeed, the danger of unquestionable mathematics created under the guise of science resulting in massive disparity is a pressing concern. It is therefore important understand how we can prevent artificial intelligence from learning our worst impulses.

The current study will try to fill this research gap by answering the following question:

How can we reduce the effects of negative bias in our artificial intelligence systems?

This thesis is structured by first elaborating on the definitions of artificial intelligence and bias, creating a clear understanding of the terminology used throughout the paper and delineating the framework of the inquiry. Secondly, an account is made of the processes and mechanisms involved in bias creation in human cognition, artificial intelligence, and at a more conceptual philosophical level. Concrete examples from a variety of disciplines will be explored to elucidate the depth and severity of the problem. Concurrently, proposed solutions will be discussed at multiple stages of the implementation of artificial intelligence:

Commercial

interest

Political interests

Technical

implementation

Transparency laws

Artifical

intelligence bias

Figure 2: model of the four topics of previously proposed solutions which will be discussed in the current paper. It is important to note that all proposed solutions will be discussed as to their effect on, and viability for, any technical implementation of artificial intelligence.

(4)

4

technical solutions, market solutions, and political solutions (figure 2). The section of solutions will be concluded by a discussion on solutions of transparency existing in all previously mentioned relevant sectors. Finally, a discussion section will highlight the synchronized proposed solution that resulted from this literary analysis, and a recommendation will be made on how different levels of society (figure 2) can work towards creating more ethically conscious artificial intelligence programming.

2: What is Artificial Intelligence?

There is a myriad of examples in popular culture of what Artificial Intelligence can look like, from the 1968 HAL9000 to the eastern Puppetmaster in Ghost in the Shell. Unfortunately, these portrayals often don’t do justice to the wide breadth of technological innovations which fall under the rubric of artificial intelligence. Definitionally, artificial intelligence covers all technology that artificially mimics a certain aspect of cognitive intelligence, be it sensory intelligence, visuospatial intelligence, emotional intelligence, or any other form of intelligence (Gordon-Murnane, 2018). Yet a more generalizable method of describing artificial intelligence would be: an artificially created system that, when given external data, identifies patterns through features to achieve classification of novel stimuli (figure 3, Dietterich & Kong, 1995). As an example, for sensory intelligence, a voice to text translator might be trained on a set of dictated speeches (external data), which results in the recognition of certain sounds (features) being related to phonemes (pattern recognition), to be able to classify (i.e. translate) voice to text in real time (classification of novel stimuli). This limited definition allows for a more focused understanding of the current issue of bias feeding into classification systems. Other types of artificial intelligence exist in the form of regression artificial intelligence as well as simple artificial intelligence (e.g. algorithmic calculators). For the purpose of the current paper, artificial intelligence can be understood as systems that classify individual cases into a subclass in a multiclass distinction.

Figure 3: visual representation of the four aspects generally associated with creating an artificial intelligence algorithm

To achieve this classification, pattern recognition is done on a set of features informative to the data. This can be either a priory, in the case of supervised learning, or through unsupervised learning, where the artificial intelligence algorithm creates it own set of features derived from the initially entered data. In the famous example of Deep Blue, the chess artificial intelligence program, the programmers chose to digitalize a value for each position of all chess piece (i.e. white bishop on C3 might correspond to an arbitrary value of 3), which altogether comprised the current state of the board (thus being an example of supervised learning). The values corresponding to the placement of the pieces are the features which are informative of the current data. Classification can then be performed to determine what the optimal move would be in the current pattern of pieces. An unsupervised version of Deep Blue, the recent artificial intelligence chess

Classfication

choice

Data

selection

Feature

selection

Model

choice

(5)

5

champion, AlphaZero, created its own features through simulated self-training. While it is not possible to explain these features in a straightforward manner, it can be understood that AlphaZero considers more than just the state of the board, possibly creating features of the emotional state of the opponent, oversight of a part of the board by the opponent, and other aspects of the game of chess.

The classification results from a decision criterium: a set of rules which delineate particular outputs (generally referred to as classes) from one another. Emotion recognition algorithms for example, use the state of facial muscles as features, and classify instances of faces as being in certain emotional states (Howard, Zhang, & Horvitz, 2017). Hypothetically, if there are 10 facial muscles, and fear would correspond to the five lower digits whereas anger corresponded to the upper digits, a feature pattern of [1, 4, 5] would be classified as fear, as all features point toward the lower digits. Classifications can become more complex, for example as sub goals are used (e.g. emotion of face can only be fear if muscle 2 is activated, resulting in a classification as angry), hidden features are used (e.g. muscle 4 and 5 used together point towards anger, resulting in a classification as angry), multiple classes are used (e.g. muscles 4-7 are used when a face is sad, resulting in a classification as sad), features are not all unidirectional (e.g. muscles [1, 8, 9] are detected, which could be taken as an average to result in the classification as angry), or when other subsets and combinations of data are used.

Artificial intelligence is an artificially created system that, when given external data, identifies patterns through features to achieve classification on novel stimuli. Dieterrich and Shavlik, in their early influential work on machine learning, a subcategory of artificial intelligence, compare a similar definition to one that exists alongside the aforementioned definition: “[Artificial intelligence] can be described as the study of biases” (Dietterich & Shavlik, 1990). This line of reasoning follows from the idea that any classification is inherently biased, as any distinction requires an inequality of distribution between classes. The current literature will not follow this interpretation of bias, also known as algorithmic bias (Danks & London, 2017), though references to the relation between algorithmic bias explained in the current section and human bias defined in the following section will be made throughout the research.

3: What is bias?

Bias is a complicated term used in a multitude of ways across as well as within disciplines (Danks & London, 2017). In general, the term refers to an unequal distribution of attributes between classes. This definition holds quite well amongst disciplines but is too unspecific to be used for different disciplines and situations. Disciplines therefore make further subdivisions of bias such as: implicit and explicit bias in cognitive sciences (Amodio & Mendoza, 2010), statistical and machine bias in computer science (Danks & London, 2017), or cognitive and machine bias in philosophy (G. M. Johnson, 2018). The purpose of the current chapter is to elaborate on the multiple uses of the term bias, and to derive a taxonomy relevant to the current study. The following section will elaborate on the type of human cognitive function that are associated with negative bias, which currently feeds into our artificial intelligence systems. Concurrently, a section will elaborate on the use of bias in artificial intelligence literature, and state the definition of bias used in the current paper.

(6)

6

Finally, a short section on bias from a philosophical account of bias, using multiple logic-based analysis, will be used to elaborate on the specific of the current definition of bias.

3.1: Bias in human cognition

Cognitive bias can be understood as a set of heuristic tools employed by the brain to make more sense of the world (Rosso, 2018). A simple example would be the checker shadow illusion, where our brain generalizes the luminescence of a square according to the surrounding squares and misclassifies the color due to generalization (figure 4). Many such heuristic have been found in the field of psychology, of which many are

translatable to artificial intelligence systems. In the current paper, however, only biases related to social situations will be considered. An example would be to attribute a member of a social category with a certain trait due to the generalization of the group (e.g. a wealthy individual is smarter than poor individuals).

Furthermore, the type of bias relevant to the current research is one which results in a negative general outcome. These are outcomes that society deems unfair, ethically reprehensible, or counterproductive. Biased hiring practices, considering education caliber (e.g. mathematics, arts history, etc.) can often be considered a fair, ethically acceptable, and productive way to generalize towards a suitable candidate. Yet if this same judgement is based on race, it would be considered unfair, ethically dubious, and unproductive. This can be understood as a metric of relevance, where an individual’s generalization to a subgroup must be relevant to the decision-making process. However, there are cases in which relevance to the decision-making process might be subservient to the issue of protection of a subgroup. This is often not a straightforward form of bias, but rather indirect bias of a subgroup. An example would be to increase insurance rates for individuals living in high-crime neighborhoods. While this metric might be relevant to the decision-making process, it does put an unfair burden on poor individuals, who tend to live in high-crime neighborhoods. These latter cases are not clear cut types of negative social bias, and their relevance is often disagreed upon in political and legal contexts (Kehl, Guo, & Kessler, 2017; Knight, 2017).

Finally, bias in the context of social cognition is generally split into implicit and explicit bias (Amodio & Mendoza, 2010). This distinction can be understood as the level of conscious perception of the bias. Explicit negative social biases are discriminatory tendencies consciously understood by the biased actor. Implicit negative social biases are generalizations made cognitively that are discriminatory in nature but not consciously perceived. An example of the latter is the well-researched implicit association effect, where minority group are subconsciously associated with negative images and words (Amodio & Mendoza, 2010). In

Figure 4: Checker shadow illusion illustration. The grey hue in box A and box B are the exact same luminescence. This illusion is illustrative of a cognitive heuristic called light consistency (or perceptual consistency), where observation is biased toward the

(7)

7

the current research, a focus is put on the effects of implicit social bias on artificial intelligence, where subconscious discriminatory tendencies tend to feed into these systems. The effects of explicit social bias on artificial intelligence is not as widespread, and currently a legal framework is already in place to prevent explicit bias in artificial intelligence (Goodman & Flaxman, 2016; Kehl et al., 2017). Thus, the type of bias that is of interest to the current study is: social in nature, results in a negative outcome, and is generally implicit (for a summary of the relevant types of bias see: Kliegr, Bahník, & Fürnkranz, 2018).

Since the current research attempts to solve the issue of bias existent in artificial intelligence systems that are caused by human cognitive bias, it can be argued that the source rather than the consequence should be subject to change. It is therefore important to highlight the tenacity of human cognitive bias. Human bias in general is an extremely useful tool, and generally functions aptly to navigate through a complex world (Amodio & Mendoza, 2010). In fact, in a recent paper on the creation of artificial general intelligence (defined as a true and full recreation of the human cognition), Potapov and Rodionov argue that social and ethical bias is imperative to human cognition (Potapov & Rodionov, 2014). Indeed, neuropsychological research also suggests social biases resist revision, as they are an integral part of human cognition (Amodio & Mendoza, 2010; G. M. Johnson, 2018).

Furthermore, social biases can be caused by long-term historical bias, as well as cumulative culture experiences (Martin, Cunningham, Hutchison, Slessor, & Smith, 2017; Rosso, 2018). These implicit biases, which often used to be explicit forms of negative bias, can still strongly influence current cognition. This occurs as cultural expressions linger and evolve to fit a social narrative inside the current system, thus morphing into implicit social understandings. Another reason that this negative bias is difficult to get rid of is the lack of current political and commercial desire to do so (Knight, 2017). Training programs to reduce negative implicit bias have been shown to be marginally effective, yet political actors (e.g. US legislative branch) seem uninterested in legislating or promoting towards such successes (Knight, 2017).

As it is difficult to change innate cognitive bias mechanisms, it seems more reasonable to limit mechanical systems from mimicking negative effects. Rather, artificial intelligence systems could be used to reduce the negative effects of our cognitive biases (Nadella, 2018; Rosso, 2018).

3.2: Bias in machine cognition

The earliest use of the term bias in the context of artificial intelligence was in 1980, used in an article on contemporary software creation (Mitchell, 1980). In this influential paper, bias was described as: “…any basis for choosing one generalization [hypothesis] over another, other than strict consistency with the observed training instances”. The author argued that, if machine learning was to handle large sets of data, non-representative biases will be needed to be included into machine learning algorithms. Essentially, Mitchell argued for an introduction of human bias style generalization to artificial intelligence. This bias would make classification (i.e. generalization) efficient and effective in terms of computational power as well as learning curve (given enough data). Indeed, any learning algorithm needs to function on such biases in order to generalize beyond the training data, in case the test data does not perfectly represent the training data (Dietterich & Kong, 1995). Although the computational power and amount of available training data has

(8)

8

increased exponentially since then, strong algorithmic biases are still the hallmark of effective and efficient artificial intelligence (Barocas, Bradley, Honavar, & Provost, 2017).

Recently, researchers have advocated the use of bias in artificial intelligence to more realistically simulate human intelligence (Potapov & Rodionov, 2014). This highlights the inherent nature of bias in artificial intelligence system, yet it refers to an entirely different type of bias. The bias referred to in machine learning can be understood as: “…an inductive bias that induces some sort of capacity control (i.e. restricts or

encourages predictors to be “simple” in some way), which in turn allows for generalization (Neyshabur, Tomioka, & Srebro, 2014). Capacity control allows for an effective and efficient classification that resembles human cognitive functioning (Araujo, 2004; Melita, Ganapathy, & Hailemariam, 2012).

“Weapons of Math Destruction” described the idea that negative implicit social bias (in the narrow definition described in the section on human bias) has been feeding into algorithmic bias (in the definition given in the above paragraphs) (O’Neil, 2016). The process by which this occurs is multifaceted (Van Dijk, 2012), where multiple possible factors might influence bias in artificial intelligence (figure 1). Firstly, implicit human bias might directly influence artificial intelligence by how algorithms are constructed by their creators (Shepperd, Bowes, & Hall, 2014). Secondly, a lack of diversity in the creation of artificial intelligence systems might cause negative biases not to be detected (Howard & Borenstein, 2017; Nadella, 2018; Rosso, 2018). Thirdly, a lack of data might lead into suboptimal categorization for minority groups (Danks & London, 2017; Rajaraman, 2008). And finally, historical social bias - through intermediate variables (e.g. economical suppression of African-Americans leading to wealth being correlated with race) – might skew training data to enforce existing implicit biases (Chander, 2017; Danks & London, 2017).

a: recidivism models The US legal system has been relying

on recidivism models, artificial intelligence systems that calculate the

likelihood a criminal will re-commit his/her crimes, in determining length

of sentencing (a higher sentence is given to a fellow like to re-commit their crimes). While the model has

been found to operate more efficiently and effectively then human

actors, fear of discriminatory behavior have recently been found. Research found that African-American

individuals had a higher chance to be misclassified as re-offenders than their white counterparts. Reversely, white individuals were more likely to misclassified as 'safe'. Importantly, no

feature of race was used (as this would be illegal), and differences between classes are likely due to proxy-attributes (figure 5 C2), like

arrest records.

b: Credit scoring

Credit scoring systems are artificial intelligence algorithms that determine the likelihood of an individual to default on their loans. In

Western countries, these scores are involved in any large financial transaction, getting a mortgage, buying a car, getting insurance, etc.

Recent research found that poor individuals, regardless of their financial past or presence, receive a worse credit score than their wealthy

counterpart. Credit agencies reacted by mentioning the use of social metric

in their algorithms, where family history and social associates are considered to determine these credit score. It has been argued that the use of this metric is in conflict with the notion of equality in opportunity (as a

person cannot chose where he/she is born), as well as the right to individualized treatment under the

law.

c: Google image recognition A somewhat older example of negative bias been fed into artificial

intelligence, is the case of Google's image recognition algorithm classifying images of African-American children as gorilla's. This instance, which was quickly corrected

by google, was one of the major triggers that led to the explosion of

research into bias in algorithms. In this case, a artificial intelligence system was trained to detect multiple

types of stimuli (e.g. humans, animals, etc.) using googles hyperlink

system (matching images to the words used to describe them). The misclassification was likely caused by

the historical racism existent in the data-base that is called the internet (figure 5 B3). Relatively, pictures of black people, compared to white people, will be much more likely to be

mislabel (by hand) thus leading to a disparity in the training data.

(9)

9

Many examples exist of how social bias has fed into artificial intelligence system (box 1). It is essential to understand why these instances represent negative instances of bias, not just in the context of ethical judgement, but in the context of artificial intelligence. Classification in the context of artificial intelligence tries to predict the correct class of any individual instance as best as possible (Danks & London, 2017). If bias leads to wrongful predictions, like in the case of artificial recidivism models in the US justice system (box 1a), this is a failure both on an ethical, as well as a machine learning level. However, it is important to note that artificial intelligence systems always operate on a notion of goal-directed behavior (i.e. maximizing a certain predetermined value). These goals are often not in line with a larger concept of ‘fairness’, strived for in society (box 1b). A possible answer to this would be to have goal-directed behavior be geared towards a metric of ‘fairness’ (Binns, 2018). Unfortunately, optimizing a goal of ‘fairness’ often leads to negative bias in other domains (Kleinberg, Mullainathan, & Raghavan, 2016). Possible answers to this problem will be discussed in the recommendation section.

Outside of the scope of the current study would thus be bias as a consequence of unequal opportunity of using artificial intelligence (Accenture, 2017; Furman et al., 2016), as well as bias that is caused by the economical model in which artificial intelligence is used (O’Neil, 2016). Thus, the notion of negative bias that feeds into machines endorsed in the current research relates to bias imbedded in created artificial

intelligence algorithms (figure 5). The following section will go into the more philosophical understanding of the notion of bias.

A: Classification

1.Bias in choice of what to classify 2.Lack of diversity in algorithmic coders

3.Lack of external validity of the classificiation question

B: Data

1.Insufficient data to train complexity of subgroups 2.Lack in representation of minority groups

3.Historical inequality due to inaccesiblity of technology

C: Features

1.Direct use of 'protected attributes' as informative features 2.Use of proxy attributes as informative features

3.Lack of transparency due to feature complexity 4.Inclusion of irrelevant features

D: Model

1.Lack of transparency due to model complexity 2.Individual model limitations for multiclass corrections 3.lack of questionability of output

4.Lack of distinction between algorithmic bias and negative bias

Possible ways negative bias is introduced in artificial intelligence

Figure 5: distinctive ways in which negative social bias is fed into algorithmic bias, divided into the four stages of artificial intelligence creation shown in figure 2

(10)

10

3.3: Bias in logic and philosophy

As explained in the previous section, bias in artificial intelligence is an inherent property, rather than a normative aspect. It is thus important to highlight in what way, if any, artificial intelligence can truly have ‘negative’ bias. This section will go into the ethical machine (Anderson & Anderson, 2007) by considering why it is important to study ethics in artificial intelligence. This will be done by considering philosophical notions of ‘negative’ bias relevant to artificial intelligence, and by considering proposed logical frameworks for viewing ethics in artificial intelligence. The question of whether artificial intelligence systems can possibly be ethical is outside of the scope of the current research, if answerable at all.

Machine ethics can be considered a sub-discipline within the philosophical query of ethics (Anderson & Anderson, 2007). In the early 2000s, when artificial intelligence systems were first being adopted into commercial practice, researchers working in various areas started considering the implication of machines playing a larger role in peoples lives. While considerations of ethical implication of novel technology was not new (e.g. the industrial revolution), the difficulty in understanding the technology and the resemblance to human decision making posed new issues (Gershgorn, 2018). In a foundational document on machine ethics, Moor highlights the need for a distinction between different levels of ethics in machine learning: general ethical machines, explicit ethical machines, implicit ethical machines, and non-ethical machines (Moor, 2006). This distinction is derived from older notions of ‘ethical agents’, which is then mapped unto artificial

intelligence. General ethical machines would be considered complete ethical agents, similar to ethical humans (Moor, 2006). Explicit ethical machines are artificial intelligence systems that are able to create and act upon an internal ethical concept (i.e. create a concept of ‘good’). Implicit ethical machines are artificial intelligence systems which have an externally implemented concepts of morality (e.g. not hurting humans is ‘good’). Finally, non-ethical machines are systems which do not take ethics into consideration at all (e.g. interactive terminal for ordering in McDonald’s).

Multiple researchers have argued that, at minimum, implicitly ethical machines should be strived for when making technology. “Weapons of Math Destruction” shows that any artificial intelligence system can have ethical implications, purely due to the inherent properties of creating such systems (O’Neil, 2016). It is therefore important to strive for a constant consideration of ethics, to minimalize negative bias feeding into artificial intelligence (Anderson & Anderson, 2007; Moor, 2006). Furthermore, explicit ethical agents might be preferable over implicit ethical agents, as externally implemented ethical rules might conflict internally, as well as lack consideration of other ethical aspects (Kleinberg et al., 2016).

The reason why this emphasis is placed on the introduction of implicit ethics in machines is the relative simplicity with which artificial intelligence systems might display ethical behavior. In a reinterpretation of Braitenberg’s vehicles, Headleand and Teahan showed how little input a machine has to have in order for it to display ethical behavior, as well as change its behavior with slight changes to the input (Headleand & Teahan, 2016). Indeed, much research has shown that slight changes in data input, feature selection, and output parameters can have drastic ethical implication on machine systems (Danks & London, 2017; Rajaraman, 2008).

(11)

11

The question of how to implement ethical judgement into artificial intelligence shall be considered in the technical solution section, however, it is important to consider what ethical framework should be implemented. Reuben Binns, an Oxford professor of Philosophy, notes the importance for researchers in machine ethics to consider political philosophical frameworks (Binns, 2018). Binns notes that competing understandings of for example utilitarianism, can lead to entirely different implementation of an implicitly ethical machine. The judgement of what philosophical framework is best is outside of the scope of this research. However, in all presented solutions it should be considered that implementation might rely strongly on what notion of ethics is used.

Another relevant philosophical aspect is the logical framework in which to view issues of bias in artificial intelligence. Over the last decade, multiple approaches have been proposed for how to logically frame bias in machine learning. Traditional conditional logic has been used to distinguish types of bias in artificial

intelligence, which allows for a comprehensive taxonomy of bias specific to artificial intelligence (Kleinberg et al., 2016). Hegelian dialectic type counterfactual logic has been proposed to distinguish negative bias in artificial intelligence to standard statistical bias present in artificial intelligence (Kusner, Loftus, Russell, & Silva, 2017). This type of logic allows for a consideration of ‘fairness’ not from an absolute point, but rather as a relative gain compared to the lost classification function. Finally, a representational logic approach has been taken to compare bias in humans and bias in machines (G. M. Johnson, 2018). This novel approach makes it easier to understand the relationship of bias, and how bias in both can help understanding in the other domain.

The philosophical concepts and logical frameworks should not be considered as solution on their own, yet they are essential to take into consideration when evaluating the proposed solution. Comparisons between any following solutions (e.g. algorithmic audit vs algorithmic transparency) should be considered from such philosophical standpoints. As much as possible, the following section will be abetted with the

abovementioned philosophical and logical frameworks to place solutions into a larger context.

4.1: Technical solutions

Braitenberg vessels are a thought experiment created by Headleand and Teahan in their recent article on ethical machines (Headleand & Teahan, 2016). A play on Braitenberg’s vehicles, vessels are simple machines that display different ethical behaviors (e.g. egoistic and altruistic) as a result of very simple changes in their mechanism (figure 6). While the simplicity of Braintenberg’s vessels does not easily translate to complex artificial intelligence, they do serve as an example of how technical changes can lead to alternate ethical behavior. Technical changes to artificial intelligence algorithms can occur at any stage of the creation process. The previously proposed solutions will be discussed in the order in which choices are generally made when creating artificial intelligence systems: choice of classification, data selection, feature selection, and model choice.

(12)

12

4.1.1: Classification choice

Choice of classification should be interpreted as the question which the artificial intelligence system will ask itself (e.g. who will recede back into crime after their sentence (box1a)). While not yet technical

implementation of the algorithms, it is highly determinative for what technical choices will be made during the process (figure 5 A). External variables that have been shown to introduce bias in choice of classification are experience of the researchers (Shepperd et al., 2014) as well as diversity of the researchers (figure 5 A2, Crawford, 2018; Reese, 2016). Besides correcting for these by changing hiring practices (e.g. hiring a more diverse research team), an important technical method to reduce such introduction of bias has recently been introduced. Kusner and colleagues created algorithms that check the classification against multiple general ethical testing procedures, termed counterfactual algorithms (Kusner et al., 2017). These algorithms allow for researchers to test their artificial intelligence against a set of predefined question (e.g. would the

classification be the same if the subject were part of a minority group) which were specifically created by a diverse, and experienced group of researchers (figure 5 A1&A2). It does need a specific measurement in the data which relates to protected attributes (e.g. African American, figure 5 C1). Unfortunately, this data in not always available, thus making counterfactual testing impossible. Also, other researchers have shown that this technique falls short in cases where the real distribution of groups is not equal, which is often the case (figure 5 A3, Kleinberg et al., 2016).

4.1.2: Data selection

In “Weapons of Math Destruction”, O’Neil mainly points towards data as the cause of bias introduction in artificial intelligence systems (O’Neil, 2016). This bias can be due to choices made in which data to use (Tantithamthavorn, McIntosh, Hassan, & Matsumoto, 2016), as well as bias that is the cause of historical inequality (figure 7, Furman et al., 2016; Van Dijk, 2012). The simplest approach to this problem would be to actively rectify data-sets to be representative (figure 5 B2). Unfortunately, data-sets are often too big (Howard & Borenstein, 2017), and interpretability of representativeness too difficult (Lipton, 2016) to apply this method to most artificial intelligence systems. Doing efficiency testing using multiple different data-sets has been proposed as a possible solution, which could be done in combination with counterfactual testing (Tantithamthavorn et al., 2016).

Figure 6: Braitenberg vessel adopted from Headleand & Teahan, 2016. A Vessel approaching a resource. (r) is the range at which the resource can be used at maximum efficiency. (R) is the maximum range the Vessel could be from the resource and still charge. (d) is the current distance of the Vessel from the resource. This vessel has a decision function where (d) and (r) are taken as

features to classify whether to move or to stay still. This ‘egoistic’ vessel can easily be modified to a ‘altruistic’ vessel by adding feature (e), which considers the surrounding low-energy vessels (so as to move away to make room near the resource). This can be

(13)

13

A possibly effective and simpler method that has been proposed is to add relevant training data indefinitely, specifically data that corrects for possible found bias (figure 5 B1, Rajaraman, 2008). This approach has the added benefit of generally improving classification, much more than changing the data-set would. However, adding enough data can be bothersome in artificial intelligence systems that use large datasets, and

impossible in algorithms that use cloud-based data-services (Howard & Borenstein, 2017). This technical solution can work symbiotically with creating large-scale open source data, which will be discussed in the transparency section (Sherman, 2018).

4.1.3: Feature selections

Feature selection refers to the attributes of the data which will be used for classification. Highly informative features (as related to the classification question) should be used to obtain a good and representative classification (figure 5 C4). Using a

large number of features will generally lead to overfitting (i.e. making the classifier too specific to the training data), while using a very narrow number of features will lead to underfitting (Neyshabur et al., 2014). Both supervised and unsupervised learning algorithms can suffer from bias introduction through feature selection (Kliegr et al., 2018). Two important technical solutions have been posed to solve these issues.

Firstly, Neyshabur and colleagues argued that, counterintuitively, selecting a large number of features might alleviate bias introduction. They researched the effect of

capacity control (number of features) on bias in artificial intelligence systems and found that it was norm factorization (i.e. the length and complexity of the decision function) that caused overfitting. They mention that feature quantity, as long as the features are relevant and easily interpretable for the classification (figure 5 C4), can improve classification results while also reducing bias in different classes (Neyshabur et al., 2014). Unfortunately, capacity control and norm factorization are often related, as increased amounts of features generally lead to more complex decision functions (Dougherty, 2013).

Secondly, Kliegr and colleagues, in a large-scale study on the different cognitive biases that play into artificial intelligence bias, note the importance of the reliability of features (Kliegr et al., 2018). Reliability, in this context, refers to the amount of predictive power a single feature holds when used without other features.

Figure 7: model of the appropriation of technology. Historical inequalities lead to less physical and material access, which results in lower skill in suppressed minority groups as well as lower representation of diversity in the techology. As an example, Afro-Americans have lower access to technology, which leads to

lack of diversity in the google directory of images, resulting in misclassification of image recognition technology (box1c) Historical bias towards minority groups Inequal access to technology Lack of diversity in technology Lack of minority representation for training data Poor technology performance for minority groups

(14)

14

While this solution was mainly framed as an active human debiasing technique, it is important to note the technical method which would be used to derive a human interpretable choice. A pruning technique is used to derive the most reliable features, reducing the amount of features by removing those that have the smallest impact term in the decision function (Lakkaraju, Bach, & Jure, 2016). Such pruning techniques are very useful in tandem with a human observer (figure 5 C3 & C4), as non-sensible features (e.g. using race when determining credit scores) can be taken out, while maintaining relevant highly reliable features.

4.1.4: Model choice

The chosen model in any artificial intelligence system has massive consequences for the resulting

classification scheme, as well as the possible bias that are fed into the system (Dougherty, 2013; Shepperd et al., 2014). Choosing which model (e.g. k nearest neighbor, support vector machine, etc.), as well as how to implement the model, can have bias feeding into the system. The following technical solutions all relate to choosing models, as well as how to implement them.

The only completely novel model that has been proposed in the context of bias prevention is the Input Causal Model (Kusner et al., 2017). This model, created by research at the Alan Turing Institute, is a methodological implementation of the previously mentioned concept of counterfactual fairness. The model can be initialized as a standard support vector machine, a simple machine learning classifier, with specialized weights for chosen protected attributes (Shadowen, 2017). In practice, this allows for a standard usage of artificial intelligence, with the added benefit of having an automated reduction of negative bias (Kusner et al., 2017). A complete expansion on how this is implemented is beyond the scope of the current article, but in

summary: an algorithm called FairLearning, which compares the resulting classification to counterfactual examples where only the protected attributes were changed (but not the relevant features that determine the classification criterium, figure 5 C1 & D2). This recently proposed solution has not yet gained much attention, and further testing will be needed to assess the effectiveness of these types of model for reduction of bias.

Another solution that has been proposed, is the use of more interpretable models so that transparency can be increased (DARPA, 2016; Kliegr et al., 2018). While these types of solutions will be individually discussed in a later section, it is important to note the technical aspect of this. In a recent paper examining the technical relevance of models that are supposedly more interpretable it was found that no such differentiation currently exists (figure 5 D4, Lipton, 2016). Lipton mentioned that supposed interpretable models (e.g. k nearest neighbors, support vector machines, etc.) often use highly complex and abstract features when considering the classification criterium (figure 5 D1). Models that are supposedly non-interpretable (e.g. neural networks, deep learning models, etc.) on the other hand, often have feature inputs which can be clearly understood, although the process by which they are implemented in the decision criterium is more obscure (figure 5 D3). Lipton’s research points toward a possible solution where a simple model could potentially be interpretable, which allows for transparency, if the norm factorization is limited and features are straightforward measures relevant to the classification (Lipton, 2016; Neyshabur et al., 2014).

Another important consideration in any model that will try to mend the effects of bias in machine learning, is how the model represents bias internally. Artificial intelligence, as has previously been mentioned, relies on

(15)

15 the existence of statistical bias to do classification. It is therefore important to internally differentiate negative bias from bias relevant to the data (figure 5 D4).

Multiple proposals for how to implement this have been done throughout the last couple of decades, yet special emphasis has been put on one solution in the context of bias prevention (Shadowen, 2017). Error-corrected output coding (ECOC), an added implementation of a decision tree model working synchronous to any artificial intelligence model, has been shown to reduce overall classification error (Dietterich & Kong, 1995). While this is an older method of reducing error in classification, it has recently been shown to effectively reduce negative bias when specific bias correcting hypotheses are implemented for the decision tree model (figure 5 A3 & D2). Unfortunately, current literature only finds success in highly specified cases, and no information is known on how transferable this method is across different models and classification problems (Lipton, 2016; Shadowen, 2017).

ECOC methods of measuring bias in classification schemes can be greatly complemented by a recent solution for removal of known biases. In a cooperative effort between Georgia Institute of Technology and

Microsoft, Howard and colleagues proposed a hierarchical classification system to account for known biases (Howard et al., 2017). The hierarchical model is built to reconsider instances where a minority group (or a group with protected attributes) gets classified into the most similar majority group, with added weight for the feature that would classify into the minority class specifically (figure 5 D2). This 2nd_{layer hierarchy is then}

complemented with a 3rd_{layer, which reconsiders cases where the classification switched from minority to}

majority class, according to a classification where the minority group is considered as being grouped with a dissimilar majority (figure 8). While they proposed using simple support vector machine classifiers for both structures, others have proposed using classifiers that are problem specific to increase the effectiveness of this method (Lipton, 2016; Shadowen, 2017). Unfortunately, hierarchical models of this kind suffer from having to know the bias a priori. Furthermore, this method relies strongly on minority and majority group to be highly similar, and translatable (i.e. facial muscles used in emotional expression), which is not the case in most forms of bias (figure 5 B2). This method could, however, be very promising in conjuncture with other methods that compensate for these flaws.

Finally, an as of yet unexplored solution which has been proposed in the literature, is the introduction of randomness to artificial intelligence systems (C. G. Johnson, 2001; Shadowen, 2017). While this method can

Figure 8: Visual representation of the hierarchical correct model adopted from Howard et al., 2017. The standard classfication (On-Line Generalized

Classifier) is supported by a 2nd_{hierarchy SVM, which reclassified the} cases classified as a majority case most similar to a minority group with a

bias towards the minority. The 3rd_{hierarchy SVM reclassifies all altered} instances into both groups, now without bias. This hierarchical model resulted in a 17.3% improval of the classification in general, and a 41.5%

(16)

16

be applicable only in cases where non-specificity is desirable (e.g. bias in advertisement), it can potentially be extremely useful. The use of randomness factors in artificial intelligence models is not a novel concept (Araujo, 2004). Genetic algorithms, models that use ‘mutations’ (i.e. randomness over generations), have been shown to be highly effective in certain classification cases where diverse outputs are desirable (C. G. Johnson, 2001; Melita et al., 2012). While the use of the term diverse is not contextual to social diversity, it has been proposed as a possible solution for creating unbiased search results in web mining (Melita et al., 2012). This highly specific example does not transfer directly onto the larger issue of bias in artificial intelligence, thus further research will be needed to assess the feasibility of using such methods.

4.2: Commercial solutions

While technical solutions try to solve the problem of bias in artificial intelligence at its core, it is important to highlight the applicability of such solutions in the commercial market. The successful implementation of solutions often hinges on their practicality. Many methods cause a reduction in classification accuracy (e.g. Howard et al., 2017), thus leading to a potential degradation of the performance of artificial intelligence systems. While there are strong arguments to be made for why this trade-off would be ethical (Gordon-Murnane, 2018; O’Neil, 2017), commercial enterprises do not always function in this manner (Spielkamp, 2017). As such the commercial intricacies to the problem of bias in machine learning will be highlighted, as well as the proposed solutions.

Firstly, it is important to note that the commercial world has changed its tone with regards to this problem over the past couple of years. In a 2013 symposium on novel techniques and technologies in the field of electrical engineering, artificial intelligence systems were named as the key future technology relevant for the commercial world (Popescu & Popescu, 2013). This document only briefly mentioned the possibility of bias feeding into artificial intelligence systems, though this was immediately countered by the possibility of the spread of wealth that artificial intelligence could bring. Only five years later, in a review on artificial intelligence and business, the essential problem of bias in artificial intelligence was highlighted (Coval, 2018; Rockwood, 2018). This occurs because on the long-term, these systems lead to an increase performance in the commercial market, which is untenable if negative bias is maintained and amplified (figure 5 A1). O’Neil also mentions this issue in “Weapons of Math Destruction”, where the example of the loan industry is used, which increases its market share as credit safe individuals which have been excluded from the market are introduced to the system (O’Neil, 2016).

In this environment of increased awareness and relevance of the problem, multiple organizations have sprung up specifically to deal with the issue of bias in artificial intelligence. The most important of these is a research center at the University of New York called AI-Now (AI Now, 2017; Gershgorn, 2018). This research agency hosts a multidisciplinary academic staff, which focusses on solving the issue of bias in artificial intelligence (Gordon-Murnane, 2018). Though created very recently, AI-Now has already published multiple articles which proliferate the contentiousness of the problem (figure 5 A1), as well as promote research on the subject. They have also initiated research projects with the Alan Turing Institute, as well as big companies like Microsoft (AI Now, 2017; Nadella, 2018).

(17)

17

Another somewhat older company dedicated to reducing negative bias in artificial intelligence is the Fairness, Accountability, and Transparency in Machine Learning (FATML) organization (FATML, 2018). This organization has been working on bringing together technicians and researchers to work on the issue of bias in artificial intelligence since 2014. Every year they host a hackathon-like event, which brings together diverse groups of computer scientists to work on this issue (figure 5 A2). This event has also spawned smaller new companies which specialize in bias prevention in artificial intelligence. One example that has gained traction due to its political advocacy is the 2017 company AlgorithWatch (Spielkamp, 2017).

The importance of such commercial advocacy groups is highlighted in a recent paper on possible political solutions for the problem of bias in artificial intelligence (Wachter, Mittelstadt, & Floridi, 2017). Wachter and colleagues found that all current political frameworks still allow for bias to be fed into the system unchecked. They mention that any effective solutions will have to come from the commercial world, as big companies generally operate across borders, thus making national legislation ineffective. Fortunately, big companies in the tech industry have recently emphasized their willingness to find a solution to the problem of bias in artificial intelligence (figure 2), though no concrete plans have been formed as of yet (Nadella, 2018). It is also important to highlight the interplay of commercial and governmental institutions, as governmental institutions have shown interest in outsourcing the creation of possible solutions to this problem (Gordon-Murnane, 2018). One such initiative is the recent Explainable Artificial Intelligence (XAI) project by the Defense Advanced Research Projects Agency (DARPA, 2016). This project aims to improve the transparency of artificial intelligence systems (figure 5 C3 & D1), as well as to reduce possible human bias effects (mostly due to the inability to understand artificial intelligence decision making criteria). Interestingly, DARPA also hired a group of specialized cognitive scientists for this project, highlighting the importance of understanding bias in this endeavor. While the project is currently classified, possible solutions could become major

contributors in reducing bias in artificial intelligence systems at large (Gordon-Murnane, 2018). Similarly, other governmental initiatives focused on promoting bias prevention in artificial intelligence might indirectly spawn solution in the commercial world (e.g. STEM-C, 2018)

Finally, a recent article on artificial intelligence in human resources highlights some interesting commercial solutions to the problem of bias in artificial intelligence (Rockwood, 2018). While these solutions have not been technically implemented, they include multiple conceptual solutions which might become important in specific commercial settings. An example of this would be to use gamification in hiring practices (Rockwood, 2018). Artificial intelligence systems are already being used to determine qualified candidates, often leading to biased classification due to historical bias feeding into the data (O’Neil, 2016). The gamification of this process (making applicants play through games which mimic on the job scenarios) could level the playing field (figure 5 B3), and focus purely on qualifications as determined by the game scenarios. Further research is needed in order to assess the success and feasibility of such solutions.

4.3: Political solutions

Political solutions to the problem of bias in artificial intelligence have developed over the last decade (Goodman & Flaxman, 2016). Political solutions tend to be imperfect as legal frameworks always lag behind technological developments and are often followed only to a limited degree (Wachter et al., 2017).

(18)

18

Furthermore, political frameworks tend to rectify the worst offenses through legislation and can work toward improvement via government funding of certain projects. The following paragraphs will outline the current legal and political situation, after which a discussion of solutions that are not yet implemented. It is important to note that the following literature focusses specifically on Western countries, due to the scope of the current article as well as the language barrier that limits translation of frameworks from other countries. Two important documents that outlined the relevance of artificial intelligence and the bias that might feed into these systems are The Future of AI, and Advances in Artificial Intelligence (Furman et al., 2016). Both documents, created during the Obama Administration, stated the importance of active promotion and regulation of artificial intelligence system, highlighting both the promise and the dangers of the technology. While the legality of these system was left for the court to decide (Kehl et al., 2017), multiple directives have allowed for an increase in research funding into this issue (DARPA, 2016; Furman et al., 2016). After the Obama Administration left office, such directives have been canceled, and no new legislation has replaced it. In an interview with Catherine O’Neil, she mentioned that in the Trump Administration: “The Office of Science and Technology Policy is no longer actively engaged in AI policy- or much of anything according to their website” (Knight, 2017). The American courts have recently seen a few cases where the use of artificial intelligence systems was challenged, which set legal precedent (Kehl et al., 2017). Especially the case of State of Wisconsin vs. Loomis has been very influential in this matter. In this case, an inmate who received a longer sentence due to the COMPAS recidivism model (box1a) challenged the state’s use of this system. The case ended in a rejection of the appeal by Mr. Loomis, though the court did mention that future use of Artificial Intelligence by the government should be done with greater transparency. This case set the precedent for multiple other cases, which have generally ruled in favor of continued use of artificial intelligence.

The European Union has chosen to actively legislate against bias in artificial intelligence system. In an update to the 1995 Data Protection Directive, an EU agreement to nationally legislate bias that is caused by

computerized systems, the General Data Protection Regulation (GDPR) seeks to actively legislate abuses of artificial intelligence system across borders (Goodman & Flaxman, 2016). Enshrined in this regulation is the so called “right to explanation”, a right of individuals to challenge the use of artificial intelligence (figure 5 D3). While this political framework is groundbreaking as the first cross-border artificial intelligence regulation worldwide, many critics have pointed out some fundamental flaws. In a breakdown of the relevant articles of the agreement, Wachter and colleagues found that, at most, the regulation allows for a “right to be

informed”. This right is a mere shadow of what others have lauded as a “right to explanation”, where individuals can only actively pursue information on how artificial intelligence works (figure 5 C3). This leaves individuals or groups that are the subject to bias in artificial intelligence powerless when pursuing legal action (Wachter et al., 2017). No other commonly used frameworks for bias in artificial intelligence currently exist in the western world, though some governmental structures, like the UK House of Lords, have recently

announced that national legislation can soon be expected (Daley, 2018).

The intent of giving a “right to explanation” has been a hallmark of many proposed solutions that are political in nature. Unfortunately, this right is often in conflict with copyright laws, where companies that create and employ artificial intelligence algorithms have a right to protect their intellectual property. Wachter and colleagues noted this as one of the main reasons why the GDPR was altered to limit the power of individuals

(19)

19

to challenge artificial intelligence classification (Wachter et al., 2017). Shortly after this criticism was levied, a novel idea was posed to circumvent this issue, the idea of Affirmative Algorithmic Audit (Chander, 2017). This proposed legal tool would allow individuals to audit the output of information of artificial intelligence

systems (figure 5 D3), much in the same way audits function in the commercial sector. While this does not infringe on copyright laws, as no internal algorithmic function must be displayed, it does allow individuals to find out what the range and classes of classification are that are being employed by the system. Other authors have criticized this concept, as output tends to be difficult to interpret (Kliegr et al., 2018), and might be subject to reverse engineering (Howard et al., 2017).

Other political solutions involve the promotion of diverse research groups (Barocas et al., 2017), international cooperation in research and development of artificial intelligence (Accenture, 2017), increase of in-house human auditors of artificial intelligence systems (Nonnecke et al., 2017), and the creation of public sector data-bases which can be used as training data for artificial intelligence systems (Accenture, 2017). These solutions would all fall within the category of non-legislative affirmative action, and are thought to enhance possible technical solutions, and speed up the possibility of finding novel solutions to the problem of bias in artificial intelligence (figure 5 A1 & A2).

4.4: Transparency solutions

The concept of transparency has been pervasive throughout the previous chapters in the creation of transparent models (Neyshabur et al., 2014), the advocacy groups working for increased transparency in machine learning (AI Now, 2017; FATML, 2018), as well as the political concept of Affirmative Algorithmic Audit (Chander, 2017). Indeed, the need for transparency is generally seen as the most important first step to solve the problem of bias in artificial intelligence (Barocas et al., 2017; Reese, 2016; Shadowen, 2017). Yet criticism has been levied at this call for as transparency because it is believed by some to be technically untenable (Lipton, 2016) and conflicts with notions of fairness in technology (Kleinberg et al., 2016). This section will discuss what concept of transparency is relevant in weeding out bias in artificial intelligence. Also, several proposals for implementing more transparent artificial intelligence practices are discussed. In his recent work on model interpretability, Lipton notes the lack of a formalized technical definition of the term transparency (Lipton, 2016). Transparency is often interpreted as the opposite of opaqueness or informally ‘blackbox-ness’. Lipton notes that this definition lacks understanding of the goal of transparency in the context of artificial intelligence. In his formalization of interpretability, and its reliance on transparency, Lipton notes five general goals of transparency in artificial intelligence: trust in the results, understanding of causality in the model, transferability of training examples into novel test examples, informativeness of the outputs, and the ability to make fair and ethical decisions (Lipton, 2016). With these five goals in mind, a distinction is made between a formalized understanding of transparency and post-hoc interpretations. Transparency is further subdivided between: simulatability (i.e. complexity of the decision function), decomposability (i.e. intelligibility of the parameters), and algorithmic transparency (i.e. algorithmic complexity of the model). Post-hoc interpretation can also be further subdivided, though all can be understood as metrics of human understanding (Lipton, 2016).

(20)

20

This formalized definition was used to explicate the idea that some ‘simpler’ models (e.g. k-nearest

neighbors) are more transparent than ‘complex’ models (e.g. deep neural networks). However, this definition can also be used to understand proposed solutions on transparency (figure 5 C3), as some fall into the category of post-hoc interpretations (figure 5 D3), whereas other can be classified as one of the

subcategories of formalized transparency. DARPA’s project on transparent artificial intelligence focused on the technical implementation of explainable models (i.e. algorithmic transparency solutions), as well as using a cognitive psychologist to review visual aids that are used in these models (i.e. post-hoc interpretations). The following solutions will be discussed in this categorization.

One decomposability solution suggested by Barocas and colleagues is the use of decoding of features from existing artificial intelligence classifiers (Barocas et al., 2017). They note that, while not possible in certain deep learning models, many artificial intelligence systems will have straightforward weighted features. Labeling these features and then doing data-collection with explicit instruction on what feature it would have an influence on (post-hoc interpretation) would allow for more transparency (figure 5 C3). For models that explicitly do not allow for a direct understanding of feature effects, like neural networks, top-level feature input should be explicitly stated for the individual that is subject to classification (Barocas et al., 2017). Other authors have stated that this simple solution is often untenable, as this would allow for ‘gaming of the system’, where individuals will alter their behavior according to the effect it will have on the artificial intelligence system (Kleinberg et al., 2016; Shadowen, 2017)

A post-hoc interpretation suggestion that has also been suggested regarding feature sets in artificial

intelligence, is the order in which they are presented to the programmer. In a long review on cognitive biases and how they form bias in artificial intelligence, Kliegr and colleagues suggested multiple post-hoc

interpretation suggestions regarding the visualization of artificial intelligence systems (Kliegr et al., 2018). Amongst these, the presentation of features by order of strength (figure 5 C3), a metric definable by the confidence interval of individual features in a test-set, seems most promising. Confidence intervals in artificial intelligence testing (a tool for algorithmic transparency) has not been proposed in this manner. No

conceptual understanding of how these confidence intervals should affect feature choice is made, yet it is suggested to serve as a debiasing technique in that it allows programmers to directly value the relevance of features (Kliegr et al., 2018).

Another interesting decomposability solution is the use of open source in artificial intelligence. Open source is a concept familiar to most computer scientists. The online availability of algorithms and data is widespread with modules like scikit-learn and DeepXplore being important tools used in many artificial intelligence systems. A recent article in a magazine specifically designed for computer scientists mentioned the creation and proliferation of open source data sets, and ‘stress-testing’ tools (Sherman, 2018). This would allow companies to have free, easily accessible tools to test their models for possible bias that feeds into the system. It is further mentioned that artificial intelligence systems could mention on what open source data the system was tested, and whether it has been tested by online tools. This would offer legitimacy to the created artificial intelligence system, and could even be an easy tool for legislation (Sherman, 2018).

(21)

21

Another interesting post-hoc interpretation solution (although it is applied before algorithmic

implementation rather than post implementation) posed by Howard and Borenstein suggests the radical concept of community involvement in artificial intelligence creation (Howard & Borenstein, 2017). They propose that the involvement of relevant stakeholders in any classification problem (e.g. inmates in the recidivism case (box1a)) should be an essential part of artificial intelligence programming (figure 5 A3). This should be done in combination with the encouragement of creating multidisciplinary teams (figure 5 A2), which involve specialists in the field of implicit biases research. Translatability of the process of artificial intelligence creation should allow for increased trust and understanding in the underlying system (figure 5 D3). Yet this radical democratized idea of a system which is strongly capitalistic seems untenable. However, other research has already pointed toward this concept as a possible solution for governmentally supported artificial intelligence systems (Accenture, 2017; Nonnecke et al., 2017).

In fact, an extreme view on governmental use of artificial intelligence systems is that they should only be adopted if they include some form of the abovementioned transparency criteria (Kehl et al., 2017). This view is peddled by, amongst others, Priscilla Guo, a researcher that played a key part in revealing the effect of social bias in artificial intelligence used in the US criminal justice system. In a paper on the use of algorithms in the criminal justice system, Guo and colleagues note the absolute disconnect between the opaqueness of artificial intelligence and the concept of good governance and related criminal justice. While this view is generally seen as quite extreme, it is good to note the inherent incompatibility between opacity of artificial intelligence and transparent governance.

5: Discussion

This paper has highlighted solutions at different levels of the problem of implicit negative social bias being fed into algorithmic bias in artificial intelligence. Technical solutions try to solve this problem by changing the nature of algorithmic bias or implementing algorithmic fail-safes to correct any found bias. Commercial solutions focus on spreading awareness of the problem, and research and development. Political solutions determine the legality of existing artificial intelligence and serve to promote and direct research efforts. Finally, transparency solutions consider the issue of opaqueness in artificial intelligence, as ethical consideration preclude a solid understanding of the underlying mechanisms.

When considering the effectiveness of any proposed solution, it is important to consider how this solution relates to other fields (figure 2). Technical implementation must be commercially desirable in order to be successfully implemented, which precludes the spread of awareness. Commercial efforts in this field often rely strongly on governmental support, in financial terms as well as in validation. Increasing transparency in artificial intelligence will only be possible through technical innovation and can reinforce efforts in all other categories. Strength of proposed solutions thus lie in the synthesis with other solutions, rather than just their individual feasibility, which have been discussed in previous sections.

In relation to this concept of synthesis importance, it is important to note that no the scope of the current article is limited by the chosen actor fields (e.g. commercial). The current research focused on all levels of actors which have a clear effect on artificial intelligence algorithm creation (figure 2). Further research,