Data-driven AI development: an integrated and iterative bias mitigation approach

(1)

Data-driven AI development: an integrated

and iterative bias mitigation approach

YOUSSEF ENNALI Student number: 10873104

University of Amsterdam Faculty of Science Thesis Master Information Studies: Information Systems

Supervisor: dhr. prof. T. (Tom) van Engers Examiner: dhr. dr. F.M. (Frank) Nack Abstract

This explanatory case study explores the bias issue leading to discriminatory decisions generated by artificial intelligence decision making systems (AI-DMS). AI-DMS depend on data emerging from society, where discriminatory bias could be concealed. This bias could be transitioned to AI-DMS models leading to biased predictions. Due the benefits of such systems and technological developments in recent years, implementation of AI-DMS became more accessible. Triggering a wide interest in the bias issue of both academia and industry. Academic literature generally focuses on various bias mitigation methods, while the integration of these methods in the development process of AI-DMS models remains underexposed. In this study the concepts of bias identification and bias mitigation methods are explored to conceive an integrated approach of bias identification and mitigation in the AI-DMS model development process. Reviewing this approach with a case study proves that its application contributes in the development of fair and accurate AI-DMS models. Its iterative nature enables the combination of more bias mitigation methods in one model. Additionally, its step-by-step design empowers designers to be aware of bias pitfalls in AI, opening doors for an “unbiased by design” model development. From a governance perspective the proposed approach might serve as an instrument for internal auditing purposes of AI-DMS models.

Keywords: Artificial Intelligence Decision-Making Systems, Bias Mitigation, Bias, Legal Compliance, Explainable Artificial Intelligence

1 Introduction

“Unfortunately, we have biases that live in our data, and if we don’t acknowledge that and if we don’t take specific actions to address it then we’re just going to continue to perpetuate them or even make them worse.”

– Kathy Baxter, Ethical AI Practice Architect, Salesforce

A quote revealing a hidden danger in utilizing real-world data in artificial intelligence decision making systems (AI-DMS). Data driven AI-DMS are applied to support in decision-making processes to e.g. reduce the workload. Several cases revealed that utilization of such a technology also comes with a major drawback, referring to bias in prediction and/or decision outcomes. Besides the benefits, these systems might have an undesirable biased outcome. Making it a double-edged sword. The biased outcomes are derived from data which contain either explicit and/or implicit human biases (Shrestha et al., 2019), as the data used is a representation of the real-world. These pre-existing biases manifested in data emerge from society, end up in our technical systems (Gu & Oelke,

(2)

2019; Ntoutsi et al., 2020), eventually sustaining and even amplifying a discriminative society (Karimi et al., 2018; Ntoutsi et al., 2020; Shrestha et al., 2019).

A well-known example is the system COMPAS used in the US determining a risk score for recidivism amongst convicts. Criminal history among other variables are used to predict the risk score. African-Americans were more likely to score a higher risk value than their actual risk compared to Caucasians (Angwin & Larson, 2016; Shrestha et al., 2019; Sileno et al., 2019). COMPAS is used to support decisions regarding the placement, supervision and case management of defendants. In the Netherlands the authorities used an AI-DMS called SyRI to track down suspects of fraud. It was concluded in a court order due the lack of transparency there is a considerable risk that the system might discriminate, stigmatize and be an invasion on the privacy of citizens (Steen, 2020). Evidently, the presented examples reveal that implementation of these systems is intertwined with ethical and legal implications.

Although artificial intelligence (AI) emerged more than a half century ago, the interest in this field grew considerably in recent years, in both academia and industry (Pretorius & Parry, 2016). Access to computational power and the possibility of storage of large amounts of data are contributing to this increase of interest (Louridas & Ebert, 2016a). AI, a broad computer science discipline, with the purpose of designing intelligence into machines. The field consists of several sub-fields, for instance robotics and machine learning. The purpose of the latter is getting machines to learn from data by incorporating statistical models and algorithms to perform a specific task, mostly regarding predictions and decisions (Louridas & Ebert, 2016a; Tecuci, 2012), which altogether are the core of an AI-DMS. Due the complex nature of such AI-DMS, an explanation for generated outcomes is difficult to achieve.

It is argued that debiasing data should contribute to the development of fair, accountable and transparent AI-DMS (Bolukbasi et al., 2016; Shrestha et al., 2019). A difficult task since human bias might be hidden in data due certain proxy variables, resulting in proxy discrimination (Datta et al., 2017; Ntoutsi et al., 2020). An example is the home address, which indicates a neighborhood where the majority of the population is of a certain ethnicity, making the address a variable related to ethnicity. An AI-DMS model could incorporate these proxy variables to generate predictions used for decisions, resulting in discriminative decisions, e.g. rejecting an insurance application. There is currently an ongoing concern in the Netherlands regarding a system similar to COMPAS incorporating postal codes of convicts to predict their recidivism score. It is argued that this system, RISC, facilitates ethnical profiling through the proxy variable postal code (Schuilenburg, 2020).

Despite that AI-DMS have a huge potential with various benefits the bias drawback is not one to be ignored. Organizations are required to resolve these ethical and legal issues to achieve acceptance of their AI-DMS. In the academic field a considerable amount of research focus on debiasing methods to cope with the bias issue. However, these studies focus on a specific debias method and/or subject while the development process of machine learning models is underexposed. Also, debiasing methods could differ due the algorithm and data used for the model. There are different machine learning frameworks suggested in studies and also in the industry. However, a framework in which debiasing is incorporated is yet to be conceived.

In this explanatory case study, the related key concepts of the bias issue and model development are gathered. Academic literature covering the research topic are reviewed to explore and describe relevant concepts, enabling a thorough and substantial understanding of the research topic. Eventually a framework is proposed in which debiasing is

incorporated, providing an end-to-end framework of bias free and accurate AI-DMS development. This framework is reviewed by utilizing it to develop an AI-DMS model.

For the case study concepts of bias in machine learning are explored and relation with model development. The aim is to draft up a framework for model development

incorporated with bias mitigation methods. This framework should enable organizations and data scientists to: 1) be aware of the bias pitfalls in data 2) understand and carry out bias mitigation methods while 3) ensuring the models accuracy, altogether leading to legally and ethically acceptable AI-DMS models. The point of departure in this study is the following research question: what are the bias mitigation methods and how could these be incorporated into the AI-DMS model development process?

(3)

This study starts with the exploration of academic literature regarding the research context. A framework and proposed approach are conceived by incorporating concepts of the explored literature. With a case study the proposed approach is reviewed leading to insights on the research topic.

2 Conceptual framework

In figure 1 a conceptual framework is visualized for the understanding of the concepts relevant for this study. This conceptual framework forms the context in which the study takes place. The concepts will be discussed in the following sections.

Figure 1: Illustration of a conceptual framework for machine learning in relation with bias 2.1 Artificial Intelligence, machine learning and deep learning

AI, a broad discipline regarding the development of systems that exhibit what humans perceive as intelligence (Tecuci, 2012). As the study’s focus is on data-driven AI systems the topics will be restricted to the following relevant core principles of AI: machine learning and deep learning. Machine learning is currently considered a disruptive technology with a huge potential for organizations. It is believed that machine learning could enable organizations to reduce costs, reduce the workload and improve customer retention and acquisition (Lee & Shin, 2020). The general idea behind machine learning is to teach a computer, hence “machine learning”, to perform a task by feeding it training examples, in most cases data. The machine is then presented with new data to perform the task for which it was trained (Louridas & Ebert, 2016a), generally these tasks concern predictions.

Three machine learning approaches can be applied; supervised, unsupervised and semi-supervised learning. Supervised entails feeding training data with its solutions, while unsupervised is feeding training data without solutions and basically letting the machine figure out the solution (Barredo Arrieta et al., 2020; Lee & Shin, 2020; Louridas & Ebert, 2016a). In the semi-supervised approach elements of supervised and unsupervised are combined to train a machine. Mostly this is used in cases when classification of data is missing, time constraints to label output data or to obtain classifications rules for labeling purposes (Lee & Shin, 2020).

To perform a task, machine learning depends on a classification method known as an artificial neural network (ANN). An ANN consists of nodes. In simple terms a neural network is a collection of algorithms for approximation purposes (Louridas & Ebert, 2016b; Pretorius & Parry, 2016). Algorithms can be used to develop a model which seeks relations between elements of data and recognize patterns to generate an approximation (Lee & Shin, 2020; Shrestha et al., 2019).

Deep learning is a subset of machine learning using a deep neural network (DNN) (Lee & Shin, 2020; Ntoutsi et al., 2020). It consists of more (hidden) layers and nodes. Unlike

(4)

solving. Its accuracy is higher, though it takes more time to train a model compared to ANNs used in the traditional machine learning methods. Another important property of DNN is that it provides less transparency due its hidden layers, complexity and independent mechanism to seek patterns and relations in data, resulting in black-box models (Barredo Arrieta et al., 2020). Deep learning models are usually trained with a large amount of (unclassified) data (Lee & Shin, 2020).

Machine learning algorithms depend on training data to learn to perform a specific task. Training data is a set of examples with (supervised) ( (Gu & Oelke, 2019; Louridas & Ebert, 2016b) or without correct outcomes (unsupervised). Usually these examples derive from the real world, e.g. data set of customers with purchase history. In the process of model development, the training data set is split into a training and test set. While the training set is used to train the model, the test set is to validate the model (Barredo Arrieta et al., 2020; Gu & Oelke, 2019; Liao et al., 2015). In this study the focus will be on the supervised learning approach.

2.2 Bias and fairness

Bias, a persistent multifaceted societal problem is generally considered to be in favor or against an individual or group with certain properties (e.g.: age, gender, ethnicity, sexual orientation, religious background and so forth) (Gu & Oelke, 2019), in a way that is unfair. It is a longstanding phenomenon as old as human civilization (Ntoutsi et al., 2020). Due its multifaceted character it is studied in many disciplines including social science, computer science, psychology, philosophy, law and so forth. In this study the coverage of bias is restricted to the following definition: a prejudice or tendency in predictions made by an AI-DMS leading to decisions against or in favor of one individual or group in a way

considered to be unfair.

2.2.1 Explicit and implicit bias

Usually the training data consists of historical events in the real world to predict future outcomes. Since the bias problem originates in society the problem is obviously

transitioned to the data. Bias in data can be present in an explicit or implicit manner, in other words direct or indirect bias (Bolukbasi et al., 2016). Explicit bias is more obvious to identify in data. For instance, data containing variables depicting the ethnicity, gender or other properties which could result in discriminative implications. Such variables are also known as sensitive features/attributes.

A more challenging bias to identify is the implicit kind. Bias could be present through proxies (Datta et al., 2017; Ntoutsi et al., 2020), these are variables which indirectly correlate with other features. Some examples are postal codes where the majority of the population is of a certain ethnicity, a first name related to the gender of individuals, a first and/or last name usually used in certain cultures or religions and so forth.

2.2.2 Other types of bias in machine learning

Besides bias leading to discriminating or unfair decisions machine learning experts distinguish three other types of (statistical) bias (Gu & Oelke, 2019; Sileno et al., 2019).

1) Covariate shift is a type of bias where the distribution of the training set is different from the test set. E.g. training the model on a younger population while the test set has an older population.

2) Sample selection bias is a flaw in the selection process where non-random data is selected, causing a higher or lower sampling. Eventually having a

non-representative sample of the population intended.

3) Imbalance bias less examples for a certain outcome than the other. More examples of convicts that got their early release application rejected than ones accepted. 2.3 Legal implications

Most countries have anti-discrimination laws included in their constitution to achieve equality between citizens. Social-cultural structure differs per country and its likewise considering the legislation regarding anti-discrimination. In some countries same-sex marriages are forbidden and prosecutable by law. In contrast, other countries permit same sex marriages by law and even have protection laws for citizens with a same-sex sexual orientation. Considering the Dutch anti-discrimination law, article 1 of the constitution prescribes that all citizens should be treated equally. This article acts as the foundation for

(5)

all lawbooks in the Netherlands. Additional laws in these lawbooks prescribe that equality is for all citizens despite their religion, beliefs, political preference, race, gender,

nationality, sexual-orientation either hetero- or homosexual, age, physical or mental disability, chronical and psychological diseases and form of employment (part- or full-time). Citizens, organizations and authorities are obliged to adhere to these laws and even prosecutable by law should compliance fail to happen. Making discriminative remarks, remarks to incite hatred or participate in activities with the aim of discrimination are punishable by law (Rijksoverheid, 1954).

Taking these laws into account a biased AI-DMS is an undesired legal implication for all beforementioned stakeholders. However, due the novelty of the bias problem in AI-DMS, the complexity and lack of transparency of these systems, law enforcement apparently is a difficult task to achieve. Nevertheless, organizations in the Netherlands should design AI-DMS free of bias to overcome these legal implications.

It should be noted that AI models explaining a phenomenon is legally permissible. For instance, investigating causal relations between the fatality of disease outbreaks and population properties like gender, age, ethnicity and so forth. Applying these causalities to predict survival rates of infected individuals is still legal. It becomes illegal once these predictions are incorporated to decide on the door policy of hospital ICU’s. In this situation predictions lead to unfair decisions against individuals or group with certain traits, thus establishing inequality between citizens.

2.4 Governance

The European Commission (EC) prescribes that data governance should be in place when using personal data for privacy purposes (Otto, 2018). Its core principle revolves around ensuring quality and integrity of data, data privacy, protection and data accessibility. Such guidelines of authorities for AI systems are not yet established, authors emphasize these should be conceived as well and call for Responsible AI (Barredo Arrieta et al., 2020). Responsible AI entails a series of principles, governance being one of them, that are necessary when deploying AI in applications.

There are two perspectives on governance: 1) committees that review and approve AI development, and 2) leaving the responsibility to employees. Both perspectives could co-exist, however the first perspective is more likely to decrease the agility of organizations in AI development (Barredo Arrieta et al., 2020).

The preceding governance perspectives focus on internal organizational activities. Though, organizations require to govern their AI-DMS to comply with laws and regulations another actor is required to oversee compliancy, specifically referring to authorities acting as regulators.

With regulations, laws and regulators transparency of AI-DMS could be achieved, contributing to trust (Barredo Arrieta et al., 2020; Kroll, 2018), inducing a wider acceptance of these systems. Arranging internal and external audits to assess compliancy is a well-known mechanism in the information technology area. The audit reports should be made available to contribute to the trustworthiness of AI-DMS. An external third party auditor is necessary for this to succeed (Barredo Arrieta et al., 2020).

2.5 Intellectual property

With the introduction of governance, organizations might be reluctant to cooperate since transparency could mean revealing the source code of AI systems. This puts organizations at a disadvantage since other competitive organizations could procure and reuse the source code to their own advantage. Barredo Arrieta et al. (2020) argue that the assessment of algorithms, data and design process contribute to the trustworthiness of AI-DMS. In the process of the assessment the authors emphasize that preservation of the intellectual property of these AI systems is necessary. Explainable AI (XAI) methods are considered to be a solution for audit purposes. However, in a recent study it proved to be a rather challenging ordeal (Oh et al., 2019). This is due the fact that confidentiality could be comprised only by giving access to the input and output of these systems (Barredo Arrieta et al., 2020). By acquiring input and output of an AI-DMS the model could be reverse engineered through XAI methods. In conclusion, further research in the XAI domain is required to assess bias in AI-DMS while preserving its confidentiality.

(6)

2.6 Delegation issues

As a result of the emergence of AI-DMS in recent years, organizations are confronted with revision of their decision-making structures. Since AI-DMS serve as actors in the decision-making processes, organizations should consider the role of AI-DMS in these decision-making structures. Traditionally decision-making is delegated to managers as actors, a full or partially delegation (hybrid) to AI-DMS are to be considered, with both their own implications.

A full human to AI delegation means leaving the decision-making fully to AI-DMS (Shrestha et al., 2019). The benefits of this approach are a high decision-making speed while processing a huge amount of data (not restricted by human capacity) and outcomes could be replicated easily. Limitations are low interpretability due the complex nature of the algorithms and the design should be carried out thoroughly to prevent bias in the models.

A hybrid delegation is to partially incorporate humans and AI-DMS in the decision-making process. Either the humans contribute at the start or at the end, and simultaneously the AI-DMS on the opposite side of the process. Both alternatives have a low decision-speed due human involvement. Its interpretability depends on whether the AI-DMS is at the start or at the end of the process. At the start of the process means a lower interpretability while at the end means the opposite since humans are involved in the final decision. In both cases the replicability is low since outcomes are vulnerable to human variability. The design of algorithms should also happen here thoroughly to prevent bias.

2.7 Debiasing phases in the development process

It is argued by authors that bias mitigation (debiasing) should contribute in fair AI-DMS outcomes (Bolukbasi et al., 2016; Ntoutsi et al., 2020; Shrestha et al., 2019; Sileno et al., 2019). Debiasing could be carried out in three different phases in the development process of AI systems (Barredo Arrieta et al., 2020; Ntoutsi et al., 2020). The three phases are:

1) pre-processing: mitigation is applied before the model is trained. Bias is tracked down and removed or hidden from data. Reweighing of sensitive features is also a common method in this phase.

2) in-processing: mitigation is carried out during the training of the model. The aim is to achieve fairness by minimizing the ability to predict sensitive features while simultaneously optimizing accuracy of predictions.

3) post-processing: carried out after the model is trained. Outcomes can be adjusted by applying other weight factors to reduce differences between groups.

In the second and last phase XAI techniques are used to track down possible bias. To identify bias in data, designers should be aware of bias types in both technical and non-technical sense (Barredo Arrieta et al., 2020). A multi-disciplinary background is therefore necessary. Debiasing data could be reached by firstly identifying the types of bias, secondly determining whether to prune or neutralize the bias (Sileno et al., 2019). Pruning is removing corresponding biased variables. Neutralizing is by hiding or anonymizing the corresponding variables. There are other debiasing methods which will be discussed in the next section.

2.7.1 Bias mitigation methods

Depending on the phase of the AI-DMS development process different bias mitigation methods can be carried out. In academic literature these mitigations methods are described which resulted in in multiple Python packages covering these methods. One of these packages is the Artificial Intelligence Fairness 360 (AIF360) package developed by IBM Research. In AIF360 different bias mitigation methods in the three phases of model development are incorporated into algorithms (IBM Corporation, 2018). These methods are per phase depicted in figure 2, each method is shortly explained afterwards.

(7)

Figure 2: Representation of mitigation methods in machine learning phases Pre-processing:

• Learning fair representations: by obfuscating information about sensitive features fair representation are achieved

• Optimized preprocessing: a probabilistic transformation approach which edits features and labels in the data with group fairness, individual distortionand data fidelity constraints and objectives

• Reweighing: a technique where sensitive attributes are provided with a weight factor to generate predictions. Another measure in this technique could also be the removal (pruning) sensitive features

• Disparate impact remover: the transformation of features to achieve group fairness

In-processing

• Adversarial debiasing: maximizing a model’s accuracy while preventing an adversary’s ability to incorporate sensitive features into the predictions. Equality constraints are used to achieve this goal

• Prejudice remover: by adding a discrimination-aware regulation to the learning objective bias is neutralized

Post-processing

• Equalized odds postprocessing: changing output target by solving a linear program by finding probabilities to optimize equalized odds.

• Calibrated equalized odds postprocessing: similar to the prior method, however a calibrated classifier is used in the process.

• Reject option classification: a positive/negative discrimination approach, where the privileged group are foreseen with unfavorable outcomes and the unprivileged group with favorable outcomes. This is done within a bandwidth around the decision boundary to neutralize the gap between the two groups. In the pre-processing methods the training data is manipulated to achieve fairness. In-processing methods generate classifiers to cope with bias. Lastly, in the post-In-processing methods bias is mitigated in the predictions.

2.8 Fairness versus accuracy

The model’s accuracy is an important factor to measure the performance of the model. There are different statistical measures that could be used to determine a model’s accuracy, this depends on the kind of algorithm used. Debiasing is surely to affect a model’s

accuracy, since debiasing entails the modification of either the variables, the algorithm or the target labels. To achieve a fair and accurate model both measures should be considered in the development process.

2.9 Explainable Artificial Intelligence (XAI)

Explainable AI (XAI) is a research area aiming to grasp the unexplainable character of AI systems thus achieving transparency and interpretability (Páez, 2019; Sileno et al.,

(8)

al., 2020), without limiting the effectiveness of AI-DMS. Therefore, XAI suggests (1) the generation of more explainable models of AI-DMS while maintaining its accuracy and performance. At the same time, (2) providing humans with understandable decision outcomes, eventually reaching a higher level of trust (Barredo Arrieta et al., 2020). In this study XAI methods are used to explain the model by for instance: calculating and visualizing feature importance, detect bias in models etcetera.

2.10 AI model development process

In this study a generic development process is proposed for AI model development. Following this process steps by step enables bias detection and its mitigation. This process is the result of the integration of additional activities into the usual AI model development process. Bias detection, bias mitigation method selection and applying the mitigation method are the additional activities. The iterative nature of the process is to achieve acceptable fairness while maintaining the model’s accuracy. Since the process is setup in a generic way it could be applied broadly (supervised and unsupervised) and its utilization is platform-independent. The process is illustrated in figure 3.

Figure 3: Machine learning development process with an integration of bias identification and mitigation Step 1: Data preparation

After the collection of data for a certain business case the data should be prepared. In this step the data is cleansed from data quality issues, variables are transformed (categorical data to binary variables, calculation of date fields etcetera). During this step biased

variables could be uncovered. Step 2: Data exploration

The exploration of data is carried out in this step to understand the shape of the data and its emergence in society. Demographics and other distributions of the population are analyzed. Also, the determination of correlations between variables and selection of the target variable is done in this step.

Step 3: Algorithm selection and model development

In this step an appropriate algorithm is selected. The selection depends on the number of observations, whether a quantity or category is to be predicted. A last important criterion is whether the dataset contains the solutions (labeled). A dataset with solutions means the selection of a supervised learning algorithm, otherwise it will be an unsupervised learning algorithm.

After selection the model is developed, in case of a supervised learning algorithm the dataset is split into a training and test dataset. The training dataset is used to train the model and the test dataset is to validate the model.

(9)

Step 4: Accuracy determination

Once the model is developed the accuracy should be measured. Depending on the use case a threshold could be used to determine whether the accuracy score is acceptable or not.

Step 5: Biased variables identification

In the first step possible biased variables could be identified with the correlation measures. In this step XAI methods could be used to measure feature importance to discover whether sensitive features contribute to the model’s prediction. Sensitive features identified in step 1 could be assessed with the same XAI methods.

Step 6: Fairness determination

XAI methods with fairness computation functions are used to calculate fairness differences between sensitive features. In this step a division is made between privileged and unprivileged groups using the sensitive features.

Step 7: Bias mitigation method selection

A bias mitigation method is chosen to neutralize biased outcomes. This step is repeated until a suitable mitigation method resulting in an acceptable fairness and accuracy score is found.

Step 8: Application of the bias mitigation method

The mitigation method is applied to the model to neutralize bias in the model’s outcome.

Step 8a: Model retraining and revalidation

Provided that a pre-processing mitigation method is used the model should be retrained and revalidated. This due the modifications in the training data.

Step 9: Fairness and accuracy determination

After neutralizing the bias in the model, the accuracy and fairness score of the model is recalculated. If the mitigation method results in acceptable scores a fair and accurate model is achieved. In case the scores are not acceptable another mitigation method should be selected (step 8). On condition that another bias mitigation is possible or available, otherwise other variables should be sought to enrich the model. The process is then repeated from step 4. Provided that the enrichment of the model is not feasible due the lack of variables, more data should be collected otherwise alternatives for the prediction should be considered.

3 Case study

For this research an explanatory case study is conducted using a sample dataset to develop a fair and accurate model using the proposed development process. The case study entails a supervised machine learning model to explore possible bias emerging from data. Subsequently the bias is mitigated by applying one or more mitigation methods. Fairness and accuracy calculation are conducted to reach acceptable scores for a fair and accurate model.

3.1 Explanatory case study

Mainly case studies can be distinguished in three types, namely descriptive, explanatory and exploratory (Fletcher et al., 1997; Yin, 2014) which are shortly described in this paragraph. Firstly, descriptive case studies describe natural phenomena which occur within the data, usually in a narrative form. In this type of case studies, the researcher begins with a descriptive theory to support the explanation of the phenomena or story. Secondly, exploratory case studies are for exploration of phenomena in the data that are a point of interest of the researcher. Customarily general research questions are used in order to arrive at preliminary findings on the research topic. A researcher might also carry out prior fieldwork and gather small scale data before drafting the research question(s) and

hypotheses, assisting in the preparation of a framework. Lastly, explanatory case studies are for the examination of data, both on the surface and on deep level to explain phenomena in

(10)

data. The researcher may form a theory based on the data and eventually test this theory (Fletcher et al., 1997).

For this research an explanatory case study approach was chosen to establish a thorough exploration of possible bias pitfalls in designing AI-DMS models. Data was extracted by evaluating previously conducted research revolving around the topics: machine learning, bias in data, explainable artificial intelligence, governance and so forth. Enabling the arrival at a research question and conceptual framework. The conceptual framework assists firstly in the comprehension of the research area. Secondly, it serves as a framework to assess and explore bias in data used for AI-DMS models. Lastly, the proposed framework, provides a step-by-step walkthrough to arrive at a fair and accurate AI-DMS model.

For the case study a dataset of COMPAS (Ofer, 2017) is used to review the proposed process of the conceptual framework. This dataset contains data of convicts: demographics, prior criminal history, recidivism scores etcetera. A model is built in which the recidivism risk is predicted, using other variables to contribute to the prediction. The recidivism risk is based on the COMPAS assessment outcome whether a defendant is low, medium or high risk. Since the dataset contains recidivism risk levels on which is the target variable is based, a supervised learning approach is carried out.

3.2 Python

Python, an open source programming language known for its simplicity is used to design the model. It is broadly used by data scientists and machine learning experts. Python provides an extensive selection of machine learning libraries and frameworks, which contribute to the choice for this programming language. Due its popularity and large community, various online sources are available regarding the development of machine learning models (Raschka et al., 2020). For the development of models the sci-kit learn package is used (Scikit Learn Developers, 2020). For model explainability and bias detection the FairML (Adebayo, 2017) and AIF360 packages are used, which derive from XAI concepts. With the AIF360 package bias mitigation is applied.

4 Findings

4.1.1 Data preparation

The COMPAS dataset contains data of over 10,000 criminal defendants in Broward County, Florida in the United States. Of these defendants 18,316 observations are included in the dataset. This dataset contains a track record of the criminal history of defendants and outcomes of a predicted risk score generated by COMPAS spanning over a period of two years. The following variables of defendants are enclosed in the dataset:

• first name: first name of the defendant. • last name: last name of the defendant. • age: age of defendant.

• gender: gender of defendant.

• ethnicity: the ethnicity of the defendant.

• juvenile count: the number of times the defendants broke the law in juvenile years. A combination of two features: juvenile considered misdemeanors and other juvenile incidents.

• priors count: the number of times the defendant was criminally charged and convicted.

• violence score: a score on a scale of 0 to 10 indicating how violent the defendant is (estimated by COMPAS). The higher the score the more violent the defendant is. • recidivism score (target variable): a score on a scale of 0 to 10 indicating the risk

that the defendant is likely to recidivate. The higher the score the more risk. • event: a binary value indicating whether an event occurred while the defendant was

incarcerated.

• recidivist: a binary value which indicates whether the defendant is a recidivist or not.

• recidivism risk level: a level indicating whether the defendant is considered a low, medium or high risk to recidivate (target/dependent variable).

(11)

The demographics data consists mostly of sensitive features e.g. sex, ethnicity and age. First and last name are also included, which could be proxy variables (related to ethnicity and/or religious background).

The dataset was cleansed in this step from data quality issues e.g.: duplicate variables (could cause multi-collinearity issues), incomplete records, records with incorrect dates etcetera. After cleansing 14,241 observations remained. Additionally, some transformation between two field dates have been carried out (duration of jail time – days_in_jail). Lastly, the recidivism risk level was transformed to a binary variable, in which the low level is indicated by 0 and the medium and high levels by 1. This variable will be the target variable in the model. The data preparation source code is enclosed in Appendix A. 4.1.2 Data exploration

4.1.2.1 Correlation coefficients

To measure the statistical relationship between the variable the Pearson’s correlation coefficient was calculated. The correlation matrix is depicted in a heatmap together with the correlation coefficients between the independent and target variable in figure 4.

Figure 4: Correlation matrix and correlation coefficients It seems that the strongest positive relations with the target variable

(medium_to_high_risk) is the recidivism score, violence score followed by priors count, the African-American race and whether the defendant is a recidivist. The target variable is based on the COMPAS recidivism risk level, which directly derives from the recidivism score (0-4: low, 5-7: medium, 8-10: high). Due this the correlation coefficient is high and might cause multi-collinearity.

On the negative correlation side, it is mainly age. Suggesting that the older the

defendant is the lower the recidivism risk is. There are some weaker relations which could also contribute to the prediction’s accuracy in the model. Based on these correlation coefficient values it seems that the model could be biased should these features be incorporated.

4.1.2.2 Demographics analysis

Considering the demographics of the observations in the data the largest group is male. From an ethnicity perspective African-Americans are the largest group followed by Caucasians and Hispanics. The distribution of sex and ethnicity is illustrated in figure 5. Furthermore, the mean age is 34, the median is 30 and the mode is 20.

(12)

Figure 5: Sex and race distribution

The distribution of age and sex is depicted in the graph in figure 6. It seems most observations have the age between 19 and 33. The higher the age the less observations. The distribution for both sexes follows the same trend.

Figure 6: Sex and age distribution

Considering the age in relation with the means of recidivism score, violence score, priors count, whether the defendants is a recidivist results in a trend illustrated in figure 7.

(13)

Figure 7: Trend graph from an age perspective per means of recidivist, priors counts and violence score

It seems that priors show no visible trend. However, the violence and recidivism score show a similar trend. Both appear to be high at the young adulthood and slowly decreasing while the age increases. For the priors this seem to increase over age, a logical development since a criminal track record is more likely to be higher at an older age.

The distributions of recidivism score, violence score and priors are depicted in figure 8 together with means, medians and modes. All distributions appear to be skewed to the right (positive skewness). For the violence and recidivism score the modes, means and medians are equal. Most observations have a score of 1 and an average of 4 for violence and 5 for recidivism.

(14)

Figure 8: Distribution of recidivism, decile score and prior count. Along with means, medians and modes 4.1.3 Model selection

For this case study based on the number of observations and target variable were considered to select an algorithm. The data consists mainly of numerical variables and the target variable is a binary variable. Considering these properties, a logistic regression algorithm was used for the model.

4.1.4 Model development and accuracy determination

The logistic regression model was trained by a couple of independent variables which indicated a correlation with the target variable in the data exploration step. To prevent multi-collinearity (overfitting) of the model a variance inflation factor (VIF) calculation was performed, refer to Appendix A for the calculation. This resulted in the exclusion of recidivism score and dropping one of the variables after dummy transformation of categorical attributes. Violence score, priors count, juvenile incident counts, days in jail, event during imprisonment, age, sex and race were used as independent variables to train the model for recidivism risk predictions.

For accuracy measures the k-Fold Cross Validation (k = 5) was carried out. Secondly, the accuracy was calculated for the training and test set using the mean accuracy. The cross validation resulted in an average accuracy of 0.83. A nearly equal outcome (0.82) for the accuracy of the training and test set were computed, which in all cases is an acceptable accuracy for the model. Lastly, a confusion matrix was generated which confirms the prior outcomes as depicted in figure 9.

(15)

Figure 9: Confusion matrix of logistic regression model predictions

1,629 (true positives) and 1934 (true negatives) were predicted against 339 (false positives) and 371 (false negatives). From which can be concluded that 83% of the total predictions were correctly predicted. Appendix C contains the validation methods source code.

4.1.5 Bias identification and fairness determination

In the data exploration step sensitive features in the demographics were identified. These attributes (age and race) were used as input to develop the model. Subsequently, with the application of an XAI package FairML the feature importance was computed. Fairness was calculated with the AIF360 package of IBM, this method computes the mean

difference of the prediction outcome between sensitive features. Refer to Appendix D for the source code of these steps of the model development. Figure 10 illustrates the feature importance which contribute to the prediction.

Figure 10: Feature importance of biased model

Features contributing positively to the model’s predictions are violence score, age, African-American ethnicity and priors count. Meaning that a higher age, violence score and priors count results in higher recidivism risk level. African-Americans are more likely to score a higher than other ethnicities. On the negative side it is mainly the male gender and

(16)

Based on this outcome the gender and ethnicity are identified as bias, where females and African-Americans are the unprivileged group and males and Caucasians the privileged groups. A minor contributing feature is the Hispanic ethnicity. Indicating that more ethnicities might be privileged compared to the African-Americans. The fairness

calculation for gender resulted in a mean difference of -0.06 between male and female. For the ethnicity the fairness gap was larger with a mean difference of -0.26, between African-Americans and other ethnicities. Another identified bias is the defendant’s age, although this could be defensible in the context of the justice system. Clearly the model generates biased outcomes which should be eliminated. For bias mitigation the focus will be on gender and ethnicity.

4.1.6 Bias mitigation selection and application

Considering the gender bias a reweighing method was chosen since the fairness gap is small between the groups. Since the ethnicity consists of more groups this attribute was transformed to a binary variable which indicates whether the ethnicity is African-American. Other ethnicities are grouped in one binary value, which results in a partially an

anonymization bias mitigation method. This also enables unfairness detection between African-Americans and other ethnicity groups besides Caucasians and Hispanics, simultaneously eliminating possible bias between non-African-American ethnicities. The source code for these steps are enclosed in Appendix E.

Reweighing the gender feature resulted in a fairness mean difference of 0.00 between male and female. Indicating that the model is debiased from gender inequality.

As expected, the sex debiased model exhibit a racial bias, illustrated in figure 11. The accuracy computation resulted in 0.82 in all accuracy calculation methods; cross validation, confusion matrix and mean accuracy.

Figure 11: Feature importance of sex debiased model

Between the African-American and other ethnicity’s a fairness mean difference -0.06 was computed. By anonymizing other ethnicities, the unfairness gap for African-Americans seems to decrease. Still, this small bias should be eliminated to achieve a fair model. This was done by using the reweighing method again.

After reweighing the race attribute, the model’s accuracy appeared to decrease insignificantly to a score of 0.81. The fairness mean difference between the two ethnicity groups decreased to an acceptable level of 0.00, meaning that the racial bias was eliminated from the model. Recalculating the feature importance resulted in figure 12.

(17)

Figure 12: Feature importance of race debiased model

As the graph indicates the African-American ethnicity has a negative effect on the prediction, this was due the reweighing mitigation method. This method neutralizes the gap between groups by generating the same weight for feature importance of

African-Americans as other ethnicity groups. Thus, arriving at an equal opportunity for both groups in the model’s predictions.

5 Conclusion and future research

The point of departure for this study was the following research question: what are the bias mitigation methods and how could these be incorporated into the AI-DMS model development process?

Exploration of academic literature resulted in a conceptual framework and integration of an iterative bias identification and mitigation in the AI-DMS model development process. Subsequently the proposed approach was reviewed by developing an AI-DMS model. The iterative nature of this approach enabled the combination of multiple debiasing mitigation methods in the model. An accurate and fair model was the result of the proposed approach. Bias mitigation methods prove to be powerful in elimination of unfairness between groups while restricting the effect on the model’s accuracy.

Obviously, the incorporation of sensitive features should be prevented. Enriching data with non-sensitive features is the better alternative. Might this alternative not be feasible for any reason then the proposed integrated and iterative approach provides a clear step-by-step walkthrough to develop an accurate and fair model. Even in cases where a model is not fed with sensitive features this approach could be appropriated to assess whether bias is incorporated in a model. Making it an instrument for internal audit purposes (fist line of defense).

Additionally, this approach enables bias and bias mitigation awareness which should enable organizations to arrive at an acceptable solution for all stakeholders regarding the bias issue. It is an approach which is platform-independent, meaning that other machine learning tools could be used. Certainly, an understanding of the mentioned concepts is the criterium for the utilization of the approach. The additional framework provides the explanation of relevant concepts that contributes to meet this criterium. Utilization of the proposed framework and approach provides a roadmap for an unbiased by design development of AI-DMS models.

It proved to be challenging to acquire data for this research. It was intended to collect data on a recent case. However, this was not feasible due privacy and intellectual property issues. Carrying out a research in a recent organizational context could contribute to new valuable insights.

In this research multiple in-processing methods were combined. For future research insights could be gained by combining other mitigation methods across other model development phases (in-processing and post-processing).

How XAI could contribute to model auditing while protecting the intellectual property is an interesting direction for future research. Since law enforcement is only achievable if AI-DMS are auditable by an independent party.

(18)

It proves to be challenging to delegate decision-making fully to machines, especially due the bias issue. Thus, preventing the usage of AI-DMS to its full potential. This makes it interesting to explore whether the proposed framework and process could contribute to an automated bias identification by implementing this approach in machine learning pipelines. References

Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Adebayo, J. (2017, June 28). fairml · PyPI. Pypi.Org.

https://pypi.org/project/fairml/

Angwin, J., & Larson, J. (2016). Machine Bias. ProPublica.

https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing

Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., & Herrera, F. (2020). Explainable Explainable Artificial Intelligence (XAI): Concepts, taxonomies,

opportunities and challenges toward responsible AI. Information Fusion, 58(December 2019), 82–115. https://doi.org/10.1016/j.inffus.2019.12.012

Bolukbasi, T., Chang, K. W., Zou, J., Saligrama, V., & Kalai, A. (2016). Man is to computer programmer as woman is to homemaker? Debiasing word embeddings. Advances in Neural Information Processing Systems, 4356–4364.

Datta, A., Fredrikson, M., Ko, G., Mardziel, P., & Sen, S. (2017). Proxy Non-Discrimination in Data-Driven Systems. http://arxiv.org/abs/1707.08120

Fletcher, R. H., Fletcher, S. W., Jiménez, V., Díaz De Salas, S., Mendoza, V., Porras, C., Eisenhardt, K. M., Flyvbjerg, B., Yin, R. K., Study, C., Tellis, W., Gerring, J., Zainal, Z., Dooley, L. M., & Noor, K. B. M. (1997). Case study as a research method. Academy of Management Review, 5(2), 301–316. https://doi.org/10.1177/15222302004003007 Gu, J., & Oelke, D. (2019). Understanding Bias in Machine Learning. 1–12.

http://arxiv.org/abs/1909.01866

IBM Corporation. (2018). aif360.algorithms — aif360 0.1.0 documentation. Read the Docs. https://aif360.readthedocs.io/en/v0.2.3/modules/algorithms.html

Karimi, F., Génois, M., Wagner, C., Singer, P., & Strohmaier, M. (2018). Homophily influences ranking of minorities in social networks. Scientific Reports, 8(1), 1–12. https://doi.org/10.1038/s41598-018-29405-7

Kroll, J. A. (2018). Data Science Data Governance. IEEE Security & Privacy, 16(6), 61–70. https://www.data-science.ruhr/about_us/

Lee, I., & Shin, Y. J. (2020). Machine learning for enterprises: Applications, algorithm selection, and challenges. Business Horizons, 63(2), 157–170.

https://doi.org/10.1016/j.bushor.2019.10.005

Liao, P. H., Hsu, P. T., Chu, W., & Chu, W. C. (2015). Applying artificial intelligence technology to support decision-making in nursing: A case study in Taiwan. Health Informatics Journal, 21(2), 137–148. https://doi.org/10.1177/1460458213509806 Louridas, P., & Ebert, C. (2016a). Machine Learning. IEEE Software, 33(September), 110–

115.

https://books.google.ca/books?id=EoYBngEACAAJ&dq=mitchell+machine+learning+ 1997&hl=en&sa=X&ved=0ahUKEwiomdqfj8TkAhWGslkKHRCbAtoQ6AEIKjAA Louridas, P., & Ebert, C. (2016b). Machine Learning. IEEE Softwarel, 33(September),

110–115.

https://books.google.ca/books?id=EoYBngEACAAJ&dq=mitchell+machine+learning+ 1997&hl=en&sa=X&ved=0ahUKEwiomdqfj8TkAhWGslkKHRCbAtoQ6AEIKjAA Nassar, M., Salah, K., ur Rehman, M. H., & Svetinovic, D. (2020). Blockchain for

explainable and trustworthy artificial intelligence. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(1), 1–13.

https://doi.org/10.1002/widm.1340

Ntoutsi, E., Fafalios, P., Gadiraju, U., Iosifidis, V., Nejdl, W., Vidal, M. E., Ruggieri, S., Turini, F., Papadopoulos, S., Krasanakis, E., Kompatsiaris, I., Kinder-Kurlanda, K., Wagner, C., Karimi, F., Fernandez, M., Alani, H., Berendt, B., Kruegel, T., Heinze, C., … Staab, S. (2020). Bias in data-driven artificial intelligence systems—An introductory survey. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 10(3), 1–14. https://doi.org/10.1002/widm.1356

(19)

https://www.kaggle.com/danofer/compass?select=cox-violent-parsed.csv

Oh, S. J., Schiele, B., & Fritz, M. (2019). Towards Reverse-Engineering Black-Box Neural Networks. Lecture Notes in Computer Science (Including Subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), 11700 LNCS(1), 121–144. https://doi.org/10.1007/978-3-030-28954-6_7

Otto, M. (European U. (2018). Regulation (EU) 2016/679 on the protection of natural persons with regard to the processing of personal data and on the free movement of such data (General Data Protection Regulation – GDPR). International and European Labour Law, 2014(April), 958–981. https://doi.org/10.5771/9783845266190-974 Páez, A. (2019). The Pragmatic Turn in Explainable Artificial Intelligence (XAI). Minds

and Machines, 29(3), 441–459. https://doi.org/10.1007/s11023-019-09502-w

Pretorius, A., & Parry, D. A. (2016). Human Decision Making and Artificial Intelligence. 2217(96), 1–10. https://doi.org/10.1145/2987491.2987493

Raschka, S., Patterson, J., & Nolet, C. (2020). Machine learning in python: Main developments and technology trends in data science, machine learning, and artificial intelligence. Information (Switzerland), 11(4). https://doi.org/10.3390/info11040193 Rijksoverheid. (1954). Wettelijk verbod op discriminatie | Discriminatie | Rijksoverheid.nl.

Rijksoverheid. https://www.rijksoverheid.nl/onderwerpen/discriminatie/verbod-op-discriminatie

Schuilenburg, M. (2020, July 6). Ook politiedata kunnen gekleurd zijn of vervuild - NRC. NRC.Nl. https://www.nrc.nl/nieuws/2020/07/06/ook-politiedata-kunnen-gekleurd-zijn-of-vervuild-a4005092

Scikit Learn Developers. (2020). sklearn.linear_model.LogisticRegression — scikit-learn 0.23.2 documentation. Scikitlearn.Org.

https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html Shrestha, Y. R., Ben-Menahem, S. M., & von Krogh, G. (2019). Organizational

Decision-Making Structures in the Age of Artificial Intelligence. California Management Review, 66–83. https://doi.org/10.1177/0008125619862257

Sileno, G., Boer, A., & van Engers, T. (2019). The role of normware in trustworthy and explainable AI. CEUR Workshop Proceedings, 2381(December).

Steen, M. (2020, July 19). ‘Discussie over de transparantie van algoritmen blijft nodig’ | Het Parool. Het Parool. https://www.parool.nl/columns-opinie/discussie-over-de-transparantie-van-algoritmen-blijft-nodig~b882ed5f/

Tecuci, G. (2012). Artificial intelligence. Wiley Interdisciplinary Reviews: Computational Statistics, 4(2), 168–180. https://doi.org/10.1002/wics.200

Yin, R. K. (2014). Case Study Research. 282. https://doi.org/10.1086/421629 Appendices

A Data preparation code

1. %matplotlib inline 2. import numpy as np 3. import pandas as pd 4. import matplotlib.pyplot as plt 5. 6. import seaborn as sns

7. sns.set_theme(style="ticks", color_codes=True)

8.

9.

10. path = '/Users/youssefennali/Desktop/Thesis/Python Projects/Data Set

s/'

11. filename = 'cox-violent-parsed.csv'

12.

13.

14. # read the csv file

15. df = pd.read_csv(path+filename, sep=',', parse_dates = ['dob','c_jai

l_in','c_jail_out','c_offense_date','r_offense_date'])

16.

(20)

19. df = df[df['decile_score'] >= 0]

20. df = df[df['v_decile_score'] >= 0]

21. df = df[df['priors_count'] >= 0]

22.

23. #transform to dummies

24. # df = pd.get_dummies(df, columns=['sex','race'], drop_first=False)

25.

26. #drop empty column violent_recid and duplicate columns

27. # del df['is_violent_recid_1']

28. del df['priors_count.1']

29. del df['decile_score.1']

30. del df['violent_recid']

31. del df['id']

32. # del df['event'] #remove because it's strongly related to is_violen t_recid variable

33. # del df['start'] #remove because it's strongly related to is_violen t_recid variable

34. # del df['end'] #remove because it's strongly related to is_violent_ recid variable

35.

36. # Remove rows with empty jail dates

37. df.dropna(subset=['c_jail_in'], inplace=True)

38.

39. # Remove rows with an in jail in date after an out jail date

40. df = df.drop(df[df.c_jail_out < df.c_jail_in].index)

41.

42. # Remove timestamps from date

43. df['c_jail_out'] = df['c_jail_out'].dt.date

44. df['c_jail_in'] = df['c_jail_in'].dt.date

45.

46. # Number days in jail transformation

47. df['days_in_jail'] = df['c_jail_out'] - df['c_jail_in']

48. df['days_in_jail'] = pd.to_numeric((df['days_in_jail'] / np.timedelt

a64(1, 'D')),downcast='integer')

49.

50. #transform target variable

51.

52. # create a list of conditions for medium_to_high_risk

53. conditions = [ 54. (df['score_text'] == 'Low'), 55. (df['score_text'] == 'Medium'), 56. (df['score_text'] == 'High') 57. ] 58.

59. # create a list of the values we want to assign for each condition

60. values = [0, 1, 1]

61.

62. # create a new column and use np.select to assign values to it using our lists as arguments

63. df['medium_to_high_risk'] = np.select(conditions, values)

64.

65. # create a list of conditions for medium_to_high_risk

66. race_conditions = [

67. (df['race'] == 'African-American'),

68. (df['race'] == 'Caucasian'),

69. (df['race'] == 'Asian'),

70. (df['race'] == 'Hispanic'),

71. (df['race'] == 'Native American'),

72. (df['race'] == 'Other')

73. ]

74.

75. # create a list of the values we want to assign for each condition

76. race_values = [1, 0, 0, 0, 0, 0]

(21)

78. # create a new column and use np.select to assign values to it using our lists as arguments

79. df['african_american'] = np.select(race_conditions, race_values)

80.

81. df.to_csv('/Users/youssefennali/Desktop/Thesis/Python Projects/Stats

/Final/cox-violent-parsed_cleansed.csv')

82.

83.

84. CorrMatrix = df.corr(method="pearson", min_periods=1)

85.

86. CorrMatrix.to_csv('/Users/sef/Desktop/Thesis/Python Projects/Data Se

ts/temp/compas_correlation.csv') 87. 88. # sns.heatmap(df.iloc[:,:50].corr()) 89. # sns.heatmap(df.iloc[:,50:].corr()) 90. 91. 92. 93. 94. # print(CorrMatrix) 95. 96. ax = sns.heatmap(

97. CorrMatrix, xticklabels=True, yticklabels=True,

98. vmin=-1, vmax=1, center=0,

99. cmap=sns.diverging_palette(20, 220, n=200), 100. square=True 101. ) 102. ax.set_xticklabels( 103. ax.get_xticklabels(), 104. rotation=90, 105. horizontalalignment='right' 106. );

B Variance inflation factor for multi-collinearity elimination

1. import pandas as pd 2. import warnings 3. warnings.filterwarnings("ignore") 4. import matplotlib.pyplot as plt 5. import seaborn as sns 6. import statsmodels.api as sm

7. from statsmodels.stats.outliers_influence import variance_inflation_ factor

8. from statsmodels.tools.tools import add_constant

9.

10. path = '/Users/youssefennali/Desktop/Thesis/Python Projects/Stats/Fi

nal/'

11. filename = 'cox-violent-parsed_cleansed.csv'

12.

13.

16.

17.

18.

19. df = pd.get_dummies(df, columns=['sex','race'], drop_first = False)

20. # ,'is_recid','is_violent_recid'], drop_first=False) 21. 22. X = df[[ 23. 'v_decile_score' 24. ,'decile_score'

(22)

26. ,'age' 27. # ,'is_recid_1' 28. ,'sex_Male' 29. ,'sex_Female' 30. ,'race_African-American' 31. ,'race_Caucasian' 32. ,'race_Asian' 33. ,'race_Hispanic' 34. ,'race_Native American' 35. ,'race_Other' 36. ,'juv_misd_count' 37. ,'juv_other_count'

38. #, event #no effect on the model’s accuracy

39. ,'is_recid'

40. ,'days_in_jail' #this causes multicollinearity

41. ,'decile_score' 42. ,'medium_to_high_risk' 43. ,'african_american' 44. ]] 45. 46. #target variable 47. Y = df['medium_to_high_risk'] 48.

49. #Here we can see that X1 and X2 have a high and similar correlation coefficient

50. #(Also X3 and X4 have similar coefficients but they are lower so we can allow low collinearity)

51.

52.

53. #Method 2 to Detect MultiCollinearity

54.

55. def get_VIF(X , target):

56. X = add_constant(X.loc[:, X.columns != 'medium_to_high_risk'])

57. seriesObject = pd.Series([variance_inflation_factor(X.values,i)

for i in range(X.shape[1])] , index=X.columns,)

58. return seriesObject

59.

60. target = Y

61. print(get_VIF(X,target))

62.

63. #Here we Observe that X1 and X2 are having VIF value of infinity so we need to drop one of them

64. #(Any value greater than 5-6 shows MultiCollinearity)

C Model validation methods

1. %matplotlib inline

2. import numpy as np

3. import pandas as pd

4. import matplotlib.pyplot as plt

5. from sklearn.linear_model import LogisticRegression

6. from sklearn import metrics

7. from datetime import date

8. import datetime as dt

9. from sklearn.model_selection import train_test_split

10. from sklearn.model_selection import cross_val_score, cross_val_predi ct

12. from sklearn.metrics import classification_report, confusion_matrix

13. from fairml import audit_model

14. from fairml import plot_dependencies

15. import seaborn as sns

(23)

nal/'

19.

20.

23.

24.

25.

27.

28. X = df[[

29. 'v_decile_score'

30. # ,'decile_score' #removed since the target variable is based on this, otherwise the model will be overfitted

31. ,'priors_count' 32. ,'age' 33. ,'sex_Male' 34. # ,'sex_Female' 35. # ,'race_African-American' 36. ,'race_Caucasian' 37. ,'race_Asian' 38. ,'race_Hispanic' 39. ,'race_Native American' 40. ,'race_Other' 41. ,'juv_misd_count' 42. ,'juv_other_count' 43. ,'event' 44. ,'is_recid' 45. ,'days_in_jail' 46. ]] 47. 48. #target variable 49. Y = df['medium_to_high_risk'] 50.

51. x_train, x_test,y_train,y_test = train_test_split(X,Y,test_size =0.3 0) 52. 53. # scaler = StandardScaler() 54. # x_train = scaler.fit_transform(x_train) 55. 56. # with sklearn 57. logisticRegr = LogisticRegression(max_iter=700) 58. logisticRegr.fit(x_train, y_train) 59. 60. y_pred = logisticRegr.predict(x_test) 61. 62. #evaluate model

63. print('Score (train set):', logisticRegr.score(x_train,y_train))

64. print('Score (test set):', logisticRegr.score(x_test,y_test))

65.

66.

67. scores = cross_val_score(logisticRegr, X, Y, cv=5)

68. print('Cross-validated scores', scores)

69.

70. accuracy = logisticRegr.score(x_test, y_test)

71. print('Cross-predicted Accuracy:', accuracy)

72.

73. cm = metrics.confusion_matrix(y_test, y_pred)

74. print(cm)

75.

(24)

77. sns.heatmap(cm, annot=True, fmt=".3f", linewidths=.5, square = True, cmap = 'Blues_r');

78. plt.ylabel('Actual label');

79. plt.xlabel('Predicted label');

80. all_sample_title = 'Accuracy Scores: {0}'.format(accuracy)

81. plt.title(all_sample_title, size = 15); 82. 83. 84. importances, _ = audit_model(logisticRegr.predict, X) 85. 86. print(importances) 87. 88. plot_dependencies( 89. importances.median(), 90. reverse_values=False,

91. title="Model feature dependence"

92. )

93.

94.

95. CorrMatrix = df.corr(method="pearson", min_periods=1)

D Model explanation and fairness computation

1. %matplotlib inline

2. import numpy as np

3. import pandas as pd

4. import matplotlib.pyplot as plt

5. from sklearn import linear_model

7. from datetime import date

8. import datetime as dt

9. from sklearn.model_selection import train_test_split

10. from sklearn.model_selection import cross_val_score, cross_val_predi ct

11. from sklearn.linear_model import LogisticRegression

12. from fairml import audit_model

13. from fairml import plot_dependencies

14. from sklearn.preprocessing import StandardScaler, MaxAbsScaler

15.

16. # IBM's fairness toolbox:

17. from aif360.datasets import BinaryLabelDataset # To handle the data

18. from aif360.metrics import BinaryLabelDatasetMetric, ClassificationM etric # For calculating metrics

19. from aif360.explainers import MetricTextExplainer # For explaining metrics

20. from aif360.algorithms.preprocessing import Reweighing # Preprocess ing technique

21.

22. from IPython.display import Markdown, display

23. import seaborn as sns

24.

25. sns.set_theme(style="ticks", color_codes=True)

26.

27.

nal/'

30.

31.

(25)

35.

36.

38.

39. X = df[[

40. 'v_decile_score'

41. # ,'decile_score' #removed since the target variable is based on this, otherwise the model will be overfitted

42. ,'priors_count' 43. ,'age' 44. ,'sex_Male' 45. ,'sex_Female' 46. ,'race_African-American' 47. ,'race_Caucasian' 48. ,'race_Asian' 49. ,'race_Hispanic' 50. ,'race_Native American' 51. ,'race_Other' 52. # ,'african_american' 53. # , 'african_american_0' 54. # , 'african_american_1' 55. ,'juv_misd_count' 56. ,'juv_other_count' 57. # ,'start' 58. ,'is_recid'

59. # , 'event' #no effect on the model's accuracy

60. ,'days_in_jail' #this causes multicollinearity

61. ]]

62.

63. #target variable

64. Y = df['medium_to_high_risk']

65.

66. x_train, x_test,y_train,y_test = train_test_split(X,Y,test_size =0.3 0) 67. 68. # scaler = StandardScaler() 69. # x_train = scaler.fit_transform(x_train) 70. 71. # with sklearn 72. biasedReg = LogisticRegression(max_iter=700) 73. biasedReg.fit(x_train, y_train) 74. 75. y_pred_biased = biasedReg.predict(x_test) 76.

77. #evaluate model biased model

78.

79. #accuracy assessment

80. print('Sex biased - Score (train set):', biasedReg.score(x_train,y_t

rain))

81. print('Sex biased - Score (test set):', biasedReg.score(x_test,y_tes

t))

82.

83.

84. scores = cross_val_score(biasedReg, X, Y, cv=5)

85. print('Sex biased - Cross-validated scores', scores)

86.

87. accuracy = biasedReg.score(x_test, y_test)

88. print('Sex biased - Cross-predicted Accuracy:', accuracy)

89.

90. #confusion matrix assessment biased model

91. cm_biased = metrics.confusion_matrix(y_test, y_pred_biased)

92.

93. plt.figure(figsize=(9,9))

(26)

95. plt.ylabel('Actual label');

96. plt.xlabel('Predicted label');

97. all_sample_title = 'Accuracy Scores: {0}'.format(accuracy)

98. plt.title(all_sample_title, size = 15);

99.

100.

101. #XAI: feature importance

102. importances, _ = audit_model(biasedReg.predict, X) 103. 104. print(importances) 105. 106. plot_dependencies( 107. importances.median(), 108. reverse_values=False,

109. title="Biased - model feature dependence"

110. )

111.

112.

113. #Fairness computation sex

114. sex_privileged_groups = [{'sex_Male': 1}]

115. sex_unprivileged_groups = [{'sex_Male': 0}]

116.

117. # Metric for the train dataset

118. train_biased = BinaryLabelDataset(df=pd.concat((x_train, y_train),

119. axis=1), 120. label_names=['medium_to_high_risk' ], 121. protected_attribute_names=['sex_Ma le'], 122. favorable_label=1, 123. unfavorable_label=0) 124.

125. # Metric for the test dataset

126. test_biased = BinaryLabelDataset(df=pd.concat((x_test, y_test),

127. axis=1), 128. label_names=['medium_to_high_risk' ], 129. protected_attribute_names=['sex_Ma le'], 130. favorable_label=1, 131. unfavorable_label=0) 132. 133. 134.

135. # Create the metric object for training set

136. metric_train_biased = BinaryLabelDatasetMetric(train_biased, 137. unprivileged_groups=sex_ unprivileged_groups, 138. privileged_groups=sex_pr ivileged_groups) 139.

140. display(Markdown("#### Original training dataset"))

141. print("Sex biased - Difference in mean outcomes between unprivileged

and privileged sex groups = %f" % metric_train_biased.mean_difference())

142.

143. # Create the metric object for testing set

144. metric_test_biased = BinaryLabelDatasetMetric(test_biased, 145. unprivileged_groups=sex_ unprivileged_groups, 146. privileged_groups=sex_pr ivileged_groups) 147.