Fairness in Machine Learning models using Causality

(1)

MSc Artificial Intelligence

Master Thesis

Fairness in Machine Learning models

using Causality

by

Rik Helwegen

10516034

July 19, 2019

36 ECT Jan 2019 - Jul 2019

Supervisor:

Dhr. Dr. P.D. (Patrick) Forr´

e

Supervisor & Examiner:

Dhr. C. (Chris) Louizos MSc

Assessor:

Dhr. Dr. J.M. (Joris) Mooij

Supervisor Gemeente Amsterdam:

Dhr. T. (Tamas) Erkelens MSc

Supervisor Statistics Netherlands:

Dhr. Dr. B. (Barteld) Braaksma

(2)

Abstract

Machine Learning (ML) models discriminate between data points in order to classify, cluster or predict. In an expanding range of settings, including predictive policing, fraud detection and job hiring, these data points are people’s profiles. When algorithmic outcomes are based on sensitive attributes, such as ethnicity, this can conflict with ethical and legal standards. Therefore, the need rises for practical methods which include fairness in the optimal solution of complex problems. Causality theory has shown capable of alleviating fundamental problems in algorithmic fairness by defining Counterfactual Fairness. The intuition behind this metric is that model outcomes should not change when switching the sensitive variable for a person, also not when other attributes change as a causal result of the switch. In this work we present the novel method FairTrade, which provides a framework to construct, train and deploy fair ML models in practical applications. Theoretical results are obtained in a simulation setting and on the IHDP data set. Further experiments are conducted on real data in the context of identifying risk profiles for unlawful social welfare. Experiments suggest successful disabling of Path Specific Effects in simple relational scenarios, creating a trade-off between fairness and accuracy. Under imposed assumptions, Counterfactual Fair risk profiles are still predicted with reasonable accuracy on a balanced data set of social welfare receivers. Together, the approaches work towards deployment of ML algorithms while honouring our ethical and legal boundaries.1

1_{The views expressed in this work are those of the author and do not necessarily reflect the policy of Statistics Netherlands}

(3)

Introduction

End of June 2019, around 80 people gather in a small venue next to the children’s playground of their neigh-bourhood in Rotterdam south.1 _{The group has a predominantly Moroccan background, and comes together to}

protest against the algorithmic fraud detection method called SyRI, or System Risk Indicator. The municipality has implemented SyRI in their neighbourhood to fight social welfare fraud, resulting in 1.263 addresses with an indication of increased fraud risk. A respondent calls the selection of particular neighbourhoods discriminating. The investigated areas house many people with a migration background, have low average income, and have relatively many people living of social welfare.

The reason for the municipality to experiment with this data driven approach can be understood in the perspective of recent technological development. ML, and Deep Learning in particular, have shown a steep performance increase, surpassing human level in a number of tasks (LeCun et al., 2015). This gain in efficiency motivates institutions from a wide range of domains to deploy ML models. For example, the praised music recommendations by Spotify heavily rely on learning algorithms. Data profiles based on the users behaviour are clustered to group people with a similar music taste. Accordingly, Spotify can provide people with just the right song, relying on the behaviour of people in the same group. This method exposes a core objective of most ML models, which is to discriminate between input variables. As long as it concerns music recommendations, this will hardly ever be harmful. However, if a persons ethnicity influences decisions like police screenings (Magee et al., 2008), loan approvals (Mahoney and Mohen, 2007) and fraud checks, the matter quickly becomes controversial. These examples entail algorithmic decisions which are influenced by a sensitive personal attribute. A sensitive attribute is some piece of information which people deem unfair to use as input for a decision or prediction. Gender and ethnicity are often considered as sensitive attributes. Sensitivity thus relies on a complicated concept central to this work: fairness. What is fair and what is unfair can be a highly philosophical discussion, hardly compatible with the mathematical expressions underlying algorithmic decision making.

In simple models, the output can be clearly understood from the input values. In the case of Linear Regression, a single coefficient explains how the output varies with a change in one of the input variables. This allows to remove unwanted effects from the predictive outcome (Frisch and Waugh, 1933) (Lovell, 1963). Unfortunately, this does not generalise to more complex models such as the neural network, which forms the basis of Deep Learning advances. The relation between one input variable and the outcome can now easily be captured by thousands of coefficients and might depend on the values of all other variables. Hence it is unclear how a variable affects the output, let alone how to correct for it. The field of research focused on fairness in ML is gaining popularity, and progress is being made in capturing fairness into metrics. However, different intuitive measures of fairness are shown to conflict on a structural basis (DeDeo, 2014). Because removing variable effects from a statistical model requires knowledge on the actual relations between variables, Causality theory is required to alleviate fundamental problems in the formalisation of algorithmic fairness (Kusner et al., 2018). In particular, Counterfactual Fairness captures the intuition that model outcomes should not change when switching the sensitive variable of a profile, also not when other attributes change as a causal result of the switch. As a problem without consensus in this domain, recent works attempt to create Counterfactual Fairness for situations in which effects from a sensitive variable can be partially allowed, depending on which variables propagate the effect to the outcome, referred to as Path Specific Effects (PSE).

The described scenario leads to our main research question: In what way can Causality theory be utilised to create Fairness in practical Machine Learning applications? For which the answer is constructed by considering the sub-questions: 1) What are the advantages and disadvantages of known Fairness metrics? 2) How can one effectively enforce Counterfactual Fairness on Path Specific Effects in the presence of unobserved confounders? and 3) How does enforcing Counterfactual Fairness affect the performance of a Risk Profile classification al-gorithm in the context of unlawful social welfare? We tackle the first question by an assessment of existing

1

(5)

literature, and conduct a simulation study to investigate a frequently proposed solution. The answer to the second question builds on related literature, combining theoretical results in a new proposed method called FairTrade. Experiments are conducted on semi-simulated and real data. Finally, the practical case of detecting risk profiles for unlawfulness in social welfare is investigated using a sample of social welfare receivers in the Netherlands. According to the different steps in the FairTrade method, a model is constructed, trained, and used for prediction. This application of novel fairness techniques on a leading problem of discrimination by algorithms delivers valuable contribution to the field of research.

This work continues by revisiting a foundation of background knowledge. Specifically, elements from the field of Causality, Approximate Inference and Fairness are elaborated on. In chapter 3, the FairTrade method is presented, providing guidelines to obtain prediction models which are optimised under the posed fairness constraints. Chapter 4, provides details on the conducted experiments, for which the results are provided in chapter 5. In chapter 6, a discussion follows on the interpretation, limitations and future directions of the study. Finally, chapter 7 holds the concluding remarks on the presented work.

(6)

Chapter 2

Background

The approach for fair ML algorithms developed in this work builds on the domains of Causality, Approximate Inference and Fairness. Here we briefly review relevant topics within these domains. The first part considers the field of Causality, introducing assumptions, rules and machinery to overcome limits inherent to purely correlation based statistics. Second, relevant techniques within Approximate Inference are discussed, most importantly introducing the Variational Autoencoder and its integration within causal reasoning. Finally, the active research field of Fairness is elaborated on, including both observational and counterfactual fairness metrics.

2.1 Causality

The idea of cause and effect are long studied concepts. Its popularity within the field of ML however, is mainly concentrated in the last decade (Peters et al., 2017). With the growing number of ML applications, the need for explainability, control and generalisation increases. Considering causal relations rather than statistical correlation, elevates a number of fundamental problems to fulfill these goals. As a downside, further defined model assumptions are required to work with causality. The methods introduced in the book Causality by Pearl have been an important contribution to the common formalisation of causal models (Pearl, 2009).

2.1.1 Need for Causal reasoning

Practitioners of the field of Causality investigate how variables affect other variables, rather than solely how variables are observed together. This more in debt understanding of the data is harder to obtain, and therefore not the first choice in common ML practices. However, for certain problems causal knowledge is a necessity, and in this work it is argued that creating fairness in ML applications is one of these problems.

In recent work, Pearl distinguishes three classes of causal information, based on what type of questions they can answer Pearl (2018). The different classes are Association, Intervention and Counterfactuals. The respective order is ascending in hierarchy, indicating that questions possible to answer with one information class can also be answered using higher types of information. The structure helps to determine the needed level of causal information in order to solve a posed problem.

The first class, ‘Association’, relates to what can be answered by observations only. For example: “What value do I expect for Y, given that I observe X = x”. Where capital X denotes a random variable and x the taken on value. This coincides with a conditional expectation in standard statistics, which is common practice in non-causal ML.

Now consider the question: “What value do I expect for Y, if I set X to obtain the value of x0?”. This action is thought of as an external intervention into the system. Answering the question requires to know something about the causal effect of X on Y. Presence of such relation determines whether the intervention does affect the value of Y, hence the name of the second class: ‘Intervention’.

Finally, retrospective reasoning is an objective for which the ‘Counterfactuals’ class of information is needed. After observing an outcome, one might like to ask: “Given this observation of the system, what would Y have been had I set X = x0?”. In this setting, information about the state of the system and the causal relationships need to be combined to formulate an answer.

As elaborated on in the section on Fairness, the Association level of information is not capable of ensuring satisfying results in many posed fairness problems. Relying on observations only does not explicitly take into account how a sensitive attribute affected the data distribution, and hence is not able to correct for it in new observations. To capture an intuitive and fundamental sense of fairness, fairness metrics in the Counterfactual class of information have been proposed recently (Kusner et al., 2017)(Kilbertus et al., 2017)(Loftus et al., 2018).

(7)

2.1.2 Notation and definitions

In order to work with causality in a formal manner, Pearl (2009) introduced concepts complementary to the canonical form of statistics. This has proven to be a base for the developing field of Causality in ML. In this section relevant aspects are reviewed, following Chapter 6 of Elements of Causal Inference Peters et al. (2017), and Chapter 1 of the book Causality Pearl (2009).

Graph Terminology

The theory of Causality deals with the causal effect that variables exercise upon other variables. The causal relations can be visualised as a directed graph in which the variables are nodes and the causal relations are edges.

Definition 2.1.1 (Directed Graph)

A directed graph G is defined as a tuple (V , E) consisting of a finite sets of nodes V and edges E. Each edge creates a directed connection between two distinct nodes.

Definition 2.1.2 (Directed Acyclic Graph (DAG))

A directed Acyclic graph (DAG) is a Directed Graph which does not contains cyclic connections of directed edges.

Each node represents a variable in the system. The edges are directed, connecting one start node with one end node. When the nodes x and y occur in an edge e ∈ E as (x, y), this edge expresses the direct effect of x on y, making y a child of x, and x the parent of y, denoted x ∈ P AG

y In this case x and y are adjacent. If for a group

of three variables holds that two of them are parent of the third, and there exist no direct edge between those parents, this is referred to as v-structure. When a directed graph does not show cyclic effects, i.e., following the direction of the edges you cannot end up at the same node, this graph is called a DAG. A sequence of adjacent edges is called a path, defined as follows:

Definition 2.1.3 (Path)

A path is a sequence of two or more unique nodes where each successive node has an edge with its predecessor in the sequence.

If edges are understood as relations between the nodes, a path captures the intuition of a string of relations, as determined by the sequence of nodes in the path. When all relations in such a path point to the same direction, we speak about a directed path, following the direction of the edges. Such a directed path makes x an ancestor of y, written as x ∈ AN_yG, and y a descendent of x, written as y ∈ DE_xG. If a node in a path has an incoming edge from both the previous as well as the next node, this node is called a collider relative to the path.

Structural Causal Models

Pearl (2009) names three roles of graphs in probabilistic and statistical modelling: 1. Representing assumptions

2. Efficient representation of a joint probability function 3. Facilitating observational based inference

Each of these roles are no elaborated on, in the order presented.

When constructing the graph from knowledge about relations between variables, a graph can be an effective representation of the causal assumptions and relationships. In order to make use of graph functionalities in causal reasoning, we introduce Structural Causal Models (SCM).

In this work, the formal definition of SCM by Peters et al. (2017) is used: Definition 2.1.4 (Structural causal models)

A structural causal model (SCM) C := (S, PN) consists of collection S of d (structural) assignments

Xj:= fj(P Aj, Nj), j = 1, ..., d, (2.1)

where PAj ⊆ {X1, ..., Xd}\{Xj} are called parents of Xj; and a joint distribution PN = PN1,...,Nd over the

noise variables, which we require to be jointly independent; that is, PN is a product distribution.

The graph G of an SCM is obtained by creating one vertex for each Xj and drawing directed edges from each

parent in P Aj to Xj, that is, from each variable Xk occurring on the right-hand side of equation (1) to Xj.

We henceforth assume this graph to be acyclic.

We sometimes call the elements of PAj not only parents but also direct causes of Xj, and we call Xj a

(8)

In order to efficiently represent the joint probability function, the SCM relies on the Markov Property. The SCM is always assumed to satisfy the Markov Property, which relates the graph structure with probabilistic independencies. Before giving the formal definition, the prerequisite definition of d-seperation is provided as proposed in Pearl (2009):

Definition 2.1.5 (d-separation)

A path p is said to be d-separated (or blocked) by a set of nodes Z if and only if

1. p contains a chain i→m→j or a fork i←m→j such that the middle node m is in Z or

2. p contains an inverted for (or collider) i→m←j such that the middle node m is not in Z and such that no descendant of m is in Z.

A set Z is said to d-separate X from Y if and only if Z blocks every path from a node in X to a node in Y , written as X |=GY |Z.

Using d-separation to define the Markov property as proposed by Peters et al. (2017): Definition 2.1.6 (Markov property)

Given a DAG G and a joint distribution PX, this distribution is said to satisfy

1. the global Markov property with respect to the DAG G if

A |=GB|C =⇒ A |= B|C

for all disjoint nodes sets A, B, C (the symbol |=G denotes d-separation, see Definition 2.1.5)

2. the local Markov property with respect to the DAG G if each variable is independent of its non-descendants given its parents, and

3. the Markov factorization property with respect to the DAG G if

p(x) = p(x1, ..., xd) = d

Y

j=1

p(xj|paGj)

For this last property, we have to assume that PX has a density p; the factors in the product are referred

to as causal Markov kernels describing the conditional distributions P_X_j_{|P A}G j

A

C

B

Figure 2.1: Directed Acyclic Graphical model with nodes: V = {A, B, C} and edges: E = {A → B, B ← C}

Peters et al. (2017) note that the three definitions of the Markov property are equivalent as long as the joint distribution has a density. Forr´e and Mooij (2017) derive a wide variety of related properties from the Markov property. From this Markov property, it becomes apparent how the graph-ical model stores information about the distribution dependencies of the variables in play. From the first Markov property, it follows that by using the d-separation criterion, we can obtain information about (conditional) independencies of sets of nodes (Geiger et al., 1990). The second property describes that all information about the distribution characteristics of a vari-able is hold by its parents. As a result, all other varivari-ables except for the ones which are influenced by the distribution (descendants) are independent. The third property makes direct use of this, by concluding that the joint distri-bution can be factorised into distridistri-butions of individual variables which are only depending on the parent nodes.

Finally, the graph facilitates structure for observational based inference

(Pearl, 2009). Graph based inference algorithms exist such as the Belief propagation algorithm (Pearl, 1986), also known as the sum-product algorithm (Kschischang et al., 2001). In these algorithms, a central concept is the passing of information containing messages along the edges of the graph. In the proposed method a Vari-ational Autoencoder (Kingma and Welling, 2013) setting will be used to do approximate inference, elaborated on in subsection 2.2.

As an example, consider a graph with nodes: V = {A, B, C} and edges: E = {A → B, B ← C}, depicted in figure 2.1. Several insights on the joint distribution and probability dependencies can be obtained using the aforementioned definitions. First of all, considering the collider path A → B ← C, we see that when conditioning on the empty set, denoted as Ø, the path is blocked according to Definition 2.1.5, implying: A |=GC =⇒ A |= C

(9)

according to Definition 2.1.6. The result can also be obtained using the Markov factorization property (MFP):

p(A, B, C)M F P= p(B|A, C)p(A)p(C)

=⇒ Z B p(A, B, C)dB = Z B p(B|A, C)p(A)p(C)dB =⇒ p(A, C) = p(A)p(C) Z B p(B|A, C)dB =⇒ p(A, C) = P (A)p(C) · 1 =⇒ p(A, C) = p(A)p(C) =⇒ A |= C

When considering the independence conditioned on B, we see that because the path A → B ← C is open: A 6⊥⊥GC. However, from this we cannot infer absence of conditional independence between the marginal

prob-abilities of A and C conditioned on B.

Interventions

With SCMs, the graphs and probability distributions obtain causal meaning. The model is assumed to repre-sent the data generative process, i.e., how the variables relate in the creation of observations. When reasoning about causal processes it is often desired to formulate an expression for the data distribution when a variable is put to a certain value. Such events cannot clearly be defined using the standard notation of statistics, as it is subject to one way causal relationships. Intuitively, this will affect the variable and its descendants, but not its ancestors. As a solution, Pearl (2009) introduces do-Calculus, with rules proven in Pearl (1995). In this work an intervention is expressed by using the do-operator as a condition, written as p(Y |do(X = x)). When additionally conditioning on an observation, this is called a counterfactual, for example: p(Y |do(X = x), A = a).

Identifiability

As multiple graphs can be responsible for the exact same output distributions, objectives like interventional distributions are in some cases impossible to deduce, or identify, from observational data. In particular, when there are unmeasured variables or multiple variables being influenced by unobserved confounders, the discussion of identifiability is relevant (Tian and Shpitser, 2010). In case an objective measure can be rewritten as a composition of purely observational distributions, the effect can be estimated using the variable densities of those observed variables. These other variables are then called a Valid adjustment set and defined in Peters et al. (2017) as:

Definition 2.1.7 (Valid adjustment set)

Consider an SCM C over nodes V and let Y 6∈ P A. We call a set Z ⊆ V \{X, Y } a valid adjustment set for the ordered pair (X, Y ) if

pC(y|do(X = x)) =X

z

pC(y|x, z)pC(z)

Here, pC is the probability function belonging to the SCM C, and the sum (could also be an integral) is over all values z that Z can take.

A number of results specify in what cases a set is indeed a valid adjustment set. To be able to quickly refer to the most important of these, Proposition 6.31 by Peters et al. (2017) is repeated.

Proposition 1 (Valid adjustment sets)

Consider an SCM over variables X with X, Y ∈ X and Y 6∈ P AX. Then, the following three statments are

true.

1. ‘Parent adjustment’:

Z := P AX

is a valid adjustment set for (X,Y).

2. ‘backdoor criterion’: Any Z ⊆ X\{X, Y } with • Z contains no descendant of X AND

• Z blocks all paths from X to Y entering X through the backdoor (X ← ..)

is a valid adjustment set for (X,Y).

(10)

• Z contains no descendant of any node on a directed path from X to Y (except for descendants of X that are not on a directed path from X to Y) AND

• Z blocks all non-directed paths from X to Y is a valid adjustment set for (X,Y).

The three statements in the proposition allow for quickly recognising valid adjustment sets, this proves useful when discussing the identifiability of the proposed causal models.

2.1.3 Path Specific Effects

The notion of Path Specific Effects (PSE) distinguishes between different ways one variable affects another. In this work, where the fairness of an effect might depend on the path it has taken, PSE are of particular interest. Pearl (2009) clarifies the need for PSE by the Berkley admission problem (Bickel et al., 1975), which has become the canonical example in the field. (Kilbertus et al., 2017) (Chiappa and Gillam, 2018) The story goes that discrimination was suspected after data showed a higher admission rate for male students compared to female students. After investigation however, female students turned out to apply in more competitive departments on average, which was lowering the admission rate. Once corrected for this effect, even a small bias towards female applicants could be found in the Berkley admissions. Kilbertus et al. (2017) discuss the problem using the simple graphical model of Figure 2.2.

G

A

D

Figure 2.2: Graphical model of the Berkley Admission (A) (Bickel et al., 1975), being influence by both Gender (G) and Department (D) choice. (Kilbertus et al., 2017)

Although including the gender of an applicant is deemed unfair, choosing for competitive departments is found to be a natural reason for a lower chance of admission. Therefore the direct effect of gender on admission would indicate unfairness in the model, whereas the indirect effect via choice of department would not. Pearl (2009) rightfully includes the applicant’s career objectives and unobserved aptitude in the model. However, the simple example already makes clear that there is a need of considering PSE in fairness applications.

Robins and Greenland (1992) consider the problem of separating direct and indirect effect. For now we refer to the direct effect as X → Y and the indirect effect as X → I → Y , considering the case of a SCM with only these effects and nodes. Robins and Greenland conclude that the effects can be separated when X and I do not interact to cause Y and both X and I are randomised over (Robins and Greenland, 1992). However, the G-computation algorithm must be used for this calculation, preventing bias as obtained with other adjustment methods (Robins, 1986). In the case of interacting X and I, the direct and indirect will not be separable.

Holding variables constant is the way direct effects are calculated in causal models, following the definition by Pearl (2009).

Definition 2.1.8 (Direct Effect)

The direct effect of X on Y is given by p(y|do(x), do(sXY)), where SXY is the set of all endogenous variables

except X and Y in the system.

Pearl (2001) discusses the struggle in defining indirect effects in nonlinear models, as there is no way of isolating the effect by holding a set of variables constant. To come to a definition of indirect effects, Pearl describes Natural Direct Effects. The need for this stems from considering nonlinear systems. When measuring an effect in such a system, the value on which other variables are held constant can be of influence on this effect. The natural direct effect is defined in Pearl (2001):

DEx,x0(Y ) = E[(Y (x0, Z(x))] − E[Y (x)]

Indicating the effect in Y when X is changed from the value x to x0, and setting Z to the value it obtains when X = x. Pearl shows that if ‘no confounding’ assumptions apply, the natural direct can be rewritten in the following form:

DEx,x0(Y ) =

X

z

[E(Y |do(x0, z)) − E(Y |do(x, z))]p(z|do(x))

This definition of the natural direct effect provides ground for a definition of indirect effects. We again consider the event of changing X from x to x0_{, however, now the direct effect stays constant, and the other variables}

(11)

Definition 2.1.9 Pearls’ Natural Indirect Effect, Pearl (2009)

IEx,x0(Y ) = E[(Y (x, Z(x0))) − E(Y (x))]

For which Pearl (2001) derives the identifiablility under unconfounded mediators:

IEx,x0(Y ) =

X

z

E(Y |x, z)[p(z|x0) − p(z|x)]

In their work utilising PSE for fairness, Nabi and Shpitser (2018) discuss difficulties which arise from optimising a problem with causality based constraints. They define the fairness problem as prohibiting effects from a sensitive feature to the outcome via ‘disallowed’ causal pathways. Accordingly, it is suggested to be solved as a new optimisation problem, creating a fair world (Nabi and Shpitser, 2018). They hold on to the above PSE definition, for which they give an example using Figure 2.3

A

M

Y

C

U

W

Figure 2.3: Graphical model by Nabi and Shpitser (2018), Figure 1b

In this example, the variable A represents the sensitive variable, a convention holding for the rest of the presented work. Consider the indirect effect of interest is A → W → Y when changing A from a to a0. Following Definition 2.1.9, all paths should be held on the original value of a, and the path under consideration is adjusted. This yields:

IEa,a0(Y ) = E[Y (a, W (M (a), a0), M (a))] − E[Y (a)]

Which again is a nested counterfactual. Using assumptions by Shpitser (2013) and the edge g-formula (Shpitser and Tchetgen, 2016), the counterfactual mean of above indirect effect is identified. Nabi and Shpitser use the definition to define a new optimization criteria, in which the ’disallowed’ PSE are constrained to lie between a predefined upper and lower bound.

2.2 Approximate Inference

For the goal of causal inference, the next step after structuring a causal graph, is to do inference on the relations using observational data. Algorithms exist for exact inference, such as the sum-product algorithm (Kschischang et al., 2001), where unbiased estimates are guaranteed under certain assumptions. However, a large amount of variables and the presence of unobserved confounders makes the needed calculations of these exact algorithms intractable.

Approximate inference can be used to do model inference in computationally intractable cases at the cost of accuracy and convergence guarantees. (Bishop, 2006)(Louizos et al., 2015)(Kingma and Welling, 2013) Two main approaches can be distinguished in approximate inference; Variational Inference and Sampling Methods. (Bishop, 2006) The latter includes methods to create samples from a desired distribution, including Markov Chain Monte Carlo (MCMC) sampling (Gilks et al., 1995), Metropolis-Hastings-Algorithm (Chib and Green-berg, 1995) and Gibbs-sampling by Carter and Kohn (1994).

When using Variational Inference, the (variational) Expectation Maximisation (EM) (Moon, 1996) is a common approach (Bishop, 2006). Elementary to Variational Inference is changing the problem form maximising a likelihood to maximising a lower bound for the likelihood. The method of choice in the proposed method is an application of Variational Bayes and is called the Variational Autoencoder. (Kingma and Welling, 2013)

2.2.1 Variational Autoencoder

The Variational Autoencoder (VAE) is an inference and learning method for directed probabilistic models, a technique introduced around the same time by Rezende et al. (2014) and Kingma and Welling (2013). In

(12)

particular, the VAE proves valuable as it can handle large scale problems, contain continuous latent variables and/or have intractable posterior distributions.

In Figure 2.4 the graphical model of the VAE is depicted (Kingma and Welling, 2013). We hold on to the

z

x

n=1,..,N

θ

Figure 2.4: Graphical model of the generative network of the VAE. Grey nodes are observed, white nodes are latent. (Kingma and Welling, 2013)

convention of presenting distributions for inference on latent variables with a q, e.g. q(z|x), and distributions with the generative network with a p, like p(x|z). The latent distribution is inferred using the data as input, and regularised to be close to a standard normal distribution. The used metric for this is is the Kullback-Leibler (KL) divergence, with the following definition for the variables P ∼ p(x) and Q ∼ q(x) defined on the same distribution space x.

Definition 2.2.1 (Kullback-Leibler divergence)

DKL(Q||P ) = Z q(x) log q(x) p(x) dx

The data is reconstructed by using the log probability of the observations occurring under the inferred distri-butions. The gradient can be propagated through the sample towards the inference network by making use of the reparameterization trick (Kingma and Welling, 2013).

2.2.2 Causal Effect Inference

The VAE falls under the class of generative models because the generative part of this probabilistic model can be understood as a data generative process. An important problem in causal inference is the presence of unobserved confounders, effecting multiple observational variables in the model. One proposed solution for this is structuring the generative model of a VAE as a causal graph, creating a causal generative process in which the latent space plays the role of unobserved confounder. This is the idea behind the Causal Effect Variational Autoencoder (CEVAE) by Louizos et al. (2017). When a causal graph is assumed with identifiable relations, the CEVAE can approximately infer the relations in the structure. Therefore, it is an inference and learning method possible to apply in the presence of unobserved confounders and intractable distributions. Guarantees on whether the CEVAE structure using a gradient descent optimisation converges to the true data generating process are limited (Louizos et al., 2017).

2.3 Fairness

The goal of creating fairness needs specification. What is fair and what not fair is an ongoing philosophical debate, subject to fundamental disagreements. To bring fairness into the domain of statistical modelling, the discussion focuses on different metrics capturing fairness, a field also known as algorithmic fairness. A concise overview of different metrics is provided by Loftus et al. (2018). One fundamental distinction in the set of known fairness metrics is whether they are based on observations only, or have a background in causal reasoning. This relates back to the different types of causal information (Pearl, 2018), as discussed in section 2.1.1. Accordingly, both type of metrics are discussed in this section.

2.3.1 Observational based metrics

Observational metrics are not based on causal reasoning, and thus have no interventional or counterfactual distributions included. In practice, one of the most proposed solutions for unwanted discrimination is fairness by unawareness, following the definition of Kusner et al. (2017).

(13)

Definition 2.3.1 (Fairness by Unawareness)

An algorithm is fair so long as any protected attributes A are not explicitly used in the decision-making process. This metric allows for any model as long as the sensitive attribute is not included in its input. The solution is attracting in its simplicity, but suffers from a clear shortcoming. Variables which are influenced by the sensitive variable A can take the role of proxy variable, hence propagating unwanted discrimination into the model.

A second intuitive and well known metric is Equalised Odds. Throughout this work, ˆY denotes the predictive distribution of Y , which is the objective of the model. As before, A represents the sensitive variable.

Definition 2.3.2 (Equalised Odds)

A model satisfies Equalised Odds if the following condition holds

p( ˆY = y|A = a, Y = y) = p( ˆY = y|A = a0, Y = y)

for all possible values of a and y.

The metric says that the true positive rate should be equal when conditionining on different values of the sensitive variable. Equalised Odds especially gained attention after its usage in the Machine Bias case with the COMPAS dataset (Angwin et al., 2016). The notion of Equalised Odds can be adjusted by swapping the roles of Y and ˆY , leading to the Calibration metric.

Definition 2.3.3 (Calibration)

A model satisfies Calibration if the following condition holds

p(Y = y|A = a, ˆY = y) = p(Y = y|A = a0, ˆY = y)

In this case, the probability of being correct after predicting ˆY = y should be the same over different conditionals of the sensitive variables. This metric is shown to be inherently incompatible with equalised odds, except in special cases such as a perfect prediction (Kleinberg et al., 2016).

Definition 2.3.4 (Statistical Parity)

A model satisfies Statistical Parity if the following condition holds

p( ˆY = y|A = a) = p( ˆY = y|A = a0)

Statistical Parity, also called Demographic parity is commonly used for fairness in ML (Loftus et al., 2018), (Edwards and Storkey, 2015), (Kamiran and Calders, 2009), (Kamishima et al., 2012). The metric demands independence between the prediction distribution and the sensitive variable. A common critique to this approach is that it can lead to disproportionate positive discrimination. The metric can cause the model to have strong difference in treatment per sensitive group, possibly causing differences to very similar individuals (Loftus et al., 2018).

Originating from this concern of treating individual different, Dwork et al. (2012) introduce individual fairness.

Definition 2.3.5 (Individual Fairness)

A model satisfies Individual Fairness if the following condition holds

p( ˆY(i)= y|X(i), A(i)) ≈ p( ˆY(j)= y|X(j), A(j)), if d(i, j) ≈ 0

for all possible values of x, a and y.

In this definition, the function d(·) is a metric which measures the individual similarity for the specific task at hand. The goal is to create similar treatment for similar individuals. One downside of this approach is that the question arises what a suitable similarity function looks like in the pursuit of creating fairness. This focus on fairness on an individual level allows for a smooth introduction into causal based fairness metrics.

2.3.2 Causal based metrics

Kusner et al. (2017) introduced Counterfactual Fairness, for which the intuition relies on the actual world and a counterfactual world. An individual need to be treated the same by the model in the real world as when, ‘counterfactually’, the individual would have belonged to a different group of the sensitive attribute. Hence this metric says the sensitive variable can not have any causal influence on the model outcomes. Kusner et al. (2017) provide the following definition:

(14)

Definition 2.3.6 (Counterfactual Fairness)

Predictor ˆY is counterfactually fair if under any context X=x and A=a,

p( ˆY (U, do(a)) = y|X = x, A = a) = p( ˆY (U, do(a0)) = y|X = x, A = a)

for all y and for any value a0 attainable by A.

The metric has been an inspiration for a number of applications (Loftus et al., 2018) (Kilbertus et al., 2017). Of special interest are the works by Chiappa and Gillam (2018) and Nabi and Shpitser (2018), in which the idea of counterfactual fairness is combined with PSE. Nabi and Shpitser (2018) apply a transformation on the inference problem in order to estimate a fair world. In this fair world, the unwanted effect is constrained to lie within an interval. In order to prevent the unwanted effect, a small interval around zero can be used as constraint. Accordingly, the model is optimised to be resemble the real world as well as possible, measured by the KL divergence (def 2.2.1).

Chiappa and Gillam (2018) reflect on Nabi and Shpitser (2018) by pointing out the efficiency challenge of explicitly computing PSE. In Nabi and Shpitser (2018) this is needed in order to constrain the value, however, this counterfactual distribution can be intractable Chiappa and Gillam argue. As a solution, they propose to correct observations which are influenced by effects from the sensitive attribute. This is done by learning latent spaces meeting the fairness conditions, for all descendants of the sensitive attribute separately. Learning the latent embedding and predictive distributions is done simultaneously in this work (Chiappa and Gillam, 2018). Loftus et al. (2018) comment on this by noting that simultaneous learning might influence which optimum is pursued in the optimisation. Hence it is suggested to learn a causal model, and separately use a black box predictor including the variables known to be fair. Finally, the work of Kilbertus et al. (2017) elaborates on the concept of Resovling variables. These are variables which are influenced by the sensitive variable, but are nevertheless deemed fair to influence the model outcomes. The Berkley example (Figure 2.2), shows such a situation for the university admissions. While the department of application is influenced by the gender of new students, it is deemed fair to condition on department when considering the admission rate per gender.

2.3.3 Feedback loops

The constitution of feedback loops in algorithmic decision making is an important phenomena when considering algorithmic fairness. The idea behind a feedback loop is that the outcome of a process influences the input of that same process. Hence, this can amplify the outcomes when the process is run multiple times. In terms of ML models, this idea can be applied to bias in data. A classifier learns to recognise biases in the data in order to assign data points to the right class. It might be possible that this class assignment changes the data point. For example, if a single demographic group in society is denied more often to obtain a loan at the bank due to classification results, this might prohibit people in this group to actually build up financial stability. The next time the classification model is trained, this higher instability is recognised, and loans will be denied on an even more regular basis.

This is an example where the model amplifies bias in the population. A relevant distinction between the origin of bias can be made between bias in the population, and selection bias. In the first case, the population is properly reflected and shows an unequal distribution over different groups. In the second case, the sampled data does not reflect the population properly, because the sample is not properly taken at random. Lum and Isaac (2016) work out a clear example of feedback by selection bias, backed by simulation. The police, in this case, directs attention towards neighbourhoods which are predicted to be of higher risk. Resulting from the extra policemen in a targeted neighbourhood, more observations of problems in that neighbourhood enter the system. The next time the model is trained, it detects even stronger negative bias for these particular neighbourhoods, and assigns even higher attention to them. The problematic feedback in this case, is that the model does not recognise that observations are partially being caused by the attention assignment. In this scenario, a bias can be amplified which does not even exist in the population, but only in the observational distributions.

Feedback loops are difficult to model using Causality theory, as the proposed methods by Pearl (2009) in general discard the notion of time. Further, introducing cycles in the graphical models poses serious challenges on the causal inference, but is an active field of research (Forr´e and Mooij, 2018). Feedback loops hence fall outside the scope of this work. However, when making model outcomes filter out biases according to metrics such as Counterfactual Fairness, this will also solve the problem of feedback loops for those biases.

(15)

Chapter 3

The FairTrade Method

In this section, we present a novel method named FairTrade, allowing to control the Fairness Trade-off with accuracy in practical ML applications by making use of causality theory. The description provides an overview of techniques and decisions, and can help as guideline to prevent unwanted discrimination in prediction models. As the practical implementation can vastly differ per case, an in-depth description of the implementation per experiment is provided in Chapter 4. The FairTrade method follows three consecutive steps:

I Construct Causal Graph II Infer Causal relations

III Fair prediction by Causal Path Enabler

The first step is concerned with creating a causal graph, and therefore determining the causal and distributional assumptions. The second step uses the CEVAE method in order to approximately estimate the relationships in the graph from observational data. The third and final step delivers a predictor by training an auxiliary model with restricted input. This input is in general the information from the CEVAE which is known to be counterfactually fair, and can be extended for allowing specific paths to be active. As different causal paths are enabled by adjusting the input selection, the name Causal Path Enabler (CPE) is coined for the auxiliary model. A graphical overview of the steps is shown in Figure 3.1.

The steps are clearly distinguished, and the result of step I and II serve as input for step II and III respectively. The reason for separating II and III, as suggested by Loftus et al. (2018) but in contrast to Nabi and Shpitser (2018) and Chiappa and Gillam (2018), is to prevent distorting the causal inference. Theory on the ability of VAEs to recover relations of the true model are already scarce (Louizos et al., 2017), adding additional constraints to the optimisation objective hence seems not justifiable without further research.

CEVAE CPE

Figure 3.1: Schematic overview of the proposed method for creating fairness in machine learning applications. The dashed graph indicates an assumed causal structure. The CPE provides predictions utilising only the elements of the graph which adhere to the posed fairness constraints.

3.1 Constructing a Causal Graph

Fundamental to the approach is the construction of a causal graph. The starting point for this graph is domain knowledge, making the construction a domain specific task. The graph is restricted to be a DAG. As theory might suggest more complicated relations, such as cyclic structures, this is a simplifying assumption. Research is being done on more complicated graphs (Forr´e and Mooij, 2019), but utilising this in practice at scale is a challenging task left for future research. Given the constructed graph, the global Markov property (Definition 2.1.6) implies statistical independencies. As a check on the assumptions, statistical independencies tests can be performed on the implied dependencies. Note that the check can merely falsify the assumptions, but does not prove them true when being passed successfully. In this section, three general graph structures are proposed,

(16)

G1, G2 and G3. The three structures are not the only possible structures, but fit for a variety of cases when pursuing fairness in ML models.

Because the goal is to achieve fairness with respect to the sensitive variable A, the assumed independence’s with this variable are of particular importance. Incorrectly assuming variables independent of A might lead to information leakage into the prediction, preventing a fair treatment. The most conservative proposed graph therefore is to assume that all variables are affected by the sensitive variable, which is the basis for G1. As the graph needs to be directed, this means in turn the sensitive variable is not influenced by the variables. Hence, the proposed structures only fit well for problems in which the sensitive attribute is not influenced by other covariates. Gender and ethnicity are attributes for which this argument could be made. Further, we assume not all causes for the data are observed, meaning a latent space Z is involved in the data generating process. This latent space accounts for unobserved confounders, which can be hard or even impossible to measure. One common example of such a confounder is Social Economic Status. As the variables already have direct access to A, Z is defined as background variables excluding the information of A. Formally Z can be described as the projection of the unobserved confounders on the orthogonal complement of A. The structure corresponds to Figure 3.2.

Z

X

Y

A

(a) G1a, Conservative CEVAE structure without X → Y

Z

X

Y

A

(b) G1b, Conservative CEVAE structure with X → Y

Figure 3.2: CEVAE structure with minimal set of assumptions on data relations

The difference between the (a) and (b) version in this figure is the presence of the X → Y effect. From an information theory point of view, one could argue the link is redundant as the confounder Z can account for relations between X and Y . However, the goal of the CEVAE optimisation is to do inference on the causal graph relations. Leaving out X → Y poses the strong assumption that none of the covariates has direct causal effect on the objective Y. One example of what could happen if this assumption is violated; effects following the pathway A → X → Y might be absorbed by the relation A → Y . In Experiment 1 a similar phenomena is obtained for the unawareness approach, in which effects of a direct effect manifest itself in an indirect effect once the direct effect is omitted. Therefore, leaving out the arrow from X to Y interferes with the inference of the causal relations, unless the assumption holds true that X does not directly influence Y . In general this leads us to argue that the direct effect of X on Y needs to be seriously considered, and not be discarded on the account of the latent variable Z. As most of the encountered problems involve direct effects from X to Y , the graph G1b is used as basis for the following structures. According to the factorisation property, the joint probability function of G1b factorises as by making each distribution depending on the parent nodes only:

pG1b(A, X, Y, Z) = p(A)p(Z)p(X|A, Z)p(Y |A, X, Z) (3.1)

Further, we see that the children of Z are also children of A. This way the graph creates incentive to capture information in Z which is not related to A. This is not enforced explicitly, as for example by Louizos et al. (2015) using a Maximum Mean Discrepancy (Gretton et al., 2007). Instead, as the latent Z space has a restricted dimensionality, the independency is expected to emerge in order to achieve memory efficiency. As all children of the space have access to the sensitive variable, it would be a waste of memory for the space to maintain information about this itself. Hence, all information following from ethnicity will be forced out of the space in order to create memory capacity for other reconstruction. Underlying this expectation is the assumption that the z space indeed has restricted memory, and is not able to create a copy of its children. For the independency to emerge we rely on all children of Z also being children of A.

With a better understanding of the data, it is possible that certain variables can be assumed to be un-influenced by A. Constrained to keeping the graph acyclic, we consider the variables that do affect the rest of the variables but are not influenced by them, constituting G2, shown in Figure 3.3. In order to adhere to this, the variables should be either completely compensated for incoming effects, or be at the very basis of the data generative process. For example, one could think of age as a variable which is not influenced by other variables, but does affect other variables such as work experience. This group of variables is called B from here on, abbreviating Basis.

(17)

Z

X

Y

B

A

Figure 3.3: G2, CEVAE structure including assumptions on Base variables

The Markov Property suggest the following factorisation of the joint distribution for G2.

PG2(A, B, X, Y, Z) = P (A)p(B)p(Z)p(X|A, B, Z)p(Y |A, B, X, Z) (3.2)

The benefit of splitting variables from X into B is to provide the auxiliary model with more complete information these variables. The variables in B can directly be used as input for fair prediction, as B is no descendent of A, and the two are marginally independent. Because the variables in B are independent of A, one could argue that Z would absorb all of the information when the variables are left in X. However, as this is subject to approximation and Z is limited in space, separating B will increase the predictive performance of an auxiliary model.

The concept of Resolving variables, as used by Kilbertus et al. (2017), often shows valuable in ML applica-tions. Therefore, a third general structure is introduced including a separated node for these variables, denoted by the letter R. A resolving variable is a variable which is influenced by the sensitive variable, but is deemed fair to use for prediction of Y nevertheless. In the example of Figure 2.2, the choice of department is such a variable. Although gender influences choice of department, the department choice is seen as a natural reason to influence admission, and the indirect unfairness is resolved. The location of the resolving variables are depicted in Figure 3.4.

Z

X

Y

B

A

R

Figure 3.4: G3, CEVAE structure including assumptions on Base variables and and Resolving variables

In the structure G3, the set P AR consists of {Z, X, B, A}, and only Y is a descendent of R. According to

the implied assumptions, the variables selected as resolving can not influence X, but can be influenced by X. The relation R → Y is subject to the same considerations as discussed for the relation X → Y in Figure 3.2. The structure G3 implies the factorisation:

pG3(A, B, R, X, Y, Z) = p(A)p(B)p(Z)p(X|A, B, Z)p(R|A, B, X, Z)p(Y |A, B, X, R, Z) (3.3)

The proposed structures are proposed as starting point for a general set of application. The final graph however should always be in agreement with the case specific assumptions.

3.2 Causal Inference

Step II of the FairTrade method is to infer the causal relations. This is done by combining observational data and the causal graph of step I in the CEVAE technique as proposed by Louizos et al. (2017). The nodes are modelled as probability distributions. Continuous variables are modelled as Normal distributions, and binary variables as Bernoulli distributions. Details on the distributions are case specific, hence provided in the Experiment setup section. The relations between nodes are parameterised a neural networks. This is achieved by having the network map from an output sample of the source node towards the distributional parameters of the influenced node. The inference network, also called encoder (Kingma and Welling, 2013), maps from the observed variables

(18)

to the latent variables and confounders. In the case of independent relations, this requires multiple functions. As exception, the objective variable Y is not included in the inference network, as to train for an out-of-sample prediction scenario. The generative network, or decoder, becomes a hierarchy of networks in the shape of the causal graph. A forward pass through the generative network is the analogue of performing ancestral sampling in a causal graph. Like in a normal VAE, the CEVAE is optimised by maximising the objective, consisting of a reconstruction term and a regularisation term. The reconstruction term encourages the model to recover the data by taking the log probability of observations occurring under the estimated data distribution. The generative network of CEVAEs generally is hierarchical, opposed to the single data node in a standard VAE. As each node yields its own distribution, each needs a separate reconstruction term. For the structure G3 fore example, this yields the following reconstruction loss over N observations.

LRecon= N

X

i=1

Eq(zi|ai,bi,xi,ri)[log p(xi|bi, ai, zi) + log p(ri|bi, ai, xi, zi) + log p(yi|bi, ai, ri, xi, zi)] (3.4)

Because continuous and binary variables are modelled by different distributions, in practice these will also require separate reconstruction terms, possibly within a single node. The regularisation term creates stability in the inferred distribution of hidden confounders. As proposed by Kingma and Welling (2013), the Z spaced is regularised to be close to a product of standard normal distributions using a KL (def 2.2.1) divergence. Rewriting the KL to represent the divergence between the inferred distribution of z q(zi|ai, bi, xi, ri), abbreviated as q(zi)

and a regularisation distribution p(zi) yields:

DKL(q(zi)||p(zi)) = Z q(zi) log q(zi) p(zi) dz (3.5) = Z q(zi) log[q(zi)]dz − Z q(zi) log[p(zi)]dz (3.6) = Eq(zi)[log[q(zi)]] − Eq(zi)[log[p(zi)]] (3.7)

As the divergence is to be minimised, the regularisation loss for G3 becomes:

LRegul= N

X

i=1

Eq(zi|ai,bi,xi,ri)[log p(zi) − log q(zi|ai, bi, xi, ri)] (3.8)

This gives us the variational lowerbound, following Kingma and Welling (2013) and Louizos et al. (2017) for N observations:

LG3= LRecon+ LRegul (3.9)

In a forward pass during training, the observed values are used for all consecutive steps rather using samples of the approximating distributions. To clarify, in Figure 3.4, the value of X is observed, but in a forward pass also approximately recreated from Z, B and A. As input for R, we are able to choose between the recovered value or the observed value. Because the recovered value might propagate error of earlier relations, and is a variance dependent sample, the observed value of X is used as input for R. The CEVAE optimised using a SGD-based optimizer such as ADAM (Kingma and Ba, 2014).

3.3 Fairness by Causal Path Enabler

Following Kusner et al. (2017), counterfactual fairness is enforced by training an auxiliary model which uses a restricted selection from the graph as input. The input selection determines which paths of the causal model are enabled in the prediction model. This is schematically shown in Figure 3.5 The prediction model is called

Z

X

ˆ

Y

B

R

A

CPE

(19)

CPE, short for Causal Path Enabler. To construct the input selection for a counterfactual fair prediction model, Kusner et al. (2017) formulate a number of criteria and considerations. The criteria for the input selection are in line with this work. First, observed variables which are non-descendants of the sensitive variable are used, these variables have not obtained any causal influence from the sensitive variable. In the proposed graphs G2 and G3, the B variables are of this kind.

Second, the hidden confounder space Z as trained in the CEVAE is used for prediction. When background variables only have children which in turn have the sensitive variable as a parent, the latent space is expected to become marginally independent of the sensitive variable, as explained in the previous section. When using B and Z as input for any auxiliary prediction model, the outcomes are expected to be counterfactually fair, as derived by Kusner et al. (2017).

As a third option of variables to include in the auxiliary model, the resolving variables are considered. As these variables are descendants of the sensitive variable, sensitive information is propagated when using these variables for prediction. Hence, with including these variables, the expectation of counterfactual fairness is lost. Therefore, the user needs to decide with care if the resolving variables are indeed admissible to use under the objective fairness constraints.

3.3.1 Fairness evaluation

With a known ground truth, counterfactual fairness can be evaluated by explicit estimation. Datapoints can be constructed in pairs with as only difference the interventional value of the sensitive variable. According to Definition 2.3.6, the two datapoints should have an equal outcome in a counterfactually fair model.

Without having access to the ground truth, evaluating counterfactual fairness poses a serious challenge. One idea is to assume the CEVAE to successfully reconstruct the data up to a close approximation. Subsequently, the CEVAE model can be used as DGP, and counterfactual instances can be generated. However, as this technique relies on correct inference of the causal model, extended research is needed for practical application. Counterfactual fairness theoretically holds for auxiliary models which have the latent space Z and the base variables B as input (Kusner et al., 2017). Therefore, the approach to evaluate counterfactual fairness is to check if the optimised model is in line with the theoretical expectations. First of all, it is important to test that distributions expected to be independent of the sensitive variable are indeed independent. Second, as counterfactual fairness implies statistical parity, this is used as a metric to evaluate fairness for cases with an unknown ground truth. Note that statistical parity does not rule out the chance of discriminating at an individual level, which is protected under counterfactual fairness.

3.4 Related work

The FairTrade method is a combination of recent work complemented with several contributions. Step I of the method provides an extensive discussion on graph structures for fair applications. This includes three proposed structures to function as starting point for general fair prediction problems. The second step of the method, inference of causal effects using a CEVAE, is closely related to the work of Louizos et al. (2017) and Madras et al. (2019). The first proposes the CEVAE by connecting Deep-latent variable models with causality, and the latter applies this in order create fairness. While Madras et al. (2019) mention counterfactual fairness, it is left for future work. In these previous works using the CEVAE, the interest is focused on a treatment effect. This is in contrast with the presented work, in which a general framework for fair prediction and classification is proposed. Step III of the method pursues fairness based on the work of Kusner et al. (2017) and Loftus et al. (2018). These works provide the guidelines for the selection of input of the CPE. Besides, Loftus et al. (2018) point out the importance of separating the causal inference step and possible corrections on the model to create fairness. Performing these steps simultaneously might distort the causal inference. Applying this separation distinguishes the FairTrade method from the work by Nabi and Shpitser (2018) and Chiappa and Gillam (2018). The work by Kilbertus et al. (2017) clearly explains the idea of resolving variables, which is combined in this work by the notion of Loftus et al. (2018) on how to create prediction models including certain PSE.

As a generalisation result, several known methods can be recognised as special cases of the proposed method. Considering the level of causal assumption the user is willing to make, the VFAE by Louizos et al. (2015) is closely related to the structure G1. Further, on the scale of fairness, counterfactual fairness by Kusner et al. (2017) can be seen as the special case in which only counterfactual fair PSE are allowed as input for the CPE. On the other end of fairness the spectrum, allowing all pathways brings us back to a standard MLP setting, being an optimisation problem without fairness constraints.

An interesting perspective on the FairTrade method appears when considering the similarity with the linear regression model. In the linear regression model, one can account for sensitive effects by including the sensitive variable as explanatory variable. This yields coefficient estimates for the other variables which are neutralised for the effect of the sensitive variable (Heij et al., 2004). Frisch and Waugh (1933) and Lovell (1963) show this

(20)

yields the same coefficient estimates as when correcting the dependent and the independent variables for the sensitive effect prior to the analysis. However, an important misconception, discussed by Frisch and Waugh (1933), is to interpret significant coefficients as the true relation between the variables. Only after carefully structuring the regression model based on a theoretical framework of variable relations, the coefficients can be interpreted as effect estimates of the input on the output (Heij et al., 2004). This model structuring is the same as the causal structuring in step I of the FairTrade method, with as only difference the allowed graph structure. The subsequent usage of the ‘neutralised’ relations between variables and the output coincides with the idea of using the Z space in the CPE. The Z space includes a ‘neutralised’ version of the explaining variable information. Hence, the FairTrade model could be viewed as a step in the direction of generalising the results by Frisch and Waugh (1933) and Lovell (1963) towards a setting with more complicated graph structures and parameterisations.

In the literature perspective, the FairTrade method is valuable in proposing a complete approach for creating fairness in practical ML models. The elaboration on constructing a causal graph for fairness purposes is an important contribution. Further, the method contributes by including Counterfactual Fairness and PSE into recent approaches of creating fairness in practical ML models.

(21)

Chapter 4

Experiment Setup

A number of experiments is conducted to justify, explore and test the above posed method. As a start, the importance of causal reasoning is emphasised by means of a simulation experiment on fairness through unawareness. While this experiment does not offer new theoretical insight, the frequent reference to this solution in practice marks the need for a clear and intuitive example of its fundamental problems.

Further, connecting to the existing literature, the proposed method is applied in the setting of the Infant Health and Development Program (IHDP) dataset (Hill, 2011). In related work, Madras et al. (2019) use this data to experiment in creating fairness through causal awareness.

The exploration of a relevant practical application is fulfilled by a third experiment on data of Dutch receivers of social welfare. In this experiment, unwanted discrimination is prevented while classifying risk profiles for unlawful receivers of the social welfare. In the Netherlands this topic is among the most discussed in the controversy of discriminating algorithms.

4.1 Experiment 1: Fairness Through Unawareness

In practical deployment and public debate, it is still frequently proposed to create fairness through Unaware-ness. Unfortunately, unwanted effects being propagated through proxy variables is a significant shortcoming to this method. To clearly illustrate this, we consider a simple world with only four variables: Ethnicity (E), Neighbourhood (N), Crime committed (C), and Observed crime (O). Observations concern single individuals. Ethnicity is binary, indicating a western background (0) or a non-western background (1). Neighbourhood has four categorical values: North (0), East (1), South (2), West (3). The variables Crime committed and Observed crime are binary variables. The latter indicates if the police observed this person committing a crime, which can only happen if the person did commit a crime. In this world, each observation is generated by the same causal process. The directional relations in this process are depicted in Figure 4.1. The variables of this world are distributed according to the following distributions:

E

N

C

O

Figure 4.1: Causal graph for Ethnicity (E), Neighbourhood (N), Crime committed (C) and Observed crime (O)

E ∼ Bern(p = 0.5) N ∼ Categorical          p(N = 0) = E · 0.1 + (1 − E) · 0.5 p(N = 1) = E · 0.5 + (1 − E) · 0.1 p(N = 2) = E · 0.2 + (1 − E) · 0.2 p(N = 3) = E · 0.2 + (1 − E) · 0.2 C ∼ Bern(p = 0.5 − 0.1 · N ) O ∼ C · Bern(p = (0.8 · E + 0.2 · (1 − E)))

The police in this world has access to 1000 random individual observations, and wants to predict which neighbourhood needs most surveillance. Further, the police knows that people who committed a crime are likely to commit one again. Finally, the people of this world have decided that people’s ethnicity is not allowed to influence any decisions made by the police. The reason for this is that people with a non-western background have been under higher attention by the police, having caused a bias in the data. This is reflected in the higher Bernoulli probability in Observed crime for people with a non-western background.

(22)

When using the Fairness Through Unawareness method, Ethnicity is left out of scope. Hence, the police will only obtain a distribution of observed crimes over the neigbourhoods.

4.2 Experiment 2: Infant Health and Development Program

To develop theory in a controlled environment known from related literature, the IHDP (Hill, 2011) dataset is used. By generating outcomes, a semi-simulated dataset is created. Causal effect inference, Counterfactual Fairness are considered in this experiment, but most important the control over PSE in a causel model is investigated.

4.2.1 Data

The data originates from a study in which the treatment effect of aid care for a target group of low-weight premature infants was investigated. A randomised experiment is run by the Infant Health and Development Program (Ramey et al., 1992). The analysis relies on the non-sensitive proxies of the dataset, which are openly available.1 This obtained set contains 767 observation of 25 covariates, such as birth weight, mothers age, and gender. Following Madras et al. (2019), a generative process is defined in order to obtain outcomes for counterfactual situations. Besides, a process for generating sensitive attribute data is defined, as the observed sensitive attributes are only obtainable under strict security conditions, which complicates efforts to validate the presented work. The steps of the generating process are provided in Algorithm 1, and the generating of the treatment in Algorithm 2. The remaining sub-algorithms are provided in the appendix.

Algorithm 1 Generate IHDP dataset

1: _{procedure GenerateIHDP(x)}

2: xcon← (xcon− mean(xcon))/std(xcon) % normalise continuous variables

3: z ← x[:, idxz]

4: x ← x\x[:, idxz] % remove latent confounder from data

5: GenerateSensitive(x, z)

6: for repetitions do

7: GenerateOutcomes(x, z)

8: _{GenerateTreatments(z)}

Algorithm 2 Generate Treatments from unobserved Confounders

1: _{procedure GenerateTreatments(z)} 2: z ← (z − min(z))/(max(z) − min(z)) 3: p0 ← Clip(α0+ (ζ ∗ z), 0, 1) 4: p1 ← Clip(α1+ (ζ ∗ z), 0, 1) 5: t0 ∼ Ber(p0) 6: t1 ∼ Ber(p1) 7: return t0, t1

In Algorithm 2 the following parameter settings are used: α0, α1, ζ = −0.05, 0.05, 1, and clip(x, u, l) is defined

as: max(l, min(u, x)).

4.2.2 Optimisation

The causal model structuring and optimisation is done in line with the work of Madras et al. (2019). The causal graph is shown in Figure 4.2. The implementation of a basic CEVAE training procedure is made openly

Fairness in Machine Learning models using Causality

MSc Artificial Intelligence

Master Thesis