Examining the stability of counterfactual explanations

(1)

MSc Artificial Intelligence

Master Thesis

Examining the stability of counterfactual

explanations

by

Puja Chandrikasingh

11059842

January 8, 2021

36 ECTS

July 2020 - January 2021

Supervisors:

Ana Lucic, MSc

Dr. Flavia Barsotti

Assessor:

Prof. Dr. Hinda Haned

(2)

The thesis has been done during a 6 months internship at ING.

ING Contact: Flavia Barsotti, Research Coordinator, Model Risk Oversight, Model Risk

Man-agement, Flavia.Barsotti@ing.com

Disclaimer: The views expressed in the thesis are solely those of the author and supervisors

and do not represent the views of ING Bank.

(3)

Abstract

In the context of Explainable Artificial Intelligence (XAI), this thesis focuses on the stability

of counterfactual (CF) explanations. The study compares four existing CF generating methods

(CFGMs) based on different criteria related to both stability and implementation aspects. The

experiments are conducted on two publicly available financial datasets, consisting of data on

loan repayments and on credit card transactions, using two tree-based classification models. In

order to assess the stability of the explanations of the CFGMs we propose a novel stability

framework that intends to provide a rigorous analysis on the stability of CF explanations, which

is an open research area in the field of XAI. The proposed stability framework consists of

(i) our direction based framework (DBF) and (ii) our P −stability measure. We compare the

proposed P-stability measure (based on probabilities) with an existing stability measure (based

on a specific distance function). The experiments performed on the two financial datasets and

a synthetic dataset indicate that, in contrast to the existing measure, the P −stability measure

can distinguish between different levels of stability. Furthermore, the results on the two financial

datasets suggest that the CF explanations of the tested CFGMs are quite similar in terms of

position (i.e. direction and distance) with respect to the original instance. However, there are,

at least for the tested settings, large differences in (i) the computational time, (ii) the number

of instances for which a CF explanation can be found and (iii) the stability of the explanations

according to the proposed stability framework.

(4)

Acknowledgements

I would like to express my gratitude to Ana Lucic, who supervised me on behalf of the University

of Amsterdam. Her enthusiasm and devotion guided and encouraged me to get the most out of

my thesis. I took great pleasure in the brainstorm sessions in which she shared her extensive

knowledge in the field of XAI with me.

Moreover, I want to thank Prof. Dr. Hinda Haned from the Univeristy of Amsterdam for

taking the time to assess my thesis. Furthermore, I am grateful to Chiel Bakkeren, who was the

head of my current team at ING, for his believe in me as he gave me the opportunity of writing

two theses (AI and Actuarial Science) at ING. Subsequently, I would also like to thank the

entire Data Science Model Validation team at ING for their kindness and sincere interest in my

research while we were mostly working from home due to the turbulent times of the COVID19

virus.

I would particularly like to thank Dr. Flavia Barsotti, who supervised me on behalf of ING.

She worked tirelessly and spared neither time nor effort to assist and advice me on both my

thesis and my future career. In addition to learning a lot on research in the banking sector, I

also learned a lot from her on working in the financial industry.

Lastly, I would like to thank my parents and my siblings for providing me with comfort and

warmth during my entire academic career. I took great pleasure in working at the same desk as

my little sister, Jaya, who stimulated me to reach for the stars and at the same time have a lot

of fun.

(5)

Introduction

Receiving $13 trillion for implementing Artificial Intelligence (AI) in daily business by 2030

should be possible according to Bughin et al. (2018) from the McKinsey Global Institute. Chui

(2017) shows that the implementation has already started in various industries ranging from

health care (Ahsan et al., 2020; Wang and Preininger, 2019) to e-commerce (Adam et al., 2020)

to mobility (Karpathy, 2019) to (social) media (Covington et al., 2016; Amatriain, 2013; Bakshy

et al., 2015). This thesis focuses on the financial industry as it considers public available datasets

based on applications from the banking sector. The banking sector is already embedding AI for

modelling and has the potential to progressively increase its use for different types of models. As

an example, some banks already use AI applications for the development of credit score models

as discussed by Königstorfer and Thalmann (2020).

As with every new invention or technology, AI first needs to be accepted and trusted before

its full potential can be reached (Fernández et al., 2019). This requires transparency among

other principles (Shin and Park, 2019; Jobin et al., 2019; Toreini et al., 2020). Transparency

measures the degree to which the model is understandable and explainable (Shin and Park, 2019;

Olteanu et al., 2019). The importance of transparency for trust in models is strictly connected

to the research field of Explainable AI (XAI), which is becoming increasingly important as

regulation has recognised the right to explanations. This means that companies are obliged to

provide the customer with an explanation for a(n) (adverse) decision made by algorithms used

by the company. The Fair Credit Reporting Act (FCRA) in the United States and the General

Data Protection Regulation (GDPR) in Europe are examples of such regulations (Wachter et al.,

2017).

The explanation methods within the field of XAI can be classified as global or local methods

(Das and Rad, 2020). Global methods consider how a model behaves in general, while local

methods aim to explain why in a specific case a specific prediction is made. Given a local

perspective, one type of explanations that could potentially address a set of the new regulatory

demands are counterfactual (CF) explanations (Barocas et al., 2019; Wachter et al., 2017). Lucic

et al. (2020) define these explanations as the minimal perturbation to an input instance such that

the prediction changes. Hence, a CF explanation is specific to one instance and it argues how the

(8)

CHAPTER 1. INTRODUCTION

2 instance should have been different in order to get classified as another (opposite) class. The CF

explanation is based on the difference between the original instance and a CF example. A CF

example is an instance that is as similar as possible to the original instance but has a different

prediction. Karimi et al. (2020a) provide a survey of all CF generating methods (CFGMs).

Barocas et al. (2019) argue that the potential to satisfy new regulation is one of the drivers

behind the fast growing amount of literature available on CFGMs. Besides this, there are more

reasons for the interest in obtaining CF explanations. First, Fernández et al. (2019) illustrate

that humans can intuitively understand CF explanations as they are constructed parallel to the

way humans think. Second, CF explanations use less features than other explanations in order

to clarify a prediction as discussed by Van der Waa et al. (2018). They argue that this implies

that a more precise distinction can be made between the differences across classes.

The main focus of this thesis is on the stability of CFGMs. If a CFGM is stable, each

expla-nation should be coherent locally, i.e. its neighbours should have similar explaexpla-nations (Laugel

et al., 2019). In other words, the stability of a CFGM should measure the degree of similarity

between the CF examples (and thus CF explanations) of comparable instances. Hence, the more

similar the CF examples of comparable instances are, the more stable the CFGM is.

Having stable CF explanations is important from a banking perspective as it is linked to the

trust in these explanations. Laugel et al. (2019) highlight the importance of trust in explanations

if they are for example used to explain to customers the decision made in a loan application

setting (e.g. similar individuals should expect similar explanations). Furthermore, Tyler et al.

(2007) and Cvetkovich et al. (2002) discuss how the lack of trust can negatively impact the

industry.

Moreover, obtaining the stability of CFGMs is an open research area. The reason for this

is that there is not a well defined measure that captures the stability of CFGMs (Laugel et al.,

2019). This is striking as stability is essential for all interpretability methods (Laugel et al., 2019;

Alvarez-Melis and Jaakkola, 2018b). To the best of our knowledge there are at this moment

only two stability measures that can be applied to CFGMs, namely the Jaccard index and the

L−stability measure. The Jaccard index, which is defined by Ivchenko and Honov (1998), can

be used to compare sets (of CF examples or changed features) with each other. Hence, this

measure is not applicable to all CFGMs since it needs several CF explanations per instance

(Guidotti et al., 2019). Many methods, however, only provide one optimal CF explanation per

instance.

The other stability measure, the L−stability measure, is based on the Lipschitz continuity

and it compares the Euclidean distance between similar instances to the Euclidean distance

between the corresponding CF examples. There are three main shortcomings of this method:

(i) it is influenced by the stability of the underlying classification model, which makes it difficult

to determine where the (in)stability comes from (Laugel et al., 2019), (ii) it has no ideal value

(Alvarez-Melis and Jaakkola, 2018b) and (iii) it implicitly implies that the Euclidean distance

is the best measure for similarity, which is not necessarily always the case.

(9)

CHAPTER 1. INTRODUCTION

3 Besides the stability of CFGMs, several criteria related to the implementation aspects are

important for a bank. For example, regulators have given all customers the right to receive an

explanation when they ask for it (e.g. FCRA and GDPR). This implies that it is crucial to

have a method that can find an explanation for each individual. For CFGMs, this boils down

to the requirement that the CFGM should be able to find a CF explanation for each instance.

Additionally, it is desirable that the CF explanation can be found within reasonable time limits

(considering IT and computational time constraints). Lastly, the position of the CF

explana-tion in the feature space is important as a company should consider which CFGM provides

explanations that are most relevant or easy to work with for its customers and employees.

1.1 Research questions

Given the bank’s perspective, the leading research question in this thesis is defined as follows.

RQ1 How do existing CFGMs differ in terms of:

(a) the number of instances for which a CF example is found;

(b) the computational time;

(c) the similarity between the CF examples in terms of position with respect to the original

instance (i.e. direction, distance, number of changed features, the magnitude of the

changes and how often the changes have no effect on the classification model);

(d) the stability of the CFGM itself.

In particular, we look at the four CFGMs that are given below.

1. Contrastive Explanations with Local Foil Trees, which is introduced by Van der Waa et al.

(2018). We denote it as CE. This method is model-agnostic (i.e. it can be used for all

classification models) and heuristic (Karimi et al., 2020a).

2. Actionable Feature Tweaking, which is proposed by Tolomei et al. (2017). We will

ab-breviate this method as FT. The method is model-specific (i.e. it cannot be used for all

classification models) and heuristic (Karimi et al., 2020a).

3. Flexible Optimizable CoUnterfactual Examples for Tree EnsembleS, which is constructed

by Lucic et al. (2020) and by them denoted as FOCUS. This method utilises gradient-based

optimisation and is not model-agnostic (Karimi et al., 2020a).

4. Distribution-Aware Counterfactual Explanation, which is developed by Kanamori et al.

(2020) and by them abbreviated as DACE. This method is model-specific and it utilises

MILP-based optimisation techniques (Karimi et al., 2020a).

We analyse the stability of the listed CFGMs combined with tree-based classification models.

The reason for this choice is that these models are often used with tabular data, which is common

in the banking industry (Bedeley and Iyer, 2014).

(10)

CHAPTER 1. INTRODUCTION

4 In order to evaluate the fourth criterion in RQ1 we propose a novel measure for stability

that can be applied to any CFGM. This is considered in the following research question.

RQ2 How can we construct a stability framework that is able to (i) assess the stability of all

CFGMs (ii) without being influenced by the classification model and (iii) without assuming

a specific distance function?

In this thesis we propose a direction based framework (DBF) to filter out the effect of the

clas-sification model and the P −stability measure that is based on probabilities instead of distance

functions to assess the stability of a CFGM. The P −stability measure compares (i) the

probabil-ity that a CF example of an instance belongs to the set of CF examples of similar instances with

(ii) the probability that it belongs to the set of training data instances with the same prediction.

For a stable CF generating method, the first probability should be higher than the second as

the CF examples of comparable instances should be more similar to each other than to other

(random) instances that have the same predicted class.

To evaluate our novel stability framework and the proposed stability measure in particular,

we examine RQ3.

RQ3 How does the P −stability measure perform compared to the L−stability measure in terms

of ability to differentiate between stable and unstable CFGMs?

The remainder of this thesis is organised as follows. Chapter 2 dives into the theory behind

CF explanations. Chapter 3 discusses the existing stability measures. Thereafter, we propose

our own stability framework in Chapter 4 in order to answer RQ2. Subsequently, we describe

the experimental setup needed for RQ1 and RQ3 in Chapter 5. The results for these two

research questions are elaborated on in Chapter 6. Finally, Chapter 7 presents the conclusion

and discussion.

(11)

Chapter 2

Theory behind counterfactual

explanations

This chapter elaborates on CF explanations and their position in the field of XAI. This broadens

the scope to which the stability framework can be applied. Subsequently, this chapter briefly

discusses different methods that can be used to obtain CF explanations in order to illustrate the

type of methods for which the stability needs to be determined.

2.1 Counterfactual explanations in XAI

CF explanations are used to clarify the behaviour of AI classification models, i.e. they explain

why a classification model makes a certain prediction for a specific instance. In order to construct

a CF explanation, we need a CF example which can be found by a CFGM. A CF example of

instance x, denoted as CF (x), is defined as an instance that is similar to x but has an alternative

prediction. The difference between the original instance and its CF example is defined as the CF

explanation. The mathematical version of the CF explanation, i.e. CF explanation = CF (x)−x,

can be translated into a sentence. In a loan approval situation, this could for example result

in explanations as 1) the client’s loan is not approved because the income is $10.000 too low or

2) if the client’s income was $10.000 higher, the loan would have been approved. This is a very

simple example and in reality more than one variable might need to change in order to flip the

prediction. These would then be added to the explanation.

Within the field of XAI there are other explanation methods that give explanations similar

to CF explanations. For example, adversarial perturbations, also known as adversarial

exam-ples (AEs) or adversarial attacks, and CF explanations have in common that they solve the

same problem, i.e. finding the minimal perturbation needed to flip the prediction of an

in-stance (Freiesleben, 2020). In addition, Wachter et al. (2017) state that AEs are essentially CF

explanations for instances that are being classified with a deep neural network (NN), such as

resnet.

(12)

CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS

6 A difference between AEs and CF explanations is the purpose that they have as described by

Karimi et al. (2019a) and Freiesleben (2020). AEs aim to fool a classifier, while CF explanations

aim to explain the classifier in terms of what must happen in order to change the prediction.

In other words, AEs assess the robustness of a classification model rather than explaining it

(Ustun et al., 2019). Another difference between CF explanations and AEs is that the latter

is frequently used with images while the first is used on tabular data (Karimi et al., 2019a).

This thesis focuses on the use of tabular data. Hence, we do not evaluate AE methods with

the proposed stability framework. Note, however, that any stability metric that is defined for

CFGMs can be applied to methods that generate AEs as their structures are similar.

Another related form of explanations is algorithmic recourse, which offers the same type of

explanation as CF explanations. The difference between explanations based on CFs and the

ones based on recourse is that the methods of algorithmic recourse focus on finding a feasible

example (i.e. an example that is inline with/can happen in the real world), while a majority

of the CFGMs focuses on finding the most similar example (Karimi et al., 2020b). Note that

it is debatable whether we should target feasibility as what is feasible can differ per instance

(Venkatasubramanian and Alfano, 2020). The methods, which are used to generate algorithmic

recourse examples, are closely related to CFGMs and therefore recourse methods can also be

evaluated with stability measures that are made for CFGMs. A difference between the methods

is that the algorithmic recourse methods often require expert knowledge in order to determine

what is and is not “feasible”. Hence, we focus on CF explanations.

Besides methods that provide explanations that are similar to CF explanations, there is a

wide range of other methods within the field of XAI. First, CFGMs belong to the class of local

XAI methods. Local methods aim to explain why, in a specific case, a specific prediction is made.

Global methods, on the other hand, consider how a model behaves in general. An example of

another local explanation method is the gradient-based saliency map for Convolutional Neural

Networks (ConvNets) introduced by Simonyan et al. (2013). These maps can explain a certain

prediction by showing on which pixels the prediction was based. An example of a global method

is the Global Attribution Mapping (GAM) which is proposed by Ibrahim et al. (2019) and ranks

the most important features for the entire model.

Second, CF explanations are post-hoc explanations. This means that the explanations are

obtained by analysing the output of a model after it has been trained (Das and Rad, 2020). This

is different from intrinsic explanations, as for this type of explanation the model is intrinsically

interpretable (such as a linear regression). The Neural Additive Model (NAM) introduced by

Agarwal et al. (2020) is an example of a method that uses intrinsic explanations. This method

restricts the structure of NNs by using a NN for each input variable. This ensures that there

is no interaction between features, which implies that the effect of a feature can be explained

directly. The other methods that are mentioned in this section are all post-hoc (Das and Rad,

2020).

(13)

CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS

7 Aside from being local or global and post-hoc or intrinsic, an explanation method can be

model-agnostic or model specific (Das and Rad, 2020). Model-specific explanation methods can

generate explanations for one specific model type only as they typically use the inner working of

the model. Model-agnostic explanation methods, on the other hand, can generate explanations

for all model types. These methods are usually query-based so that they do not use the inner

workings of the model. The GAM method is an example of a model-agnostic method as it can

generate global explanations for all model types and not only NNs. An example of a

model-specific explanation method is NAM (Das and Rad, 2020). CFGMs can be both model-agnostic

and model-specific. Note that more information about the diffent types of XAI methods can be

found in the work of Das and Rad (2020) as they consider and classify all XAI methods from

2007 to 2020.

2.2 How to obtain counterfactual explanations?

There are over fifty CFGMs as can be seen from the survey paper of Karimi et al. (2020a). In

this thesis, we examine four methods that differ in the chosen approach for finding the minimal

perturbation, i.e. the CF explanation. This section considers other approaches as well in order

to clarify that the proposed stability framework can be used for any CFGM. The underlying

reason for this is that all CFGMs, regardless of the chosen search approach, need to chose an

optimal CF example for an instance. In order to separate the optimal CF example from the

other examples, an objective function is required. This is one of the assumptions of the proposed

stability framework. After discussing the different search approaches, this chapter considers some

objective functions that can be used. The difference in what is being minimised by the objective

functions motivates why the stability framework utilises probability distributions instead of

distance functions in order to capture similarity.

2.2.1 Counterfactual generating methods

The approach that is used most often to find CF examples is the heuristic approach as can be

seen from the survey of Karimi et al. (2020a). Two CFGMs that use this approach are the CE

method introduced by Van der Waa et al. (2018) and the FT method proposed by Tolomei et al.

(2017). The stability of these methods is evaluated in this thesis. Section 5.1.3 considers these

methods in more detail. Another example of a heuristic method is the CFGM defined by Laugel

et al. (2017) that uses the two-step heuristic growing spheres algorithm.

Besides the utilisation of heuristics, CF explanations can also be found by applying an

optimisation-based approach. This thesis considers one method that uses gradient-based

opti-misation (FOCUS) and one that uses (mixed-)inter linear programming (i.e. (M)ILP) based

optimisation (DACE; see Section 5.1.3). The latter has been introduced in order to solve the

problem of finding the closest CF example while keeping certain constraints into consideration.

For example, Ustun et al. (2019) have used this approach in order to force the CF example to

(14)

CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS

8 be feasible, i.e. to be consistent with a recourse example.

Using a gradient-based optimisation approach implies that CF examples are generated by

utilising the gradients of a loss function (an objective function). This is done by Wachter et al.

(2017) with a rather simple loss function that captures the distance and a part that forces

the prediction to flip. Mothilal et al. (2020) use an extended distance function that promotes

diverse CF examples. Note that in principle gradient-based CFGMs are not model-agnostic as

the classification model needs to be differentiable. Hence, in general these methods will not work

for tree-based classification models. The method evaluated in this thesis (FOCUS) is, however,

applicable to tree-based classification models. Section 5.1.3 elaborates on this.

Both the (M)ILP-based an the gradient-based optimisation approach are often used in

CFGMs as can be seen in the survey constructed by Karimi et al. (2020a). Another approach

that is used multiple times is the genetic algorithm approach. For instance, Sharma et al. (2019)

apply a genetic algorithm to directly solve an objective function with the constraint that the

prediction is flipped for the CF example.

Two other approaches are applied by Karimi et al. (2019b) and Poyiadzi et al. (2020) among

others. Karimi et al. (2019b) constructed MACE (model-agnostic counterfactual explanations)

that consists of transforming a minimisation problem with an objective function plus constraints

to a sequence of logic formulae. This means that the problem is turned into several satisfiability

(SAT) problems that are solved with a SAT solver (instead of solving the objective and

con-straints with a MILP application). The method suggested by Poyiadzi et al. (2020), FACE, is

based on graph theory. The method finds the optimal CF example by constructing the shortest

path through the high density regions of the input space. The authors argue that this leads to

feasible and actionable CF examples.

2.2.2 Objective function for nearest counterfactual

Before we define the objective function that is used to find the optimal CF example, we discuss

the notation used in this thesis. The original instance is denoted as x, which is a vector that

contains the characteristics or input features of the instance. The CF example of x is denoted as

CF (x). Note that CF (x) results from the CFGM. This implies that the CFGM can be seen as

a function CF (·). Each observation has D dimensions and the value of dimension i of instance

x (CF (x)) is denoted as x

i

(CF (x)

i

). Furthermore, the classification model M is probabilistic.

For each x there is a probability that it belongs to class y, where the probability is denoted as

M(y|x). Lastly, given a condition a, the indicator function

1

_{a}

is one if condition a holds and

zero otherwise.

In general the objective function of the CFGMs, i.e. the loss functions that is being minimised

by these methods, can be defined as

L

_objective

= L

f lip

+ L

similarity

+ L

additional constraints

(2.1)

where L

_{f lip}

is the part of the objective function that stimulates the prediction of the CF example

to be opposite to the prediction of the original instance, L

similarity

is the part that aims to

(15)

CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS

9 make the CF example and the original instance as similar as possible and L

_{additional constraints}

is the part that stimulates the CF example to posses some characteristic such as feasibility or

actionability. In all CFGMs the second term, L

_similarity

, is directly given in the objective function

that is being minimised. This term is usually equivalent to a distance function. The other two

terms can be included in the objective (e.g. in gradient-based methods) or as constraints (e.g.

in (M)ILP approaches).

Mathematically, the first term of eq. (2.1) can be defined as follows. In case L

f lip

is

incorpo-rated as a constraint, then the constraint can, as in the work of Sharma et al. (2019), be defined

as:

arg max

y

M(y|x) 6= arg max

y

M(y|CF (x)).

If by contrast the first term is part of the objective, then this part can be represented by the

hinge-loss as argued by Lucic et al. (2020):

1

{arg maxyM(y|x)=arg maxyM(y|CF (x))}

· M(y|CF (x)).

Hence, this loss is active as long as the prediction has not flipped, which stimulates the CF

example to move in the direction for which the prediction will flip. Another possibility is to

use the squared difference between the actual predicted class (or probability value) of the CF

example and the class (or probability value) it should have, as is done by Wachter et al. (2017).

In order to keep pushing the prediction to flip, they multiply this part of the loss with a scalar

that they increase in each round of the minimisation process. Hence, this loss is also active

as long as the prediction of the CF example is not opposite to the prediction of the original

instance.

Name Formula Goal is to minimise the As in Euclidean (L2-norm) p (CF (x) − x)T_{(CF (x) − x) =} pP i(CF (x)i− xi)2

absolute value of the

change in each feature Lucic et al. (2020)

Weighted

Squared L2-norm

P

istd1i×(CF (x)i− xi)

2

absolute value of the change in each featurewhile accounting for standard deviation (std)

Wachter et al. (2017) Mahalanobis with covariance Σ (standardised L2-norm if Σ is diagonal) p (CF (x) − x)T_{Σ(CF (x) − x)}

distance (dissimilarity) between the points while accounting for the distribution they are from

Kanamori et al. (2020) Lucic et al. (2020) Cosine x T_{CF (x)} ||x||·||CF (x)||

=

P ixiCF (x)i

√

P ix 2 i

√

P iCF (x) 2 i change in relationship

between features Lucic et al. (2020)

Weighted Manhattan (L1-norm) P i |CF (x)i− xi|P 1 x0∈Xx0i−mediani

number of features that are changedwhile accounting for the intrinsic volatility of the input space

Lucic et al. (2020)

Wachter et al. (2017)

Table 2.1: Several distances functions that measure the distance between x and CF (x) (in

gray

(16)

CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS

10 While the goal of the first term is well defined, the definition of the distance term, L

_similarity

,

is open for interpretation. In fact it is one of the undefined areas in the CF literature (Barocas

et al., 2019). The distance function has to be chosen in accordance with the belief about how

instances can change the easiest. For example, if one believes that an instance can change easily

as long as the changes that it has to make are small, then the Euclidean distance (L2-norm)

should be used. If one, on the other hand, suspects that it is the number of changes instead of the

magnitude of the individual changes that is driving the easiness of change, then the Manhattan

distance (L1-norm) is better suited than the Euclidean distance. Table 2.1 summarises some of

the distance functions that are used in literature.

The last term of of eq. (2.1), L

_{additional constraints}

, is optional. Several authors have proposed

specific constraints in their (M)ILP optimisation problem in order to force actionability or

fea-sibility of the CF explanation. For example, Ustun et al. (2019) introduce two constraints to

ensure feasibility. The constraints assure that each feature in the CF example is on a certain grid

with feasible values. The CFGM of Poyiadzi et al. (2020), which is based on a graph approach,

can also take custom conditions into consideration in order to create feasible and actionable CF

explanations.

(17)

Chapter 3

Related Work

There are, to the best of our knowledge, only two stability measures that can be applied to

CFGMs. These methods, their drawbacks and the results that have been obtained so far are

discussed in this chapter.

3.1 Jaccard index

The Jaccard index is a stability measure from the field of statistics. Given two sets of instances

X

1

and X

2

, the Jaccard index, J , is defined as

J =

|X

1

∩ X

2

|

|X

1

∪ X

2

|

(3.1)

in which the numerator reflects the number of instances that are both in X

1

and X

2

, while the

denominator captures the number instances in the union of the two sets (Ivchenko and Honov,

1998).

In the context of the stability of CFGMs, the Jaccard index could be applied by comparing

the set of features that need to be changed according to the CF explanation of one instance in

different runs as is done by Guidotti et al. (2018, 2019). They generated for each instance ten

CF explanations in ten runs. The Jaccard index then considers the sets of features that needed

to be changed in the different runs.

There are two shortcomings of calculating the Jaccard index in this way. First, this procedure

implies that the CFGM should have a stochastic component, which is not always the case. For

example, one would obtain deterministic results from a method that systematically evaluates

all possible changes before selecting the optimal CF explanation. Second, in this procedure the

Jaccard index measures how comparable the explanations are for exactly the same instance.

This is important, but not sufficient when the stability of a CFGM is quantified. The reason for

this is that the stability measure of a CFGM should also consider similar original instances as

a CFGM is stable if each explanation is coherent locally, i.e. its neighbours should have similar

explanations (Laugel et al., 2019).

Another way to apply the Jaccard index on CFGMs is to compare the sets of CF explanations

for similar initial instances. This solves the two shortcoming discussed above. However, this

(18)

CHAPTER 3. RELATED WORK

12 approach has two pitfalls on its own. First, the CFGM should be able to generate more than one

CF explanation for each instance, which many CFGMs do not do. Second, the final value of the

Jaccard index is influenced by the objective function. For example, Mothilal et al. (2020) focuses

on creating a diverse set of CF explanations. More diverse explanations imply that the instances

are further apart from each other. Hence, in this case the resulting Jaccard index might differ

more for similar original instances than when the only objective for CF explanations is to be as

close as possible to the original instance.

The stability framework that we propose does not require a stochastic component in the

CFGM nor does it restricts the CFGM to generating multiple CF examples per instance as it

only needs one CF example per instance. Furthermore, the stability framework does consider

the neighbours of the original instance in order to determine the stability. Hence, it is in line

with the definition of stability used by Laugel et al. (2019). We do not compare against the

Jaccard index, as the CFGMs which we evaluate do not produce multiple CF examples for the

same instance.

Our work compares with the work of Guidotti et al. (2018, 2019) as follows. Guidotti et al.

(2018, 2019) only evaluated model-agnostic methods while in this thesis a model-agnostic method

and three model-specific methods are considered. As in the work of Guidotti et al. (2018, 2019)

this thesis focuses on methods that provide local explanations.

3.2 L−stability

The L−stability measure is introduced by Alvarez-Melis and Jaakkola (2018a). Their measure

is based on the Lipschitz continuity, which is a known calculus measure for the stability of a

function. For instance x and corresponding CF explanation CF (x) the stability measure solves

the optimisation problem:

argmax

x0_∈B ε(x)

||CF (x) − CF (x

0

)||

2

||x − x

0

_||

2

(3.2)

where B

ε

(x) is a circle of radius ε and centre x. This implies that for each point in B

ε

(x) the

fraction of the distance between the CF explanations and the original instances is calculated.

Subsequently, the highest fraction is chosen as the quantification of the degree of stability for

instance x. The highest fraction itself is the value of the stability measure for instance x. The

stability measure that solves eq. (3.2) assumes that the stability is continuous, which might be

too strict especially in the case of discrete features. Hence, Alvarez-Melis and Jaakkola (2018b,a)

modified eq. (3.2) which resulted in a weaker stability measure, which is in this thesis referred

to as the L−stability. The measure solves the following optimisation problem:

argmax

x0_∈N ε(x)

||CF (x) − CF (x

0

)||

2

||x − x

0

_||

2

(3.3)

where N

_ε

(x) = {x

0

∈ X | ||x − x

0

_||

2

≤ ε}. Hence, B

ε

(x) is replaced by a finite set of instances

(19)

CHAPTER 3. RELATED WORK

13 Note that this thesis refers to the resulting maximum value of the fraction in eq. (3.3) as the

L−stability.

The L−stability does not require the CFGM to be stochastic or to be able to generate sets of

CF explanations, while the Jaccard index does. There are, however, still three major drawbacks

of this measure: (i) it is influenced by the stability of the underlying classification model, (ii) it

has no ideal value and (iii) it implicitly implies that the Euclidean distance is the best measure

for similarity, which is not necessarily always the case.

Influence of the classification model

The first drawback of the L−stability measure is that it does not differentiate between the effect

of the CF generating model and the effect of the classification model (Laugel et al., 2019). This

implies that the L−stability could be biased. This can be explained as follows. Consider a

case in which the CFGM is by construction stable. Hence, it always discovers the nearest CF

explanation if its objective is to minimise the distance. In this case the L−stability should be

optimal as the CFGM always finds the optimal CF explanation. Note that the optimal value of

the L−stability is not predefined (Alvarez-Melis and Jaakkola, 2018a). However, one could still

assume that the classification model should not be able to influence the value of the L−stability.

If it can influence the value of the L−stability then the L−stability might be biased.

Figure 3.1 illustrates the effect the classification model can have on the L−stability value.

In this figure the L−stability of a stable CFGMs is compared to the L−stability of an unstable

CFGMs. The CFGM in Figure 3.1a and Figure 3.1c is stable by construction, while Figure 3.1b

and Figure 3.1d illustrate what could happen with an unstable CFGM (and respectively the

same classification models). We see that even for the stable CFGMs the L−stability value

dif-fers per classification model, which is the only difference between both situations. We postulate

that this might be problematic for a stability measure, as a stability measure of a certain CFGM

should only be influenced by that CFGM. Otherwise the resulting value of the measure might

be biased by the stability of the underlying classification model, which might happen for the

L−stability measure.

Ideal value

The second disadvantage of the L−stability measure is that there is no ideal value as discussed

by Alvarez-Melis and Jaakkola (2018a). They state that for different goals of interpretability and

even for different applications, the most reasonable value differs. Alvarez-Melis and Jaakkola

(2018b) argue that the method with the lowest L−stability value is the most stable method as

eq. (3.2) and eq. (3.3) are defined to find the worst case scenario. Note that this implies that an

absolutely stable CFGM would have a value of zero. However, eq. (3.3) shows that a L−stability

close to zero means that the numerator is small while the denominator is large. This in turn

implies that the CF examples are close to each other while the original instances are not close to

each other. This means that dissimilar instances have similar CF examples, which is not optimal

(20)

CHAPTER 3. RELATED WORK

14 from a stability perspective. Note that this fraction is the maximum, hence the other instances

are even further situated from each other, i.e. even more dissimilar instances have comparable

CF examples.

1

a. Classification boundary is a line on the right, stable CFGM. L−stability is 1 due to x03

b. Classification boundary is a line on the right, unstable CFGM. L−stability is 5 due to x03

c. Classification boundary is a circle, stable

CFGM. L−stability is 10.5 due to x02 d. Classification boundary is a circle,

unsta-ble CFGM. L−stability is 16 due to x02

Figure 3.1: Example influence classification model

Note. The difference in L−stability in case a and c illustrates the effect that the classification model has on the estimate of stability of the CFGM. The difference in L−stability between the stable and unstable CFGMs, i.e. between a and b and between c and d, highlights that the optimal value for the L−stability is ambiguous.

1

Note that in this discussion the distance is directly related to similarity. The reasoning behind this is that the link is indirectly defined in the definition of the L−stability as eq. (3.2) and eq. (3.3) are said to represent the worst case scenarios.

(21)

CHAPTER 3. RELATED WORK

15 Figure 3.1 also reveals the ambiguity of the optimal value of the L−stability. For both

clas-sification models, i.e. the straight line and the circle, the L−stability of the stable CFGM is

lower than the L−stability of the unstable CFGM. Hence, one could argue that the higher the

L−stability the less stable the CFGM is. However, if one would compare the situation as in

Figure 3.1b with the situation as in Figure 3.1c, one would conclude that the CFGM in the last

situation is less stable. This is striking as the CFGM is constructed to be stable, while the first is

constructed to be unstable. Comparing these two situations is not illogical as in reality it might

be unclear how the classification model behaves or how stable the CFGM is by construction.

Similarity assumption

The third downside of the L−stability is that it uses the Euclidean distance, which raises several

problems. For example, a CFGM that minimises the Euclidean distance might incorrectly

ap-pear to be more stable than CFGMs that minimise other distance functions. Another example

is that by using the Euclidean distance, the L−stability implicitly assumes that an instance

can change easily as long as the changes that it has to make are small. This could, however,

differ per situation as discussed in Section 2.2.2. One can consult the work of Lucic et al. (2020)

to examine the effects that minimising different distance metrics can have on the resulting CF

explanations.

In the academic literature, the L−stability measure is used by Alvarez-Melis and Jaakkola

(2018a,b). They compared the stability of five gradient-based specific and two

model-agnostic explainability methods with each other. Their findings are twofold. First, they notice a

trade-off between how accurate a model is and how stable the model is. Second they found that

model-agnostic methods are in general less stable than model-specific methods. In this thesis

we will examine if this holds as well for the explanation methods and stability measures used in

our experiments.

Unlike Alvarez-Melis and Jaakkola (2018a,b), we evaluate on multiple types of model-specific

CFGMs (as opposed to solely gradient-based CFGMs). Moreover, we evaluate the stability of

CFGMs with the L−stability measure (as proposed by Laugel et al. (2019)), while Alvarez-Melis

and Jaakkola (2018a,b) consider other type of XAI methods (not CFGMs).

(22)

Chapter 4

Method

This chapter proposes a stability framework consisting of the direction based framework (DBF)

and the P −stability measure. This framework fulfils the requirements in RQ2, i.e. the

sta-bility framework can (i) assess the stasta-bility of all CFGMs (ii) without being influenced by the

classification model and (iii) without assuming a specific distance function.

Note that we assume that a stability measure for CFGMs should capture if explanations are

coherent locally as in the work of Laugel et al. (2019). In other words, similar instances should

have similar CF explanations.

Furthermore, note that we make the simplification that all classification problems are binary.

This is, however, without the loss of generality as any classification problem can be converted

into a binary classification problem in the context of CF explanations (i.e. one versus all

clas-sification). Karimi et al. (2020a) also argue that regression problems can also be converted into

a binary classification problem if a threshold is used.

4.1 Direction based framework

The DBF is introduced in order to assure that the classification model does not influence the

assessment of the stability of a CFGM. Section 4.1.1 introduces the two pillars on which the

DBF is based and Section 4.1.2 describes the framework itself.

4.1.1 Pillars

The first pillar on which the DBF is based is defined as follows.

P1:

If a CFGM is able to produce a CF example,

the CF example is locally optimal according to the CFGM.

The motivation behind this pillar is that each CFGM utilises an objective function. Hence, the

final CF example needs to be optimal according to that function. This implies that at least a

local optimum is found.

Based on P1 we can formulate the following proposition for a stable CFGM which we can

prove by contradiction.

(23)

CHAPTER 4. METHOD

17 Prop. 1:

For a stable CFGM it holds that:

∀x

0

between x and CF (x) ⇒ CF (x

0

) is close to CF (x)

Proof:

_{According to P1, both CF (x}

0

_{) and CF (x) are the best CF examples that can be}

found for x

0

and x respectively. This implies that given x we should move in the

direction of CF (x) in order to get the to its optimal CF example, namely CF (x).

Hence, if we take a point on this route, i.e. x

0

, it should be the case that we have

to continue on our route to the optimal CF example, i.e. CF (x), in order to find

CF (x

0

).

Because of slight differences between x and x

0

, CF (x) and CF (x

0

) could

also differ slightly. However, if CF (x

0

) lies far from CF (x) it means that we have

deviated from the optimal route, which implies that (i) either our old route was

not optimal or (ii) the new route is not optimal. This would imply that either

CF (x) or CF (x

0

) is not a local optimum, which contradicts P1.

Hence, given P1 and a stable CFGM, each observation x

0

between the original

observation, x, and its counterfactual example, CF (x), has a CF example close

to CF (x). This is exactly what Prop. 1 states.

Note that we only considered the implication of P1 for points x

0

that are between x and

CF (x). The reasoning behind this is that the points that are not between x and CF (x) do not

follow the same optimal route to a CF example by definition. This is a consequence of CF (x)

being a local optimum instead of a global optimum.

The second pillar, P2, can be formulated as follows.

P2:

If the stability of a CFGM is determined by taking an original

ob-servation x and assessing if for x

0

similar to x, CF (x) is similar

to CF (x

0

), then only the x

0

that have the same prediction as x can

be considered.

If this doesn’t hold, i.e. if all x

0

are considered, then CF (x) and CF (x

0

) could belong to different

classes. It is not meaningful to compare the similarity across classes for the stability measure as

it is not expected that two different classes lie arbitrarily close to each other.

4.1.2 Framework

The DBF is given in Framework 4.1. It requires an original instance, the CFGM, the number

of instances it needs to create and the chosen stability measure as input. The first step is to

obtain the CF example for the original instance x. Step two (a-c) consists of creating points

between x and CF (x) (due to P1 and Prop. 1). Step three assures that P2 is satisfied, i.e. it

removes all created instances that do not have the same predicted class as x. Subsequently, the

(24)

CHAPTER 4. METHOD

18 CF examples are generated for the remaining instances in step 4. The last step is to calculate

the value of the chosen stability measure.

The DBF enables the chosen stability measure to quantify the stability of a CFGM while

filtering out the effect that the decision boundaries of a classification model can have on the

stability value. This is done by restricting the instances that are considered to be similar to the

original instance (by using P1, P2 and Prop. 1). Hence, the DBF assures that the final stability

value is independent of how stable the underlying classification model is.

Framework 4.1: Direction based framework

Inputs

x: the original instance (has D dimensions)

n: number of instances created per changed dimension

m: number of instances created on the direct line between x and CF (x)

r: number of instances created randomly between x and CF (x)

SM : the stability measure

CF (·): the CFGM

1. Generate CF (x) for the original instance x

2. Create points that are similar to x (while taking P1 and Prop. 1 into consideration)

and store them in X

0

(a) For each dimension d in which CF (x) differs from x (i.e. x

_d

6= CF (x)

_d

):

For i in 1, ..., n:

i. x

0

= x

ii. x

0_d

= x

d

+ i ×

CF (x)d −xd n+1

iii. Store x

0

in X

0

(b) For i in 1, ..., m:

i. x

0

= x +

_m+1i

× (CF (x) − x)

ii. Store x

0

in X

0

(c) For i in 1, ..., r:

x

0

= x

For dimension d in 1, ..., D:

i. If x

_d

< CF (x): x

0_d

= random sampled point between (x

d

, CF (x)

d

)

ii. If x

d

> CF (x): x

0_d

= random sampled point between (CF (x)

d

, x

d

)

iii. Store x

0

in X

0

3. Eliminate all points x

0

∈ X

0

_{that have a different predicted class than x (due to P2)}

4. Generate CF examples for all instances that are still in X

0

5. Calculate SM with the obtained data

(25)

CHAPTER 4. METHOD

19

4.2 P −stability measure

In order to measure the stability of a CFGM, we propose the P −stability measure. The P −stability aims to determine if CF (x), i.e. the CF example of an original instance x, is more likely to belong (i) to the set of CF examples that correspond to instances that are similar to x or (ii) to the set of original instances that have the same predicted value as CF (x). Instead of measuring the similarity with the Euclidean distance as is done in the L−stability measure, the P −stability measure uses probabilities. Hence, it compares (i) the probability that a CF example of an instance belongs to the set of CF examples of similar instances with (ii) the probability that it belongs to the set of training data instances with the same prediction. Intuitively, it is clear that the CF examples of similar original instances should be more similar to each other than to other (random) instances that have the same predicted class. The reason for this is that the second set consist of points that are scattered throughout the entire input space, while the first set is clustered if the CF explanations are stable. Hence, for a stable CFGM the first probability that is considered in the P −stability measure should be higher than the second.

In order to define the P −stability mathematically, we need to introduce some notation in addition to the notation in Section 2.2.2. Denote the set of original instances as X . For each instance from this set, i.e. ∀x ∈ X , the classification model f (·) can predict the class to which the instance belongs. Instance x is said to have predicted class f (x). The set of original instances that have the same prediction as x is denoted as Xf (x) _{and thus it holds that X}f (x)_{⊆ X . The set that contains the dth dimension (i.e. the}

dth feature) of all instances from X is denoted as Xd. Given this notation, X f (x)

d is the set that contains

the values for the dth dimension of the original instances with predicted class f (x). Assume that we have sampled N instances around x, which are contained in the set Xs_{= {x}s,1_{, x}s,2_{, ..., x}s,N_{}. Keeping}

the notation for the dimensions and predicted classes the same, we denote X_ds,f (CF (x)) as the set that contains the values for the dth dimension of the CF examples that are generated for the N instances around x and that have predicted class f (CF (x)). Recall that there are D features in total.

Given this notation, the P −stability evaluated in x is defined as

P (x) = 1 D D X d=1 1{p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d} (4.1)

where 1{·} is the indicator function that is one if the expression (in {·}) is true and zero otherwise and pH · is the probability function defined on set H. Hence, the indicator function is one if the dth

dimension of the CF example of x fits better in X_ds,f (CF (x))than in X_df (CF (x)), which is what is expected if the CFGM is stable.

Given the definition in eq. (4.1), we claim that the P −stability evaluated in x is contained in the interval [0, 1]. Furthermore, a higher value corresponds to more stable explanations.

Claim 1: The lower bound of the P −stability evaluated in x (see eq. (4.1)) is zero. Proof: P (x) is decreasing in1{p X_df (CF (x)) CF (x)d ≤ pX_ds,f (CF (x)) CF (x)d} 1{p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ lowest value (0) if p_Xf (CF (x)) d CF (x)d > p_Xs,f (CF (x)) d CF (x)d If p_Xf (CF (x)) d CF (x)d > p_Xs,f (CF (x)) d CF (x)d ∀d ∈ {1, 2, ..., D} ⇒PD d=11{pX_df (CF (x)) CF (x)d ≤ pX_ds,f (CF (x)) CF (x)d} = D × 0 = 0 ⇒ P (x) = 1 D × 0 = 0

(26)

CHAPTER 4. METHOD

20

Claim 2: The upper bound of the P −stability evaluated in x (see eq. (4.1)) is one. Proof: P (x) is increasing in1{p X_df (CF (x)) CF (x)d ≤ pX_ds,f (CF (x)) CF (x)d} 1{p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ highest value (1) if p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d If p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d ∀d ∈ {1, 2, ..., D} ⇒PD d=11{pX_df (CF (x)) CF (x)d ≤ pX_ds,f (CF (x)) CF (x)d} = D × 1 = D ⇒ P (x) = 1 D × D = 1

Claim 3: The higher the P −stability evaluated in x (see eq. (4.1)) the more stable the explanation is.

Proof: 1{p X_df (CF (x)) CF (x)d ≤ pX_ds,f (CF (x)) CF (x)d} ∈ {0, 1} ⇒ highest value (1) if p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d

⇒ X_ds,f (CF (x))is more clustered around CF (x)d than X

f (CF (x)) d is

⇒ this is what we defined as a characteristic of stable explanations ⇒ if this holds for all dimensions we would get a value of 1 (analogous to the proof of claim 2)

1{p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ lowest value (0) if p_Xf (CF (x)) d CF (x)d > p_Xs,f (CF (x)) d CF (x)d

⇒ X_ds,f (CF (x))is less clustered around CF (x)d than X

f (CF (x)) d is

⇒ this is what we defined as a characteristic of unstable explanations (as random points are more similar to CF (x))

⇒ if this holds for all dimensions we would get a value of 0 (analogous to the proof of claim 1)

P (x) is increasing in 1{p X_df (CF (x)) CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d}, which is 1 if p_Xf (CF (x)) d CF (x)d ≤ p_Xs,f (CF (x)) d CF (x)d, which

corre-sponds to a dimension in which the explanation is stable. Hence, the more dimensions are stable (and thus the more stable the explanation), the higher P (x).

Note that the probability function in eq. (4.1) can be chosen to be the empirical distribution. Another option is to fit an existing probability distribution, for example a Gaussian or student-t distribution. This requires a comparison between the possible probability distributions on a theoretical (i.e. academic liter-ature) and a practical level (i.e. AIC and BIC values). Hence, this thesis uses the empirical distribution (statsmodels package1_{) as this is straightforward for different datasets and features.}

Furthermore, note that eq. (4.1) shows that we compare the distribution per feature dimension instead of using a multivariate probability distribution for the entire instance. The main reason for this is tractability. The multivariate distribution becomes less accessible than the univariate distributions when the number of features increases as the (theoretical) relationships between the features need to be taken into account.

1

(27)

CHAPTER 4. METHOD

21

Given the P −stability measure evaluated in x, the P −stability measure for the entire dataset X = {x1_{, x}2_{, ..., x}O_{} is defined as} P = 1 |X | |X | X j=1 P (xj) (4.2)

where |X | = O stands for the number of instances that X contains. As with eq. (4.1), the value of eq. (4.2) is contained in the interval [0, 1] and a higher value corresponds to a more stable CFGM. The reason for this is that a high value means that for many instances most dimensions fit better in the CF set than in the set of the original data. The proofs of these claims are analogous to the proofs for P (x), hence they are shown in Appendix A.1.

The P −stability measure as defined in eq. (4.2) differs from the Jaccard index and the L−stability in the following ways. First, the P −stability is, unlike the Jaccard index, applicable to any CFGM. The reason for this is that the measure only requires the final CF example for each original and sampled instance. Furthermore, only one CF example is needed per instance. Second, unlike the L−stability measure, the P −stability measure can be applied to compare the stability of different CFGMs on different datasets with different classification models with each other. This is a consequence of the P −stability being bounded on the interval [0, 1] where 1 corresponds to the most stable CFGM (regardless the setting). Third, the P −stability differs from the L−stability as it utilises probabilities instead of a distance function in order evaluate to what extent CF examples are similar. Hence, no assumptions are needed on preference of change i.e. (i) only a few adjustments or (ii) only small adjustments. Furthermore, the P −stability does not favour any CFGM based on the distance function it uses in order to find the optimal CF example as the measure is not defined based on a distance function.

(28)

Chapter 5

Experimental setup

The experimental setup needed to answer RQ1 (defined in Section 1.1) consists of evaluating four CFGMs on two datasets and with two classification models. This results in 16 experiments. Each part of the experimental setup is considered in this chapter. Subsequently, the experimental setup for RQ3 (defined in Section 1.1) is elaborated on. This consists of the implementation of the L−stability measure and the setup for the synthetic experiment.

5.1 Comparison among CFGMs

RQ1 requires us to compare CFGMs based on:

(a) the number of instances for which a CF example is found, (b) the computational time,

(c) the similarity between the CF examples in terms of position with respect to the original instance (i.e. direction, distance, number of changed features, the magnitude of the changes and how often the changes have no effect on the classification model),

(d) and the stability of the CFGM itself.

The measures in (a), (b) and (c) are straightforward to implement in Python. In order to analyse the fourth point, (d), this thesis uses the P −stability measure and the DBF, which are also implemented in Python. In order to limit the number of instances for which a CF example needs to be calculated, this thesis uses the following settings for the DBF (see Framework 4.1). For each instance, one new instance is created per changed dimension (i.e. n = 1), four instances are created on the line directly between the original instance and the corresponding CF example (i.e. m = 4) and ten instances are created randomly (i.e. r = 10). Note that alternative values can be used for these hyperparameters.

For the probability distributions in the P −stability measure (see eq. (4.1)) this thesis utilises the empirical distribution from the statsmodels package1_{.The P −stability is calculated for the test sets of}

the datasets. This implies that the CFGMs create CF examples for the same instances in step one of the DBF (see Framework 4.1). The created CF do differ per CFGM, dataset and classification model. Hence, the points that are created between the original instances and the corresponding CF examples (i.e. step two of the DBF) do differ per experiment. Note that a maximum of 1500 instances is used to calculate the stability on, as this can already result in more than 66.000 instances for which a CF example should be generated.2

1_{See: https://www.statsmodels.org/stable/index.html}

2_{The over 66.000 instances result from applying the DBF with the hyperparameters n = 1, m = 4, r = 10 on}

(29)

CHAPTER 5. EXPERIMENTAL SETUP

23

5.1.1 Data

The first dataset is the credit card transactions dataset from Kaggle.3 _{The data is collected over two days}

in 2013. Some of these transactions are not made by the owners of the cards, hence these are fraudulent transactions. The goal of the classification model is to identify these fraudulent credit card transactions. In total this dataset contains 284,807 transactions of which only 0.172%, i.e. 492 transactions, are fraudulent while the remaining transactions are not fraudulent. Due to confidentiality considerations the suppliers of the data have performed a principal component analysis (PCA) transformation before making the data available on Kaggle. This results in 28 of the 30 features that the dataset has in total. The other two variables are Time and Amount. The first measures the number of seconds the transaction is apart from the first transaction in the dataset. The latter is the transaction amount. Hence, all variables are numerical.

The second dataset is the HELOC dataset, which is also used by Kanamori et al. (2020) and Lucic et al. (2020).4 The data was provided for the FICO Explainable Machine Learning Challenge in 2017. HELOC stands for Home Equity Line of Credit, which is a type of credit that banks provide as a percentage of home equity. Home equity is defined at time t as the amount of money by which the market value at t is higher than the value it was purchased for. In the provided dataset, the amounts range from $5,000 to $150,000. The goal of the classification model is to predict whether customers will repay their HELOC loan. In total the dataset contains 10,459 observations of which roughly 50% is not able to repay their loan. Hence, in contrast to the first dataset, this dataset is balanced. The number of variables is in the same order of magnitude, as this dataset contains 23 features (numerical).

All features in both datasets are scaled between zero and one, which is feasible as there are no categorical variables. The reason for this transformation is that the implementation of the FOCUS method restricts the CF examples to have feature values between zero and one. This implies that FOCUS might not find CF explanations for certain instances if the features are not scaled between zero and one.

5.1.2 Classification models

For each dataset two classification models are considered, namely a decision tree and a random forest. We selected tree-based models as these models are frequently used with tabular data which is in turn common in the banking industry (Bedeley and Iyer, 2014). The decision tree is chosen to be shallow (maximum depth of three) in order to keep it interpretable. The random forest is restricted to a maximum of 20 trees (each with a maximum depth of three) as to reduce the computational cost.

Both classification models are trained on the same data as each dataset is randomly split once in a train set (70%) and a test set (30%). The training is performed without cross validation and with the default settings of the sklearn package5 except for the number of trees and the maximum depth. Note that the we do not tune the hyperparameters of the classification (which would increase the accuracy) as it is not the main point of this thesis. For the same reason, the degree of data imbalances is also not taken into consideration.

The accuracy of both classification models for both test sets is shown in Table 5.1.

1500 initial instances.

3

Data from: https://www.kaggle.com/mlg-ulb/creditcardfraud

4

Data from: https://community.fico.com/s/explainable-machine-learning-challenge

5

Examining the stability of counterfactual explanations

MSc Artificial Intelligence

Master Thesis