MSc Artificial Intelligence
Master Thesis
Examining the stability of counterfactual
explanations
by
Puja Chandrikasingh
11059842
January 8, 2021
36 ECTS
July 2020 - January 2021
Supervisors:
Ana Lucic, MSc
Dr. Flavia Barsotti
Assessor:
Prof. Dr. Hinda Haned
The thesis has been done during a 6 months internship at ING.
ING Contact: Flavia Barsotti, Research Coordinator, Model Risk Oversight, Model Risk
Man-agement, Flavia.Barsotti@ing.com
Disclaimer: The views expressed in the thesis are solely those of the author and supervisors
and do not represent the views of ING Bank.
Abstract
In the context of Explainable Artificial Intelligence (XAI), this thesis focuses on the stability
of counterfactual (CF) explanations. The study compares four existing CF generating methods
(CFGMs) based on different criteria related to both stability and implementation aspects. The
experiments are conducted on two publicly available financial datasets, consisting of data on
loan repayments and on credit card transactions, using two tree-based classification models. In
order to assess the stability of the explanations of the CFGMs we propose a novel stability
framework that intends to provide a rigorous analysis on the stability of CF explanations, which
is an open research area in the field of XAI. The proposed stability framework consists of
(i) our direction based framework (DBF) and (ii) our P −stability measure. We compare the
proposed P-stability measure (based on probabilities) with an existing stability measure (based
on a specific distance function). The experiments performed on the two financial datasets and
a synthetic dataset indicate that, in contrast to the existing measure, the P −stability measure
can distinguish between different levels of stability. Furthermore, the results on the two financial
datasets suggest that the CF explanations of the tested CFGMs are quite similar in terms of
position (i.e. direction and distance) with respect to the original instance. However, there are,
at least for the tested settings, large differences in (i) the computational time, (ii) the number
of instances for which a CF explanation can be found and (iii) the stability of the explanations
according to the proposed stability framework.
Acknowledgements
I would like to express my gratitude to Ana Lucic, who supervised me on behalf of the University
of Amsterdam. Her enthusiasm and devotion guided and encouraged me to get the most out of
my thesis. I took great pleasure in the brainstorm sessions in which she shared her extensive
knowledge in the field of XAI with me.
Moreover, I want to thank Prof. Dr. Hinda Haned from the Univeristy of Amsterdam for
taking the time to assess my thesis. Furthermore, I am grateful to Chiel Bakkeren, who was the
head of my current team at ING, for his believe in me as he gave me the opportunity of writing
two theses (AI and Actuarial Science) at ING. Subsequently, I would also like to thank the
entire Data Science Model Validation team at ING for their kindness and sincere interest in my
research while we were mostly working from home due to the turbulent times of the COVID19
virus.
I would particularly like to thank Dr. Flavia Barsotti, who supervised me on behalf of ING.
She worked tirelessly and spared neither time nor effort to assist and advice me on both my
thesis and my future career. In addition to learning a lot on research in the banking sector, I
also learned a lot from her on working in the financial industry.
Lastly, I would like to thank my parents and my siblings for providing me with comfort and
warmth during my entire academic career. I took great pleasure in working at the same desk as
my little sister, Jaya, who stimulated me to reach for the stars and at the same time have a lot
of fun.
Contents
1
Introduction
1
1.1
Research questions . . . .
3
2
Theory behind counterfactual explanations
5
2.1
Counterfactual explanations in XAI . . . .
5
2.2
How to obtain counterfactual explanations? . . . .
7
2.2.1
Counterfactual generating methods . . . .
7
2.2.2
Objective function for nearest counterfactual
. . . .
8
3
Related Work
11
3.1
Jaccard index . . . .
11
3.2
L−stability . . . .
12
4
Method
16
4.1
Direction based framework . . . .
16
4.1.1
Pillars . . . .
16
4.1.2
Framework
. . . .
17
4.2
P −stability measure . . . .
19
5
Experimental setup
22
5.1
Comparison among CFGMs . . . .
22
5.1.1
Data . . . .
23
5.1.2
Classification models . . . .
23
5.1.3
CFGMs . . . .
24
5.2
Comparison of stability measures . . . .
26
5.2.1
L−stability implementation . . . .
26
5.2.2
Synthetic experiment . . . .
26
6
Results
28
6.1
Comparison among CFGMs . . . .
28
6.2
P −stability versus L−stability
. . . .
32
CONTENTS
v
7
Discussion & Conclusion
35
7.1
Main findings concerning CF explanations & their stability
. . . .
35
7.2
Future research directions . . . .
37
7.2.1
Potential impact . . . .
37
7.2.2
Limitations . . . .
37
A Mathematical proofs
39
A.1 P −stability measure . . . .
39
A.2 Synthetic CFGMs in the synthetic experiment . . . .
41
B Additional graphs
43
B.1 Decision tree
. . . .
43
B.2 Synthetic data
. . . .
44
Chapter 1
Introduction
Receiving $13 trillion for implementing Artificial Intelligence (AI) in daily business by 2030
should be possible according to Bughin et al. (2018) from the McKinsey Global Institute. Chui
(2017) shows that the implementation has already started in various industries ranging from
health care (Ahsan et al., 2020; Wang and Preininger, 2019) to e-commerce (Adam et al., 2020)
to mobility (Karpathy, 2019) to (social) media (Covington et al., 2016; Amatriain, 2013; Bakshy
et al., 2015). This thesis focuses on the financial industry as it considers public available datasets
based on applications from the banking sector. The banking sector is already embedding AI for
modelling and has the potential to progressively increase its use for different types of models. As
an example, some banks already use AI applications for the development of credit score models
as discussed by Königstorfer and Thalmann (2020).
As with every new invention or technology, AI first needs to be accepted and trusted before
its full potential can be reached (Fernández et al., 2019). This requires transparency among
other principles (Shin and Park, 2019; Jobin et al., 2019; Toreini et al., 2020). Transparency
measures the degree to which the model is understandable and explainable (Shin and Park, 2019;
Olteanu et al., 2019). The importance of transparency for trust in models is strictly connected
to the research field of Explainable AI (XAI), which is becoming increasingly important as
regulation has recognised the right to explanations. This means that companies are obliged to
provide the customer with an explanation for a(n) (adverse) decision made by algorithms used
by the company. The Fair Credit Reporting Act (FCRA) in the United States and the General
Data Protection Regulation (GDPR) in Europe are examples of such regulations (Wachter et al.,
2017).
The explanation methods within the field of XAI can be classified as global or local methods
(Das and Rad, 2020). Global methods consider how a model behaves in general, while local
methods aim to explain why in a specific case a specific prediction is made. Given a local
perspective, one type of explanations that could potentially address a set of the new regulatory
demands are counterfactual (CF) explanations (Barocas et al., 2019; Wachter et al., 2017). Lucic
et al. (2020) define these explanations as the minimal perturbation to an input instance such that
the prediction changes. Hence, a CF explanation is specific to one instance and it argues how the
CHAPTER 1. INTRODUCTION
2
instance should have been different in order to get classified as another (opposite) class. The CF
explanation is based on the difference between the original instance and a CF example. A CF
example is an instance that is as similar as possible to the original instance but has a different
prediction. Karimi et al. (2020a) provide a survey of all CF generating methods (CFGMs).
Barocas et al. (2019) argue that the potential to satisfy new regulation is one of the drivers
behind the fast growing amount of literature available on CFGMs. Besides this, there are more
reasons for the interest in obtaining CF explanations. First, Fernández et al. (2019) illustrate
that humans can intuitively understand CF explanations as they are constructed parallel to the
way humans think. Second, CF explanations use less features than other explanations in order
to clarify a prediction as discussed by Van der Waa et al. (2018). They argue that this implies
that a more precise distinction can be made between the differences across classes.
The main focus of this thesis is on the stability of CFGMs. If a CFGM is stable, each
expla-nation should be coherent locally, i.e. its neighbours should have similar explaexpla-nations (Laugel
et al., 2019). In other words, the stability of a CFGM should measure the degree of similarity
between the CF examples (and thus CF explanations) of comparable instances. Hence, the more
similar the CF examples of comparable instances are, the more stable the CFGM is.
Having stable CF explanations is important from a banking perspective as it is linked to the
trust in these explanations. Laugel et al. (2019) highlight the importance of trust in explanations
if they are for example used to explain to customers the decision made in a loan application
setting (e.g. similar individuals should expect similar explanations). Furthermore, Tyler et al.
(2007) and Cvetkovich et al. (2002) discuss how the lack of trust can negatively impact the
industry.
Moreover, obtaining the stability of CFGMs is an open research area. The reason for this
is that there is not a well defined measure that captures the stability of CFGMs (Laugel et al.,
2019). This is striking as stability is essential for all interpretability methods (Laugel et al., 2019;
Alvarez-Melis and Jaakkola, 2018b). To the best of our knowledge there are at this moment
only two stability measures that can be applied to CFGMs, namely the Jaccard index and the
L−stability measure. The Jaccard index, which is defined by Ivchenko and Honov (1998), can
be used to compare sets (of CF examples or changed features) with each other. Hence, this
measure is not applicable to all CFGMs since it needs several CF explanations per instance
(Guidotti et al., 2019). Many methods, however, only provide one optimal CF explanation per
instance.
The other stability measure, the L−stability measure, is based on the Lipschitz continuity
and it compares the Euclidean distance between similar instances to the Euclidean distance
between the corresponding CF examples. There are three main shortcomings of this method:
(i) it is influenced by the stability of the underlying classification model, which makes it difficult
to determine where the (in)stability comes from (Laugel et al., 2019), (ii) it has no ideal value
(Alvarez-Melis and Jaakkola, 2018b) and (iii) it implicitly implies that the Euclidean distance
is the best measure for similarity, which is not necessarily always the case.
CHAPTER 1. INTRODUCTION
3
Besides the stability of CFGMs, several criteria related to the implementation aspects are
important for a bank. For example, regulators have given all customers the right to receive an
explanation when they ask for it (e.g. FCRA and GDPR). This implies that it is crucial to
have a method that can find an explanation for each individual. For CFGMs, this boils down
to the requirement that the CFGM should be able to find a CF explanation for each instance.
Additionally, it is desirable that the CF explanation can be found within reasonable time limits
(considering IT and computational time constraints). Lastly, the position of the CF
explana-tion in the feature space is important as a company should consider which CFGM provides
explanations that are most relevant or easy to work with for its customers and employees.
1.1
Research questions
Given the bank’s perspective, the leading research question in this thesis is defined as follows.
RQ1 How do existing CFGMs differ in terms of:
(a) the number of instances for which a CF example is found;
(b) the computational time;
(c) the similarity between the CF examples in terms of position with respect to the original
instance (i.e. direction, distance, number of changed features, the magnitude of the
changes and how often the changes have no effect on the classification model);
(d) the stability of the CFGM itself.
In particular, we look at the four CFGMs that are given below.
1. Contrastive Explanations with Local Foil Trees, which is introduced by Van der Waa et al.
(2018). We denote it as CE. This method is model-agnostic (i.e. it can be used for all
classification models) and heuristic (Karimi et al., 2020a).
2. Actionable Feature Tweaking, which is proposed by Tolomei et al. (2017). We will
ab-breviate this method as FT. The method is model-specific (i.e. it cannot be used for all
classification models) and heuristic (Karimi et al., 2020a).
3. Flexible Optimizable CoUnterfactual Examples for Tree EnsembleS, which is constructed
by Lucic et al. (2020) and by them denoted as FOCUS. This method utilises gradient-based
optimisation and is not model-agnostic (Karimi et al., 2020a).
4. Distribution-Aware Counterfactual Explanation, which is developed by Kanamori et al.
(2020) and by them abbreviated as DACE. This method is model-specific and it utilises
MILP-based optimisation techniques (Karimi et al., 2020a).
We analyse the stability of the listed CFGMs combined with tree-based classification models.
The reason for this choice is that these models are often used with tabular data, which is common
in the banking industry (Bedeley and Iyer, 2014).
CHAPTER 1. INTRODUCTION
4
In order to evaluate the fourth criterion in RQ1 we propose a novel measure for stability
that can be applied to any CFGM. This is considered in the following research question.
RQ2 How can we construct a stability framework that is able to (i) assess the stability of all
CFGMs (ii) without being influenced by the classification model and (iii) without assuming
a specific distance function?
In this thesis we propose a direction based framework (DBF) to filter out the effect of the
clas-sification model and the P −stability measure that is based on probabilities instead of distance
functions to assess the stability of a CFGM. The P −stability measure compares (i) the
probabil-ity that a CF example of an instance belongs to the set of CF examples of similar instances with
(ii) the probability that it belongs to the set of training data instances with the same prediction.
For a stable CF generating method, the first probability should be higher than the second as
the CF examples of comparable instances should be more similar to each other than to other
(random) instances that have the same predicted class.
To evaluate our novel stability framework and the proposed stability measure in particular,
we examine RQ3.
RQ3 How does the P −stability measure perform compared to the L−stability measure in terms
of ability to differentiate between stable and unstable CFGMs?
The remainder of this thesis is organised as follows. Chapter 2 dives into the theory behind
CF explanations. Chapter 3 discusses the existing stability measures. Thereafter, we propose
our own stability framework in Chapter 4 in order to answer RQ2. Subsequently, we describe
the experimental setup needed for RQ1 and RQ3 in Chapter 5. The results for these two
research questions are elaborated on in Chapter 6. Finally, Chapter 7 presents the conclusion
and discussion.
Chapter 2
Theory behind counterfactual
explanations
This chapter elaborates on CF explanations and their position in the field of XAI. This broadens
the scope to which the stability framework can be applied. Subsequently, this chapter briefly
discusses different methods that can be used to obtain CF explanations in order to illustrate the
type of methods for which the stability needs to be determined.
2.1
Counterfactual explanations in XAI
CF explanations are used to clarify the behaviour of AI classification models, i.e. they explain
why a classification model makes a certain prediction for a specific instance. In order to construct
a CF explanation, we need a CF example which can be found by a CFGM. A CF example of
instance x, denoted as CF (x), is defined as an instance that is similar to x but has an alternative
prediction. The difference between the original instance and its CF example is defined as the CF
explanation. The mathematical version of the CF explanation, i.e. CF explanation = CF (x)−x,
can be translated into a sentence. In a loan approval situation, this could for example result
in explanations as 1) the client’s loan is not approved because the income is $10.000 too low or
2) if the client’s income was $10.000 higher, the loan would have been approved. This is a very
simple example and in reality more than one variable might need to change in order to flip the
prediction. These would then be added to the explanation.
Within the field of XAI there are other explanation methods that give explanations similar
to CF explanations. For example, adversarial perturbations, also known as adversarial
exam-ples (AEs) or adversarial attacks, and CF explanations have in common that they solve the
same problem, i.e. finding the minimal perturbation needed to flip the prediction of an
in-stance (Freiesleben, 2020). In addition, Wachter et al. (2017) state that AEs are essentially CF
explanations for instances that are being classified with a deep neural network (NN), such as
resnet.
CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS
6
A difference between AEs and CF explanations is the purpose that they have as described by
Karimi et al. (2019a) and Freiesleben (2020). AEs aim to fool a classifier, while CF explanations
aim to explain the classifier in terms of what must happen in order to change the prediction.
In other words, AEs assess the robustness of a classification model rather than explaining it
(Ustun et al., 2019). Another difference between CF explanations and AEs is that the latter
is frequently used with images while the first is used on tabular data (Karimi et al., 2019a).
This thesis focuses on the use of tabular data. Hence, we do not evaluate AE methods with
the proposed stability framework. Note, however, that any stability metric that is defined for
CFGMs can be applied to methods that generate AEs as their structures are similar.
Another related form of explanations is algorithmic recourse, which offers the same type of
explanation as CF explanations. The difference between explanations based on CFs and the
ones based on recourse is that the methods of algorithmic recourse focus on finding a feasible
example (i.e. an example that is inline with/can happen in the real world), while a majority
of the CFGMs focuses on finding the most similar example (Karimi et al., 2020b). Note that
it is debatable whether we should target feasibility as what is feasible can differ per instance
(Venkatasubramanian and Alfano, 2020). The methods, which are used to generate algorithmic
recourse examples, are closely related to CFGMs and therefore recourse methods can also be
evaluated with stability measures that are made for CFGMs. A difference between the methods
is that the algorithmic recourse methods often require expert knowledge in order to determine
what is and is not “feasible”. Hence, we focus on CF explanations.
Besides methods that provide explanations that are similar to CF explanations, there is a
wide range of other methods within the field of XAI. First, CFGMs belong to the class of local
XAI methods. Local methods aim to explain why, in a specific case, a specific prediction is made.
Global methods, on the other hand, consider how a model behaves in general. An example of
another local explanation method is the gradient-based saliency map for Convolutional Neural
Networks (ConvNets) introduced by Simonyan et al. (2013). These maps can explain a certain
prediction by showing on which pixels the prediction was based. An example of a global method
is the Global Attribution Mapping (GAM) which is proposed by Ibrahim et al. (2019) and ranks
the most important features for the entire model.
Second, CF explanations are post-hoc explanations. This means that the explanations are
obtained by analysing the output of a model after it has been trained (Das and Rad, 2020). This
is different from intrinsic explanations, as for this type of explanation the model is intrinsically
interpretable (such as a linear regression). The Neural Additive Model (NAM) introduced by
Agarwal et al. (2020) is an example of a method that uses intrinsic explanations. This method
restricts the structure of NNs by using a NN for each input variable. This ensures that there
is no interaction between features, which implies that the effect of a feature can be explained
directly. The other methods that are mentioned in this section are all post-hoc (Das and Rad,
2020).
CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS
7
Aside from being local or global and post-hoc or intrinsic, an explanation method can be
model-agnostic or model specific (Das and Rad, 2020). Model-specific explanation methods can
generate explanations for one specific model type only as they typically use the inner working of
the model. Model-agnostic explanation methods, on the other hand, can generate explanations
for all model types. These methods are usually query-based so that they do not use the inner
workings of the model. The GAM method is an example of a model-agnostic method as it can
generate global explanations for all model types and not only NNs. An example of a
model-specific explanation method is NAM (Das and Rad, 2020). CFGMs can be both model-agnostic
and model-specific. Note that more information about the diffent types of XAI methods can be
found in the work of Das and Rad (2020) as they consider and classify all XAI methods from
2007 to 2020.
2.2
How to obtain counterfactual explanations?
There are over fifty CFGMs as can be seen from the survey paper of Karimi et al. (2020a). In
this thesis, we examine four methods that differ in the chosen approach for finding the minimal
perturbation, i.e. the CF explanation. This section considers other approaches as well in order
to clarify that the proposed stability framework can be used for any CFGM. The underlying
reason for this is that all CFGMs, regardless of the chosen search approach, need to chose an
optimal CF example for an instance. In order to separate the optimal CF example from the
other examples, an objective function is required. This is one of the assumptions of the proposed
stability framework. After discussing the different search approaches, this chapter considers some
objective functions that can be used. The difference in what is being minimised by the objective
functions motivates why the stability framework utilises probability distributions instead of
distance functions in order to capture similarity.
2.2.1
Counterfactual generating methods
The approach that is used most often to find CF examples is the heuristic approach as can be
seen from the survey of Karimi et al. (2020a). Two CFGMs that use this approach are the CE
method introduced by Van der Waa et al. (2018) and the FT method proposed by Tolomei et al.
(2017). The stability of these methods is evaluated in this thesis. Section 5.1.3 considers these
methods in more detail. Another example of a heuristic method is the CFGM defined by Laugel
et al. (2017) that uses the two-step heuristic growing spheres algorithm.
Besides the utilisation of heuristics, CF explanations can also be found by applying an
optimisation-based approach. This thesis considers one method that uses gradient-based
opti-misation (FOCUS) and one that uses (mixed-)inter linear programming (i.e. (M)ILP) based
optimisation (DACE; see Section 5.1.3). The latter has been introduced in order to solve the
problem of finding the closest CF example while keeping certain constraints into consideration.
For example, Ustun et al. (2019) have used this approach in order to force the CF example to
CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS
8
be feasible, i.e. to be consistent with a recourse example.
Using a gradient-based optimisation approach implies that CF examples are generated by
utilising the gradients of a loss function (an objective function). This is done by Wachter et al.
(2017) with a rather simple loss function that captures the distance and a part that forces
the prediction to flip. Mothilal et al. (2020) use an extended distance function that promotes
diverse CF examples. Note that in principle gradient-based CFGMs are not model-agnostic as
the classification model needs to be differentiable. Hence, in general these methods will not work
for tree-based classification models. The method evaluated in this thesis (FOCUS) is, however,
applicable to tree-based classification models. Section 5.1.3 elaborates on this.
Both the (M)ILP-based an the gradient-based optimisation approach are often used in
CFGMs as can be seen in the survey constructed by Karimi et al. (2020a). Another approach
that is used multiple times is the genetic algorithm approach. For instance, Sharma et al. (2019)
apply a genetic algorithm to directly solve an objective function with the constraint that the
prediction is flipped for the CF example.
Two other approaches are applied by Karimi et al. (2019b) and Poyiadzi et al. (2020) among
others. Karimi et al. (2019b) constructed MACE (model-agnostic counterfactual explanations)
that consists of transforming a minimisation problem with an objective function plus constraints
to a sequence of logic formulae. This means that the problem is turned into several satisfiability
(SAT) problems that are solved with a SAT solver (instead of solving the objective and
con-straints with a MILP application). The method suggested by Poyiadzi et al. (2020), FACE, is
based on graph theory. The method finds the optimal CF example by constructing the shortest
path through the high density regions of the input space. The authors argue that this leads to
feasible and actionable CF examples.
2.2.2
Objective function for nearest counterfactual
Before we define the objective function that is used to find the optimal CF example, we discuss
the notation used in this thesis. The original instance is denoted as x, which is a vector that
contains the characteristics or input features of the instance. The CF example of x is denoted as
CF (x). Note that CF (x) results from the CFGM. This implies that the CFGM can be seen as
a function CF (·). Each observation has D dimensions and the value of dimension i of instance
x (CF (x)) is denoted as x
i(CF (x)
i). Furthermore, the classification model M is probabilistic.
For each x there is a probability that it belongs to class y, where the probability is denoted as
M(y|x). Lastly, given a condition a, the indicator function
1
{a}is one if condition a holds and
zero otherwise.
In general the objective function of the CFGMs, i.e. the loss functions that is being minimised
by these methods, can be defined as
L
objective= L
f lip+ L
similarity+ L
additional constraints(2.1)
where L
f lipis the part of the objective function that stimulates the prediction of the CF example
to be opposite to the prediction of the original instance, L
similarityis the part that aims to
CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS
9
make the CF example and the original instance as similar as possible and L
additional constraintsis the part that stimulates the CF example to posses some characteristic such as feasibility or
actionability. In all CFGMs the second term, L
similarity, is directly given in the objective function
that is being minimised. This term is usually equivalent to a distance function. The other two
terms can be included in the objective (e.g. in gradient-based methods) or as constraints (e.g.
in (M)ILP approaches).
Mathematically, the first term of eq. (2.1) can be defined as follows. In case L
f lipis
incorpo-rated as a constraint, then the constraint can, as in the work of Sharma et al. (2019), be defined
as:
arg max
y
M(y|x) 6= arg max
yM(y|CF (x)).
If by contrast the first term is part of the objective, then this part can be represented by the
hinge-loss as argued by Lucic et al. (2020):
1
{arg maxyM(y|x)=arg maxyM(y|CF (x))}· M(y|CF (x)).
Hence, this loss is active as long as the prediction has not flipped, which stimulates the CF
example to move in the direction for which the prediction will flip. Another possibility is to
use the squared difference between the actual predicted class (or probability value) of the CF
example and the class (or probability value) it should have, as is done by Wachter et al. (2017).
In order to keep pushing the prediction to flip, they multiply this part of the loss with a scalar
that they increase in each round of the minimisation process. Hence, this loss is also active
as long as the prediction of the CF example is not opposite to the prediction of the original
instance.
Name Formula Goal is to minimise the As in Euclidean (L2-norm) p (CF (x) − x)T(CF (x) − x) = pP i(CF (x)i− xi)2
absolute value of the
change in each feature Lucic et al. (2020)
Weighted
Squared L2-norm
P
istd1i×(CF (x)i− xi)
2
absolute value of the change in each featurewhile accounting for standard deviation (std)
Wachter et al. (2017) Mahalanobis with covariance Σ (standardised L2-norm if Σ is diagonal) p (CF (x) − x)TΣ(CF (x) − x)
distance (dissimilarity) between the points while accounting for the distribution they are from
Kanamori et al. (2020) Lucic et al. (2020) Cosine x TCF (x) ||x||·||CF (x)||
=
P ixiCF (x)i√
P ix 2 i√
P iCF (x) 2 i change in relationshipbetween features Lucic et al. (2020)
Weighted Manhattan (L1-norm) P i |CF (x)i− xi|P 1 x0∈Xx0i−mediani
number of features that are changedwhile accounting for the intrinsic volatility of the input space
Lucic et al. (2020)
Wachter et al. (2017)
Table 2.1: Several distances functions that measure the distance between x and CF (x) (in
gray
CHAPTER 2. THEORY BEHIND COUNTERFACTUAL EXPLANATIONS
10
While the goal of the first term is well defined, the definition of the distance term, L
similarity,
is open for interpretation. In fact it is one of the undefined areas in the CF literature (Barocas
et al., 2019). The distance function has to be chosen in accordance with the belief about how
instances can change the easiest. For example, if one believes that an instance can change easily
as long as the changes that it has to make are small, then the Euclidean distance (L2-norm)
should be used. If one, on the other hand, suspects that it is the number of changes instead of the
magnitude of the individual changes that is driving the easiness of change, then the Manhattan
distance (L1-norm) is better suited than the Euclidean distance. Table 2.1 summarises some of
the distance functions that are used in literature.
The last term of of eq. (2.1), L
additional constraints, is optional. Several authors have proposed
specific constraints in their (M)ILP optimisation problem in order to force actionability or
fea-sibility of the CF explanation. For example, Ustun et al. (2019) introduce two constraints to
ensure feasibility. The constraints assure that each feature in the CF example is on a certain grid
with feasible values. The CFGM of Poyiadzi et al. (2020), which is based on a graph approach,
can also take custom conditions into consideration in order to create feasible and actionable CF
explanations.
Chapter 3
Related Work
There are, to the best of our knowledge, only two stability measures that can be applied to
CFGMs. These methods, their drawbacks and the results that have been obtained so far are
discussed in this chapter.
3.1
Jaccard index
The Jaccard index is a stability measure from the field of statistics. Given two sets of instances
X
1and X
2, the Jaccard index, J , is defined as
J =
|X
1∩ X
2|
|X
1∪ X
2|
(3.1)
in which the numerator reflects the number of instances that are both in X
1and X
2, while the
denominator captures the number instances in the union of the two sets (Ivchenko and Honov,
1998).
In the context of the stability of CFGMs, the Jaccard index could be applied by comparing
the set of features that need to be changed according to the CF explanation of one instance in
different runs as is done by Guidotti et al. (2018, 2019). They generated for each instance ten
CF explanations in ten runs. The Jaccard index then considers the sets of features that needed
to be changed in the different runs.
There are two shortcomings of calculating the Jaccard index in this way. First, this procedure
implies that the CFGM should have a stochastic component, which is not always the case. For
example, one would obtain deterministic results from a method that systematically evaluates
all possible changes before selecting the optimal CF explanation. Second, in this procedure the
Jaccard index measures how comparable the explanations are for exactly the same instance.
This is important, but not sufficient when the stability of a CFGM is quantified. The reason for
this is that the stability measure of a CFGM should also consider similar original instances as
a CFGM is stable if each explanation is coherent locally, i.e. its neighbours should have similar
explanations (Laugel et al., 2019).
Another way to apply the Jaccard index on CFGMs is to compare the sets of CF explanations
for similar initial instances. This solves the two shortcoming discussed above. However, this
CHAPTER 3. RELATED WORK
12
approach has two pitfalls on its own. First, the CFGM should be able to generate more than one
CF explanation for each instance, which many CFGMs do not do. Second, the final value of the
Jaccard index is influenced by the objective function. For example, Mothilal et al. (2020) focuses
on creating a diverse set of CF explanations. More diverse explanations imply that the instances
are further apart from each other. Hence, in this case the resulting Jaccard index might differ
more for similar original instances than when the only objective for CF explanations is to be as
close as possible to the original instance.
The stability framework that we propose does not require a stochastic component in the
CFGM nor does it restricts the CFGM to generating multiple CF examples per instance as it
only needs one CF example per instance. Furthermore, the stability framework does consider
the neighbours of the original instance in order to determine the stability. Hence, it is in line
with the definition of stability used by Laugel et al. (2019). We do not compare against the
Jaccard index, as the CFGMs which we evaluate do not produce multiple CF examples for the
same instance.
Our work compares with the work of Guidotti et al. (2018, 2019) as follows. Guidotti et al.
(2018, 2019) only evaluated model-agnostic methods while in this thesis a model-agnostic method
and three model-specific methods are considered. As in the work of Guidotti et al. (2018, 2019)
this thesis focuses on methods that provide local explanations.
3.2
L−stability
The L−stability measure is introduced by Alvarez-Melis and Jaakkola (2018a). Their measure
is based on the Lipschitz continuity, which is a known calculus measure for the stability of a
function. For instance x and corresponding CF explanation CF (x) the stability measure solves
the optimisation problem:
argmax
x0∈B ε(x)||CF (x) − CF (x
0)||
2||x − x
0||
2(3.2)
where B
ε(x) is a circle of radius ε and centre x. This implies that for each point in B
ε(x) the
fraction of the distance between the CF explanations and the original instances is calculated.
Subsequently, the highest fraction is chosen as the quantification of the degree of stability for
instance x. The highest fraction itself is the value of the stability measure for instance x. The
stability measure that solves eq. (3.2) assumes that the stability is continuous, which might be
too strict especially in the case of discrete features. Hence, Alvarez-Melis and Jaakkola (2018b,a)
modified eq. (3.2) which resulted in a weaker stability measure, which is in this thesis referred
to as the L−stability. The measure solves the following optimisation problem:
argmax
x0∈N ε(x)||CF (x) − CF (x
0)||
2||x − x
0||
2(3.3)
where N
ε(x) = {x
0∈ X | ||x − x
0||
2
≤ ε}. Hence, B
ε(x) is replaced by a finite set of instances
CHAPTER 3. RELATED WORK
13
Note that this thesis refers to the resulting maximum value of the fraction in eq. (3.3) as the
L−stability.
The L−stability does not require the CFGM to be stochastic or to be able to generate sets of
CF explanations, while the Jaccard index does. There are, however, still three major drawbacks
of this measure: (i) it is influenced by the stability of the underlying classification model, (ii) it
has no ideal value and (iii) it implicitly implies that the Euclidean distance is the best measure
for similarity, which is not necessarily always the case.
Influence of the classification model
The first drawback of the L−stability measure is that it does not differentiate between the effect
of the CF generating model and the effect of the classification model (Laugel et al., 2019). This
implies that the L−stability could be biased. This can be explained as follows. Consider a
case in which the CFGM is by construction stable. Hence, it always discovers the nearest CF
explanation if its objective is to minimise the distance. In this case the L−stability should be
optimal as the CFGM always finds the optimal CF explanation. Note that the optimal value of
the L−stability is not predefined (Alvarez-Melis and Jaakkola, 2018a). However, one could still
assume that the classification model should not be able to influence the value of the L−stability.
If it can influence the value of the L−stability then the L−stability might be biased.
Figure 3.1 illustrates the effect the classification model can have on the L−stability value.
In this figure the L−stability of a stable CFGMs is compared to the L−stability of an unstable
CFGMs. The CFGM in Figure 3.1a and Figure 3.1c is stable by construction, while Figure 3.1b
and Figure 3.1d illustrate what could happen with an unstable CFGM (and respectively the
same classification models). We see that even for the stable CFGMs the L−stability value
dif-fers per classification model, which is the only difference between both situations. We postulate
that this might be problematic for a stability measure, as a stability measure of a certain CFGM
should only be influenced by that CFGM. Otherwise the resulting value of the measure might
be biased by the stability of the underlying classification model, which might happen for the
L−stability measure.
Ideal value
The second disadvantage of the L−stability measure is that there is no ideal value as discussed
by Alvarez-Melis and Jaakkola (2018a). They state that for different goals of interpretability and
even for different applications, the most reasonable value differs. Alvarez-Melis and Jaakkola
(2018b) argue that the method with the lowest L−stability value is the most stable method as
eq. (3.2) and eq. (3.3) are defined to find the worst case scenario. Note that this implies that an
absolutely stable CFGM would have a value of zero. However, eq. (3.3) shows that a L−stability
close to zero means that the numerator is small while the denominator is large. This in turn
implies that the CF examples are close to each other while the original instances are not close to
each other. This means that dissimilar instances have similar CF examples, which is not optimal
CHAPTER 3. RELATED WORK
14
from a stability perspective. Note that this fraction is the maximum, hence the other instances
are even further situated from each other, i.e. even more dissimilar instances have comparable
CF examples.
1a. Classification boundary is a line on the right, stable CFGM. L−stability is 1 due to x03
b. Classification boundary is a line on the right, unstable CFGM. L−stability is 5 due to x03
c. Classification boundary is a circle, stable
CFGM. L−stability is 10.5 due to x02 d. Classification boundary is a circle,
unsta-ble CFGM. L−stability is 16 due to x02
Figure 3.1: Example influence classification model
Note. The difference in L−stability in case a and c illustrates the effect that the classification model has on the estimate of stability of the CFGM. The difference in L−stability between the stable and unstable CFGMs, i.e. between a and b and between c and d, highlights that the optimal value for the L−stability is ambiguous.
1
Note that in this discussion the distance is directly related to similarity. The reasoning behind this is that the link is indirectly defined in the definition of the L−stability as eq. (3.2) and eq. (3.3) are said to represent the worst case scenarios.
CHAPTER 3. RELATED WORK
15
Figure 3.1 also reveals the ambiguity of the optimal value of the L−stability. For both
clas-sification models, i.e. the straight line and the circle, the L−stability of the stable CFGM is
lower than the L−stability of the unstable CFGM. Hence, one could argue that the higher the
L−stability the less stable the CFGM is. However, if one would compare the situation as in
Figure 3.1b with the situation as in Figure 3.1c, one would conclude that the CFGM in the last
situation is less stable. This is striking as the CFGM is constructed to be stable, while the first is
constructed to be unstable. Comparing these two situations is not illogical as in reality it might
be unclear how the classification model behaves or how stable the CFGM is by construction.
Similarity assumption
The third downside of the L−stability is that it uses the Euclidean distance, which raises several
problems. For example, a CFGM that minimises the Euclidean distance might incorrectly
ap-pear to be more stable than CFGMs that minimise other distance functions. Another example
is that by using the Euclidean distance, the L−stability implicitly assumes that an instance
can change easily as long as the changes that it has to make are small. This could, however,
differ per situation as discussed in Section 2.2.2. One can consult the work of Lucic et al. (2020)
to examine the effects that minimising different distance metrics can have on the resulting CF
explanations.
In the academic literature, the L−stability measure is used by Alvarez-Melis and Jaakkola
(2018a,b). They compared the stability of five gradient-based specific and two
model-agnostic explainability methods with each other. Their findings are twofold. First, they notice a
trade-off between how accurate a model is and how stable the model is. Second they found that
model-agnostic methods are in general less stable than model-specific methods. In this thesis
we will examine if this holds as well for the explanation methods and stability measures used in
our experiments.
Unlike Alvarez-Melis and Jaakkola (2018a,b), we evaluate on multiple types of model-specific
CFGMs (as opposed to solely gradient-based CFGMs). Moreover, we evaluate the stability of
CFGMs with the L−stability measure (as proposed by Laugel et al. (2019)), while Alvarez-Melis
and Jaakkola (2018a,b) consider other type of XAI methods (not CFGMs).
Chapter 4
Method
This chapter proposes a stability framework consisting of the direction based framework (DBF)
and the P −stability measure. This framework fulfils the requirements in RQ2, i.e. the
sta-bility framework can (i) assess the stasta-bility of all CFGMs (ii) without being influenced by the
classification model and (iii) without assuming a specific distance function.
Note that we assume that a stability measure for CFGMs should capture if explanations are
coherent locally as in the work of Laugel et al. (2019). In other words, similar instances should
have similar CF explanations.
Furthermore, note that we make the simplification that all classification problems are binary.
This is, however, without the loss of generality as any classification problem can be converted
into a binary classification problem in the context of CF explanations (i.e. one versus all
clas-sification). Karimi et al. (2020a) also argue that regression problems can also be converted into
a binary classification problem if a threshold is used.
4.1
Direction based framework
The DBF is introduced in order to assure that the classification model does not influence the
assessment of the stability of a CFGM. Section 4.1.1 introduces the two pillars on which the
DBF is based and Section 4.1.2 describes the framework itself.
4.1.1
Pillars
The first pillar on which the DBF is based is defined as follows.
P1:
If a CFGM is able to produce a CF example,
the CF example is locally optimal according to the CFGM.
The motivation behind this pillar is that each CFGM utilises an objective function. Hence, the
final CF example needs to be optimal according to that function. This implies that at least a
local optimum is found.
Based on P1 we can formulate the following proposition for a stable CFGM which we can
prove by contradiction.
CHAPTER 4. METHOD
17
Prop. 1:
For a stable CFGM it holds that:
∀x
0between x and CF (x) ⇒ CF (x
0) is close to CF (x)
Proof:
According to P1, both CF (x
0) and CF (x) are the best CF examples that can be
found for x
0and x respectively. This implies that given x we should move in the
direction of CF (x) in order to get the to its optimal CF example, namely CF (x).
Hence, if we take a point on this route, i.e. x
0, it should be the case that we have
to continue on our route to the optimal CF example, i.e. CF (x), in order to find
CF (x
0).
Because of slight differences between x and x
0, CF (x) and CF (x
0) could
also differ slightly. However, if CF (x
0) lies far from CF (x) it means that we have
deviated from the optimal route, which implies that (i) either our old route was
not optimal or (ii) the new route is not optimal. This would imply that either
CF (x) or CF (x
0) is not a local optimum, which contradicts P1.
Hence, given P1 and a stable CFGM, each observation x
0between the original
observation, x, and its counterfactual example, CF (x), has a CF example close
to CF (x). This is exactly what Prop. 1 states.
Note that we only considered the implication of P1 for points x
0that are between x and
CF (x). The reasoning behind this is that the points that are not between x and CF (x) do not
follow the same optimal route to a CF example by definition. This is a consequence of CF (x)
being a local optimum instead of a global optimum.
The second pillar, P2, can be formulated as follows.
P2:
If the stability of a CFGM is determined by taking an original
ob-servation x and assessing if for x
0similar to x, CF (x) is similar
to CF (x
0), then only the x
0that have the same prediction as x can
be considered.
If this doesn’t hold, i.e. if all x
0are considered, then CF (x) and CF (x
0) could belong to different
classes. It is not meaningful to compare the similarity across classes for the stability measure as
it is not expected that two different classes lie arbitrarily close to each other.
4.1.2
Framework
The DBF is given in Framework 4.1. It requires an original instance, the CFGM, the number
of instances it needs to create and the chosen stability measure as input. The first step is to
obtain the CF example for the original instance x. Step two (a-c) consists of creating points
between x and CF (x) (due to P1 and Prop. 1). Step three assures that P2 is satisfied, i.e. it
removes all created instances that do not have the same predicted class as x. Subsequently, the
CHAPTER 4. METHOD
18
CF examples are generated for the remaining instances in step 4. The last step is to calculate
the value of the chosen stability measure.
The DBF enables the chosen stability measure to quantify the stability of a CFGM while
filtering out the effect that the decision boundaries of a classification model can have on the
stability value. This is done by restricting the instances that are considered to be similar to the
original instance (by using P1, P2 and Prop. 1). Hence, the DBF assures that the final stability
value is independent of how stable the underlying classification model is.
Framework 4.1: Direction based framework
Inputs
x: the original instance (has D dimensions)
n: number of instances created per changed dimension
m: number of instances created on the direct line between x and CF (x)
r: number of instances created randomly between x and CF (x)
SM : the stability measure
CF (·): the CFGM
1. Generate CF (x) for the original instance x
2. Create points that are similar to x (while taking P1 and Prop. 1 into consideration)
and store them in X
0(a) For each dimension d in which CF (x) differs from x (i.e. x
d6= CF (x)
d):
For i in 1, ..., n:
i. x
0= x
ii. x
0d= x
d+ i ×
CF (x)d −xd n+1iii. Store x
0in X
0(b) For i in 1, ..., m:
i. x
0= x +
m+1i× (CF (x) − x)
ii. Store x
0in X
0(c) For i in 1, ..., r:
x
0= x
For dimension d in 1, ..., D:
i. If x
d< CF (x): x
0d= random sampled point between (x
d, CF (x)
d)
ii. If x
d> CF (x): x
0d= random sampled point between (CF (x)
d, x
d)
iii. Store x
0in X
03. Eliminate all points x
0∈ X
0that have a different predicted class than x (due to P2)
4. Generate CF examples for all instances that are still in X
05. Calculate SM with the obtained data
CHAPTER 4. METHOD
19
4.2
P −stability measure
In order to measure the stability of a CFGM, we propose the P −stability measure. The P −stability aims to determine if CF (x), i.e. the CF example of an original instance x, is more likely to belong (i) to the set of CF examples that correspond to instances that are similar to x or (ii) to the set of original instances that have the same predicted value as CF (x). Instead of measuring the similarity with the Euclidean distance as is done in the L−stability measure, the P −stability measure uses probabilities. Hence, it compares (i) the probability that a CF example of an instance belongs to the set of CF examples of similar instances with (ii) the probability that it belongs to the set of training data instances with the same prediction. Intuitively, it is clear that the CF examples of similar original instances should be more similar to each other than to other (random) instances that have the same predicted class. The reason for this is that the second set consist of points that are scattered throughout the entire input space, while the first set is clustered if the CF explanations are stable. Hence, for a stable CFGM the first probability that is considered in the P −stability measure should be higher than the second.
In order to define the P −stability mathematically, we need to introduce some notation in addition to the notation in Section 2.2.2. Denote the set of original instances as X . For each instance from this set, i.e. ∀x ∈ X , the classification model f (·) can predict the class to which the instance belongs. Instance x is said to have predicted class f (x). The set of original instances that have the same prediction as x is denoted as Xf (x) and thus it holds that Xf (x)⊆ X . The set that contains the dth dimension (i.e. the
dth feature) of all instances from X is denoted as Xd. Given this notation, X f (x)
d is the set that contains
the values for the dth dimension of the original instances with predicted class f (x). Assume that we have sampled N instances around x, which are contained in the set Xs= {xs,1, xs,2, ..., xs,N}. Keeping
the notation for the dimensions and predicted classes the same, we denote Xds,f (CF (x)) as the set that contains the values for the dth dimension of the CF examples that are generated for the N instances around x and that have predicted class f (CF (x)). Recall that there are D features in total.
Given this notation, the P −stability evaluated in x is defined as
P (x) = 1 D D X d=1 1{pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d} (4.1)
where 1{·} is the indicator function that is one if the expression (in {·}) is true and zero otherwise and pH · is the probability function defined on set H. Hence, the indicator function is one if the dth
dimension of the CF example of x fits better in Xds,f (CF (x))than in Xdf (CF (x)), which is what is expected if the CFGM is stable.
Given the definition in eq. (4.1), we claim that the P −stability evaluated in x is contained in the interval [0, 1]. Furthermore, a higher value corresponds to more stable explanations.
Claim 1: The lower bound of the P −stability evaluated in x (see eq. (4.1)) is zero. Proof: P (x) is decreasing in1{p Xdf (CF (x)) CF (x)d ≤ pXds,f (CF (x)) CF (x)d} 1{pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ lowest value (0) if pXf (CF (x)) d CF (x)d > pXs,f (CF (x)) d CF (x)d If pXf (CF (x)) d CF (x)d > pXs,f (CF (x)) d CF (x)d ∀d ∈ {1, 2, ..., D} ⇒PD d=11{pXdf (CF (x)) CF (x)d ≤ pXds,f (CF (x)) CF (x)d} = D × 0 = 0 ⇒ P (x) = 1 D × 0 = 0
CHAPTER 4. METHOD
20
Claim 2: The upper bound of the P −stability evaluated in x (see eq. (4.1)) is one. Proof: P (x) is increasing in1{p Xdf (CF (x)) CF (x)d ≤ pXds,f (CF (x)) CF (x)d} 1{pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ highest value (1) if pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d If pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d ∀d ∈ {1, 2, ..., D} ⇒PD d=11{pXdf (CF (x)) CF (x)d ≤ pXds,f (CF (x)) CF (x)d} = D × 1 = D ⇒ P (x) = 1 D × D = 1
Claim 3: The higher the P −stability evaluated in x (see eq. (4.1)) the more stable the explanation is.
Proof: 1{p Xdf (CF (x)) CF (x)d ≤ pXds,f (CF (x)) CF (x)d} ∈ {0, 1} ⇒ highest value (1) if pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d
⇒ Xds,f (CF (x))is more clustered around CF (x)d than X
f (CF (x)) d is
⇒ this is what we defined as a characteristic of stable explanations ⇒ if this holds for all dimensions we would get a value of 1 (analogous to the proof of claim 2)
1{pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d} ∈ {0, 1} ⇒ lowest value (0) if pXf (CF (x)) d CF (x)d > pXs,f (CF (x)) d CF (x)d
⇒ Xds,f (CF (x))is less clustered around CF (x)d than X
f (CF (x)) d is
⇒ this is what we defined as a characteristic of unstable explanations (as random points are more similar to CF (x))
⇒ if this holds for all dimensions we would get a value of 0 (analogous to the proof of claim 1)
P (x) is increasing in 1{p Xdf (CF (x)) CF (x)d ≤ pXs,f (CF (x)) d CF (x)d}, which is 1 if pXf (CF (x)) d CF (x)d ≤ pXs,f (CF (x)) d CF (x)d, which
corre-sponds to a dimension in which the explanation is stable. Hence, the more dimensions are stable (and thus the more stable the explanation), the higher P (x).
Note that the probability function in eq. (4.1) can be chosen to be the empirical distribution. Another option is to fit an existing probability distribution, for example a Gaussian or student-t distribution. This requires a comparison between the possible probability distributions on a theoretical (i.e. academic liter-ature) and a practical level (i.e. AIC and BIC values). Hence, this thesis uses the empirical distribution (statsmodels package1) as this is straightforward for different datasets and features.
Furthermore, note that eq. (4.1) shows that we compare the distribution per feature dimension instead of using a multivariate probability distribution for the entire instance. The main reason for this is tractability. The multivariate distribution becomes less accessible than the univariate distributions when the number of features increases as the (theoretical) relationships between the features need to be taken into account.
1
CHAPTER 4. METHOD
21
Given the P −stability measure evaluated in x, the P −stability measure for the entire dataset X = {x1, x2, ..., xO} is defined as P = 1 |X | |X | X j=1 P (xj) (4.2)
where |X | = O stands for the number of instances that X contains. As with eq. (4.1), the value of eq. (4.2) is contained in the interval [0, 1] and a higher value corresponds to a more stable CFGM. The reason for this is that a high value means that for many instances most dimensions fit better in the CF set than in the set of the original data. The proofs of these claims are analogous to the proofs for P (x), hence they are shown in Appendix A.1.
The P −stability measure as defined in eq. (4.2) differs from the Jaccard index and the L−stability in the following ways. First, the P −stability is, unlike the Jaccard index, applicable to any CFGM. The reason for this is that the measure only requires the final CF example for each original and sampled instance. Furthermore, only one CF example is needed per instance. Second, unlike the L−stability measure, the P −stability measure can be applied to compare the stability of different CFGMs on different datasets with different classification models with each other. This is a consequence of the P −stability being bounded on the interval [0, 1] where 1 corresponds to the most stable CFGM (regardless the setting). Third, the P −stability differs from the L−stability as it utilises probabilities instead of a distance function in order evaluate to what extent CF examples are similar. Hence, no assumptions are needed on preference of change i.e. (i) only a few adjustments or (ii) only small adjustments. Furthermore, the P −stability does not favour any CFGM based on the distance function it uses in order to find the optimal CF example as the measure is not defined based on a distance function.
Chapter 5
Experimental setup
The experimental setup needed to answer RQ1 (defined in Section 1.1) consists of evaluating four CFGMs on two datasets and with two classification models. This results in 16 experiments. Each part of the experimental setup is considered in this chapter. Subsequently, the experimental setup for RQ3 (defined in Section 1.1) is elaborated on. This consists of the implementation of the L−stability measure and the setup for the synthetic experiment.
5.1
Comparison among CFGMs
RQ1 requires us to compare CFGMs based on:
(a) the number of instances for which a CF example is found, (b) the computational time,
(c) the similarity between the CF examples in terms of position with respect to the original instance (i.e. direction, distance, number of changed features, the magnitude of the changes and how often the changes have no effect on the classification model),
(d) and the stability of the CFGM itself.
The measures in (a), (b) and (c) are straightforward to implement in Python. In order to analyse the fourth point, (d), this thesis uses the P −stability measure and the DBF, which are also implemented in Python. In order to limit the number of instances for which a CF example needs to be calculated, this thesis uses the following settings for the DBF (see Framework 4.1). For each instance, one new instance is created per changed dimension (i.e. n = 1), four instances are created on the line directly between the original instance and the corresponding CF example (i.e. m = 4) and ten instances are created randomly (i.e. r = 10). Note that alternative values can be used for these hyperparameters.
For the probability distributions in the P −stability measure (see eq. (4.1)) this thesis utilises the empirical distribution from the statsmodels package1.The P −stability is calculated for the test sets of
the datasets. This implies that the CFGMs create CF examples for the same instances in step one of the DBF (see Framework 4.1). The created CF do differ per CFGM, dataset and classification model. Hence, the points that are created between the original instances and the corresponding CF examples (i.e. step two of the DBF) do differ per experiment. Note that a maximum of 1500 instances is used to calculate the stability on, as this can already result in more than 66.000 instances for which a CF example should be generated.2
1See: https://www.statsmodels.org/stable/index.html
2The over 66.000 instances result from applying the DBF with the hyperparameters n = 1, m = 4, r = 10 on
CHAPTER 5. EXPERIMENTAL SETUP
23
5.1.1
Data
The first dataset is the credit card transactions dataset from Kaggle.3 The data is collected over two days
in 2013. Some of these transactions are not made by the owners of the cards, hence these are fraudulent transactions. The goal of the classification model is to identify these fraudulent credit card transactions. In total this dataset contains 284,807 transactions of which only 0.172%, i.e. 492 transactions, are fraudulent while the remaining transactions are not fraudulent. Due to confidentiality considerations the suppliers of the data have performed a principal component analysis (PCA) transformation before making the data available on Kaggle. This results in 28 of the 30 features that the dataset has in total. The other two variables are Time and Amount. The first measures the number of seconds the transaction is apart from the first transaction in the dataset. The latter is the transaction amount. Hence, all variables are numerical.
The second dataset is the HELOC dataset, which is also used by Kanamori et al. (2020) and Lucic et al. (2020).4 The data was provided for the FICO Explainable Machine Learning Challenge in 2017. HELOC stands for Home Equity Line of Credit, which is a type of credit that banks provide as a percentage of home equity. Home equity is defined at time t as the amount of money by which the market value at t is higher than the value it was purchased for. In the provided dataset, the amounts range from $5,000 to $150,000. The goal of the classification model is to predict whether customers will repay their HELOC loan. In total the dataset contains 10,459 observations of which roughly 50% is not able to repay their loan. Hence, in contrast to the first dataset, this dataset is balanced. The number of variables is in the same order of magnitude, as this dataset contains 23 features (numerical).
All features in both datasets are scaled between zero and one, which is feasible as there are no categorical variables. The reason for this transformation is that the implementation of the FOCUS method restricts the CF examples to have feature values between zero and one. This implies that FOCUS might not find CF explanations for certain instances if the features are not scaled between zero and one.
5.1.2
Classification models
For each dataset two classification models are considered, namely a decision tree and a random forest. We selected tree-based models as these models are frequently used with tabular data which is in turn common in the banking industry (Bedeley and Iyer, 2014). The decision tree is chosen to be shallow (maximum depth of three) in order to keep it interpretable. The random forest is restricted to a maximum of 20 trees (each with a maximum depth of three) as to reduce the computational cost.
Both classification models are trained on the same data as each dataset is randomly split once in a train set (70%) and a test set (30%). The training is performed without cross validation and with the default settings of the sklearn package5 except for the number of trees and the maximum depth. Note that the we do not tune the hyperparameters of the classification (which would increase the accuracy) as it is not the main point of this thesis. For the same reason, the degree of data imbalances is also not taken into consideration.
The accuracy of both classification models for both test sets is shown in Table 5.1.
1500 initial instances.
3
Data from: https://www.kaggle.com/mlg-ulb/creditcardfraud
4
Data from: https://community.fico.com/s/explainable-machine-learning-challenge
5