• No results found

Underlying Problems and Assumptions in Data-Informed Decision-Making from a Legal Point of View

N/A
N/A
Protected

Academic year: 2021

Share "Underlying Problems and Assumptions in Data-Informed Decision-Making from a Legal Point of View"

Copied!
96
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Bachelor thesis

Underlying Problems and Assumptions

in Data-Informed Decision-Making from

a Legal Point of View

David J. C. van Rooij

supervised by Dr. A. W. F. Boer

co-supervisor

Prof. Dr. T. M. van Engers

(2)
(3)

Preface

This is my bachelor thesis, completing my undergraduate degree in Information Science at the University of Amsterdam in summer 2016. This thesis consists of twelve EC from the regular curriculum and an additional six EC for the honours-programme.

At a young age, as a freshman in high school, I already acquired a great passion for computers. This passion resulted in the desire to pursue a degree in Information Science. I would like to express my gratitude, and thank my parents. My dad for guiding me through my very first steps in the digital age and inspiring me by showing the endless possibilities of the computer. My mum for always believing in me, having unconditional support and letting me determine my own objectives in life.

I am indebted to my supervisor Dr. Alexander Boer who guided and mentored me throughout the entire process of writing my bachelor thesis. He provided constructive advice, encouraging me to be critical, and general guidance in the process. Boer had the patience to let me come up with my own ideas while encouraging me to dig deeper and explore further possibilities. I am grateful for my close peers and specially my close friends. Encouraging me to pursue the best within my reach and mentally helping me through the ups and downs I experienced in the process of writing this thesis and in life. Without their endless support, encouragement and discussions this thesis would not have been this comprehensive.

In line with my passion for statistics and data, I wanted to explore the possibilities of classifica-tions in the real world. Initial research showed a fairly unexplored area of statistics in law, more specifically how the use of statistics as evidence can be justified. With writing this thesis I hope to contribute to the acceptability of statistics in legal proceedings.

(4)

Contents

Contents iii List of Figures v List of Tables vi 1 Introduction 1 1.1 Methodology . . . 5 2 Background information 8 2.1 Statistical evidence . . . 8

2.2 Bayesian versus Frequentist . . . 10

3 Decision making 15 3.1 Introduction . . . 15

3.1.1 Statistical decision theory . . . 18

3.2 Data-informed decision making . . . 19

3.2.1 Probability theory . . . 19

3.2.2 Subjective probability . . . 21

3.2.3 Utility Theory . . . 22

3.2.4 The Prospect Theory . . . 23

3.2.5 The Loss function . . . 26

3.3 Bayesian Analysis . . . 27 3.4 F-Measure . . . 28 3.5 Summary . . . 30 4 Game Theory 32 4.1 Introduction . . . 32 4.2 Prisoners’ dilemma . . . 33 4.3 Dictator game . . . 34 4.4 Representation of Games . . . 34

(5)

4.5.1 Static Bayesian Games . . . 35

4.5.2 Dynamic Bayesian games (Perfect Bayesian Equilibrium) . . . 36

4.6 Summary . . . 37 5 Reasoning 39 5.1 Introduction . . . 39 5.2 Defeasible reasoning . . . 40 5.3 Causality . . . 41 5.3.1 Bayesian Networks . . . 42 5.4 Inferred Causation . . . 44 5.5 Data snooping . . . 45 5.6 Opportunistic behaviour . . . 47 5.7 Summary . . . 47 6 Sample Selection 49 6.1 Introduction . . . 49 6.2 Data provenance . . . 50 6.3 Selectivity bias . . . 51 6.4 Population drift . . . 52 6.5 Truncated data . . . 53 6.6 Summary . . . 54

7 Framework for data-informed decision making 56 7.1 Introduction . . . 56

7.2 Punishment and reward regime . . . 57

7.3 Proposed framework . . . 58

7.3.1 Reduction of data . . . 58

7.4 Fraud detection case . . . 60

7.4.1 Primary process . . . 61 7.4.2 Secondary process . . . 67 8 Conclusion 75 8.1 Discussion . . . 78 References 80 Appendices 87

(6)

List of Figures

3.1 Common choice environment . . . 16

3.2 Common utility function . . . 23

3.3 Illustration of a hypothetical value function . . . 25

4.1 Example pay-offs in a dynamic game setting . . . 36

5.1 Example spurious correlation . . . 41

5.2 Reasoning example with n causal network . . . 42

5.3 An example of a serial belief-network structure . . . 43

5.4 An example of a diverging belief-network structure with a common immediate predecessor . . . 43

5.5 An example of a converging belief-network structure with a common immediate descendant . . . 44

5.6 Example of a Bayesian Network used fingerprint match as evidence . . . 45

6.1 Misclassification rate over time . . . 53

7.1 Reduction from population to court cases . . . 59

7.2 Decsion process . . . 60

(7)

List of Tables

3.1 Basic decision table . . . 17

3.2 Basic statistical decision table (loss matrix ) . . . 18

4.1 Prisoners’ dilemma . . . 33

7.1 Expected utility compliance and non-compliance in different regimes . . . 62

7.2 Prospects compliance and non-compliance in different regimes . . . 63

7.3 Hypothetical actions . . . 66

(8)
(9)

Chapter 1

Introduction

Decisions can be made after careful consideration of the situation at hand. Subjectively people tend to evaluate the problem presented and aim to make a careful consideration of all the possible outcomes from different choices. In this consideration, the different outcomes are evaluated and placed in a preferential order. We set out to pick the most beneficial course of action, hopefully, maximising gain whilst minimising losses. As opposed to personal affairs a more formalised pro-cess is nepro-cessary in areas such as job admissions, mortgage applications, insurance claims, social security claims, tax claims, and loans. The decision maker would be helped by a more formalised process to justify actions and choices. Moreover, the formalisation of a decision-making process can provide means to minimise errors. An overview of common errors in statistical evidence, which are the product of a decision process is portrayed in section 2.1. In these processes, there are multiple players involved, often a decision maker and a client. The decision maker who is to determine a course of action and on the other side a client. A client is used here to describe the player whose faith is determined by the decision maker. Many of these decision-making processes in business and public administration can be characterised by the following characteristics; binary classification, the behaviour of clients, the client is a stakeholder, bulk process, there are rules associating behaviour with outcomes, and there is diagnostic and corrective feedback in place for recodifying those rules. These characteristics are briefly introduced below. These characteristics are further examined as the thesis progresses.

Classification. The decision task at hand is a binary classification problem. A binary clas-sification can be used for the clasclas-sification of customers. Banks use credit scores to determine whether or not a loan is granted, based on a set of classification rules which define the catego-rization of the clients. All applications are evaluated and classified as either good or bad clients. Sequentially a decision is made based on this classification.

Judging past behaviour or future behaviour of a client. The classification is based on past behaviour of the client. In the example of credit scoring the bank aims to predict future

(10)

behaviour of clients based on their past behaviour. When a client persistently fails to make payments on time, the likelihood of an excellent rating is small. Based on the data about past behaviour of clients the bank aims to infer and predict future behaviour of clients.

Client stake. In all the areas mentioned the client has a stake in the outcome of the decision. Given the binary classification aspect of the decision process, one outcome is favourable and the other one is undesirable, from the clients’ perspective. Even though the client has a stake in the outcome, the influence of the client in the decision process is often limited or none. Clearly, the influence of the client past behaviour is present, however at the time of the decision process, these have become fixed figures and hence the influence is implausible.

Bulk process. The cases are not examined individually first hand but rather are the results of a bulk process analysis process. The bulk process can either focus on narrowing down possible candidates or on inferring general classification rules based on behaviour derived from a sample. The decision maker benefits from an effective and efficient process. Optimising this process leads to resource reduction and increases reliability.

Rules associating behaviour with outcomes. The decision task is guided by codified rules that associate behaviour with outcomes and guide the attention of the decision maker towards data that can be used as evidence in decision-making.

Diagnostic and corrective feedback. There are two types of diagnostic and corrective back-office processes (this process is referred to as a secondary process in the final framework, which is introduced in section 7) that have the potential to recodify the rules in the primary front-office process (primary process are concerned with the execution of a decision-making process in individual cases) . Back-office processes are the processes used for determining associating and classification rules. The front-office, on the other hand, is entrusted with the task of executing the policies and rules determined by the back-office. The processes are triggered by different events but share the ability to recodify the rules used in front office processes.

1. A secondary decision-making process that can overrule the primary decision-making pro-cess, that is triggered either externally (an appeal) or internally (auditing), and that may use evidence not considered in the primary process.

2. A statistical analysis process on aggregate data from the decision-making process that may discover and disclose new patterns in the data, detect potential frauds (to be forwarded to auditing), and may involve research into parallel processes to (dis)confirm patterns.

These are are the characteristics of a decision-making process which are taken into account in the construction of a framework regarding the formalisation of the decision-making process. Since the client has a stake in the outcome, he can seek ways to alter his appearance in such a way

(11)

that the new appearance will have beneficial consequences from his point of view. This could be achieved through normal good behaviour or less ethically sound methods such as fraud. By committing fraud, a client tries to conceal, alter, manipulate, fake or construct evidence to avoid undesirable outcomes such as fines, incarceration, more expensive rates, denial of a mortgage, et cetera. Alternatively, on the other hand, he tries to benefit himself by altering evidence to get more money, better ratings, higher loans and so on. Relative to the status quo there are positive and negative outcomes. The client who is determined to commit fraud seeks to minimise the negative outcomes and maximise the positive outcomes. Clients have the tendency to be risk-averse, which will be explained in section 3.2.4. A formalised decision-making process can be used in the detection of fraud. Depending on the area where this decision-making process takes place the desired and undesired outcomes will differ. The possibility of fraud is caused by the type of data collected: decisions are usually not based on first-hand data about behaviour, but on (second hand) claims by the client or brought in by the client or via another third party. The implication of different types of data are covered in the framework. Examples of these second-hand claims are tax forms, insurance claim forms, income statements, and the like. Clients may have an interest in altering and manipulating the decision process, and can relatively easily do so. For example, a fraudulent insurance claim can be relatively easily submitted to an insurance company where a case is reported which never happened. There are always new methods explored regarding committing fraud. Moreover, successful methods are copied by other people and induce a domino effect: fraudulent methods are exploited until too many people start using it and the decision maker can effectively and easily detect these methods. By that time fraudsters have moved onto exploiting new strategies and the game recommences.

The first time statistical analysis, according to N. Fenton, Neil, and Berger (2016), was presented in legal proceedings as evidence dates back to 1867. This was not the first time it was used but was the first time it was well documented. Before the use of statistical analysis most cases were built on defeasible reasoning, see section 5.2. The Howland case (1867) was studied by Meier and Zabell (1980), where a forged signature was used as the claim for the demand on a will. The solicitor disputed the authenticity of signature by comparing the down strokes of multiple instances. It was stated the differences between them was improbable under a binomial model. The judge ruled against the provided evidence by claiming statistical evidence was not to be used as evidence. In legal proceeding, the judge acts as the decision maker since the judge is the one determining the faith of the client. Meier and Zabell (1980) also refuted the statistical evidence by pointing out the abuse of the product rule by multiplying probabilities of independent events. Ever since the first appearance of statistical evidence in legal proceedings, there has been a considerable growth in its use. N. Fenton et al. (2016) state the use of statistical evidence is still limited to small cases where mostly classic statistical methods are used in evidence. However, it has been discussed extensively that the classical methods are severely limited and induce common misinterpretations (e.g., N. E. Fenton & Neil, 2013; Ziliak & McCloskey, 2004, 2008; Wagenmakers, Lee, Lodewyckx, & Iverson, 2008; J. O. Berger, 1985). Nonetheless, there are instances where statistical analysis is presented as evidence. These statistical methods have made their way into court and are often presented as evidence in legal proceedings. More specifically, trait evidence is an example of evidence which is (partly) based on statistical analysis. These traits include forensic features like DNA, fingerprints, handwriting, bite marks, earmarks, and footwear. Besides forensic features trait evidence also includes non-human artefacts such as

(12)

clothing, weapons, and soil (N. Fenton et al., 2016; Verheij et al., 2016). A more comprehensive overview of statistical evidence and common errors will be presented in section 2.1.

Since a classical approach to statistics is limited an alternative to the classical approach is considered: the Bayesian approach. A comparison between these two methods is made in section 2.2. Recent research seems to favour the Bayesian approach. Given the magnitude of both theorems and due to limited resources the focus of this thesis will be on the Bayesian approach. Unfortunately, within the legal profession, there has been a very rigid, negative, attitude towards the adoption of a Bayesian approach. And therefore this area is examined. N. Fenton et al. (2016) suggest various reasons for this resistance. Firstly, the social, legal, and logical obstructions for its utilisation. Secondly, the authors argue the tendency to oversimplify the underlying legal arguments and causality. The simplification was necessary to ensure the models could be handled manually. However, as elaborated by N. E. Fenton and Neil (2013), nowadays there are sophisticated algorithms and tools available to use machinable computation as opposed to manual computation. Furthermore, information science tools are used to learn from aggregated statistical data. Difficulties arise when probabilistic evidence is accommodated with a single case. There seems to be a gap which has to be addressed. The distance between probabilistic evidence and the interpretation of this type of evidence has to be addressed. A possible solution is by letting an expert interpret the findings as opposed to leaving the interpretation to a judge or a jury. An expert evaluation is credible based on the assumption that he ought to be knowledgeable in the domain and has a motive to testify. By accepting or rejecting an expert interpretation of statistical evidence, the evidence is thereby simplified. A more sophisticated approach could be achieved through formalising a decision-making process. In court, there is still an emphasis on coherent scenarios and statistical evidence fairly limited (Vlek, Prakken, Renooij, & Verheij, 2014; Verheij et al., 2016).

Besides statistical analysis brought up as evidence, statistical analysis is also used in the pro-duction of evidence. In the propro-duction of evidence, there are several additional factors which have to be addressed. Assume a decision maker wants to gain insight into the possible fraudu-lent behaviour of clients. In this particular case, these clients are the citizens. Gaining insight into fraudulent behaviour requires the ability to be able to make a distinction between suspicious behaviour and normal behaviour, which is the primary binary classification in this example. Sub-sequently, suspicious behaviour can be investigated to gain knowledge whether there was notion of fraudulent behaviour. One method of achieving this insight is to conduct a mandatory survey, such as a tax deceleration form. Given the two proposed categories (suspicious behaviour and normal behaviour), the collected data should be able to categorise each instance into one of the categories by first establishing classification rules and thereby classify every instance. Therefore, the desired categories dictate which data are collected and how e.g. if gender is relevant the gender has to be documented. Since the data are used as evidence there is the need for a sensible approach for avoiding biases, interpretation errors, reasoning errors, sample errors, and statistical errors. To address these problems there is the need for a feedback clause. The feedback should generate information to refine existing processes or establish new ones. An additional factor has to be taken into account - the avoidance of educating the citizen that they are being investigated since this knowledge can have negative consequences. Cost effectiveness and resource distribution should be taken into account since there is limited funding to collect and analyse data. Lastly,

(13)

the performance of different classification rules and methods should be evaluated. An example of such evaluation is the F-measure. This measure can also be used to quantify acceptance of various errors in classifications. This formalisation can be used to articulate the acceptance of false positives and false negatives. The F-measure is covered in more detail in section 3.4. Given the growing interest in statistical evidence and the need for a sophisticated approach to a formalisation of the decision-making process, it is important that these issues can be adequately addressed. A framework is proposed that incorporates the factors mentioned in this chapter. The research interest is a methodology for designing a framework for decision-making processes. The method applied will be review of literature, specifically looking at the friction between court decision making versus data analysis. To address these issues these proxy research questions are considered:

What are the problems with statistical analysis presented as evidence from a legal point of view? Can these problems be addressed in design?

To answer the main research question the question is specified into two sub-questions each ad-dressing a different area within legal profession:

I. How should the decision maker’s preferences be dealt with regarding false negatives and false positives in decision making?

II. How should fraud detection be properly implemented as a process design?

These questions form the basis of this thesis. How the questions are answered and what approach is used be discussed in the next section methodology.

1.1

Methodology

In the previous section, the main research question was formulated, “What are the problems with statistical analysis presented as evidence from a legal point of view? Can these problems be addressed in design?”. This question was established to gain insight into statistical analysis, the production of evidence, the entire process prior to evidence and to review these topics from a legal point of view. This thesis will be by review of literature. The main purpose of this thesis is to gain knowledge of the aspects which contribute to this process. This research is on an exploratory basis with the emphasis is on reviewing existing literature. The literature is used to define all the different aspects which contribute or have to be considered in mapping statistical analysis as evidence from a legal point of view. The objective is not to provide conclusive research, merely to develop an understanding of the landscape in statistical findings as evidence.

To answer the research question, two different approaches are presented in section 7 in the form of a framework. This framework builds on the different topics discussed in the chapters. Through-out the framework relevant sections are referenced. The first approach presents a model which

(14)

maps the entire decision-making process whilst clearly defining the relevant factors from a legal point of view. This process model will be a schematic representation of all the different aspects which can be used to improve the sensible approach to biases, interpretation errors, reasoning errors, sample errors, statistical errors, feedback, educating opponent, and cost effectiveness. The process is divided into two major processes; the primary process and the secondary process. The primary process is concerned with the decision-making process for individual cases. The secondary process is concerned with the data collection, data analysis and the establishment of classification rules used in the primary process. The difference and how these two are interoper-ated will be discussed in more detail in the chapter 7. The framework is described from a fraud detection case perspective to illustrate how it can be used in practical terms but can also be used for different areas where decision-making processes are relevant. In some cases not all of these are relevant aspects, however, including all of them provides a more generic model.

To legitimise this model and to create a more workable form the second approach is to formulate critical questions, questions designed for decision makers such as legislators, law enforcers, data analysers, domain experts. These critical questions provide a self-reflective means for authority to minimise unwanted errors and to increase awareness in implicit and explicit choices. These questions can be helpful when encountering any form of statistical analysis used as evidence (pri-mary process), or whenever statistical evidence is produced (secondary process) these questions can be consulted.

Before the framework can be constructed different aspects of decisions and statistical infer-ences are covered. Throughout this thesis different chapters will be discussed. Hereafter a brief overview of the various chapters is given.

Chapter 2 gives a more detailed examination of the differences between a Bayesian approach and a classical approach. Several arguments are addressed to develop a more general understanding of the both and to justify the decision to favour a Bayesian approach over a classical approach. The Bayesian analysis is extensively reviewed. It is argued why a Bayesian approach is favourable over a classical approach and why the classical approach should be avoided in most cases due to complications. After these two approaches are considered statistical evidence is further examined and an overview is provided of common errors which can be found in statistical evidence. This chapter contains the background information this thesis is built upon.

In chapter 3 a broad introduction to the decision-making process is given. First the foundations of the decision-making process are examined by covering the psychological aspects of decision making before moving to statistical decision making. In statistical decision-making, decisions are based on knowledge which is derived from data. After a basic introduction to the process of decision-making section 3.2 goes into more detail on how the probability of events and outcomes can be mathematically represented relevant to decisions. The probability theory which lies at the basis of the Bayesian theorem is needed to develop Bayesian Networks used to illustrate causality and can be used for inferences and estimating joint probabilities. Statistical decisions can be evaluated with different methods such as utility theory, prospect theory, loss function and subjective probability, which will be discussed in detail.

(15)

outcomes. The quantifications can be used in the development of a game setting where different choices resolute to different outcomes. The game theory is discussed in chapter 4. Additionally, a game theory approach can interoperate multiple players who in some situations might be essential. When multiple players are present the pay-off depends on the actions from all players. Subsequently in chapter 5 the concepts of reasoning and inferences are introduced. Reasoning is essential in making logical conclusions and inferences. These concepts are to be used sensibly to minimise biases and thus provide better data. Data provenance provides insight into the nature of data. Bayesian Networks are introduced as a schematic representation of joint probabilities and conditional dependencies. Based on these graphs inferences can be made about relations. However, even when underlying relations are found they are not necessarily true; which will be discussed in section 5.5 data snooping. Lastly, opportunistic behaviour in decision-making is discussed.

The sample selection will be discussed in chapter 6. Often a sample is taken from a population and used in decision-making and reasoning. In establishing classification rules it is of utmost importance that the sample is representative of the population it is taken from. In the assessment of individual cases, the representativeness for the population is less significant. Biases might arise when sample selection is not handled carefully. Several tools and methods are provided which can be used for reducing biases and ensuring the best possible representation of the population. Based on all the methods, findings and techniques discussed in the previous chapters, a framework is proposed in chapter 7 which aims to facilitate decision makers in their process by providing a process model appliable to decision-making processes. It will be completed by critical questions that can be used by decision-makers to improve the quality of the decision and the quality of the decisions.

All the findings will be summarised and evaluated in the final chapter. A brief overview of the most important points is recapitulated. In the discussion this thesis is reviewed and acknowledges the shortcomings of this and suggesting further research.

(16)

Chapter 2

Background information

2.1

Statistical evidence

In a relatively recent case, the court of appeal deemed evidence based on a Bayesian probability inadmissible. The judge in the case Nulty & Ors v Milton Keynes Borough Council, dated in 2013, stated that “The chances of something happening in the future may be expressed in terms of percentage. Epidemiological evidence may enable doctors to say that on average smokers increase their risk of lung cancer by X%. However, you cannot properly say that there is a 25 percent chance that something has happened: Either it has or it has not.”, placing the Bayesian theorem in a difficult position (Nulty & Ors v Milton Keynes Borough Council , 2013). Moreover, as mentioned in N. Fenton et al. (2016), “It is quite clear that outside the field of DNA (and possibly other areas where there is a firm statistical base) this court has made it clear that Bayes’ theorem and likelihood ratios should not be used”. However, there are some cases where these methods were recognised in countries like the Netherlands, Slovenia, and Switzerland (N. Fenton et al., 2016).

A common practice of statistical evidence is DNA matches. A DNA match indicates plausible evidence but has to be assessed from a statistical point of view. The interpretation of these findings requires knowledge of the process on how statistical information is derived. Verheij et al. (2016) state that these interpretations are often made by fact finders such as judges and juries. Note those fact finders are often more used to narrative and argumentative approaches to reasoning involving arguments and scenarios. Opposed to forensic experts, who are more famil-iar with a probabilistic approach and are well trained in the interpretation of statistical findings (Vlek et al., 2014). Without the proper background misinterpretations and miscommunication are becoming a real danger which induces various reasoning errors. Errors in probabilistic rea-soning are amongst the most common. Thompson (2012) illustrates the common perception of the reliability DNA tests is overestimated. This nuance is easier to be understood by forensic experts than by judges and juries. The author supports his claim by providing evidence of false positives and false negatives where innocent people are incarcerated and guilty people are

(17)

liber-ated respectively. This example illustrates that there are inadequate measures in place to avoid error and biases. More specifically, in the case of DNA matches used as evidence, genetic pairs referred to as loci are matched with each other. Based on the assumption that 13 gene loci are statistically independent the probability is calculated as such e.g. there is a chance of 1 in 31 trillion to have 12 matching loci, which is considered indisputable (Schneps & Colmez, 2013, p. 82-83). Commonly 6 or 7 loci are required by law before running matching tests. However, an interesting discovery was made by Kathryn Troyer, discussed in Schneps and Colmez (2013, p. 82-83). In a database of 10.000 DNA profiles, she was able to find two matching profiles of unrelated people who shared 9 identical loci (which has a probability of 1 in 13 billion). This illustrates that despite the extremely small probability of this occurrence it does occur. This phenomenon is not limited to DNA profiling but reaches into different areas where statistical evidence is unfounded. To illustrate some of the errors made in production and interpretation of statistical evidence Schneps and Colmez (2013) provide a review of common errors made in sta-tistical evidence by examining various court cases. The author categorised the following different types of errors.

• Multiplying non-independent probabilities.

Probabilities can be multiplied when the events are independent of each other. Events which conditionally depend on each other are not be multiplied. A more in-depth explanation is given in section 3.2.1.

• Unjustified estimates

According to the author, there is a substantial amount of numerical estimates which are simply wrong. These errors can be caused by intention or by accident and can be as simple as forgetting to include a decimal point.

• Trying to get something from nothing

Depending on the prior information and the characteristics such as the likelihood of errors being distributed equally thought the sample should dictate what measures to be used. • Double experiment

By reproducing an experiment multiple times which happen to provide the same outcome the reliability will be increased as the number of experiments increases. The author presents a case where the judge did not value the second experiment since it provided the same information.

• The birthday problem

A well-known pitfall is the birthday problem is a puzzle that asks “How many people do you have to punt in a room for there to be a 50-50 chance that two of them share the same birthday”. The answer to this question is contradicting intuition.

• Simpson’s paradox

The Simpson’s paradox describes the phenomenon that when different disjoint subgroups from a sample the averages of the subgroups can increase but the overall average remains the same. The explanation can be found in the differences in groups sizes which have a different weight on the total average.

(18)

• The incredible confidence

Often there are certain probabilities attached to events. However, when a probability is retroactively calculated on an event that already occurred the interpreter might become too confident (since it did occur) despite the very slim probability.

• Underestimation

The author argues that since we are used to coping with small distances and small amounts, in general, we are becoming hard-wired to small quantities and therefore experience dif-ficulties dealing with large quantities. To exemplify the author addresses the following example; “Assume that the world is a perfect sphere and wrap a wire tightly around the equator. Now take a second wire that is exactly one meter longer than the first one, and wrap that around the equator as well. Because this wire is longer, it will be slightly loose, a bit up off the ground all the way around. However, how high will it actually be? Can you slip a razor blade underneath?”, the answer of 16 cm might be surprising to most.

• Choosing a wrong model

Using mathematical models to explain real life situations can be deceptive as they have a tendency not to be completely accurate. Models simplify the complexity of real-world events to be useful. The greater the simplification the greater the chance of wrongful predictions.

• Mathematical madness

This error shares characteristics from the error addressed in incredible confidence. In con-trast to this error, the mathematical madness states that there is always something you will find when you run a hundred tests. Say if you were to compose more than a dozen hypothesis regarding a data set, the more hypothesis you are testing the more likely it will become to find data supporting on of your hypothesis.

These examples above illustrate the various errors in statistical evidence that are common and range from different sources. These errors can be attributed to both the production as the interpretation of evidence. This also shows the need for a sensible approach to statistical evidence and statistical evidence used in decision-making processes. Given the relevancy, this thesis aims to explore some of the fundamentals of decision-making processes to reduce errors like the above.

2.2

Bayesian versus Frequentist

As briefly mentioned in the introduction of this thesis there are compelling arguments to favour a Bayesian approach over a classical approach. Note that a classical approach and a frequentist ap-proach are used interchangeably. The use of statistical inference is to be used as an interpretation of a phenomenon as opposed to an explanation (Robert, 2007). The statistical theories used in this thesis draw heavenly on the Bayesian approach. This was a conscious decision and therefore needs some additional information regarding the differences between Bayesian theorem and most common alternative, the frequentist approach. There are additional approaches for statistical inference such as the Fiducial approach (Fisher, 1935) or a randomised approach (J. O. Berger,

(19)

1985, p. 12). The Fiducial approach, however, is ill-suited for statistical inference and logic rea-soning and its value is more in historical context (Seidenfeld, 1992) and a randomised approach is only suited for randomised events such as a game of coin tossing (J. O. Berger, 1985, p. 12). Therefore the comparison is only focussed on the difference between a frequentist approach and a Bayesian approach. This section provides a thorough comparison to illustrated why in most cases this theorem is favourable as opposed to the alternative. Interestingly, Bland and Altman (1998) argues that most statisticians did not make a conscious decision regarding adhering the Bayesian or frequentist theorem but is commonly a result of the university they attended; no knowledge of the alternatives was acquired before it was too late and the definite choice had been made.

The Bayesian approach was originality developed as part of scientific inference by Jeffreys (1961). His approach has been wildly adopted among scholars since (see Kass and Raftery (1995)) and is used regarding hypothesis testing. Since the Bayesian analysis is a self-contained model within statistics, this method is firstly assessed in a broader scope before discussing various applications of Bayesian analysis in terms of decision-making analysis in section 3.3 and concerning causalities used for reasoning in section 5.3. Bayes’ theorem is based on the assumption there exists some probability distribution for quantities such as a population or proportions (Bland & Altman, 1998). Parameters are variable whereas data are fixed. A probability distribution discloses the prior knowledge and beliefs. Furthermore, a Bayesian approach can be used to express a degree of belief of events to occur in terms of probability know as the probability theory (see section 3.2.1), and additionally to express conditional dependencies and the degree of belief of co-occurrence of events.

The frequentist approach, as opposed to a Bayesian approach, is based on the assumption the population is a fixed quantity and does not have a probability distribution. Furthermore, data are treated as a repeatable random sample and parameters are fixed and data are variable. A frequentist can subsequently calculate a confidence interval used for testing hypothesis and significance tests (the well know p-values). With confidence intervals of 95% hypothesis are either rejected or accepted. The issues of the misinterpretation of confidence intervals are addressed in uncertainty as quantified probability. The following arguments compare these two approaches and are based on the arguments placed together by J. O. Berger (1985, p. 118-126).

Important Prior Information

As argued by J. O. Berger (1985) failure to utilise prior information can lead to conclusions that range from unlikely to utter nonsense. The Bayesian approach helps with using prior information in a sensible way. Comparing the Bayesian theorem to a frequentists approach is best illustrated by the chances of obtaining full-house while engaged in a game of poker. Say, there is a certain probability of obtaining such card combination. A frequentist cannot take into account that the chances of actually winning the round might also depend on the opponents (say playing against the world champion) (Wagenmakers et al., 2008).

(20)

Uncertainty as quantified probability

Expressing uncertainty can be achieved through a probabilistic expression. Be aware that a confidence interval is not equal to a probability. A common way to express confidence in findings is through significance levels (p-values), say there is a confidence of 95% for the interval [1,5]. It is important to be aware of the difference of 95% significance and a probability of 95%. Falsely they are commonly used interchangeably (N. E. Fenton & Neil, 2013, p. 12). Classically, a confidence interval of 95% entails that off all the observations 95% of the instances are within the interval [1,5]. However, this does not mean that the probability P (θ ∈ [1, 5]) = 95% (Wagenmakers et al., 2008; J. O. Berger, 1985). The probability states that the chance of observation is 95% to be within the interval [1,5] and is a fundamentally different concept. “a 95% confidence interval is not to be interpreted as an interval that has a 0.95 probability of containing θ” (J. O. Berger, 1985, p. 199). The problem with a confidence interval is they are very often misinterpreted, given the counter-intuitive characteristics. N. E. Fenton and Neil (2013) state even well-trained statisticians experience difficulties correctly interpreting p-values. To emphasise the magnitude of this problem, Ziliak and McCloskey (2004, 2008) conducted a survey of well-established journals to conclude the vast majority contained flawed analysis caused by misinterpretation. Suggesting economic losses as the result of those misinterpretations. J. O. Berger (1985) gives two issues that have to be addressed. Firstly, the philosophical aspect of what is the best method of quantifying uncertainty and based on previous studies (see (J. O. Berger, 1985, p. 119)) the author argues that there are compelling reasons to favour a Bayesian approach over a classical approach. Secondly, the practical issue arises with the interpretation of statistical conclusions. J. O. Berger (1985) argues that the majority of the users are incapable of interpreting a classical confidence correctly and often sees it as a Bayesian probability.

Conditional viewpoint

The importance of a conditional viewpoint can easily be explained by using an example which is based on the example given by J. O. Berger (1985, p. 24). Imagine a research trying to examine the performance of a hypothetical device. After running experiments, the researcher found two independent outcomes both equally in size. The first category showed that the performance was 50 %. The second category showed that the performance was 100%. A frequentist ap-proach would conclude that the overall performance of the device was 75 %. Concluding that the actual performance is 75 % makes little sense and is better to be understood given their conditional. Thus a conditional viewpoint of the performance provides better insight into the actual performance of the device.

Coherency and Rationality

Bayesian is built upon the foundation that there are a prior probability distribution and a utility function. Briefly described, a utility function is a function which predicts outcomes given the action and the state of nature, see section 3.2.3. Given the prerequisite, an individual

(21)

must have a certain level of preference and a preference ordering of various outcomes, actions, decision rules, statistical procedures, and inferences. J. O. Berger (1985) argues that by having some preference ordering results into rationality. Violating the preferential order is counter-intuitive since violating them is “violating the common sense axiom of behaviour”, according to J. O. Berger (1985). Whilst following the Bayesian a rational rational approach comes more intuitively and hence exposing the irrationally in other approaches.

Selecting optimal methods

Whilst conducting statistical analysis it is natural to limit the statistical procedures to only relevant statistical tests. Furthermore, while facing a decision, it is natural to limit the consid-eration to only executable actions. In light of Bayesian versus frequentist, both theorems can be used for selecting a best-suited test. Imagine a situation where two hypothesis are tested. In a classical approach, it is common to select a test based on the probability of type I errors of α = 0.05 or α = 0.01, this approach has been refuted by most statisticians and argued that α should be determined based on a conscientious comparison of the two hypothesis. However, the conscientious comparison is vaguely defined and rather obscure (J. O. Berger, 1985). In contrast, the Bayesian theorem dictates the following approach;

i. determine the probability distributions of both hypothesis ii. identify the consequences of false positives and false negatives iii. use the best suited Bayesian test based on the answers

Based on the above a test can be selected which corresponds to the most powerful test for some α level. Both approaches are used for selecting the most powerful test. The major difference between the two is that in the latter, the Bayesian approach, there is explicit guidance for the selecting process where the former, the frequentist approach, fails to provide an explicit guidance and remains ill-defined. Making an intuitive and subjective decision based on a Bayesian approach seems more of an adequate approach than making an intuitive and subjective approach purely based on the selection of some α level (J. O. Berger, 1985).

Operational Advantages

A common counter argument made against the use of a Bayesian approach is the ease of use. Which is a founded argument since the determination of a prior distribution and a loss function are not simple. Accompanied by the argument that computers lack the technological advances to compute such complex models. However, give the developers in terms of computational power this argument becomes less relevant (N. E. Fenton & Neil, 2013). However, the ease of use remains more difficult as opposed to a classical approach. However, according to J. O. Berger (1985), upon examination which approach yields the best answers, a Bayesian approach is the most likely to produce a better outcome.

(22)

Objectivity and Scientific Uncertainty

Another common critique made against the use of a Bayesian approach is the supposed lack of objectivity. Objectivity is favourable in terms of science as opposed to subjectivity. Even though it is very hard to be completely objective, it should be pursued nonetheless. To use a Bayesian approach properly, one must take a conditional viewpoint. To refute this argument, a Bayesian approach is applied on non-informative priors. By doing so, the subjective viewpoint is omitted and the effectiveness and objectivity of this approach can be illustrated (J. O. Berger, 1985). Based on all these arguments the Bayesian theorem is adhered in this thesis. Most importantly the difficulties in interpreting confidence intervals in a classical approach and the ability to formulate conditional standpoints in a Bayesian approach contributed to this decision. From a legal point of view, the conditional probabilities are particularly useful in reconstructing scenarios and the ability to calculate joint probabilities.

(23)

Chapter 3

Decision making

3.1

Introduction

This chapter begins with exploring the fundamental factors of decision making from a psycho-logical point of view. How humans use their perception to value the prospect and the associated effort for obtaining that prospect. Based on the trade-off between those two humans can resolute to a decision. The preferences can be influenced just by reframing the relevant factors which will be covered thereafter. After this general introduction decision making the third factor is introduced: the state of nature which also influences the outcomes of choices. The probability of these outcomes i.e. how likely they are to occur can be quantified from a mathematical approach or a subjective approach. The mathematical approach is introduced in the probability theory and the application is further examined in Bayesian analysis. Statistical decision theory is dis-cussed where decisions are (partly) based on insight gained from statistical information. Within statistical decision theory, there are different approaches to quantify the value prospects which will be discussed in the prospect, utility, and loss theory. The quantification of preferences, based on the value of outcomes and the probability is very useful for evaluating which outcome is most preferred. Lastly, quantifying the attitude towards falsely classified instances is very useful in the decision-making process. In the section F-measure, a method is discussed how these can be used to advantage. This section is slightly different compared to the other sections dealt with in this chapter but is important in the evaluation of different strategies discussed in the proposed framework.

Making decisions is a crucial part of human life, they can be the basis of the course of one’s life or determining a business goal. A common classroom example of simple choices is the example grab an orange or not. In an experiment setting the answer can be either yes or no. Depending on the perceived value of something in contrast to the associated costs the observed person makes a decision (Rangel & Clithero, 2013). Rangel and Clithero (2013) illustrate in figure 3.1 a common choice environment simulated in research. The green line depicts the curve associated with the effect required. The effort has a negative influence on the probability of choosing. The red

(24)

line depicts the associated with the (perceived) value. The value (rating) has a positive curve influence on the probability of choosing.

Figure 3.1: Common choice environment

Even though figure 3.1 is a simple example, it covers some of the cognitive basis on why we choose certain things and it is the basic assumption on which decision theory is based. Decision theory is concerned with the process and the factors contributing to making decisions. The Oxford dictionary describes a decision as the resolution that is made after consideration (Oxford Dictionary of English, 2010).

Decision theory more formally described is a collection of mathematical, logical, and philosophical theories of decision making by rational individuals or groups. These decisions can be made individually, in competition, and in groups (Resnik, 1987). The motivator behind this theory is to gain insight in how individuals or groups, given certain attributes, resolute to decisions. Resnik (1987) points out that it has become a fundamental part of important approaches to the philosophy of science, the theory of rationality and ethics.

It is common in literature to find decision theory divided into two categories; normative (some-times referred to as prescriptive) and descriptive. Normative decision theory is interested in how decisions are made by ideally rational agents, more specifically how decisions ought to be made. Descriptive decision theory, on the other hand, is concerned with the process of how decisions are made by agents, which are not necessarily completely rational. Even though it is common to find that decision theory is categorised as either one, some more abstract theories fit in neither and are more focussed on the logical and philosophical foundations (Resnik, 1987).

Besides the categorisation of normative and descriptive, decision theory can be branched into individual and group decisions. It is important to bear in mind that groups (such as governments, states, sports clubs) can act both as an individual or as a group. A group can act as an individual when they pursue an organisational target, and can be accounted on as an individual. This is type is assumed in this thesis. However, within a group, decisions about best goals or priorities are part of a group decision. Group decisions are made amongst more than one individual where the individuals prioritise maximising individual goals. When an agent acts as an individual, the principal objective is to maximise its own benefits.

The process of making a decision in abstract terms consist of three main parts; acts, states, and outcomes. Whilst making a decision the relevant set of acts, states and outcomes are taken into

(25)

account. Important or desirable entities are prioritised from each sets in order to come to a decision. Table 3.1 illustrates a basic decision table with two states and two actions with the outcomes based those. The agent has two possible acts; light the wood or do not. Given the states of the wood, either wet or dry it will result in a fire.

Table 3.1: Basic decision table

States

Wet wood Dry wood Acts Light wood No fire Fire

Do not light wood No fire No fire

Table 3.1 illustrates an intuitive example of a basic decision table. Often, in a more complex setting, there are costs associated with certain acts and the outcomes can vary in terms of values. These issues are discussed in section 3.2. Moreover, decisions are often based on uncertainties. Uncertainty can arise from various sources e.g. there may be uncertainty as to what extent samples differ from the population, there may be uncertainty about tomorrow’s events, or there may be uncertainty about unobservable values (e.g. diagnoses of cancer, based on descriptive symptoms) (Hand, Mannila, & Smyth, 2001, p. 94). A formal and mathematical representation of uncertainty in the form of probability is covered in section 3.2.1. There are some meth-ods available to reduce uncertainty. The uncertainty reduction theory was firstly proposed by C. R. Berger and Calabrese (1975) and describes how uncertainty in social interactions can be reduced. However, I could not find any evidence of this theory being used outside social interactions.

Lastly, another approach to decision theory is from a psychology perspective. Tversky and Kah-neman (1981) state that depending on the framing decision strategies can be altered. Considering altering decisions based on different formulations can have an impact on strategies and perspec-tive. Thereby it is important to acknowledge these potential issues. According to the article, the main reasons for altering decisions based on framing can be summarised into the following points;

1. individuals might have a different perspective and preference in a different framing of the same problem.

2. individuals might lack knowledge of alternative frames which potentially have an effect on the effectiveness of the various options.

3. individuals strive to have an opinion independent of the framing but,

4. individuals experience difficulties upon encountering inconsistencies and lack the ability to resolve these.

An example of an inconsistency in preferences will be discussed in the framework after some additional relevant aspects are covered.

(26)

3.1.1

Statistical decision theory

Whereas table 3.1 illustrates a simple real world decision table, in light of data analysis, a sta-tistical equivalent is necessary in the formalisation of the decision-making process. Stasta-tistical decision theory, in particular, is the process of decision making based upon statistical knowl-edge which includes the uncertainties which are present in the decision problem (J. O. Berger, 1985). This theory does not significantly attribute to either normative or descriptive theory but addresses it more broadly. Regarding individual or group decisions, this theory leans to-wards decision theory based upon agents acting as individuals. A group decision can also be formalised in a mathematical approach as well. However, this is beyond the scope of this thesis (see (J. O. Berger, 1985) for an mathematical approach to group decisions). Statistical decision theory combines the decision-making process and statistical knowledge of a population. Both statistical decision theory and Bayesian analysis attempt to use prior information to predict un-certainties as probabilities and often are used together. J. O. Berger (1985, p. vii) argues that there is almost no point in learning one without the other. As argued in section 2.2 in this thesis there is opted for a Bayesian approach as opposed to a frequentist approach.

Whereas the basic example (shown in table 3.1) is pretty straightforward and intuitive, moving to a representational form raises new issues. Often making a decision is based on risk and uncertainty. Risk is mathematical quantification and is the combination of the probability, the associated costs and the prospect. These will be discussed in the following sections. Uncertainty is a major factor in most decision cases, and that implies that we may get it wrong (Tobler & Weber, 2013). Because it is based on a representation of the reality, the available data are limited and thus incomplete. Ideally, complete information is present. However, only a limited amount of information can be collected; it is usually impossible to collect everything. Costs are associated with the collection of data. Therefore the data collection is limited and is focused on (presumably) relevant data points or observations to reduce complexity. Reducing the complexity of a model leads to a more pragmatic solution since it leads to a more workable form (Robert, 2007, p. 5).

Uncertainty can be considered as an unknown set of quantities which represent the possible states in a vector or a matrix. The states are represented by θ (e.g. wet wood) and are called the parameter or the state of nature where θ ∈ Θ. Additionally, the parameter space is noted as Θ (J. O. Berger, 1985). The particular actions (e.g. light wood) are denoted as a. All the possible actions form the collectionA where a ∈ A . The outcomes, given the state and actions will be denoted as X where X = (X1, X2, . . . , Xn) and Xi is an independent observation from

the distribution. A particular realisation of X will be denoted as x. The entire set of possible outcomes,X , is the sample space.

Table 3.2: Basic statistical decision table (loss matrix )

Θ θ1 θ2

A a1 -200 -300

(27)

Table 3.2 illustrates a fictional decision table, where based on the actions and the given state of nature there is a different pay-off. This example is a finite set, whereas more often than not in a real-world setting this is not the case or there is no knowledge about the size. Furthermore, in the table are several states where (a1, a2) ∈A and (θ1, θ2) ∈ Θ. Depending on the state of

nature and the action there is either a negative loss (e.g. -700) or a gain (e.g. 900). This table is also known as a loss matrix since it provides insight into the possible losses depending on the variables (J. O. Berger, 1985). Bear in mind that in real-world decision problems there may not be a clear formulated loss function and prior information as suggested in table 3.2. J. O. Berger (1985) states that in the majority the quantities in the real world range from very vague to non-unique.

3.2

Data-informed decision making

The objective in decision making is to find the best outcome, possibly minimising loss and/or maximising gain. Finding the best outcomes in a particular context entails two major problems. The first problem, which has to be taken into account, is the fact that sometimes the desired con-sequences are not easily measurable. Measurements of goodwill and reputation can be important desired effects; they are not easily measured on a point scale. J. O. Berger (1985) illustrates this problem by presenting a problem faced by a prestigious company. They are presented with the opportunity to market their products through a discount store. Short terms effects are increased sales; the long term effects can harm the companies’ prestigious status which may result in a decline of their steady customer base. The second problem that arises, even if there is an obvious and straightforward scale to measure the effects of an action is that the value is often not a true representation and has to be interpreted within a context. The true value of the decision can be misleading. J. O. Berger (1985) illustrates this problem by describing a situation where $ 100 can be gained by doing a rather unpleasant task. The value of $ 100 can be addressed objectively or marginally. Where the objective value of $ 100 can be discussed the marginal value in this context is of greater significance. Since the marginal value is often a curved line it will either increase or decrease as the point is moved along the x-axis. When there is a prior amount of money involved, say $10.000.000, the chances of executing the unpleasant task are a lot less likely than when there is no prior money involved. Both issues will be discussed in more detail later on. The potential loss is discussed in section 3.2.5, whereas the true value of the consequences of an action are addressed in the section 3.2.3 and section 3.2.4. Before addressing these topics, the basic principals of the probability theory will be covered in the next section.

3.2.1

Probability theory

The probability theory is fundamental to the Bayesian theorem. Several decision strategies are based upon the basis of probability theory. The probability theory describes the degree of belief about a certain event to occur. The probability is used to either straighten, weaken or maintain the degree of belief (Pearl, 2003, p. 32). Formally described, the probability is the degree of

(28)

belief of propositions concerning real world events. The (predicted) probability of an event say A to occur is denoted as

P (A) where 0 ≤ P (A) ≤ 1.

The probability of rain tomorrow will be noted as P (rain tomorrow) = 0.6 if there is a belief of 60% chance of rain. A probability of 0 or 1 is a certain proposition and guaranteed to be true. Even though these probabilities are denoted as such, they are always to be understood in a larger context K which implies the assumptions given as common knowledge. An example of such common knowledge can be the fairness whilst tossing a coin or rolling a dice.

Often occurrence of an event is linked to other events which can be dependent, conditionally independent or completely independent. The likelihood of multiple events to occur together, joint events are denoted as P (A, B) or P (A ∧ B). Joint events e.g. the chance of me finding money on the street while it is raining can be independent or dependent on each other. If, additionally, the events A and B are mutually exclusive (e.g. heads or tails while tossing a coin; either one is true and thus they are disjoint) P (A, B) = P (A) + P (B). Therefore

P (A) = P (A, B) + P (A, ¬B) (3.1)

because the events are exclusive. In addition to 0 ≤ P (A) ≤ 1 the consequence is that a proposition and antithesis must be a total belief,

P (A) + P (¬A) = 1. (3.2)

If B is not a singular event but a set of events say, Bi, i = {1, 2, . . . , n}, complying to the

mutually exclusive proposition,

P (A) =X

i

P (A, Bi) (3.3)

This is referred to as the law of total probability (Pearl, 2003, p. 33). The outcome of throwing two dices are examples of mutually exclusive probabilities. Therefore, the probability of two equal outcomes P (A, Bi), i = {1, 2, . . . , 6}. Summing these events shows the probability of the

joint events (e.g. both dices outcomes are equal) is P (A) =X i P (A, Bi) = X i P (A) × P (Bi) = 6 × 1 36 = 1 6

In contrast to events that are exclusive are conditional probabilities. Conditional probabilities are events that conditionally depend on each other. The probability of being fired is depended on of the probability of actually being employed at the place. Conditional probabilities denote the degree of belief in A when event B is given with absolute certainty and is denoted as P (A | B). When those events are independent P (A | B) = P (A) since P (B) does not attribute to P (A).

P (A | B) = P (A, B)

P (B) (3.4)

rewriting this formula gives

(29)

Knowledge of the conditional probabilities is beneficial for the understanding of assumption-based reasoning. The quantification of a probability and joint probabilities will be used later in the evaluation of different outcomes faced by a decision maker in section 3.3. The probability theory is fundamental of the Bayesian theorem and can be used for the construction of the prior probability distribution. A prior probability distribution is a set of probabilities for the possible outcomes. The probability distribution depends θ, therefore the probability of an event A can be formalized as Pθ(X ∈ A). When X is a discrete set then

Pθ(A) =

X

x∈A

f (x|θ).

The establishment of a prior probability distribution can be challenging and therefore sometimes unsuited (J. O. Berger, 1985). The construction of a prior probability distribution is beyond the scope of this thesis and has been well documented in existing research (e.g., J. O. Berger, 1985; Hand et al., 2001)

3.2.2

Subjective probability

Since the establishment of a prior probability distribution can be complex another approach is discussed here. As opposed to a mathematical approach for quantifying probabilities a subjective approach can be used where the probabilities are assessed based on a subjective belief. Depending on resources and the given situation a mathematical approach or a subjective approach is more suited. It is common to have a personal belief about the probability of certain events, e.g. it is likely to have some degree of belief about tomorrow’s weather based on personal experience (what was the weather like the last couple of days) and the cycle of the seasons (Kahneman & Tversky, 1972). Betting on sporting events is also another example which is based upon personal belief. The subjective probability can be used for quantifying personal belief. There are several ways of using subjective probability, which can be applied in statistical cases (J. O. Berger, 1985). The first method is quite straight forward, assume ones believe for an event is P (E) = 3/4, therefore the possibility of its complement is P (¬E) = 1/4. The assumption is that P (E) is three times more likely to occur than P (¬E). This notion comes quite intuitively, therefore, can be used relatively easily.

The alternative approach suggested by J. O. Berger (1985) is through formalising a betting scenario. In this scenario P (E), a negative prospect will occur when E occurs and z will be lost, consequentially ¬E will result in a payoff (1 − z) where z complies to 0 ≤ z ≤ 1. To optimise ones’ gamble the summation of the total utility must be equal to 0. Therefore, 0 = U (−z)P (E) + U (1 − z)P (¬E), rewriting this formula for P (E) yields;

P (E) = U (1 − z) U (1 − z) − (U − z)

This equation can be used for determining the best z value, such that the gamble becomes fair, that is the overall utility score is equal to zero. Even though a mechanism for quantifying uncertainty is provided, the performance as an operational device is limited.

(30)

Both methods can be used in quantifying uncertainty by formalising subjective probability. (J. O. Berger, 1985) argues there are a few practical issues which have to be addressed. Firstly, it is common for people without knowledge of statistics to judge the probability of joint events wrong for instance they duplicate conditionally independent events. Secondly, people experience difficulties when it comes to estimating really small probabilities (Tversky & Kahneman, 1974) such as the probability of winning a lottery. Small quantities are hard to grasp according to Tversky and Kahneman (1974) and are not judged adequately. Assigning a subjective proba-bility score to events might provide a workable and straightforward solution when there is no sound mathematical approach or when it is deemed too complex. Since subjective probability is far from uncommon, the formalisation can provide some valuable insight. Therefore it should be a consideration whilst evaluating different strategies regarding determining a probability.

3.2.3

Utility Theory

After the probability is quantified, it is important to have an understanding of how much dif-ferent outcomes are valued. More importantly how they are valued compared to each other. By assigning numbers indicating how much different outcomes are valued, the outcomes can be prioritised. With a mathematical representation of the value, it discloses the preference of one outcome over the other (Fishburn, 1970, p. 2). The utility theory is used for the development of these numbers or indicators. Utilities are used to describe the numbers indicating how much a prospect is valued (J. O. Berger, 1985). Firstly, it is necessary to define which prospects are con-sidered and are relevant. LetX denote the set of all the prospects. With these prospects, there is a certain degree of belief that a prospect will actually occur. P is the set of all probability distributions. The utility theory is a standard practice in economics and can be found in issues such as premium determination, risk aversion, Pareto optimal risk exchanges, and market equi-librium (Gerber & Pafum, 1998). Beside economics the utility theory can be used for estimating the values of different outcomes in a decision-making process.

U (r) is a function which satisfies r ⊂ R. For every P ∈ P there is an expected utility and is denoted as EP[U (r)]. The expected utility takes into account the prospect and the probability

of the prospects. The expected utility function is referred to as the utility function. The process of deriving and/or calculating a utility function is a complex process and reaches beyond the scope of this thesis. Therefore, we assume such function U (r) exists and can be computed. For a detailed explanation of a method to compose an expected utility function see J. O. Berger (1985, p. 47-55). For the expected probability theory, the following axioms should be met;

1. Completeness: ∀P1, P2∈P : P1 P2, or P2 P1

2. Transitivity: ∀P1, P2, P3∈P : P1≺ P2and P2≺ P3 then P1≺ P3

3. Continuity: ∀P1, P2, P3∈P and P1≺ P2≺ P3then ∃λ ∈ [0, 1] such that λP1+(1−λ)P3∼

P2

4. Independence: ∀P1, P2, P3 ∈ P, P1 ≺ P2, and α ∈ [0, 1] such that αP1+ (1 − α)P3 ≺

(31)

The axioms mentioned above correspond with rational behaviour. Axiom 1 completeness denotes that there should always be at least a preferential for one over the other or preference must be mutual for both. Axiom 2 transitivity denotes that if there is a preference for P1over P2 and for

P2 over P3 then P1 is also preferred over P3. Axiom 3 continuity states that if P1 is preferred

over P2 and P2 is preferred over P3, there is a point close to P1 and is still preferred over P2.

The last axiom independence denotes that if P1 is preferred over P2, the probability of P1 is

also preferred to the probability of P2, given that in both instances there is some P3. When

αP1+ (1 − α)P3 is compared with αP2+ (1 − α)P3, the emphasis should be on the distinction

between P1 and P2 and holds the same preference for both α and P3. In other words, when P3

is substituted for a part of P1 and P2, the preferential order should be unaffected (Levin, 2006).

The expected utility theory only holds when all axioms are met, in section 3.2.4 the validity of these are argued when users are presented real world dilemmas.

U (r)

r

Figure 3.2: Common utility function

A typical function U (r) is shown in figure 3.2. This figure can be used for explaining the example given in the introduction 3.2, where the difference between no prior money and $ 10.000.000 prior money is compared in light of an additional $ 100. The difference between point a and a + 100 decreases as a increases. When a = 0, then an additional $ 100 is appealing, however, when a = 1.000.000 the marginal value of an additional $ 100 is a lot smaller. Another example is the marginal value of apples, where the value of two apples is not necessary twice as rewarding as one apple (Tobler & Weber, 2013). If the marginal value is substantial, the willingness to perform certain actions is greater than when the marginal value is minimal. This is an important aspect to bear in mind when considering possible actions.

3.2.4

The Prospect Theory

Even though the utility theory has been wildly adopted (see Gerber and Pafum (1998) for different applications of the utility theory), the authors Kahneman and Tversky (1979) argue that the users of the utility theory fail to obey the required axioms consistently, often the preferences of the user do not comply with the axioms. Therefore the authors propose a different model the prospect theory. This theory is not entirely different new theory but merely an addition to the utility theory.

Referenties

GERELATEERDE DOCUMENTEN

Plausibly, the similarity of the domains thus moderates whether individuals compensate their initial immoral behavior or continue the immorality: escalating

Each step in the sequence represents further interpretations of the framework when communicating evidence- based research (content) to decision-makers (audience) with the

The residential function and safety in built-up areas ask for enlarged traffic calming areas with diameter of about 4 km, which are then divided by urban arterials with

by Popov. 5 To generalize Popov’s diffusion model for the evapora- tion process of ouzo drops with more than one component, we take account of Raoult’s law, which is necessary

The  panel  was  told  further  that  each  separate  campus  retained  primary  responsibility  for 

The final model explained 17% of the variance in child fear, and showed that children of parents who use parental encouragement and intrusive parenting show higher levels of fear

Thesis, MSc Urban and Regional Planning University of Amsterdam 11th of June, 2018 Daniel Petrovics daniel.petrovics1@gmail.com (11250844) Supervisor: Dr.. Mendel Giezen Second

All cases display a similar pattern relative to their respective pendulum frequency f pðγÞ (dashed lines): At small γ, f exceeds fp , but the two quickly converge as the offset