• No results found

Effectiveness of training in Bayesian inference on accuracy of posterior probability judgment

N/A
N/A
Protected

Academic year: 2021

Share "Effectiveness of training in Bayesian inference on accuracy of posterior probability judgment"

Copied!
87
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Judgment

Shurin Hase

B.A.,

University of Western Ontario, 1998 A Thesis Submitted in Partial Fulfillment of the

Requirements for the Degree of MASTER

OF

ARTS In the Department of Psychology

O Shurin Hase, 200 4 University of Victoria

All rights resewed. This thesis may not be reproduced in whole or in part, by photocopy or other means, without the permission of the author.

(2)

ABSTRACT

The effectiveness of training to improve participants' posterior probability judgment was investigated. Eighty undergraduate students were randomly assigned to a training or control group. Participants' deviation scores from normative solutions were used to evaluate the effect of training. The experiment demonstrated that training in Bayesian inference was an effective means of improving the accuracy of posterior probability judgments and conversely, of attenuating a judgment bias known as the inverse fallacy - namely, the tendency to conhse posterior probabilities with their inverse conditional probabilities. The effect of providing diagrams was also investigated at the test phase. Diagrams facilitated normative posterior probability judgment for both groups.

Participants also filled out the need for cognition

(NFC)

scale which measures people's general tendency toward reflective thinking. The effect of training was significant across individual differences on the

NFC

scale. Findings extended previous research by

demonstrating that evaluating the effect of training in terms of deviation scores was dependable.

(3)

Abstract Table of Contents List of Tables List of Figures Introduction Probability Background The Inverse Fallacy

Facilitating Normative Probability Judgment Need for Cognition

Summary

Methods Participants

Materials and Design Training Program

Quizzes for the Control Group The Need for Cognition Scale

Game Show Quiz as Dependent Variable Cab Problem and Disease Problem Procedure

(4)

The Effect of Training

The Analysis of Accuracy across Items

The Effect of Training on the Inverse Algorithm

The Effect of Training on the Cab and Disease Problem The Effect of Training across Individual Differences

Discussion References Appendixes

The Training Program Slides The Need for Cognition Scale

(5)

Table 1 Mystery Person's Answer and Percent of Group A and Group B on the

Described Characteristics 2 5

Table 2 Mean Absolute Deviation Scores and Standard Deviations for Training

by Diagram 3 1

Table 3 Mean Deviation Scores and Standard Deviations on the Accuracy across

items 34

Table 4 Mean Deviation and Standard Deviation with Magnitude by Direction 37 Table 5 Mean Absolute Deviation Scores and Standard Deviations for the Cab

Problem 4 1

Table 6 Mean Absolute Deviation Scores and Standard Deviations for the Disease Problem

(6)

#

Figure 1 Organizing the Statistical Information into a Frequency Tree Figure 2 Obtaining a Bayesian Answer

Figure 3 Question Format for Mystery Person # 1 with a Diagram Figure 4 Mean Absolute Deviation Scores for the Two Groups Figure 5 Mean Deviation Scores with Stimuli for Overestimation Figure 6 Mean Deviation Scores with Stimuli for Underestimation Figure 7 Mean Absolute Deviation Scores for the Cab Problem Figure 8 Mean Absolute Deviation Scores for the Disease Problem

(7)

Introduction

Empirical research (Eddy, 1982; Kahneman, ~ l o v i c - & Tversky, 1982; Kahneman & Tversky, 1996; Loftus, 1996; Wagenaar, 1988) in the field of judgment and decision making indicates that people do not follow normative probability theory when they engage in thinking about the likelihood of an event. Thus, it is believed that people's probability judgments are incoherent and unreliable in real life. This claim contains a significant implication, namely, that people" probability judgments are irrational in the sense of not acting optimally in pursuit of their goals. To explain nonnormative judgment, Kahneman, Slovic, and Tversky (1982) suggested that people often perceive external cues in terms of a psychological interpretation rather than a statistical interpretation. That is, the mind interprets the world using its past experience and available information to

understand the meaning of the event or situation. In addition, because of the lack of mental computational capacity, people often use simple general rule, called

heuristics, to guide their probability judgments.

A heuristic allows people to reduce a complex task into a simple and

efficient strategy. However, a heuristic may also lead to imprecise processing which may result in systematic biases. Kahneman and Tversky (1996) compared such judgment biases to cognitive illusions. Even though the cognitive illusion does not correspond with the way things really are, researchers can study biases to understand how people actually interpret probabilistic information to make probability judgments in everyday life. For instance, when asked to estimate the

(8)

probability that "John is an engineer r?ther than lawyer", people often rely on the similarity of John's personality profile to the stereotypes of engineer and lawyer, while ignoring the base rates that give statistically important information about the number of engineers and lawyers, namely the relative sizes of population

subgroups when judging the likelihood of contingent events involving the subgroup. This tendency to focus on similarity cues between categories was named the

representativeness heuristic (Kahneman, Slovic, & Tversky, 1982).

However, many issues concerning what variables actually determine people's nonnormative probability judgment have not been settled. For instance, subsequent research showed that the representativeness heuristic can be influenced by the nature of the problem, the sophistication of the subjects, the presence of clues or other demand characteristics, and that it is not clear whether or to what degree people rely on this heuristic (Goldman, 1986). Furthermore, Gigerenzer (1 987) suggested that given the limitation of human computational capability, people have a collection of specialized cognitive tools that natural selection has built into the human mind for adaptive probability judgment. Finally, unless there is consensus about the choice or application of the normative criteria for rationality, the interpretation of empirical data can be debatable (Cohen, 198 1).

Probability background

In spite of the complexity of probability judgment in real life, possible ways of measuring uncertainty have been contemplated by many great thinkers. In

(9)

the beginning, Aristotle defined 'probable' as that which usually occurs, to characterize plausible opinions or likely sense impressions. ~ k d i e v a l and

Renaissance philosophers applied the concept to beliefs and opinions by counting the number of authorities who support them. Since then, several concepts of probability have been introduced. Among diverse theories of probability, the ideas of how such probabilities are actually conceived and how they ought to be

conceived have been the topic of controversy. Yet another distinction refers to realist and idealist theories of probability (Cohen, 1989). When one treats

probability as computable in relation to events, objects, classes of events or objects, natural kinds or features in reality (e.g., relative frequencies of events), one is referring to realist theories. Whereas when one treats probability as computable in relation to arguments, propositions, beliefs or other elements of our thoughts about reality (e.g., probability as the strength of belief), one is referring to idealist

theories of probability. Such views necessarily conflict with one another. However, theories that are based on principles such as the 'principle of indifference' have included both realist and idealist views.

Regarding to the axiomatic development of the probability calculus, Kolmogorov gave axioms that could allow one to distinguish problems about probability function syntax from problems about propositional semantics. Furthermore, the axioms made it possible "to make precise the premises which would make it possible to regard any given real events as

(10)

relative frequencies in the context of random experiments with sets of outcomes as events. As a result, with respect to the principle of indifferenek when one defines the probability of an outcome as the ratio of the number of favorable cases to the total number of equally possible cases, a realist believes that the two events should be treated as equally probable because there is nothing to cause one type of

outcome rather than another. On the other hand, an idealist interprets that when one does not know of any reason to prefer one event to another prior to one's judgment (with respect to gaps in his or her knowledge), two events are equally possible because there is no reason to expect one type of outcome rather than another.

If conditional probability ofp(A(B), which refers to the probability of the truth of A given B is true, is understood as the relative frequency of As among Bs, then it's not just the observed frequencies to the total number of equally possible cases as the evidence for the assessment of probabilities, but also the probability is nothing more than the overall relative frequency of As among Bs (i.e., p(A1B) = p(BIA)p(A)lp(B)). With a finite number of events, the mathematics of probability

can be reduced to the arithmetic of rational numbers. However, when p(A1B) is to be thought of as a propensity, one measures the probability as the degree of causal relation or physical relation such that the .5 probability that a coin will fall tails is a physical propensity that is operative when tossing a fair coin. Generally, the issues of probability reasoning are more concerned with statistical tendencies rather than fundamental physical properties or causal relation (empirical science). Thus, the focus on the propensity to define probability makes the additivity principle

(11)

ambiguous. The additivity principle d5fines the sum of probability of two mutually exhaustive and exclusive complementary hypotheses to equal ;nity, p(A1B)

+

p(not-AIB) =l. Thus, if one thinks of probability in terms of propensity, then one

can either decide to compare the extent of B's propensity for A with the extent to which B has no propensity for A, or one can compare the extent of B's propensity for A with the extent to which B has a propensity for not-A. The former requires additive judgments for negation, the latter needs to avoid additive judgments. For instance, drinking a cup of tea a day may have a small probability to make a person well, but it does not have a complementary probability to make a person unwell (Cohen, 1989).

When one defines judgment of probability as his or her degree of rational belief, subjective aspects of probability judgments can complicate the estimation of probabilities. However, by treatingp(A1B) as measuring the ratio of the lowest odds acceptable on A&B to the lowest odds acceptable on B alone, one can treat this ratio as evaluating the strength of a person's belief in the truth of A given just B as evidence. Thus, the ratio would constitute a measure of conditional belief. By construing the question in terms of how to reason about probabilities rather than what are one's probabilities, thenp(A1B) is the probability assigned to A where B states the only relevant additional fact to become known.

Suppose the question raised refers to the probability of the hypothesis H (e.g., a person has cancer). One can begin by assigning a prior value to the

(12)

probability for data, D (e.g., a mammggraphy test) given the truth of H. In order to find the posterior probability, p(HID), one could not simply =;eke the conditional probability, p(DIH), because the probability of H given D, is not necessarily identical with the probability of D given H. It is however possible to exploit the mathematical law that governs the inversion of a probability that connects the two conditional probabilities, p(DJH) and p(HID), to derive the posterior probability as Thomas Bayes did as follows : p(H1D)

=p(DIH)p(H)l[p(DIH)p(H)+p(D(-H)p(-H)].

According to Bayes' theorem, H and -H refer to two alternative hypotheses (e.g., "having breast cancer" and "not having breast cancery'). The probability of the focal hypothesis H, p(H), and the probability of the alternative hypothesis -H, p(-H), are called prior probabilities (or base rates when they are based on estimates of long- range relative frequencies). Let D represent a new bit of data (e.g., a test result). Thus, p(D1H) and p(D1-H) refer to the conditional probabilities of obtaining data, D, given that the focal hypothesis is true or false respectively. So, one who obtains new data should update his or her degree of belief in the hypothesis H by

conditioning on data to find the posterior probability, p(H1D).

Furthermore, because p(A1B) = p(BIA)p(A)lp(B) where p(B)>Oy if one knows the values ofp(H) andp(D), the simpler formula can be derived as follows: p(H1D) =p(D(H)p(H)lp(D). The formula states that the posterior probability, p(HID), is proportional to the product of the prior probability, p(H), of H and the

diagnostic probability, p(DIH), divided by the probability of data, p(D). Thus, optimal revision of initial beliefs, Bayes' theorem has been considered as the

(13)

normative model on how one should ypdate his or her degree of belief based on the -

-

evidence in well-structured probability assessment problems:-

The inverse fallacy

Past psychological studies showed that people's probability judgment deviates from Bayes' theorem. Such instances can be found in the case of evaluating forensic evidence in criminal trials, medical diagnosis and hypothesis testing. When forensic evidence was presented in terms of a mathematical probability, the jury often confused the probability of DNA matching the defendant's if the defendant is innocent, p(DNA match1 innocent), with the

posterior probability that defendant is innocent if there is DNA match, p(imocent1 DNA match) (Wagenaar, 1988). The matching probability is usually very small (e.g., 1/1,000,000,000) but the posterior probability could be any value. In the case of medial diagnoses, medical experts often equate the posterior probability

(predictive accuracy) and its inverse probability (retrospective accuracy) (Eddy, 1982). Eddy found that when physicians were asked "what is the probability that a particular woman's breast lesion is malignant given a positive mammography test result?", many (95 out of 100 in an informal sample taken by Eddy) estimated the posterior probability based on the accuracy of mammography. So, when the accuracy of mammography was around 80%' they assess the probability of cancer to be about 75% rather than the 7% predicted by Bayes' theorem. Even research scientists often do not discriminate the posterior probability fiom its inverse

(14)

probability (Krueger, 2000; Loftus, 1996; Meehle, 1978). With respect to the null hypothesis testing procedure, psychologists analyze the data to obtain how likely it is that obtained data result given it had occurred by chance factor alone, p(data1 Ho), but instead often believe that the obtained probability is the probability that the null hypothesis is true given the obtained data, p(Ho1 data). In sum, these findings have important implications: a) confusing two conditional probabilities, the jury could make a wrong verdict if the forensic evidence was the main evidence available b) physicians could misdiagnose a disease c) scientific progress is hindered when researchers misinterpret their findings.

In order to understand the psychological cause of biased probability judgment, Kahneman, Slovic, and Tversky (1982) suggested that the Bayesian way

of combining available evidence with prior probability (or base rate) is not intuitive for many people. They suggested that the reason for people's error in estimating the posterior probability was due to relying on heuristics that are easily accessible (e.g., similarity cue or causal relation) rather than using the analytical assessment of all the data such as the base rate. To support this hypothesis, Kahneman, Slovic, and Tversky (1982) gave the cab problem to participants:

A cab was involved in a hit and run accident at night. Two cab companies, the Green and the Blue, operate in the city. A) 85% of the cabs in the city are Green and 15% are Blue. B) A witness identified the cab as Blue. The court tested the reliability of the witness under the same circumstances that existed on the night of

(15)

the accident and concluded thyt the witness correctly identified each one of the two colors 80% of the time and failed 20%-of the time (Kahneman, Slovic, & Tversky, 1982)

They found that the participants gave the witness' reliability of 80 % when answering the question as to whether it was Blue cab rather than Green cab that was involved in the accident. The base rate of cabs in the city (85% for Green cabs and 15% for Blue cabs) was ignored by most subjects. However, when the base rate was described as the proportion of cabs involved in accidents in the past, they adjusted their posterior probabilities to be around .6 which indicated the use of the base rate by those participants. This suggested that when the base rate is causally related to the problem, people often use the base rate. Whereas when the base rate is not apparently related to the problem within a particular causal order, people often ignore the base rate (a.k.a., the base rate fallacy). This result was consistent with the notion that psychological interpretation of different kinds of base rates in probability assessment problems can explain why people treat the base rate as important or irrelevant, because according to their analysis, the posterior probability that 'the cab involved in the accident was Green' is the same regardless of the base rate of the cabs in the city (incidental base rate) or the base rate of accidents (causal base rate). It follows that how much people's posterior probability judgments deviate

(16)

from normative solutions can be described as a function of the

- .

psychological interpretation of cues such as causality or similarity cues. Much research has been done on the use of heuristics by asking participants to provide various probability judgments. However, recent research indicates multiple interpretations of such empirical data. For example, Koehler (1 995) suggested that the base rates of the real world are often undetermined and constantly changing, and that people's use of the base rate should depend on the problem structure in the ecological environment and internal task representation. Thus, unless researchers know more about the accuracy of the model applied on a given problem and people's interpretation of provided information (prior

probability, base rates), it is difficult to interpret the empirical data in a

straightforward manner. Baratgin (2000), on the other hand, suggested that peoples' deviation from the normative solution might be due to the theoretical constraints on probability theory rather than use of a particular heuristic. For example, Baratgin found participants often ignore the additivity rule in cases of binary complementary judgments (e.g., numeric probability estimates for 2 mutually exclusive and

exhaustive complementary hypotheses sum to unity). However, when probabilistic cues were included in the question to promote probabilistic interpretation (e.g., make sure two estimates add up to 100%) participants gave more normative probability judgments.

Several factors could explain the neglect of the complementarity of events: a) participants may consider hypotheses individually instead of as a

(17)

complementary set b) participants may focus on only one, thus forgetting the other hypothesis in their judgment c) participants may misinterpretthe' diagnostic

probabilities (e.g., p(DIH), orp(D1-H)) for the posterior probabilities (e.g., p(HID), and p(-HID) respectively). Consequently, more research is needed to investigate how people derive their posterior probabilities and to understand why people deviate from Bayes' theorem.

Distinguishing judgment output from the judgment process using multiple trials, Villejoubert and Mandel(2002) found that people's judgments are not complementary, because people confuse posterior probabilities with their inverse conditional probabilities. In other words, people use the inverse algorithm that relies solely on the diagnostic probability to assess posterior probability. They asked participants to predict whether a creature was a "Glom" or a "Fizo", two kinds of invisible creatures that participants meet. Each time the participant meets a creature, she or he can ask one question about the creature's features (e.g., do you play the harmonica?). The participants were also given the following information: a true response by the creature (e.g., yes or no) and the prevalence of a certain feature (e.g., 98% of Glom and 58% of Fizo play the harmonica).

About a half of the participants estimated the posterior probability, p(HID), as the diagnostic probability, p(DIH), over 80% of the judgment trials, and those judgments were predicted in the direction of overestimation or underestimation. Both estimations contained predicted estimates in the magnitude of large, medium, and small. Because participants who commit the inverse fallacy exclusively rely on

(18)

the diagnostic probabilities rather thaq finding the posterior probabilities in the Bayesian manner, it was possible to predict the direction and magnitude of probability estimates. That is, when the diagnostic probability is more than its posterior probability, p(DIH)>p(HID), participants overestimated the posterior probability and when the diagnostic probability is less than its posterior probability, p(DIH)<p(HID), participants underestimated the posterior probability.

Consequently, when the sum of two diagnostic probabilities for Glom and Fizo was more than unity, p(D1H) +p(DI-H)>100, the sum of two complementary posterior probabilities was subadditive (two mutually exclusive and exhaustive hypotheses add up more than unity). The term 'subadditive' refers to the complimentary probabilities where the implicit disjunctions of events were given less weight than their explicitly decomposed equivalents (i.e., p(H) <p(HID) +p(-HID)). Likewise, when the sum of two diagnostic probabilities for Glom and Fizo was less than unity, p(D1H)

+

p(D1-H) < 100, the sum of two complementary posterior probabilities

were superadditive. Likewise, the term 'superadditive' refers to the complimentary probabilities where the implicit disjunctions of events were given more weight than their explicitly decomposed equivalents (i.e., p(H) >p(HID) +p(-HID)).

Villejoubert and Mandel (2002) argued that when participants were asked to provide posterior probabilities, the base rates were not only neglected, but also complementary posterior probability judgment did not add up to unity, precisely because the majority of participants used the inverse algorithm which by definition one only considers the diagnostic probability to derive its posterior probability.

(19)

Accordingly, if people are using an inpppropriate algorithm to judge posterior probabilities, then by learning one that is appropriate, people-would improve their posterior probability judgments. The present experiment was carried out to test this hypothesis. There are several studies that point in the same direction. For instance, Mill, Gray, and Mandel (1994) investigated whether research methods and statistics courses would improve students' general reasoning abilities. They found that although research methods and statistics courses by themselves did not have impact on the students' reasoning abilities, a series of a brief training sessions which assisted the students to see the relationship between the statistical concepts and everyday life (e.g., the use of a control group in everyday-type situations to evaluate a particular argument or claim) significantly improved the students' general reasoning skills.

Similarly, Lehman, Lempert, and Nisbett (1988) demonstrated that graduate training can influence students' statistical reasoning, because such training promotes the importance of certain statistical concepts (e.g., use of a control group in an experiment) in addition to acquiring specific procedures and tools to amve at correct answers. Furthermore, Fong, Krantz and Nisbett (1986) also indicated that people can learn to use the law of large numbers in their everyday reasoning after a series of tutorials using concrete examples. They argued that people already

possess statistical heuristics and statistical rules such as preference for more rather than less evidence, and application of base rate in certain situations. They found that when participants were taught the law of large numbers at the beginning of the

(20)

semester, and were asked to evaluate ~tatistical arguments at the end of the

semester, those who were taught the law of large numbers with concrete examples were much more statistically sophisticated in answering probability questions compared to those in the control, the rule only, or the example only condition. Most importantly, they claimed that statistical training actually promoted the adoption of a distributional view, that is, to think of events and situations statistically rather than based on intuition or anecdotal evidence.

Although, the inverse fallacy is well known, no research to date has examined the effects of training to correct this bias. Lack of such data represents a gap in the debiasing literature especially in the context of psychological aspect of probability judgment. In the next section I review some proposals from the field of cognitive psychology regarding how to facilitate people's normative judgment. Consequently, those theoretical ideas were incorporated into the training program.

Facilitating normative probnhility judgr~lel~t

Sedlmeier and Gigerenzer (2001) argued that when people judge probabilities using natural frequencies (e.g., 80 women out of 100 will get a positive test), their judgments are more normative compared to when people judge probabilities using single probabilities (e.g., 80% will get a positive test). The implication is that changing the way the information is presented can affect the amount of computation for normative posterior probabilities. Consequently, participants were given training in one of the three conditions: frequency tree,

(21)

probability tree, or rule learning. At

tile

testing session, participants were given the cab problem. They found that the immediate training effect was strong in all three groups. However, participants in the frequency tree group showed no decay after three months compared to the other two groups.

Macchi (2000) suggested that when probability information clearly indicates the relationships between the base rate, p(H), and the diagnostic probabilities, p(D1H) and p(-D/H), this clarification facilitates people's posterior probability judgments. Thus, when participants read the cab problem, "The witness made correct identifications in 80% of the cases and erred in 20 % of the cases", they inadvertently read those probabilities as posterior probabilities rather than as diagnostic probabilities. Instcad, they should have been given the sentence "The witness recognized as Blue 80% of the Blue cabs and mistook 20% of the Green cabs for Blue cabs". Macchi (2000) argued that the aspects of clear communication might be the most important aspect of normative probability judgment.

Consequently, she found that 70% of the subjects gave Bayesian answers when the problem was clearly written, \\shereas only about 30% of the subjects gave

Bayesian answers in the ambiguous version.

Focusing on visual presentation of information, Yamagishi (2003) suggested that diagrammatic presentation of statistical information could change the mental representation into one that is more effective for processing probabilistic information. The emphasis is on the visual display of statistical information,

(22)

data for participants to arrive at the normative solutions. Yarnagishi (2003) tested several difficult probability problems (e.g., prisoner's dilemma) ;sing the roulette diagram, the tree diagram and frequency format conditions, and found that the roulette and the tree diagram conditions outperformed the frequency condition. He argued that visual display of statistical inforn~ation could be the most suitable strategy to improve probability judgments.

Need for Cognition

Baron (1988) distinguished the capacity and dispositional aspect of intelligence with respect to the normative judgment and the conception of

rationality. IQ or computational capacity might be important aspects of rationality that could explain people's nonilonnative probability judgments. Baron, on the other hand, emphasized that individual differences and psychological variables such as individual's cognitive style, thinking dispositions, attitudes, goals, or motivation are just as important in explaining rationality. While Baron surveyed broad aspects of rationality, Stanovich and West (2000) tested individuals

dispositional differences to see whether those differences could explain probability judgment biases, and nonnormative probability judgments.

One of the dispositional variables was measured by the need for

cognition (NFC) scale (Cacioppo, Pctty, and Kao, 1984). The Need for Cognition (NFC) scale has been considered to reflect the tendency toward reflective thinking which includes having a tendency toward thoughtful analysis, the desire for

(23)

understanding, and the tendency toward greater information search in one's generalized trait. Those individuals who score high on the NFC scale tend to seek, acquire, think about and reflect back on information to make sense of stimuli, other people and events in the world; those individuals who score low on the NFC scale tend to rely on others, cognitive heuristics, or social comparison processes to provide this structure. Although the scale is known to be influenced by one's experience, knowledge, and attitudes, Stanovich and West found that participants who scored high on the NFC scale responded in a statistical manner significantly more often than those who scored low on the NFC scale. The present study investigated whether the effect of training was significant across individual differences on the NFC scale.

Summary

One of the research goals was to investigate whether training in Bayesian inference would effectively improve participants' posterior probability judgments. Simultaneously, the effect of diagrams on participants' posterior probability

judgments was examined. Participants' absolute deviation scores were computed to test the hypothesis that those in the training group would have significantly smaller absolute deviation scores compared to those in the control group. It was also hypothesized that the use of diagrams would facilitate participants' normative posterior probability judgments in both groups. No interaction was expected.

(24)

Furthermore, the effect of training wag expected to be consistent across the test questions.

The second point was to investigate whether the effect of training can be analyzed in terms of tlie inverse algorithni. Deviation scores were computed for each participant. By manipulating tlie diagnostic probabilities with respect to direction (i.e., overestimation when p(D1H) > p(H1D) and underestimation when p(D1H) <p(HID)) and magnitude (i.e., large, medium, and small) from Bayesian solution, one could see the extent to wliicli the effect of training on the inverse algorithm between the training and control groups. This analysis was done as a part of evaluating the effect of training on specific aspect of the bias, and no prediction was expected.

The cab problem and the disease problem were also tested. The cab problem was described with an emphasis on the clear distinction between the base rate and the diagnostic probabilities as in Macchi's (2000) version. Absolute deviation scores were compared to see the effect of training and diagrams. It was hypothesized that the effect of training would be also significant for posterior probability questions that contain incidental base rates. An incidental base rate refers to the lack of any causal factor that explains why any particular instance is more likely to yield on outcome rather than another, while a causal base rate refers to such inference. Thus, the cab problem has been the topic of controversy with respect to whether Bayes' theorenl is the correct model for describing what people actually use to derive posterior probabilities. However, assuming the effectiveness

(25)

of the training program, it was hypothesized that the training group would give

- .

more normative posterior probabilities compared to the contra1 group. No effect of providing diagrams or interaction was expected. The disease problem was also tested to investigate the effect of training on extremely low base rate probabilities in addition to its incidental base rates. It has been known that people are insensitive or less intuitive when dealing with extremely low probabilities. It was hypothesized that the difference would be more apparent with this question because the question provides the diagnostic probability as 80% while its posterior probability is 0.3% due to the extremely low base rate.

Finally, the relationship between the need for cognition (NFC) scale (measurement for individual's tendency towards thoughtful analysis) and

participants' normative posterior probabilities were investigated to see whether the effect of training was significant across individual differences. High score on the NFC scale was positively related to the normative probability judgments

(Stanovich & West, 2000). In this study, it was hypothesized that the effect of training would be significant across the individual differences on the NFC scale evidenced by no association between the NFC scale and participants' posterior probability judgments. There were two independent variables: training (training vs. control), and diagram (with diagram vs. without diagram at the test phase). The dependent variables consisted of the answers to the 12 sets of posterior probability questions, the cab problem, the disease problem and participants' NFC scale scores.

(26)

,Method

- . Participants

Eighty undergraduate students in the Psychology 100 introductory course at the University of Victoria participted i n the experiment. Each student signed up for the experiment to get extra credits for the course. Students had the chance to win a $50 prize for the highest score on the posterior probability questions. They were also told that participation was voluntary and that the experiment was not a test that has bearing on their grade.

Materials and Design

Tutorialprogranz. The tutorial program contained four medical examples showing how to organize and integrate statistical information to obtain normative posterior probabilities. At the beginning, participants were welcomed to the training program on the computer and instructed to proceed to the first example on how to obtain a correct probability. Thc fist step started with how to integrate probability infom~ation into a frequency tree (see Figure 1). After the participants proceeded step by step to complete the freq~iency tree, they were shown how to compute posterior probabilities using a ratio (see Figure 2). Care was taken to ensure the participants paid attention to the content of the program. That is, before they were shown the construction of the frequency tree, they were asked to write down their own solution. At the end of each example, they were again instructed to comment on their own solution by comparing it to the correct answer.

(27)

$

Now, we will work through one approach that presents the solution as a graphically illustrated step-by-step process. First, let's start with a relatively large sample of the general population-say, 1,000 people. Then let's divide this sample into the percentage that has Hep-C and the remaining percentage that does not have Hep-C. Since 10O/0 of population have Hep-C, we expect that 100 of the 1,000 people will have Hep-C and the remaining 900 will not.

Figure I. Organizing the Statistical Information into a Frequency Tree.

A frequency tree diagram divides the observed events into four subclasses. The population size is shown at the top of the tree above. The population size is broken down into the two middle nodes indicating the base rate frequencies from the statistical information available. One node indicates the number of cases for \vliicli hypothesis is true and the other the number of cases for u~hicli the hypothesis is no[ true.

(28)

We're almost there! To figure out the probability of John having Hep-C given that he tested positive, you will want to know how many out of the 450 people who test positive also have Hep-C. That ratio will be the answer to John's question. Since we know that 90 of the 450 people who test positive also have Hep-C, we simply divide 90 over 450 to get the answer. Since 901450 = .20, there is exactly a 20% chance that John has Hep-C given his positive test result.

w ~ t h Hep-C w~thout Hep-C

Test Posttive Test Negat~ve

Figure 2. Obtaining a normative Answcr.

The four nodes at the bottom of the diagram above divide the base rate frequencies based on the evidence obtained (e.g. a medical test result). The posterior probability for having cancer in the light ot'a positive r e s ~ ~ l t can be calc~dated by dividing the number of positive test results given the hypothesis is true (e.g. cancer) by the sum of the number of positive results given that the hypothesis is true and given that it is not true. Bayes' theorem is often quoted in a form attuned to cases in which there are clear probabilitiesp(H),p(-H), etc., for mutually inconipalihle, collectively exhaustive hypotheses H, -H, and clear conditional probabilitiesp(DI1 I), p(l);-H) for data

D

on each of them.

(29)

Quizzes for the col~trol group. The quiz booklet for the control group contained 14

. .

sets of problems that were not related to probability judgment. They were taken from Fletcher's (1 971) book. For example, the students were presented with a brief instruction to demonstrate their Imowledge of English vocabularies as follows:

On each line belo\\!, u~lilcrline the two woscls which mean most nearly the same. Example: person, man, lad,

&

Absurd, logical, preposterous, popular Receive, deceive, accept, disown

Negligent, unimportant, careless, cautious

Comparable, intricate, comprehensible, understandable Conquer, acl~icvc, iind, accomplish

Soft, fragile, sevcrcd, brittlc Serene, seething, mobility, tranquil Subservient, menial, manly, morbid Stupid, idle, acti\,ity, inacti\,e

10. Transient, imniutable, transport, momentary

The

N e d for C o g l ~ i t m ~ scale. A short form of the Need

for

Cognition (NFC) scale u r n s r ~ w l to nieasurc the tendency toward reflective thinking

(Cacioppo, Petty, & Kao, 198-1). I'articipants were instructed to indicate their responses for the 1 S st:ltements (e.g., I really elljoy a task that involves coming up with new solutions to problems) by circling a number on a 9-point rating scale (-4 =

very strongly

c1i.wg1-ee, 0 = rleitl~cr o p e /lor

disagree, 4

=

very strongly agree).

NFC scale items cnn be found in the appendix.

(30)

Reaclil~g con~lx-eliel~siolz. The rcading comprehension test was used as - *

the filler task before the test phase. The task consisted of a set of readings that are typical of what undergraduates read in university courses. Each passage is followed by a set of mu1 t iple choices. Participants worked on the reading comprehension test

for 10 minutes.

Game show quiz as clepe~idel~t varinble. At the test phase, participants were asked to complete the posterior probability questionnaire. The questionnaire contained 12 sets of posterior probability questions, the cab problem, and the disease problem.

The diagnostic probabilities wel-e controlled in such a way that a Mystery person's response (i.e., yes or no) corresponded to the same set of the diagnostic values (see Table 1). The order of stimulus presentation was randomized. The order of questions was the same for all participants.

Participants were asked to imagine that they were contestants in a game show. They were then introduced to twelve 'hl ystery persons', one person at a time by the game slio\v host. The pasticipants were told that their task was to classify each mystery person into one of two categories, Group A or Group B (Group A has

100 people and Group B has 100 people for all the questions). To start, the host would ask each mystery person one question about a particular haracteristic that he or she might possess ( e . ~ . , 'Do you smoke cigal-cttes?'). Each mystery person answers honestly either 'yes' or 'no'. Then, the host provides the

(31)

Table 1

#

Mystery Person's Answer and Percent of Group A and ~ r o u ~ B n the Described Characteristics

Answers Group A Group B Characteristics

play the harmonica been a firefighter dance

sing

have a flying license

no 40 80 afraid of height smoke drink beer have child eat meat eat seafood skateboard

Note: Participants were provided with statistical infomiation that are complementary to the

numbers shown above for "no" responses. For example, the participants would read 42% and

2% for Group A and Group B respectively for 'been a fire fighter' rather than 58% for Group A

(32)

percentage in Group A and in Group $ who possess that characteristic (e.g., 2% of members of Group A smoke and 42 % of members of Groups B smoke cigarettes). Finally, the participants would answer three questions for each of the 12 mystery persons: (1) what is the probability that this person belongs to Group A (2) what is the probability that this person belongs to Group B (3) pick either Group A or B (see Figure 3).

Cab problem and disease problelil as dependent variable. Participants were asked to assess the probability that the cab that was involved in the accident was Blue rather than Green. Only two cab companies operate in the city, the Green cab company and the Blue cab company. Following statistical information was given:

85% of the cabs in the city are Green cabs, 15% are Blue cabs.

A witness identified the cab as a Blue cab. The court tested the witness's ability to identiry cabs under the same conditions that existed the night of the accident.

When presented with a sample of cabs, (half of which were Blue cabs, the rest Green cabs) the witness made correct identifications in 80% of the cases and errcd in 20% of the cases. In other words, the witness correctly identified a Crcen cab as Green SOO0 of the time and wrongly identified a Green cab as Blue 20% of the time. Similarly, the witness correctly identified a Blue cab as Blue 80% of the time and wrongly identified a Blue cab as Green 20% of the time.

(33)

Host: MYSTERY PERSON #I, I will ;ow ask you a question and you will respond honestly. The question is: Do you smoke cigarettes?

Mystery Person #1: Yes, I do smoke cigarettes.

Host: Thank you! Now Contestant, to make your task easier, I will also let you know that 2% of members of Group A smoke cigarettes and 42% of members of Group B smoke cigarettes, as summarized in this figure:

People

I I

100 100

Group A Group B

2 smoke 98 don't 42 smoke 58 don't

smoke smoke

Host: Now Contestant, on the basis of what you know from the Mystery Person's response and the information I gave you, please answer the following three questions:

1. What is your estimate of the probability that Mystery Person #1 belongs to Group A? Please indicate your response by writing down a value from 0 to 100 where 0 stands for "absolutely no chance at all" and 100 stands for "absolutely certain".

The probability that this individual belongs to Group A is

2. What is your estimate of the probability that Mystery Person #1 belongs to Group B? Please indicate your response by writing down a value from 0 to 100 where 0 stands for "absolutely no chance at all" and 100 stands for "absolutely certain".

The probability that this individual belongs to Group B is

3. Finally, if you had to pick Group A or Group B as the group that Mystery Person #1 belongs to, which would it be? Circle one:

[GROUP A] [GROUP B]

Host: Thank you for your responses. Now please turn the page to meet Mystery Person #2.

(34)

After tlic cab problem, particiqants proceeded to the disease problem. They were asked to assess the probability that a patient has a disease given that he or she obtained a negative test result. The following statistical information was given:

Imagine you are a psychotherapist trying to diagnose which type of disorder one of your clients is likely to have. You think that your client might have a disorder called psycho-meiotic personality disorder (PMPD). Only 1% of the pop~~lation has PMPD. Your client completes a diagnostic test. Past research has shown that 80% of people with PMPD who take the test will test positive on it and 80% of people who don't have PMPD will test negative on it. Your client gets a negative result on the diagnostic test. What is your estimate of the probability that your client has PMPD?

Procedure

An experimenter welcomed each participant and briefly introduced the experiment as a probability judgment study. Then, participants in the training group were introduced to the computerized training program and instructed to go through the program carefully at their own pace. Thcy were free to ask any technical questions, but \ \ w e encouraged to go back to the previous screens if they did not understand the content of the training progrnm. Also, par-ticipants were instructed to write down their own solution for each example before proceeding with the

(35)

their solution compared to the correct,solution. In the place of the training program, participants in the control group were given a booklet containing 8 sets of quizzes that were not related to probability judgment problems. Participants were asked to work on those q t ~ i n e s for 15 mini~tes.

After the training phase, all participants were asked to fill out the need for cognition survey (NFC) and worked on a 10-minute reading comprehension test. At the test phase, half of the training and control group received the posterior

probability questionnaire with diagrams and another half without diagrams. Participants were debriefed by the experimenter at the end of the experiment.

Results

Effectiveness of tl-trinilig 011 ~lolwlntive posterior probability judgments.

It was hypothesized that training in Bayesian inference and the use of diagrams would effectivcl y improve participants' posterior probability judgments. Ten participants were replaced for various reasons. For example, first 6 participants were replaced b e c a ~ ~ s e they were not providcd with a calculator, 1 participant was replaced because she provided the same answer for all the problems, 3 were

replaced because the experiment \sras inter]-uptcd by fire bells. Participants' answers for the game show quiz were transfornied into absolute deviation scores by

subtracting thc corresponding Bayesian solution. Tlills, a higher score indicated a larger absolute deviation from norrn:ltive responding. To examine the effect of training and di:lcr:lms 011 the absolute dc\.iation scorcs. a 2 (training vs. control) x 2

(36)

(with diagram vs. without diagram) a~plysis of variance (ANOVA) was calculated.

- .

The alpha level was sct at .05 (two tailed) for all analyses. The normality and homogeneity of vasi ance assumptions \\we both supported. Group means and standard deviations are sho\vn in Table 2 :~nd in Figure 3 .

As expected, a two-way

A N O V A

found a main effect of training, F ( l , 76) = 24.43,p< ,001,

I]'

= .243. The training group had significantly smaller absolute deviation scores, A4 =10.52, SD -1 0.18, than the control group,

M

=l9.99, SD =7.26. The effect size was considered to be very large. There was also a main effect of providing diagrams, F ( l , 76) = 7.10, p< .009, r12 =.085. Participants'

posterior probabilities were significantly more normative with diagrams, M =l2.70, SD = 9.39, than \vitliout diagrams, Ad =17.S 1, SD = 10.04, indicating that diagrams facilitated participants' nonnative posterior probability judgments. The effect size was medium to large. There was no signillcant interaction, F(1,76) < 1,p = .81, ns. The effects of tririniny a n d providing diagrams were consistent across the groups.

Anulj~.ris of i 1 c n 1 1 - c t q 1 rrcross i r c > ~ l r s . To exnmine the extent to which

participants' response to a n y one item on the posterior probability questions is a good indicator of his or her performance on other questions, reliability across questions was tcsrcd. The posterior probability questio~is were made into a set of four blocks. Reli:~l>ilit!r analysis showed satisfactory rcsult, Crotzback's a = .90. Accuracy across t lie p m e show quiz \\.as also analyzed. The hypothesis, that there

(37)

Table 2 t

Mean Absolute Deviation Scores and Stanilal-d Deviations ~ r & n & by Diagram

Condition Training Control Mean

With Diagram

M

7.75 17.66 12.70 SD 9.35 6.17 9.39 Without Diagram M 13.30 22.3 1 17.8 1 SD 10.45 7.41 10.04 Mean

M

10.52 19.99 SD 10.18 7.26

Note: The mean absolute deviation score \\,as obtained first by subtracting the normative answer from each score. Then the :ilxolute value of these deviation score was obtained. Finally, the mean of the absoluts deviation was obtained. The higher the score is, the greater the deviation.

(38)

6 1 - tralnmg group I control group

TYPES

OF

GROUP

(39)

would be a consistent effect of training was supported. A 2 (training vs. control) x 2 (with diagram vs. without diagram) x 4 (QI-2-3, 44-56,4728-9, Q10-11-12) mixed analysis of variance (ANOVA) with four question blocks as the within subject factor was calculated. The sphericity assumption was not met, so the Huynh-Feldt correction was applied. As predicted, an ANOVA revealed significant main effects for training F(1, 76) = 24.43, p<.001, r12 = .243, and providing

diagrams F ( l , 76) =7.09,p =.009,

$=

.085. There was also a statistically

significant main effect for across blocks, F(2.72,206.44) =7.69, p <.001, r12 =.092.

Within subject contrasts indicated that accuracy was reduced during the 4'h block (last block) for both the training and the control groups. There was no significant difference among first three blocks. There was no significant interaction, F <1, indicating that the effects of training and diagram was present throughout the test phase. The means and standard deviations for accuracy across the sets are shown in Table 3.

Effects of training 011 the inversefillacj~. Previous research indicated that the

inverse algorithm can be demonstrated by syste~natically biased posterior probability estimates as a function of the difference between the posterior probability and its inverse probability, and the patterns of complementary probabilities as a function of the sum of inverse probabilities. Thus, Villejoubert and Mandel (2000) were able to predict the direction and magnitude of deviations

(40)

Table 3 #

- .

Mean Absolute Dcviation Scores and Standard Deviations om the Accuracy across Items Block 1 2 3 4 Training Group 9.40 8.80 10.98 10.62 Control Group M 19.05 19.1 1 19.08 22.72 SD 8.25 7.42 10.1 1 9.82

(41)

from Bayesian responding. Because those who use the inverse algorithm only rely -

-

on the diagnostic probability, p(DIH), the stimuli were manipulated in such a way that if participants used the inverse algorithm, their deviation scores would be positive (or overestimation) whenp(HID)<p(DIH) and negative (or

underestimation) whenp(HID)>p(DIH). Furthermore, participants7 deviation scores for overestimation can be predicted in the magnitude of large, medium, and small whenp(D1 IH1) - p(H1 ID 1) >p(D21H2) - p(H21D2) >p(D3IH3) - p(H3ID3)

respectively, and for underestimation in the magnitude of large, medium, and small when p(D1 IH1) - p(H1 ID11 Cp(D21H2) - p(H2ID2) <p(D21H2) - p(H21D2)

respectively. Therefore, a significant interaction between direction and magnitude was demonstrated. Consequently, those who use Bayesian inference would have their deviation scores close to zero whereas those who use the inverse algorithm would have their deviation scores systematically deviate from Bayesian responding as a function of deviation between the diagnostic probability and its inverse

probability. This analysis was carried out to evaluate the extent to which the effect of training on the inverse algorithm. This analysis was done as a part of evaluating the effect of training on specific aspect of the bias, and no prediction was expected. The mean scores and standard deviations are shown in Table 4.

A 2 (training vs. control) x 2 (over vs. under) x 3 (large, medium, and small) mixed measures ANOVA with training factor as the between subjects measure revealed a significant two-way interaction between direction and training, F(1,78) = 4.07, p =.047, i12= ,050 indicating that the main effect for training was

(42)

modified by direction. Simple effects Jest suggested that the mean deviation score for the training group was significantly smaller than the control group only for underestimation, F(1,78) = 15.14, pc.001, i12= .l63, while there was no significant difference between the two groups for overestimation, F(1,78) = 1 . 5 6 , ~ =.216, ns. (See Figure. 5 and Figure. 6). There was a significant two-way interaction between direction and magnitude, F (2, 136.30) = 2 6 . 6 3 , ~ <.001 i12 = .255. Significant main effects for training, F (1,78) =10.57, p =.002, r12 = .119, and direction, F ( l , 78) = 38.80, p <.001, i12 = .332 were also observed.

The main effect for magnitude was not significant, F < 1. There was no significant interaction effect between magnitude and training, F(2, 139.5 1) = .286, p =.68, i12= .005. There was no three way interaction, F(1.75, 136.292) = 3 . 1 6 , ~

= .052, i12= .039, but the significant level approached close to alpha level.

The cab problem and the diseaseproblem. It was hypothesized that the effect of training would be significant for the question with incidental base rates. An incidental base rate refers to the base rate that suggests no causal factor for inference. It has been argued that normative models may not be applicable to describe the posterior probability judgment for incidental base rate problems (Cohen, 198 1 ; Stanovich & West, 2000).

The hypothesis, that training in Bayesian inference would improve participants' posterior probability judgments was supported.

(43)

Table 4

Mean Deviation Scores and Standard Deviations with Magnitude by Direction by Training

Magnitude

Large Medium Small

Direction Overestimation Training Group

M

3.62 2.49 0.13 SD 9.63 8.64 5.14 Control Group M 2.30 1.3 1 -3.86 SD 15.59 14.81 12.23 Underestimation Training Group M -5.46 -4.67 -2.43 SD 11.07 9.26 6.07 Control Group M -16.53 -13.70 -9.47

SD

14.27

14.14

12.06

(44)

# - --1 training group I control group TYPES OF GROUP

Figure 5. Mean Deviation Scores and Standard Deviations with Stimuli for Overestimation

(45)

I

n= 40

I -

3 -- - - 1 -

training group control group TYPES OF GROUP

Figure 6. Mean Deviation Scores and Standard Deviations Stimuli for Underestimation

(46)

A 2 (training vs. control) x 2 (with diagram vs. without diagram) ANOVA was conducted to evaluate the effect of training and diagram on thc absolute deviation scores for the cab problem. Normality and homogeneity of variance assumptions were supported. A two-way ANOVA yielded a significant main effect for training, F(1,76) = 9.06, p =.004, r12 = .106. The effect size was considered to be large. There was no significant main effect for diagram, F(1,76) =.641,p =.426, r12 =.008 nor an interaction, F 4 as expected. The means and standard deviations were shown in Table 5 and Figure 7.

The disease problem. It was hypothesized that the effect of training wou

significant on participants' posterior probability judgment with an extremely low base rate. If participants used the diagnostic probability (80%) for the posterior probability (.03%), the mean absolute deviation scores between the two groups would be significantly different. This difference would be much more apparent with the low base rate. Normality and homogeneity of variance assumptions were not supported, so significance analysis was analyzed by Mann-Whitney U test. As expected, the training group had significantly smaller absolute mean deviation score, M = 10.72, SD = 18.96, than the control group, M = 24.13, SD = 26.17, U (nl=40, n2= 40) = 4 8 2 . 0 0 , ~ = .002, two tailed. The means and standard deviations were shown in Table 6 and Figure 8.

(47)

Table 5 ,

Mean Absolute Deviation Scores and Standard Deviations for-thk Cab Problem

Condition Training Control Mean

With Diagram M 21.95 29.70 35.82 SD 12.36 9.16 1 1.44 Without Diagram M SD Mean M SD

(48)

t

I

training group control group r----

TYPES OF GROUP

Figure 7. Mean Absolute Deviation Scores and Standard Deviations for the Cab Problem

(49)

Table 6

I

Mean Absolute Deviation Scores and Standard Deviations forihe Disease Problem

(50)

training group control group

TYPES

OF

GROUP

(51)

Need for cognition scale.,Stanovich and West (2000) demonstrated that the Need for Cognition (NFC) scale was positively associatelwith normative solution. Thus, participants with higher scores on the NFC scale generally give more normative answers compared to those with lower scores. It was hypothesized that training would be effective across individual differences on the Need for Cognition (NFC) scale. The correlation study revealed that the association between the NFC scale and participants' deviation scores from Bayesian standard was not reliable for the posterior probability questions, Kendall's tau = -.079 (2-tailed), n = 80, p =.304, suggesting that the effect of training was significant across individual differences on the NFC scale.

Discussion

The training program results reported in this thesis extend the

understanding of whether probability judgment bias, called the inverse fallacy can be corrected. Previous research that has emphasized the aspects of cognitive

illusion on intuitive probability judgment can be looked at under the new light. The first hypothesis that the training program would improve participants' posterior probability judgments was supported. Similarly, the hypothesis that providing of diagrams would facilitate participants' probability judgments was also supported. The results are consistent with those of Sedlmeier and Gigerenzer (2001) and Sedlmeier (1999) who found that the frequency format facilitates people's probability judgments, and with those of Yamagishi (2003b) who found the

(52)

positive effect of providing diagrams.+Although it is impossible to determine the causal factors, the findings suggest that incorporating cognitive and psychological aspects of research into the training program was also important for participants' skill improvement.

The consistent effect of the training and the reliability across items suggest the stronger claim that training in Bayesian inference improves participants' posterior probability judgment skill. A single trial result has been criticized for unwarranted conclusions (Villejoubert & Mandel, 2002). Participants' posterior probability estimates could be close enough to Bayesian responding, unless the output ofjudgment from the process ofjudgment is distinguished. Consequently, using multiple trials with a systematic manipulation of stimuli in evaluating the training program was also more dependable.

In analyzing the effect of training on the inverse algorithm, the effect of training was significant for the underestimate manipulation where p(D1H) <p(HID). This suggests that participants' posterior probability judgments tend to be more superadditive than subadditive. It is somewhat surprising that there was no significant difference on deviation scores between the two groups for the

overestimate manipulation where p(DIH)> p(H1D). The reason for this is not clear but the results still suggest the influence of the training program on participants' posterior probability judgment skill. Participants in the training group did significantly better than those in the control group with the stimuli for

(53)

point. This finding could also be explqined by recent research that people often think about and search information selectively. That is, people tdnd to search for confirmation rather than disconfirmation (Margolis, 2000) or people fail to recognize the extent to which non-focal components of a complementary hypothesis also have impact on posterior probability judgments. Future research could evaluate whether training in Bayesian inference effectively influence participants to recognize the importance of non-focal components of a complementary hypothesis in other judgment tasks.

For the cab problem, the hypothesis that participants in the training group would give more normative posterior probability judgments than those in the control group was supported. That is, the effect of training seems to be significant for the problem with the base rates that are incidental. Likewise, for the disease problem, the participants in the training group did much better than those in the control. This suggests that the effect of training was consistent for the incidental base rate problem (the cab problem and the disease problem) as well as the extremely low base rate problem (the disease problem).

Finally, the present study shows that the effect of training was significant across individual differences on the need for cognition W C ) scale. While

Stanovich and West (2000) argued that the differences in the NFC scale could explain people's nonnormative probability judgments, the present study provides only weak support for the theory of individual differences on the posterior probability judgment bias. However, outside the frequency interpretation of

(54)

probability, the NFC scale might be Tore significant in determining the normative

- .

standard (e.g., subjective theory or propensity theory of probability). Continuous investigation of the relationship between psychological variables and normative probability models is necessary for the establishment of appropriate prescriptive models. Furthermore, the study suggests a limit to theory of cognitive illusion with respect to debiasing efforts. Since cognitive illusion is analogous to perceptual illusion, it implies more resistance to alternate forms of problem interpretation. However, the findings of the current study suggest that al5minute training session was effective at promoting normative posterior probability judgments.

A limitation of the study would be the absence of a transfer test. It is difficult to say how long the effect of training would stay with participants. Sedlmeier (1999) showed that participants who used a frequency tree diagram showed a strong transfer effect after a 15-week follow up test. Nonetheless, it is one of the important aspects in evaluating any training programs in the future.

Thus, overall, the present results indicated that training in Bayesian inference was an effective means to correct the inverse fallacy. The effect of

training was further analyzed according to previous research that the inverse fallacy is associated with systematic deviation from Bayesian responding as a function of deviation between the diagnostic probability and its inverse probability. Thus, the effect of training on participants' posterior probability judgments was evaluated with a clear distinction between the output and the process ofjudgment. Further theoretical development could incorporate those particular aspects of the debiasing

(55)

data. Finally, the study makes a contribution to an accumulating literature that rationality of probability judgments is multifaceted, and that Gobability judgment skill is an important aspect of rationality in this context. Direct application of the study might be that a brief computerized training program could be used as a supplemental course material for those who are learning how to reason about probabilities.

(56)

Referqnces

Baron, J. (1988). Thinking and deciding. U K : Cambridge University Press.

Baratgin, J., & Noveck, I. A. (2000). Not only base rates are neglected in the Engineer-Lawyer problem: an investigation of reasoners' underutilization of complementarity. Memory & Cognition, 28, 79-91.

Cacioppo, J. T., Petty, R. E., & Kao, C. F. (1984). The efficient assessment of need for cognition. Journal of Personality Assessnzent, 48, 306-307.

Cohen, L J. (1981). Can human irrationality be experimentally demonstrated? Behavioral and Brain Sciences, 4, 3 17-370

Cosmides, L., & Tooby, J. (1996). Are humans good intuitive statisticians after all? rethinking some conclusions from the literature on judgment under uncertainty. Cognition 58, 1 -73.

Eddy, D. M. (1982). Probabilistic reasoning in clinical medicine: problems and opportunities. In Kahneman, D., Slovic, P., & Tversky, A.(Eds.), Judgnlent under uncertainty: heuristics and biases. Cambridge:

Cambridge University Press.

Fletcher, H. J. (1971). Puzzles and quizzles. New York: Abelard-Schuman.

Fong, G. T., Krantz, D. H., & Nisbett, R.

E.

(1986). The effects of statistical training on thinking about everyday problems. Cognitive Psj-chology, IS, 253-292.

(57)

Gigerenzer, G., & Murray, D. J. (1987). Cognition as intuitive statistics. Hillsdale,

NJ: Erlbaum.

- .

Gigerenzer, G., & Goldstein, D. G. (1996). Reasoning the fast and frugal way: Models of bounded rationality. Psychological Review 103, 650-669.

Goldman, A. (1986). Epistemology and Cognition. Cambridge: Harvard University Press.

Kahneman, D., Slovic, P., & Tversky, A. (1982). Judgment under uncertainty: heuristics and biases. Cambridge: Cambridge University Press.

Kahneman, D., & Tversky, A.(1996). On the reality of cognitive illusions. Psychological Review, 103, 582-591.

Koehler, J. J. (1 996). The Base rate fallacy reconsidered: normative, descriptive and methodological challenges. Behavioral and Brain Sciences, 19, 1-53.

Kolmogorov, A. N. (1993). Foundations of the Theory of Probability, Chelsea; New York.

Krueger, J. (2001). Null hypothesis significance testing: on the survival of a flawed method. American Psychologist, 56, 16-26.

Lehman, D. R., Lempert, R. O., & Nisbett, R. E. (1988). The effects of graduate training on reasoning: formal discipline and thinking about everyday-life events. American Psychologist, 43, 431-442.

(58)

Loftus, G.R. (1996). Psychology wil1,be a much better science when we change the way we analyze data. Current Directions in Psycho(ugka1 Science, 5,

161-171.

Margolis, H. (2000). Wason's selection task with a reduced array. Psycholoquy, 1 1, #5. Retrieved on January 5,2004 from

http://psycprints.ecs.soton.ac.uk~archive/OOOOOOO5/.

Meehl, P. E. (1978). Theoretical risks and tabular asterisks: Sir Karl, Sir Ronald and the slow progress of soft psychology. Journal of Consulting and clinical Psychology, 46, 806-834.

Mill, D., Gray, T., & Mandel, D. R. (1994). Influence of research method and statistic courses on everyday reasoning, critical abilities, and belief in unsubstantiated phenomena. Canadian Journal of Behavioral Science, 26, 246-258.

Sedlmeier, P., & Gigerenzer, G.(2001). Teaching Bayesian reasoning in less than two hours. Journal of Experimental Psychology: General, 130,380-400.

Sedlmeier, P. (1 999). Improving Statistical Reasoning: Theoretical Models and Practical Implication. Mahwah, NJ: Lawrence Erlbaum.

Stanovich, K. E., & West, R. F. (2000). Individual differences in reasoning:

implications for the rationality debate? Behavioral and Brain Science, 23,

(59)

Villejoubert, G., & Mandel, D. R. (20,02). The inverse fallacy: an account of deviation s from Bayes's theorem and the additivity. principle. Memory & Cognition, 30, 171-178.

Wagenaar, W. A.(1988). The proper set: a Bayesian discussion of the position of expert witness. Law and Human Behavior, 12, 499-5 10.

Wason, P. C.(1986). Reasoning about a rule. Quarterly Journal of Experimental Psychology, 20,273-1 81.

Yamagishi, K. (2003b). Facilitating normative judgments of conditional

probability: frequency or nested sets? Experimental Psychology, 50,97- 106.

Referenties

GERELATEERDE DOCUMENTEN

These other approaches, for ex- ample, add rules such as countable additivity, which is not an essential aspect of probability, merely for technical convenience (I discuss count-

( 2011 ) considers n-dimensional ones, which can be degenerate... degenerate models are not a priori freed from any constraint, and this is the point on which we would like to

Practical normativity is not “up to us” in this sense (Frankfurt, 2006, p. We can accommodate this intuition if we subscribe to cognitivism, the view that practical judgments express

Zo leert de aankomende professional te werken volgens zorg en welzijn nieuwe stijl waarin het gezinssysteem, mantelzorgers en vrijwilligers centraal staan.. We gebruiken

Comparing Gaussian graphical models with the posterior predictive distribution and Bayesian model selection.. Williams, Donald R.; Rast, Philip; Pericchi, Luis R.;

Als collega-docent BIV/AO die dit onderzoek is gestart om na te gaan wat nu in de praktijk terecht komt van mijn decennialange doceerinspanningen, ben ik zeer blij met de

Verkruijsse merkt daarover op: “Gelet op het lage aantal casus- stellingen met gelijkluidende antwoorden, in totaal bezien, kan niet geconcludeerd worden dat aantoon- baar is

We computed two commonly used Bayesian point estimators – posterior mode or modal a-posteriori (MAP) and posterior mean or expected a-posteriori (EAP) – and their confidence