• No results found

Reduced Confidence in Loss Context Enables Swift Adjustments in a Changing Environment

N/A
N/A
Protected

Academic year: 2021

Share "Reduced Confidence in Loss Context Enables Swift Adjustments in a Changing Environment"

Copied!
17
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

R

EDUCED

C

ONFIDENCE

 

IN

L

OSS

C

ONTEXT

E

NABLES

S

WIFT

A

DJUSTMENTS

IN

A

C

HANGING

E

NVIRONMENT

M

ASTER

T

HESIS

2018, A

UGUST

15

C. L. LUSINK

J. B. ENGELMANN, SUPERVISOR

C. M. VAN VEELEN, 2

ND

SUPERVISOR

MSC. ECONOMICS

BEHAVIOURAL ECONOMICS AND GAME THEORY

UNIVERSITYOF AMSTERDAM

FACULTYOF ECONOMICSAND BUSINESS

(2)

Reduced Confidence in Loss Context Enables Swift

Adjustments in a Changing Environment

Caspar Leon Lusink

Faculty of Economics and Business, University of Amsterdam

The human valuation system is accompanied by the ability to estimate the probability of being correct; a meta-cognitive, second-order valuation process which can be consciously perceived as a confidence judgement. Although many standard economic and psychological theories assume gain/loss symmetry, a general discrepancy between punishment and reward processing is found on a neurobiological and behavioural level. Recent evidence from reinforcement learning studies suggests that confidence judgements are biased downwards when learning to avoid punishment relative to learning to seek rewards. From an evolutionary point of view, we hypothesise that while reward learning is accompanied by a high-confidence rigid learning strategy, punishment avoidance is characterised by a more fluent, low-confidence strategy enabling rapid adjustments under threat. We tested this hypothesis in an instrumental learning task with manipulated outcome valence and added a reversal learning condition to simulate changing conditions. Replicating earlier findings, we found a general overconfidence in reward conditions and underconfidence in punishment conditions, despite similar performance. Furthermore, we confirmed our main hypothesis by showing an interaction effect between the punishment and reversal condition relative to the reward condition. Additionally, we showed that learning is context-dependent, hence values are learned relative to a context variable. Specifically, we showed that small punishments are valued higher than small rewards when displayed in loss versus gain context, respectively.

Introduction

Human behaviour is shaped by the positive and negative interactions we encounter with our environment. Inherently, we learn which actions create rewards and seek them. In contrast, we learn to avoid actions that result in punishment. Reward and punishment might appear to be the inverse of one another and in many classic economic theories (Von Neumann & Morgenstern, 1944; Arrow, 1965) and reinforcement learning models (Sutton & Barto, 1998; Kim et al., 2006), they are assumed to be. From an evolutionary perspective, however, punishing and rewarding outcomes have had many different roles for survival. Historically, humans lived in close clans, where we encountered many complex social decisions. Perhaps the strongest reward an individual could face was becoming the group leader (and secure offspring) and conversely, the biggest punishment could be being kicked out of the clan (thus face near certain death). One could easily imagine that the decisions and strategies involved in obtaining such a reward and avoiding such a punishment differ in fundamental ways. Indeed, a great deal of research has showed a general discrepancy in evaluating, learning and deciding in gain and loss situations. This study tries to further explore this discrepancy and make a connection with its evolutionary nature.

(3)

For many years, the field of economics used a set of assumptions that describes human behaviour with mathematically optimal models. In standard expected utility theory, for instance, the gain or loss of money (or utility) is assumed to be valued equally (Von Neumann & Morgenstern, 1944). The first to address human biases in gain and loss contexts were Markowitz (1952) and Williams (1966), who observed that people are risk averse in gains and risk seeking in losses. The famous theory that formalised these findings is Prospect Theory (Tversky & Kahneman, 1976), which describes the discrepancy between gain and loss in decision making under risk. The authors showed experimentally that subjects value gains differently from losses. This relationship is expressed in a value function, from which the writers famously stated that 'losses loom large than gains'. Since, the theory has received many criticism for its descriptive shortcomings, and lack of practical applicability outside an experimental setting (Barbaris, 2013). Nevertheless, the discrepancy between gain and loss still stands and researchers from different fields are working hard to fully comprehend it.

More recently, the field of Neuroscience has started contributing to the research into the gain/loss discrepancy. Recent research showed that many shared brain areas are involved in reward and punishment (Brooks and Berns, 2013), but that punishment has additional brain areas (Knutson et al., 2000) and different types of neurotransmitters involved (Daw et al., 2002; Seymour et al., 2007; Matsumoto & Hikosaka., 2009). Although researchers showed that brain activity in the orbital frontal cortex correlates positively with the magnitude of reward and negatively with that of punishment (O’Doherty et al., 2006), others argue that due to the general background firing rate of neurons, the inhibitory response to punishment is too weak to account for punishment fully (Bayer & Glimcher, 2005; Doya, 2008). Thus far, most research points out that the ways in which we perceive reward and punishment differ in complex ways.

Yet, there is converging evidence for systemic biases in the discrepancy. A study of Tom et al. (2007) found a neurological foundation for loss aversion in the ventral striatum and prefrontal cortex, showing that losses are valued more heavily than gains. This was further substantiated by DeMartino et al. (2010) by showing that loss aversion is inhibited by impairing the amygdala. Furthermore, clinical studies show different responses to reward/punishment conditions. McFarland and Klein (2009) showed that clinically depressed patients exhibit lower responsiveness to reward anticipation, but had normal responses to punishment anticipation. Moreover, anxiety disorder tends to imbalance or enhance reward seeking and punishment, unbalancing both systems in divergent ways (Aupperle & Paulus, 2010). Apparently, our brain computes gains and losses in a different way, and the quest for understanding this discrepancy and mapping its biases is starting to bear fruit.

(4)

This study will look at how we learn to seek rewards and avoid punishment and investigate the ways in which they differ. Reinforcement learning (RL) models offer interesting insights in the workings of many parts of the brain (Schultz et al,. 1997). In RL models, punishment can easily be modelled as a negative reward and thereby avoiding punishment can be seen as a reward. In fact, one study showed that the brain processes an avoided punishment in the same way as a reward (Kim et al., 2006). However, as mentioned before, this does not correlate with other neurological findings. Palmintiri et al. (2015) offer a computational solution by introducing a context dependent modulation. In this model, learning is accompanied by a so-called context-value, which becomes negative in loss contexts and positive for gain contexts, over time.

To further understand the strategies underlying reward and punishment learning, Lebreton et al. (2015) found a second order valuation metric that is incorporated in the brain’s valuation system. This metric measures the strategy reliability of a decision through a confidence estimate (Daw et al., 2005). The confidence estimate is defined as the participants subjective feeling of being correct (Adams, 1957; Pouget et al., 2016). This second-order, meta-cognitive valuation process (Fleming et al., 2010; Pleskac & Busemeyer, 2010) occurs automatically (Lebreton et al., 2015) and is general; acts as a common currency (De Gardelle et al., 2016), as does first-order valuation (Fehr & Ruff, 2014). Confidence can be consciously perceived, hence it can reliably be reported by eliciting incentivised judgements. Interestingly, converging evidence shows that confidence judgements are inherently biased. Particularly, people tend to have a general overconfidence (Lichtenstein et al., 1982), and confidence is biased by the value at stake in a perceptual decision making task (Lebreton et al., 2017). Furthermore, recent work showed that confidence follows the discrepancy in gain and loss, where confidence is biased downwards in learning to avoid punishment compared to learning to seek reward, despite similar performance (Lebreton, 2018, working paper). Interestingly, this research shows that the bias correlates with the context-value model of Palminteri et al. (2015). This study will dive deeper into the workings and evolutionary background of this confidence bias. Looking back at our primal history, the confidence discrepancy between reward seeking and punishment avoidance might have an evolutionary origin. For group leaders or individuals that challenge the alpha male, overconfidence is a positive trait, while in a threat situation, where punishment avoidance is of the essence, it is not. In these kind of loss situations, a more fluent strategy might be advantageous. Perhaps a lower confidence prepares an individual to swiftly change strategies and maximise survival chances. Therefore, we investigate the question: Does

lower confidence in punishment avoidance learning enable better performance in changing conditions? We used an instrumental learning task with outcome valence (reward/punishment) and

(5)

the punishment and reversal condition on post-reversal performance, hence showing compelling evidence for our postulated theory and confirming our research question. Furthermore, we replicated findings from Palminteri et al. (2015; 2016) and Lebreton et al. (2018, working paper), strengthening existing evidence for a confidence bias in reward seeking and punishment avoidance. Furthermore, our data shows additional support for the reliability of the context-dependent reinforcement model of Palmintiri et al. (2015). The recent neuroeconomic investigation into the underlying mechanisms of the human valuation system starts to bear fruit, opening doors for future research to find the true (dys)functions of our cognitive abilities.

Methodology

Hypotheses

This study used the instrumental learning task of Palminteri et al. (2015) and Pessiglioni et al. (2006) with small modifications of Lebreton et al. (2018, working paper), and included a reversal learning condition to investigate the effect of change on confidence and performance. Earlier findings suggest a general, downward confidence bias in a punishment condition, relative to a reward condition, despite similar performance (Lebreton, 2018, working paper). Our first hypothesis is, therefore, that we replicate these findings, by showing a significantly lower confidence level in the punishment conditions, without a significant effect of this condition on performance. From our former historical argument, we conjecture that learning behaviour in a loss context requires a more fluent learning strategy. Hence, our second, and main hypothesis states that a lower confidence in loss context enables better performance in the reversal learning condition. We will investigate this, by looking at the interaction effect of the reversal and outcome valence conditions. Furthermore, we hypothesise that participants learn values context dependently. Specifically, our third hypothesis expresses that symbols are learned relatively to the pairs they are presented in, resulting in higher valuation of small punishment over small gains in the transfer learning task.

Participants

All studies were approved by the local Ethics Committee of the Center for Research in Experimental Economics and political Decision-making (CREED), at the University of Amsterdam. All participants gave informed consent prior to partaking in the study. The participants were recruited from the laboratory's participant database (www.creedexperiment.nl).

(6)

A total of 48 participants took part in this study, divided over two groups of 25 (52% male; age = 22.8 ± 5.1) and 23 (60% male; age = 22.5 ± 2.4). Both group 1 and 2 performed the same experiment. The participants were compensated with a show-up fee (€5) and additional gains and losses they made during the trials (exchange rate: 0.3 tokens :€1). Additionally, three trials were randomly selected to pay out a possible €5 each, following the confidence incentivisation scheme (explained below).

Experimental Design

The experiment consisted of two parts. The first was a instrumental learning task with a reversal condition and the second a transfer task where learned preferences were tested. The first part consisted of three sessions, the second part was performed once by each participant. Every session consisted of 120 trials and the transfer task of 112 trials.


The design of the experiment was adopted from the probabilistic instrumental-learning task from Palminteri et al. (2015) and follows the adjustments implemented by Lebreton et al. (working paper). Eight stimuli from the Agathodæmon alphabet were randomly divided into four pairs that remained in these pairs throughout each session. Every pair corresponded to one of four conditions, a stable gain and loss condition and a reversal gain and loss condition (2x2 design). In the stable condition the stimuli were pseudo-randomly presented on both sides of a centralised fixation dot. In other words, all stimuli were presented an equal amount of times in total and on both sides. Within every pair, one stimulus was associated with an outcome (gain or loss) of €1 with reciprocal probability of 80% (G80&L80) and the other stimulus with 20% (G20&L20) and €0.10 with remaining probability. For the two reversal condition pairs, the same randomisation and incentivisation rules apply, but after trial 60 the pairs switch roles (probability of winning/losing €1). In other words, the probability and thereby average value were interchanged within those pairs in the second half of the task.

Procedure

Before the participants started the experiment, they were informed by written instruction. The participants started with a practice session of 16 trials. In these trials the symbols were replaced with letters, to enhance understanding of the task. After the practice session, participants were asked if they understood the task after which the confidence incentivisation scheme was demonstrated with their data from the practice session.

In the learning task, the participants were asked to select one of the stimuli as fast as possible (RT was measured), after which they were asked to give a score ranging from 50%-100% of their

(7)

confidence in their decision. After they commit their confidence score, the outcome of only their choice was presented (no counterfactual information was presented). See Figure 1 for illustration. The transfer task consisted of the same stimuli (with their corresponding probability and valence) as in the last session of part one. For the reversal conditions, the stimuli qualifications of the second part (after the reversal) were used. In this task the eight stimuli are presented four times in all possible combinations, resulting in a total of 112 trials. For each pair the participants selected the one they believed had the highest average 'value'. No feedback was presented in this task and no tokens could be earned.

Confidence Incentivisation

A confidence incentivisation scheme, derived from the Becker-DeGroot-Marschak auction (Becker et al., 1964), was designed to ensure reliable confidence judgements. Participants were asked to report their confidence judgment on a scale from 50% (oblivious for the correct symbol) to 100% (completely sure) for every decision they made. Confidence was defined as the estimated probability (p) of choosing the correct symbol (the symbol with the highest average value: G80 in the gain conditions and L20 in the loss conditions).

From all trials, three were randomly chosen to be incentivised. For every of those trials, a random number (r) was drawn from an interval [0.5 1]. This number determines if the additional bonus (€5,-) is won for that specific trial. In scenario 1, if p ≥ r, the bonus is earned, provided that the participant choose the correct symbol (with highest average value). In scenario 2, if p < r, the bonus is earned with a probability of r. In this procedure, the optimal strategy is to elicit your true confidence judgement. When a higher confidence judgment is reported, the expected pay-out

Figure 1: Main task procedure for one trial. First a fixation dot is presented for 500-1500 ms, after which two symbols are pseudo-randomly drawn and presented for 3000 ms. When the participant chooses one side, a red arrow is presented on that side for 500 ms. If no choice is made, no arrow is presented and the minimum amount is earned (+€0.10) or maximum is lost (-€1,00). After the choice, the participant is asked to elicit his/her confidence about the decision at own pace. Finally, the feedback is presented for the chosen symbol (not for the counterfactual option).

(8)

decreases, since the chance for scenario 1 increases and the estimated probability of having chosen the correct symbol is lower than reported. If a lower judgement is reported, the chance for scenario 2 increases, thereby decreasing the expected pay-off, since the probability of being correct is higher than reported. It has been shown theoretically (Karni, 2009) and experimentally (Hollard et al., 2015) that this mechanism reliably measures participants’ true confidence judgement.

Before the experiment, participants were informed on the specifics of the mechanism and the workings were demonstrated with the practice round data of the subject.

Statistical Analysis

All statistical analyses were performed using Matlab R2018a. All reported p-values correspond to two-sided tests, unless stated otherwise. The types of t-tests used, are reported in the text. The ANOVAs employed are repeated measure ANOVAs on main effects, unless stated otherwise.

Results

48 participants took part in our experiment, divided over two groups (25 & 23). They performed a learning task, which involved three sessions and a single-session transfer task. In the learning task, symbols were presented in pairs and these pairs were fixed throughout every session. With a factorial 2x2 design, each of the four pairs belongs to one condition, with both symbols giving a loss or a gain of €1 with 80% or 20% probability and €0.1 with the remaining probability. One of the two symbols was better on average (the 80% symbol in the gain condition and the 20% symbol in the loss condition). In the two reversal pairs, the best and worst symbol changed probabilities (thereby average value), halfway, after trial 60. After choosing one symbol, participants were asked how confident they were with their decision on a scale from 50% to 100%. Confidence judgements were fully incentivised with the use of a confidence incentivisation scheme derived from the Becker-DeGroot-Marschak auction (Becker et al., 1964).

We replicated the findings of Palminteri et al. (2015; 2016) and Lebreton et al. (2018, working paper) on participants’ performance. Participants learned by trial-and-error over time and every participant performed better than chance in experiment 1 (average correct: 71,70% ± 11.08; t-test vs. chance: t24 = 46.76, P = 0) and experiment 2 (average correct: 70,99% ± 9.76; t-test vs. chance:

t22 = 42.66, P = 0). Combining both experiments, we showed, that participants’ performance was not

affected by outcome valence (ANOVA valence: F1,47 = 1.78, P = 0.19), but was obviously affected

by the reversal condition (ANOVA reversal: F1,47 = 45.46, P = 0). Although participants performed

(9)

confidence reporting (ANOVA valence: F1,47 = 22.47, P = 0; ANOVA reversal: F1,47 = 390.3, P = 0;

ANOVA interaction (valence x reversal): F1,47 = 16.71, P = 0). Participants were less confident in the loss condition despite similar performance, which is in line with our hypothesis and previous work (Lebreton et al., 2018, working paper).

Another replication of Lebreton et al. (2018, working paper) we found concerns overconfidence: An upward bias in confidence reporting relative to performance. Participants were especially overconfident in the reversal conditions (two-sample t-test on reversal trials, confidence vs performance: t47 = 27.40, P = 2.7x10-158), because the reversal is not anticipated by the participants

and confidence reporting requires time to adjust. Hence, for further overconfidence analysis we ignored the trials that occurred after the reversal. Participants are found to be slightly overconfident in general (one-sided, two-sample t-test on remaining trials, confidence vs performance: t47 = 1.88, P

= 0.03) and in the gain condition (two-sample t-test on gain trials, confidence vs performance: t47 =

2.46, P = 0.01). In contrast, in the loss condition no over- or underconfidence was found (two-sample t-test on loss trials, confidence vs performance: t47 = 0.25, P = 0.80). This confirms our first

hypothesis that overconfidence is present, but mostly confined to a gain context.

Figure 2: Average performance and confidence levels reported per trial for all 4 conditions. 2A shows the non-reversal condition for reward (condition 1) and Punishment (condition 3). No significant different can be observed between the reward and punishment condition. 2B shows the confidence reportings for condition 1,3. The difference between the reward and punishment condition can be observed. 2C shows reversal conditions for reward (condition 2) and punishment (condition 4). This effect is demonstrated in Figure 3. 2D shows confidence levels for condition 2,4, here the general confidence discrepancy is observable. Error bars constitute to one standard error.

(10)

To confirm our main hypothesis, we investigated performance and confidence after the reversal. For illustrative purposes, we plotted average confidence and performance per trial for every condition (Figure 2, above). In the graphs, the above described results are clearly demonstrated. Moreover, the graphs hint to an increased performance after reversal in the loss trials relative to the gain trials. To analyse this effect we looked at the interaction effect of the reversal and outcome valence. In line with our main hypothesis, we found a statistically significant effect of reversal and loss on performance, relative to the reversal and gain (ANOVA interaction (valence x reversal): F1,47 = 16.71, P = 0; see Figure 3). In other words, participants performed better after reversing the symbols’ values for the loss pairs than for the gain pairs.

Additionally, we replicated the findings of Lebreton et al. (2015; 2018, working paper) in the transfer task. Participants performed better than chance (average correct: 61,92% ± 1.91; t-test vs. chance: t47 = 6.24, P = 2.8x10-9), indicating learning occurred in the main task. Furthermore, we

found evidence for context dependent learning (Palminteri et al., 2015), by showing that small losses are preferred to small gains (two-sample t-test on average frequency, symbol 2 vs symbol 5: t47 = 4,35, P = 7,21x10-5; see Figure 4). This effect is not present in the reversal condition. Figure 3: The interaction effect of reversal and punishment versus

reversal and reward. Average performance is plotted against the

reversal condition for both the punishment and reward conditions. The lines indicate the change in performance from the non-reversal to the reversal condition for reward and punishment. The interaction effect is not directly observable, but the difference in steepness of the blue (reward) and orange (punishment) lines demonstrate that the reversal condition influences the reward condition more than the punishment condition. Error bars constitute to one standard error.

Figure 4: The average frequency of being

chosen in the transfer task for every symbol.

The symbols are ordered in their pairs (1,2; 5,6; 3,4; 7,8) and the left symbol is the better symbol (on average) of the two (therefore the reversal pairs are switched, since they correspond to the after-reversal values). As can be seen in the non-reversal symbols, the better punishment symbol (5: L20) is chosen more often than the worse reward symbol (2: G20). This effect is not present in the reversal condition. Error bars constitute to one standard error.

(11)

Nevertheless, this result indicates context dependent learning, since these symbols (G20 and L20) are learned relatively to their paired symbols. Hence, participants learn these symbols only in their respective context (reward or punishment). Computational modelling can account for this effect by implementing a 'context variable' that measures the relative value of a decision by making use of a reference point. The computational confirmation of our data within the model lies beyond the scope of this study, but our results point in the right direction and can be used for this purpose in the future.

Discussion

This study investigated the discrepancy between reward seeking and punishment avoidance in a reinforcement learning context. We used a modified instrumental learning task (Palminteri et al., 2006; 2015; Lebreton et al., 2018, working paper) to manipulate the outcome valence (gain/loss) and added a reversal condition to test strategic adaptation under change. We elicited incentivised confidence judgements, to acquire a deeper understanding of the evolutionary origin of the bias and revealed an increased performance in the punishment condition after reversal, confirming our main hypothesis. Furthermore, we replicated earlier findings that showed that people tend to be overconfident in general (Lichtenstein et al., 1982), participants perceive lower confidence in a loss context despite similar performance and values are learned context-dependently (Lebreton et al., 2018, working paper).

In contrast with earlier findings (Lebreton et al., 2018, working paper), we didn’t find overconfidence in the loss condition. Additionally, the general overconfidence was marginal, which both supports our second findings, that overconfidence is specific to a gain context. From an evolutionary perspective, this corresponds to the common sense assumption that for high gain rewards, overconfidence is necessary. This argument is supplemented by the finding that participants have a relatively increased performance in the punishment condition after the reversal. The reversal condition, representing changing factors in decision situations, tests the strategy reliability of the participants. Therefore, a lower confidence level in the loss context, enables easier re-evaluation of the situation, resulting in increased performance. Moreover, the context-dependent learning effect in the transfer task was found to be limited to the non-reversal condition. This could be explained by the slow adjustment of the context value over time. Further analysis will be needed to investigate this result.

As Palminteri and Pessiglioni (2017) already mentioned, behavioural tasks in general might not purely target instrumental learning, hence other processes or strategies might underlie the behaviour

(12)

in this experiment. Therefore, we added the second experiment, using the exact same paradigm to enhance power and test the robustness of the effect. The subsequent result appears to be robust, thereby, to a statistical degree of certainty, showing an effect. The effect itself must be taken with consideration, however, since a reversal learning condition does not constitute to a general change. To ensure generalisability of the effect, different change conditions should be tested to investigate the larger strategy reliability and the size of the effect on performance. Moreover, without neuroimaging data to back up the findings of this study, it is non-viable to make statements about the underlying processes that cause the confidence bias in outcome valence and its increased performance after reversal. Possibly, both observations don’t have any causal relationship and another underlying process generates the discrepancy and enhance performance. Future research should investigate the neurological foundation of this confidence bias and its effect on performance in a reversal condition to be able to connect the observation and confirm the theory.

(13)

References

Adams, J.K. (1957). A Confidence Scale Defined in Terms of Expected Percentages. Am. J. Psychol. 70, 432–436. Arrow, K. J. (1965). Aspects of the theory of risk-bearing. Yrjö Jahnssonin Säätiö.

Aupperle Robin, L., & Martin, P. P. (2010). Neural systems underlying approach and avoidance in anxiety disorders. Dialogues in clinical neuroscience, 12(4), 517.

Barberis, N. C. (2013). Thirty years of prospect theory in economics: A review and assessment. Journal of Economic Perspectives, 27(1), 173-96.

Bayer, H. M., & Glimcher, P. W. (2005). Midbrain dopamine neurons encode a quantitative reward prediction error signal. Neuron, 47(1), 129-141.

Brooks, A. M., & Berns, G. S. (2013). Aversive stimuli and loss in the mesocorticolimbic dopamine system. Trends in Cognitive Sciences, 17(6), 281-286.

Becker, G.M., DeGroot, M.H., and Marschak, J. (1964). Measuring Utility by a Single-Response Sequential Method. Behav. Sci. 9, 226–232.

Daw, N. D., & Doya, K. (2006). The computational neurobiology of learning and reward. Current opinion in neurobiology, 16(2), 199-204.

Daw, N.D., Kakade, S. & Dayan, P. (2002). Opponent interactions between serotonin and dopamine. Neural Netw. 15, 603–616.

Daw, N. D., Niv, Y., & Dayan, P. (2005). Uncertainty-based competition between prefrontal and dorsolateral striatal systems for behavioral control. Nature neuroscience, 8(12), 1704.

De Gardelle, V., Le Corre, F., & Mamassian, P. (2016). Confidence as a common currency between vision and audition. PloS one, 11(1), e0147901.

De Martino, B., Fleming, S. M., Garrett, N., & Dolan, R. J. (2013). Confidence in value-based choice. Nature neuroscience, 16(1), 105.

Doya, K. (2008). Modulators of decision making. Nature neuroscience, 11(4), 410.

Fleming, S. M., Weil, R. S., Nagy, Z., Dolan, R. J., & Rees, G. (2010). Relating introspective accuracy to individual differences in brain structure. Science, 329(5998), 1541-1543.

Guillemette, M. A., Yao, R., James, I. I. I., & Russell, N. (2015). An analysis of risk assessment questions based on loss-averse preferences. Journal of Financial Counseling and Planning, 26(1), 17-29.

Hollard, G., Massoni, S., and Vergnaud, J.-C. (2015). In search of good probability assessors: an experimental comparison of elicitation rules for confidence judgments. Theory Decis. 80, 363–387.

Holt, C. A., & Laury, S. K. (2002). Risk aversion and incentive effects. American economic review, 92(5), 1644-1655. Kahneman, D., & Tversky, A. (1979). Prospect theory: An analysis of decision under risk. Econometrica 263 – 291. Karni, E. (2009). A mechanism for eliciting probabilities. Econometrica 77, 603–606.

Kim, H., Shimojo, S., & O'Doherty, J. P. (2006). Is avoiding an aversive outcome rewarding? Neural substrates of avoidance learning in the human brain. PLoS biology, 4(8), e233.

Knutson, B., Westdorp, A., Kaiser, E., & Hommer, D. (2000). FMRI visualization of brain activity during a monetary incentive delay task. Neuroimage, 12(1), 20-27.

Lebreton, M., Abitbol, R., Daunizeau, J., & Pessiglione, M. (2015). Automatic integration of confidence in the brain valuation signal. Nature neuroscience, 18(8), 1159.

Lebreton, M., Bacily, K., Palminteri, S., and Engelmann, J. B. (Working Paper). Contextual influence on confidence judgment in human reinforcement learning.

(14)

Lebreton, M., Langdon, S., Slieker, M. J., Nooitgedacht, J. S., Goudriaan, A. E., Denys, D., ... & Luigjes, J. (in press). Two sides of the same coin: monetary incentives concurrently improve and bias confidence judgments. Science Advances.

Lichtenstein, S., Fischhoff, B., and Phillips, L.D. (1982). Calibration of probabilities: the state of the art to 1980. In Judgment Under Uncertainty: Heuristics and Biases, D. Kahneman, P. Slovic, and A. Tversky, eds. (Cambridge, UK: Cambridge University Press), pp. 306–334.

Markowitz, H. (1952). The utility of wealth. Journal of political Economy, 60(2), 151-158.

Matsumoto, M., & Hikosaka, O. (2009). How do dopamine neurons represent positive and negative motivational events?. Nature, 459(7248), 837.

McFarland, B. R., & Klein, D. N. (2009). Emotional reactivity in depression: diminished responsiveness to anticipated reward but not to anticipated punishment or to nonreward or avoidance. Depression and anxiety, 26(2), 117-122. O'Doherty, J., Kringelbach, M. L., Rolls, E. T., Hornak, J., & Andrews, C. (2001). Abstract reward and punishment

representations in the human orbitofrontal cortex. Nature neuroscience, 4(1), 95.

Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nature communications, 6, 8096.

Palminteri, S., Kilford, E.J., Coricelli, G., and Blakemore, S.-J. (2016). The Computational Development of Reinforcement Learning during Adolescence. PLOS Comput. Biol. 12, e1004953.

Palminteri, S., & Pessiglione, M. (2017). Opponent brain systems for reward and punishment learning: causal evidence from drug and lesion studies in humans. In Decision Neuroscience (pp. 291-303).

Pleskac, T. J., & Busemeyer, J. R. (2010). Two-stage dynamic signal detection: a theory of choice, decision time, and confidence. Psychological review, 117(3), 864.

Pouget, A., Drugowitsch, J., and Kepecs, A. (2016). Confidence and certainty: distinct probabilistic quantities for different goals. Nat. Neurosci. 19, 366–374.

Ruff, C. C., & Fehr, E. (2014). The neurobiology of rewards and values in social decision making. Nature Reviews Neuroscience, 15(8), 549.

Schultz, W., Dayan, P., and Montague, P.R. (1997). A Neural Substrate of Prediction and Reward. Science 275, 1593– 1599.

Seymour, B., Daw, N., Dayan, P., Singer, T., & Dolan, R. (2007). Differential encoding of losses and gains in the human striatum. Journal of Neuroscience, 27(18), 4826-4831.

Sutton, R. S., & Barto, A. G. (1998). Reinforcement learning: An introduction. MIT press.

Tom, S. M., Fox, C. R., Trepel, C., & Poldrack, R. A. (2007). The neural basis of loss aversion in decision-making under risk. Science, 315(5811), 515-518.

Von Neumann, J., & Morgenstern, O. (1944). Theory of games and economic behavior. Princeton, NJ, US: Princeton University Press.

Williams, C. A. (1966). Attitudes toward speculative risks as an indicator of attitudes toward pure risks. The Journal of Risk and Insurance, 33(4), 577-586.

(15)

Appendix

Participant instructions: Main task

Dear participant,

Thank you for participating in our experiment. Before the experiment begins it is important that you understand exactly what is going to happen. Therefore, please read the text below carefully and do not hesitate to ask for clarification if there is anything unclear. Your experimenter will answer any questions you may have.

Purpose of the research

The purpose of the experiment is to investigate how you learn the probabilities of winning and losing money. This will be investigated using a computer task.

Procedures during research

During this game, you are asked to make repeated choices between two symbols shown to you on the computer screen (one on the left, one on the right of a central cross). All symbols carry a certain value and the two symbols that are displayed simultaneously are not equivalent. One is on average more advantageous than the other, either because it brings big gains more often, or because it brings fewer big losses more often when compared to the other symbol of the pair. The result of your choice may be that:

• you earn money (+ €1 or + €0.10) • you lose money (- €1 or - €0.10)

The goal of the game is to win as much money as possible, even if avoiding losses is not possible at all times. Gains and losses never occur together in one trial, meaning that symbols always both win or both lose. With losing symbol pairs, the goal is to get the small loss more frequently than the big loss; with winning symbol pairs, the goal is to get the big gain more often. Keep in mind that the rewards are probabilistic, the actual best symbols will, in some trials, give a smaller gain (or a bigger loss) than the other symbol They are better on average.

We ask you to try and make a choice on every trial. Not making a choice within the allotted time of 3 seconds is disadvantageous to you, as you will be attributed the worst possible outcome for this trial, that is, either a failure to gain the large amount (so you win €0,10) of money (winning symbol pairs), or in fact a loss of the large amount (so you loose €1) of money (losing symbol pairs). If you do not make a response, a red upward pointing arrow ( ^ ) will appear in the centre of the screen. If you chose one of the options, your choice will be indicated by a red arrow under the symbol that you selected.

(16)

You will receive feedback after every choice telling you whether you won or lost. We ask you to try to optimize your winnings based on this feedback. Please be aware that stimuli-outcome

associations may change during the game. This means an advantageous symbol can become

disadvantageous, switching from bringing gains more often relative to the other symbol, to bringing gains less often (with the reverse for symbols that bring losses). However, each and every change in outcome (trial-to-trial differences) does not necessarily indicate a change in value of a certain symbol, so be mindful of the average outcome of all symbols. At the end of the session, the experimenter will tell you the total of your winnings, which reflects the choices that you made.

After you chose a symbol, but before you receive feedback on your choice, you are asked to indicate how confident you are, that you chose the best symbol of the pair, on a scale from 50% to 100%. 50% means that you made your choice completely at random and you do not know which of the symbols has the best value; 100% means that you are absolutely certain you chose the symbol that has the best average value of the two. Your final confidence rating should represent your estimated probability that you correctly chose the symbol that is the most advantageous of the pair on average.

At each trial, the outcome associated with the symbol of your choice is not influenced by your confidence rating. However, you can win a confidence bonus, which is highest if your confidence matches your actual probability of having chosen the most advantageous symbol (this will be explained in details after practice). It is therefore important to be as truthful and as accurate as possible when estimating your confidence during the learning task. You have to confirm your confidence choices before you get to see your feedback, but confirmation is not necessary for any other part of the experiment.

Before the task starts you will go through a few practice trials to familiarize yourself with the task and its pace. Please make sure to ask your experimenter any questions during or after the practice trials. When everything is clear to you, you can start with the main experiment, which consists of 3 parts, each should take about 15-20 minutes (112 trials for each part). Each part contains different symbols and you will re-learn all values, but your task is the same. After these 3 parts, you will do a post-learning task, for which you are given separate instructions before the start. All tasks combined will take about one hour and 15 minutes.

Payment

Throughout the experiment we will use laboratory currency (MU) that will be converted to Euros after all procedures are completed (exchange rate: 1 MU = 0.3 Euros). You will receive a show-up fee of 5 Euros, to which additional winnings and losses based on your decisions will be applied. The computer will keep track of your wins and losses

(17)

during the game. A confidence bonus can be won (as mentioned before). The mechanism used to determine the bonus is visualized in an example after your practice trials. After the last task, the same mechanism will tell you which bonuses you earned and how much your final payout is. It will use random trials from your sessions, so it is in your best interest to be mindful of your answers in all trials.

Confidentiality

All research data are strictly confidential and are processed anonymously. Personal information will not be publicised to third parties without your explicit permission.

Voluntary

If you decide now to not participate in this research, this will have no consequences for you. If you decide during the research to stop your participation, this too will not have any consequences for you. Furthermore, you can withdraw your permission for use of your data up to 24 hours after the research. So, you can stop your participation at any time, without having to give a reason. If you decide to not participate at any time, or withdraw your permission within 24 hours, your information and data will be deleted from our files and destroyed.

Insurance

Because this research does not compromise or risks your health or safety in any way, the terms and conditions of the regular liability insurance of the UvA are applicable.

Further information

If you have any questions regarding this research, before or after it is conducted, you can turn to the responsible researcher, Dr. Jan Engelmann, tel. (0) 205 255 5651, email j.b.engelmann@uva.nl, Roeterstraat 11, 1018 WB

Amsterdam. For potential complaints about this research you can turn to the member of the Ethics Committee, Prof. Dr. Joep Sonnemans, tel. 0205254249, email j.sonnemans@uva.nl, Weesperplein 4, 1018 AX Amsterdam.

Participant instruction: Transfer task

This last task consists of a series of new choices, constructed in a similar fashion as the previous task. Specifically, as in the experiment before, you are asked to choose which symbol you prefer over the other. You will be familiar with all the symbols from the first experiment (only the third part), and they retain their value. So please base your decisions on the values the symbols had at the end of the third part. However, in this task you cannot win or lose any money. Nonetheless, please make your choices as if it was the case that you are actually getting paid.

In this task, you will also not receive any feedback on your choices, and this means that you will be presented with a new choice as soon as you have rated your confidence. In contrast to the main task, you can make your choice of symbols without any time limitation, so if you want you can take your time to think about it. This task will also take about 20 minutes (112 trials).

Referenties

GERELATEERDE DOCUMENTEN

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of

Repeated suggestive questioning, accuracy, confidence and consistency in eyewitness event memory. Chapter

Therefore, to provide new infor- mation about the relation between accuracy and confidence in episodic eyewitness memory it is necessary to make a distinction between recall

In this study we investigated the effects of retention interval (either 1, 3 or 5 weeks delay before first testing) and of repeated questioning (initial recall after 1 week,

Repeated suggestive questioning, accuracy, confidence and consistency in eyewitness event mem 51 Although on average incorrect responses were given with lower confidence, still a

Repeated partial eyewitness questioning causes confidence inflation not retrieval-induced forg 65 We also looked at the possible occurrence of hypermnesia in correctly

However, the central and peripheral groups of witnesses in this study did not indicated different levels of emotional impact.. The robbery used in this study is an ordinary case,

Although these correlations are clearly higher than accuracy-confidence correlations found in person identification tasks (e.g., Bothwell, Deffenbacher, &amp; Brigham, 1987;