Valence and Reward Magnitude in Perceptual Choice

(1)

Valence and Reward Magnitude in Perceptual Choice

Scott Isherwood

11407093

36EC

12/02/18 – 10/08/18

Integrative Model-based Cognitive Neuroscience (IMCN) Research Unit

MSc in Brain and Cognitive Sciences, University of Amsterdam (UvA)

Track: Behavioural Neuroscience

Examiner 1: Birte Forstmann

Examiner 2: Leendert van Maanen

Supervisor: Steven Miletić

(2)

Abstract

The process of making perceptual decisions is a complex matter. Not only do we have to encode and integrate all of the information in front of us, but we also have to incorporate information prior to our choices. This prior knowledge can influence how we weight the noisy sensory input we receive by biasing us towards an alternative. Here, we investigate the influence of valence and reward magnitude on biased and unbiased decision-making in two experiments. The use of modeling techniques (e.g. DDM, LBA) has allowed an insight into how prior knowledge effects decision-making on a computational level, suggesting that observations in behaviour are best accounted for by a shift in starting point (DDM) or threshold (LBA). Though different accumulator models, these shifts are analogous. By manipulating the saliency associated with the outcome of a simple two-alternative perceptual choice task, we aimed to observe differences in how individuals set their threshold criterions. We were able to replicate previous findings of how bias influences computational models, but did not find differences in how people respond to this bias when valence or reward magnitude were manipulated. Although the framing of economic, value-based decisions heavily modulates behaviour, it appears that this influence does not crossover into perceptual choices in the same way. We therefore conclude that manipulating the outcome-salience of simple motion discrimination decisions does not significantly impact behaviour.

Introduction

The purpose of this study was to shed light on how individuals respond to bias while making simple perceptual decisions, and what factors modulate this bias. Bias in such a case can be defined as an asymmetrical weight in favour of or in opposition to a choice alternative. Bias is largely shaped by prior knowledge, and even if at a low-level, guides our everyday choices. Prior knowledge allows us to better prepare for choices in advance, resulting in quicker decisions in the direction of the bias and slower decisions in directions opposite to the bias. But importantly, this bias can be modulated or overcome by new information. Here, the bias introduced was in the form of a simple arrow cue, where its direction denotes information about the potential payoff of a choice. In an attempt to alter behaviour towards this bias, we manipulated aspects of the outcomes associated with a simple two-alternative forced choice (2AFC) task.

How individuals respond to noisy sensory information has been at the heart of the wide-ranging literature on decision-making. These perceptual decisions are commonly investigated in simple behavioural tasks, where participants have to pick between two choice-options of differing visual composition (Ho et al., 2009; Liu, 1999; Niwa & Ditterich, 2008; Mulder et al., 2010; Forstmann et al., 2010; Jogan & Stocker, 2014). Most provide feedback on the decisions of the subjects, but it is less clear how these subjects respond to prior information on their decisions, and how the framing of these decisions or feedback influence perceptual choice. The dynamics of these choices have been formalized in recent years through the use of accumulation-to-bound models, all comprising the same basic assumption that sensory evidence from the

(3)

environment is sampled until a decision criterion is reached, and a decision is executed (Raticliff, 1985; Ratcliff and McKoon, 2008; Brown and Heathcote, 2008; Usher and McClelland, 2001). This noisy evidence accumulation process can be affected by many environmental factors, namely prior information, or bias (Mulder et al., 2012; Forstmann et al., 2010; Bogacz et al., 2006; Diederich & Busemeyer, 2006). Here, we shed light on how this prior information is incorporated into the decision process using 2 experiments. The first uses valence (reward and punishment) to observe whether framing effects biased or unbiased decision-making. In the second experiment, we aimed to maximize the effect of bias in decision-making by increasing the magnitude of the reward participants received during the perceptual task.

Two main types of prior knowledge have been proposed: information indicative of probability, and information indicative of reward (potential payoff). The latter will be the main focus of this paper, though previous work has suggested that both types share a common mechanism with regards to computational integration (Mulder et al., 2012). Previous work has elucidated the most likely site of action of bias as a modulator of decision thresholds (Mulder et al., 2012; Wagenmakers et al., 2008; Forstmann et al., 2010). Through this mechanism, bias can either shift the threshold away from the starting point, requiring more information for a decision, or shift the threshold closer to the starting point, requiring less information. In terms of the potential payoff bias, if one choice were indicated to confer a larger reward than the other, by

this mechanism, we would expect less evidence to be needed for a choice to be made. There is greater

evidence for this decision threshold mechanism of bias incorporation, but that is not to say that there are not

other possible explanations. Another suggested mechanism is that bias effects the speed of evidence

accumulation, where evidence in favour of the biased response is accumulated quicker than the unbiased response (Ratcliff, 1985; Diedrich & Busemeyer, 2006). An objective of this paper is to add further evidence as to which of these mechanisms is at play. This model-based strategy will therefore quantify bias in terms of differences in decision thresholds, or differences in drift rates. Bias cues can take three forms during the decision-making task: valid, invalid or neutral. Valid cues bias the subjects towards to correct answer, whereas invalid cues bias subjects towards the incorrect response. Neutral cues bias the participants in neither direction.

In order to quantify the effect of this bias, the data presented in this paper were fitted using the Linear Ballistic Accumulator (LBA) model (Brown and Heathcote, 2008). The LBA is one of the simplest models of decision-making and can be applied to multi- as well as two-alternative choices (Brown and Heathcote, 2008). As mentioned, the model assumes choices are made when a decision criterion is met, in light of the evidence available to the decision maker at the time. One major difference between the LBA and other diffusion models (e.g., DDM, LCA) is that each possible choice has an independent accumulator unit (Ratcliff, 1985; Usher and McClelland, 2001). In this sense, the choice alternatives are not relative to one another and “race”, in a ballistic fashion, towards their respective thresholds. In the case of a simple two-choice perceptual task such as this, or the random dot motion task, there is a correct and incorrect accumulator (see Figure 1). The overall time to make a choice is made up of a non-decision (e.g., motor and perceptual processes) phase and a decision phase. This non-decision time (t0) is one of the parameters

(4)

estimated by the model. The model assumes that noisy evidence is accumulated over time, from a starting point (k), which a randomly drawn value in the interval [0,A]. The rate at which evidence is accumulated is defined by the drift rate (v) parameter, with the independent units having discrete rates drawn from a normal distribution. These two parameters make up the instances of trial-to-trial variability in the model. This accumulation process occurs until a decision threshold for one of the choices is reached. For unbiased decisions, these thresholds are assumed to be the same for the two alternatives. Unbiased decisions are therefore comprised of seven parameters: decision threshold (b), upper bound of starting point (A), mean drift rate of each accumulator (v1, v2), non-decision time (t0), standard deviation of the drift rate of each accumulator (sd1, sd2).

Prior knowledge, on the other hand, could influence these intrinsic thresholds (Mulder et al., 2012; Forstmann et al., 2010), adding an extra parameter to each decision. In this case, instead of estimating a single b parameter, they are split into b1 and b2, corresponding to the correct and incorrect accumulator, respectively. Fortunately, to simplify things, we can assume that there is some consistency between the thresholds set for each participant, such that the threshold of the correct accumulator, when faced with a valid cue, is analogous to the threshold of the incorrect accumulator, in the presence of an invalid cue (see Figure 2). Such a simplification is logical, since thresholds are set prior to the evaluation of the stimuli, the decision-maker does not know which is the correct or incorrect choice and therefore, the boundaries are set consistently and related to one another. Of course, a neutral cue (unbiased condition) would not confer any differences in thresholds between the two accumulators.

Figure 2. Diagram depicting the shift in threshold due to prior information. The left panel shows the accumulator

associated with the correct response. The right panel shows the incorrect accumulator. The threshold (b) can be seen to vary depending on the bias type of the decision. A indicates the range of possible starting points between the interval [0,A]. On the x-axis is the decision time associated with the response. In this depiction, the drift rates of the correct and incorrect accumulators are displayed to be the same.

Figure 1. A depiction of the accumulation process in a

two-choice perceptual task, where drift rates differ between acucmulators. The example shows how evidence is accumulated for the correct (left) and incorrect (right) responses. The starting point (k) is drawn randomly and independently from identical uniform distributions between the values of [0, A]. The drift rate (v) is drawn independently from normal distributions with standard deviation (s). In this case, the threshold for the correct accumulator is reached before the incorrect one. Adapted from Brown and Heathcote, 2008. k A b Incorrect accumulator Decision time Drift rate (v) drawn from normal distribution Start point (k) drawn randomly from uniform distribution Correct accumulator Decision time

b

(5)

If drift rate changes were found to best explain the observations of differences between bias types, it would suggest that persons would accumulate evidence for a biased option faster than an unbiased option. Although this is a theoretically sound intuition of how individuals could respond to bias, changes in drift rate do not fully account for the RT distributions and their interactions with performance found in the data. The drift rate is typically related to stimulus difficulty, but not prior knowledge (Ratcliff and McKoon, 2008; Brown and Heathcote, 2008).

Economic decision-making (EDM) and perceptual decision-making (PDM) have markedly different mechanisms at play, namely the region of uncertainty surrounding the choice. In PDM, this uncertainty is focused at the stimulus level, where features of the choice phase are the point of ambiguity, but the outcome of the choice is explicit. Whereas, in EDM, the opposite is true, the stimuli presented in the choice phase are unambiguous, but the outcomes associated with each choice are uncertain. This is an important delineation between the two decision types. On top of this, there is not always a ‘correct’ and ‘incorrect’ response in EDM in the canonical sense. Even so, aspects of the framing of choices of either type of decision-making may crossover into the other. A principal objective of this paper is therefore to investigate whether perceptual and economic choices are shaped by or can be shaped by a common mechanism, that is, instrumental conditioning.

From the EDM literature we know that valence affects certain types of behavior, e.g., risk preferences and choices with uncertain outcomes (Kahnemann & Tversky, 1971; Allais, 1953). Related to this, the magnitude of these outcomes also has an asymmetric effect across

valence, as well as eliciting different behaviours within valence (see Figure 3). According to prospect theory, a descriptive model of behaviour, individuals tend to put more emphasis on losses that are of the same magnitude as gains (Kahneman and Tversky, 1971). Differential responses to changes in reward magnitude in the orbitofrontal cortex have also been reported (Knutson et al., 2003; Van Duuren et al., 2007; O’Doherty et al., 2001), indicating that perceived magnitude in some way modulates a neural response. Whether this crosses over to modulate behavioural responses is an empirical question. Much previous work has aimed attention at value-driven responses in the brain in both perceptual and economic decision-making tasks (Mulder et al., 2012; Mulder et al., 2014; Summerfield and Koechlin, 2010; Forstmann et al., 2010; Palminteri et al., 2015; Guitart-Masip et al., 2012; O’Doherty et al 2004). And

although rife in the economic literature, the extent to which valence and reward magnitude have been investigated in perceptual choice has been few and far between (Clay et al., 2017). Moreover, how valence and reward magnitude interact with value-driven bias has also received little attention.

Figure 3. Value function describing

the subjective value of outcomes across the valences. The y-axis indicates the value, the x-axis indicates the valence. The steeper slope of the value of losses suggests that losses outweigh gains. Adapted from Kahneman and Tversky (1971).

GAINS LOSSES

(6)

Reward and punishment contexts have been applied extensively to economic choice tasks involving reinforcement learning (Palminteri et al., 2015; Wright et al., 2013; O’Doherty et al., 2004) and other learning types (e.g. procedural; Wächter et al., 2009). Although the paradigm presented here is not a learning task per se, participants in some way learn how to optimize their perceptual strategies in order to maximize their accuracy. Along this line of reasoning, reward and punishment, in the form of positive and negative

points,

may cause differences in how they impact attention, motivation, caution, strategy or all of the above.

The salience associated with punishment contexts may therefore cause more cautious behaviour, which in diffusion models is defined as a widening of the decision thresholds required for the execution of responses (Donkin et al., 2011; van Maanen et al., 2016). This may also explain the longer reaction times observed in punishment over reward contexts (Gonzalez et al., 2005; Pabst, Brand and Wolf, 2013). Evolutionarily, this differential outlook towards gain and loss makes sense; one gain may be beneficial for a short amount of time, whereas one loss could be the difference between life and death. Hence, ‘losses loom larger than gains’ (Kahneman, Knetsch and Thaler, 1991). Moreover, due to the nature of seeking and avoidance behaviour, we inherently respond differently to rewards and punishments. As we actively seek rewards, they are reinforced more frequently; punishments on the other hand, are actively avoided and thus require less reinforcement (Thorndike, 1911; Skinner, 1938; Palminteri et al., 2015). The use of the LBA framework had both a confirmatory and an exploratory role in the analysis of the acquired behavioural data. We aimed to confirm or find evidence against the previous findings of how bias computationally effects decision-making in both experiment 1 and 2 (changes in decision thresholds or changes in drift rates). In experiment 1, we also sought to explore the influence of valence on such computations, and whether it interacts with the bias manipulation. In experiment 2, we test if higher reward magnitudes are able to increase the influence of bias, by comparing the behavioural and modelling results to the lower magnitude rewards used in experiment 1. Model comparison will be used to test the applicability of the selected models and model averaging will be employed as another level of corroboration. In addition to this, we will run a parameter recovery with the winning model of experiment 1 to verify its validity. As a further exploratory analysis we will also fit a conditional accuracy function (CAF) to the accuracy and reaction time data collected (Van Maanen et al., submitted). This function, similarly to the LBA, allows us to observe the relationship between responses and response times, as opposed to studying either in isolation. Due to previous work, and from our understanding how bias can computationally influence choices, we can make predictions about how parameters of the CAF vary with bias (e.g., fast errors, accuracy, late responses, etc.). This function can therefore add a valuable level of analysis to observe the dynamics of bias in choice behaviour.

(7)

Materials and Methods & Results Experiment 1 – The Effect of Valence on Perceptual Choice Materials and Methods Subjects Nineteen healthy subjects (female = 15; mean age = 22; right-handed = 18) completed the study, which was approved by the ethical committee at the University of Amsterdam (UvA). Written informed consent was obtained from all participants. The subjects were recruited from the UvA and had corrected-to-normal vision and no history of epilepsy. Three subjects were excluded from the final analysis due to either not following the experiment instructions or for performing at chance level. This left us with a total of 16 subjects (female = 13; mean age = 22.3; right-handed = 15). Each subject was informed that they could receive an additional performance-dependent endowment upon arrival. This bonus was expected to increase the motivation of the participants.

Flashing dot task

The behavioural data was obtained using a Psychopy generated perceptual choice task, where participants had to repeatedly pick between two options. The perceptual choice required subjects to pick between two “flashing” dots, selecting the one (left or right) that appeared to flash more. The two dots on the screen had a consistent diameter of 1cm and were both positioned either 2cm left or right from the central fixation cross. The speeds of the flashes were set as a probability to flash at each frame. For example, at a flash rate of 0.5 the dot would have a 50% chance of flashing at that specific frame. A larger difference in flash rates between the two stimuli would therefore increase discriminability. The flash rates used here were consistently set at 0.7 for the correct stimulus and 0.5 for the incorrect stimulus. These rates were used to elicit an average accuracy rate of 70%, found in a previous study (Miletić et al., submitted). Procedure In total, each subject completed 1004 trials, 44 of the trials being in the practice or training phases (see Figure 4). To introduce bias, prior information was given before the choice phase of each trial in the form of an arrow cue. Specifically, we used a potential payoff manipulation, where the direction of the arrow (left, right or neutral) indicated the response that would be most advantageous to the score of the participant (Mulder et al., 2012). In the gain condition, the biased response would gain them the most points if they were correct, whereas in the loss condition, the biased response would lose them the least points if they were

(8)

incorrect. The subjects had two seconds to make their choice; otherwise 3 points would be deducted from their score.

There were two sets of formal tasks in this experiment, where behavioural data were analyzed. The first was a neutral task, where only valence was manipulated in the absence of a bias manipulation. Blocks consisted of 60 trials, where the gain and loss conditions were separated between blocks. The valence of the condition would be indicated to the participant by a coloured circle (blue for gain, and orange for loss) prior to the presentation of the stimuli. In the gain condition, correct responses would receive 2 points and incorrect responses would receive no points. In the loss condition, correct responses would receive no points and incorrect responses would lose the subject 2 points. Feedback was given after each response as either “correct!” or “wrong!”, in addition to the amount of points they had gained or lost.

The implemented design comprised of six conditions in the main task, formed by the crossing of our 2 x 3 factors: two levels of valence (gain or loss) and three levels of bias (valid, invalid or neutral). Figure 5 shows a step-by-step example of a single trial for each valence condition. For the biased conditions, an arrow was displayed prior to the stimuli either in the left or right direction. Valid and invalid trials comprise the biased conditions, where valid cues correspond to trials where the direction of the arrow is in the same direction as the correct stimulus. Invalid trials, on the other hand, describe trials where the cue direction was opposite to the correct stimulus option. For the unbiased conditions, a neutral arrow was shown, pointing in both directions. The colour scheme indicating the valence was the same as the neutral task. Each block consisted of 60 trials, with intermixed bias conditions but separated valence conditions.

Practice task

t = 20

Neutral task

t = 240

Training task

t = 24

Main task

t = 720

4 blocks of 60 trials

120 of each valence 12 blocks of 60 trials 360 of each valence

Figure 4. Outline of the procedures for experiment 1. Participants first received an instruction sheet with

exlpanations and a description of the task. The neutral task was preceded by a short practice task, followed by another practice task to preceed the main task. Where t indicates the number of trials within each task.

Figure 5. Schematic of the main task of the experiment.

The loss condition (orange) shows a single valid trial in which the bias cue was in the right direction, the correct answer was in the right direction and the subject responded left. The gain condition (blue) shows a single valid trial in which the bias cue was in the left direction, the correct answer was in the left direction and the participant responded left. + Wrong! -8 + 300ms 500ms 300ms Self-paced 500ms Fixation Fixation Cue Stimuli Feedback + + Correct! +8

If incorrect, right will lose you the least points

If correct, left will gain you the most points

(9)

The participants were instructed to respond as fast and as accurately as possible. The points scheme differed between the valences, indicated in table 1. A crucial similarity between the points system is that the arrow cue in both conditions always indicates the more advantageous option if the trial is difficult. Even though, these point’s schemes do have a fundamental difference. In the gain condition, the bias type affects the reward if the response is correct, whereas in the loss condition, the bias type affects the punishment if the response is incorrect. The reason for this is that we wanted to implement a perceptually equivalent point system, but therefore had to sacrifice mathematical equivalence between the valences. For mathematical equivalence, the potential payoffs within the gain and loss conditions would have to have the same corresponding expected values. But, to mirror the gain condition, subjects would have to lose points even though they had answered correctly in invalid trials in the loss condition (e.g. -6 for correct response, -8 for incorrect response). Although this would result in the same expected values in each condition, it would mean that participants would be punished for both correct and incorrect answers on some of the trials. We believe that this would induce an unwanted perceptual effect, and mean that the conditions would become less comparable. We therefore opted for the system displayed in table 1. Both the side of the correct responses and the direction of the bias cues were counter balanced over the experiment. NEUTRAL BIASED (LEFT OR RIGHT) GAIN Correct: +0

Incorrect: +0 Valid correct: +8 Invalid correct: +2 Incorrect: +0

LOSS Correct: -0

Incorrect: -0 Correct: -0 Invalid incorrect: -8 Valid incorrect: -2

Table 1. Points scheme for both the gain and loss conditions.

Each column indicates a bias type and each row signifies the two valences (gain and loss).

(10)

Data Analysis

Model selection

Hypotheses about the observed differences in behaviours in experiment 1 are coupled with assumptions that these differences would have on the model describing their behaviour. To this end, there are specific expectations we have about the effect of valence on the model. As described, the effect of bias on the dynamics of decision-processes has been well founded (Forstmann et al., 2012; Mulder et al., 2012). A main effect of bias on decision thresholds appears to best account for the data. Although, another aim of this study is to replicate these previous findings and we will therefore test a model in which bias modulates drift rates as opposed to the threshold parameter. Response caution is defined in the LBA as b – A/2, which translates to the value of the threshold (b) minus the expected value of the starting point (A). Due to the differences in risk-seeking and risk-aversive behaviour when individuals are confronted with the prospect of gain or loss, a site of difference could be in response caution. As such, we hypothesize the mostly likely location of difference to be in the threshold parameter (see Figure 6). We hypothesize responses to bias will differ between the reward and punishment conditions. How they differ is a more complex matter. Due to the typical observation that individuals are more risk seeking in response to punishment, it is possible that persons would be less biased when in the loss condition. This is because subjects may select the more ‘risky’ prospect when unsure of the correct response. On the other hand, subjects could place a stronger emphasis on the bias cue, as they are more motivated to incur a lesser punishment. If either of these were the case, it would imply two sources of modulation on the threshold parameter, giving rise to six different estimates across the conditions.

We are also manipulating reward magnitude, as we wanted to test the hypothesis that higher magnitudes would incur a greater degree of bias. We believe that the difference between the valid and invalid thresholds, estimated by the LBA model, is the most accurate quantifier of bias. Our best fitting model for experiment 1 will therefore be the model selected for fitting the data in experiment 2. This will allow us to compare the model parameters between the low magnitude (experiment 1) and high magnitude

(experiment 2) conditions. To act as a scaling factor, sd1, defined as the between trail variability of the drift

rate of the correct accumulator unit, was fixed at 1 for all conditions and all models fitted in this study.

Figure 6. Hypothesized threshold differences caused by valence and reward magnitude manipulations. A) Depiction of

the difference in response caution between the valences, if individuals were to exhibit more cautious behaviour in the punishment conditions. B) Depiction of a standard difference in threshold estimates between the valid and invalid accumulators in the low magnitude condition. C) Depiction of a larger difference in threshold estimates between the valid and invalid accumulators in the high magnitude condition, suggesting a larger degree of bias.

(11)

Results

We used a three-way repeated measure analysis of variance (ANOVA) to explore the data over the course of the experiment. For this, valence and bias type were used as categorical factors, the subjects as random factors and reaction time and accuracy as dependent variables. The results indicate that bias type (p < 0.01), but not valence (p = 0.25) had a significant effect on reaction time. Similar results were obtained for accuracy, where bias type (p < 0.001) had a significant effect but valence did not (p = 0.88). Descriptive statistics of experiment 1 can be found in figure 7. Correct responses were significantly faster than incorrect responses over all trials (p < 0.01). But, this effect was mainly driven by the difference in reaction times in the valid and neutral conditions.

We next turned to the analysis of the neutral task. This unbiased block displayed similar results for accuracy between the two valences, indicating reward and punishment contexts did not influence performance. We did find a significant difference in reaction time between the valences (p < 0.05), where the punishment trials appeared to increase the reaction time of choices. Though, this was not true for the unbiased trials in the main task. Participants also seemed to exhibit a right-side bias, where the right stimulus was selected significantly more than the left, regardless of whether it was the correct or incorrect choice (p < 0.05).

Figure 7. Descriptive results of experiment 1. a) Bar graph displaying the mean accuracy obtained in each condition.

On the x-axis the bias type are indicated, with the colours indicating the gain (blue) and loss (orange) conditions. Accuracy (%) is displayed on the y-axis. b) Bar graph displaying the mean reaction time (RT) of participants in each condition. RT is shown in seconds on the y-axis. c) Bar graph displaying the mean reaction time in each bias type, depending on whether the response was correct (1) or incorrect (0). Valence conditions are not separated when comparing based on correctness. All error bars indicate one standard error. ** indicates p < 0.01, *** indicates p < 0.001. 55 60 65 70 75 80

invalid neutral valid

Bias Type Accur acy (%) cond gain loss

Average accuracy per condition

0.55 0.60 0.65 0.70 0.75 0.80 0.85 0.90

Bias Type

Reaction time (s)

cond gain loss

Average reaction time per condition

0.55 0.60 0.65 0.70 0.75 0.80

Bias Type

Reaction time (s)

Correctness 0 1

Average reaction time for correct and incorrect responses

a

Mean Accuracy

_b

_Mean RT

c

Mean RT

** **

(12)

Conditional Accuracy Function (CAF)

To assess the dynamics of the perceptual choices between each of the conditions in greater depth, we applied a conditional accuracy function (CAF) to the obtained data (Van Maanen et al., submitted). This function, as with the case of the LBA, can simultaneously take the accuracy and reaction time responses of each participant into account (Heitz, 2014; Van Maanen et al., submitted). This allows much less information to be lost through averaging, such as in the use of reaction time bins, which is a research-standard at this time. CAFs give a means to relate the probability of correct responses to the distribution of execution times over the course of all trials. The use of the CAF in some ways also allows us to observe if there is a discrepancy between it and the results of the LBA.

The CAF is comprised of four parameters, each accounting for different patterns observed in the data. The formula can be found in equation 1 but to simplify its visualization, a logistically transformed version is also shown in equation 2. The first parameter, a, defines the upper bound of the asymptote, signifying the maximum accuracy in the data. Secondly, parameter b defines the steepness of the gradient within the later response times of the curve. Thirdly, parameter c defines the initial bend of the first segment, with lower values of c indicating larger proportion of fast-errors. Lastly, parameter d defines the shift of the function, expressing the overall positioning of graph in relation to the x-axis. In terms of bias, we could intuitively expect differences in parameters a, b and c between valid and invalid cues. As performance is hypothesized to be lower in the invalid conditions, we would anticipate a lower a value to accompany it. Invalid trials, due to an increase in threshold, would see a larger amount of slow-correct responses, which would be indicated by a lower b value, and therefore a subtler downward slope. Due to the discrepancy between the cue and the correct response, we would also assume a greater degree of fast-errors to occur in the invalid conditions, resulting in a lower c value. 𝑓 𝑥 = 𝑎 − 𝑏 𝑥 − 𝑑 − 𝑐 𝑥 − 𝑑 𝑓(𝑥) = 𝑒, 𝑒,_{+ 𝑒}(/ 012 3₀₁₂₎4 The graphs describing results taken from this test are shown below (see Figure 8). As you can see, they are in concordance with the descriptive statistics and LBA parameters mentioned previously. On the left, you can see that the bias type (validity) has a clear effect on the shape of the distribution generated. In terms of parameters, a and b displayed significant differences between the bias types (p < 0.01; p < 0.05)(see Table 2). Since parameter a describes the maximum accuracy of the condition, this result makes sense. Similarly, figure 8 shows clear differences in the accuracies of later responses between the valid and invalid conditions. This is reflected in the b parameter of the CAF, which can be explained by the difference in the threshold

(1)

(13)

criterion suggested by the model fits of the LBA. Parameters c and d displayed minor differences, though these were not significant.

Also in line with the results mentioned earlier, the right-hand graph displays relatively little difference between the dynamics of choices between the gain and loss conditions. As there are no significant differences between any parameters of the valence conditions, we can reliably assume that in terms of conditional performance, the conditions do not elicit distinct behaviours.

Bias type a** b* c d Valid 2.68 1.72 0.40 0.070 Neutral 2.61 1.78 0.44 0.057 Invalid 1.69 1.06 0.27 0.15 Valence a b c d Gain 2.31 1.51 0.35 0.11 Loss 2.34 1.53 0.39 0.074

Table 2. Parameter estimates for the conditional accuracy function across all conditions. a) Table a displays the

estimates between bias types. b) Table b displays the estimates between the gain and loss conditions. * indicates significance level. * p < 0.05; ** p < 0.01 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0

CAF validity all subjects

times ac invalid neutral valid 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 11 times acc loss gain 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 12 times acc loss gain 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 13 times acc loss gain 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0 9 times acc loss gain 0.0 0.5 1.0 1.5 2.0 0.0 0.2 0.4 0.6 0.8 1.0

CAF loss vs gain all subjects

times

ac

loss

gain

Figure 8. Conditional accuracy functions (CAF) describing behavioural results for conditions in experiment 1.

Conditional accuracies for the bias types are shown on the left, with the CAFs for the two valences shown on the right. The y-axis displays the probability of being correct and the x-axis displays reaction time (RT).

RT

Ac

cu

ra

cy

Ac

cu

ra

cy

b

a

(14)

Model Comparison

Models are typically compared on the basis of their maximum likelihood estimation (Myung, 2003). This estimation is the likelihood that a given set of parameters accurately describes the data. The aim of the differential evolution optimization algorithm used in the model fitting (DEoptim; Ardia et al., 2016), is to maximize this likelihood function. Following estimation of the parameters comprising each model, we used both the Bayesian Information Criterion (BIC) and the Akaike Information Criterion (AIC) to inform the adequacy of our models for each participant (Schwarz, 1978; Akaike, 1974). The use of such criterions allowed us to quantitatively compare the quality of the model estimates and indicate where the manipulations exerted their effect. Although a useful tool in comparison methods, researchers should not solely rely on these techniques but should also use visual and graphical methods to base their decisions on. The BIC inherently penalizes model complexity more than the AIC, this is good to factor out issues such as over fitting, but it also comes with its own pitfalls (see Equations 3 and 4). Due to its conservativeness, it can sacrifice its accuracy of model selection in order to keep complexity down. Complexity, in this case, is defined as the number of free parameters within a model. To overcome the drawbacks associated with each method, I will be using both criterions to determine the best model. BIC = ln(n)k – 2ln(L) (3) AIC = 2k – 2ln(L) (4)

In the above equations, L indicates the likelihood of the model, k is the number of parameters estimated by the model and n is the number of trials. Due to the setup of the experiment and the inherent makeup of the LBA model, there was a large range of possible free parameters. By ‘free’, we refer to parameters that could vary between conditions and need to be estimated to describe the data. As standard in the use of the LBA, the standard deviation of the drift rate of the correct accumulator (sd1) was fixed at 1 so to act as a scaling parameter for the rest of the model. Six parameter values (b, v1, v2, t0, A and sd2) are therefore left as a minimum amount to fit to the data. As there were six conditions, this leaves 36 possible free parameters to estimate as the freest model available. Of course, such a lack of constraint would provide us with a nice fitting model, but we would be sacrificing any interpretable results. The theoretical ideas behind our choice of models has already been explained, this section will focus on how to quantitatively compare them, in light of the data. Only theory-driven models will be mentioned and compared as there are a large number of exploratory fits that could be considered, too many to report here. We tested 8 models with varying constraints, ranging from 6 to 36 free parameters. For all parameters (except sd1), there were four possible constraints: 1) describe all six conditions with a single parameter estimate, 2) vary between valence (two estimates), 3) vary between bias types (three estimates) or 4) vary across both valence and bias type (six estimates).

(15)

Model comparison with the BIC favours model 1 heavily, suggesting it is the best fitting model for 12 out of 16 of the participants (see Table 3). In addition, it is ranked in the top three models for 14 out of the 16 subjects. Although this seems like conclusive evidence for model 1, it is in contrast to the descriptive results and parameters from the CAFs. Model 1 is the least complex model, comprising of only six free parameters. This suggests that the bias type had no effect on response thresholds or drift rates, even though there were clear behavioural differences between the experimental conditions. We then turned to the results of the AIC comparisons. This suggested that model 4 was the best fit to the data, being top for 5 out of the 16 participants. The model ranked in the top 3 for 12 of the 16 subjects. AIC therefore appears to paint a different picture of the effect of the bias manipulation in the task. Model 4 lets the threshold parameter vary across all conditions, both between valence and between bias types. To overcome the discrepancy between the BIC and AIC values, as opposed to taking a single value, we opted to look at the top two ranked models for both information criterions. This allowed us to look for patterns in the predictions of the models. The BIC suggests that model 1 and model 2 best account for the data, in contrast to the AIC, which implies model 3 and model 4. One relationship across three of four of the models is the effect of bias type on response thresholds. Neither model comparison technique implied an effect of bias on the drift rate parameters, in line with previous findings.

Table 3. Overview of the models tested in experiment 1. The models are shown row by row. The parameter columns

indicate whether they varied over the experimental conditions. 1 indicates that the parameter was only estimated once over all conditions, V indicates the parameter varied over the valence conditions, B indicates the parameter varied over the bias types. A, upper bound of starting point interval; b, response threshold; v1/v2, drift rate of correct/incorrect accumulator; sd2, standard deviation of drift rate of incorrect accumulator; t0, non-decision time. nPar = number of free parameters, nBest = number of times the model won, nTop3 = number of times the model ranked in the top 3 models.

(16)

As the difference in t0 values between the valences is not significant (p = 0.43), we can assume that model 3 is over fitting the data. Similarly, because the differences between the thresholds (b) of the bias types in model 4 do not significantly differ between the valences (p = 0.29), we can infer that this also does not accurately fit the data. Model 2 is the simplest model that assumes response thresholds differs over the bias types, and displays significant differences between them (p < 0.01). Model 2 therefore appears to be the simplest model that can adequately describe the data. This model assumes a single area of variability across the conditions. In line with our hypotheses and the results of previous studies, the presence of bias appears to modulate the response thresholds of participants during their perceptual choices. Specifically, the threshold of the cued response was lower than that of the uncued response. Averaging the results of model 2

over participants, the b parameter of the correct accumulator during a valid cue (bvalid) was 4.01; whereas

the threshold of the incorrect accumulator during such a cue (binvalid) was 4.13. The averaged values of the

other parameters in the best fitting model are as follows: bneutral = 3.96, A = 2.81, v1 = 3.73, v2 = 2.63, t0 =

0.099 and sd2 = 1.27.

Due to the uncertainties surrounding the model fits of experiment 1, we applied both a Bayesian and an Akaike model averaging technique as a further quantification of the effect of bias in the task (see Equations 5 and 6). Model averaging, both Bayesian and frequentist, helps overcome the issue of model uncertainty (Kim, Bartell & Gillen, 2015; Hoeting et al., 1999). By applying a weighting factor to each parameter value across all subjects, we can extract averaged parameter estimates taking all model specifications into account (Wagenmakers & Farrell, 2004). 𝑤_@𝐵𝐼𝐶 = 𝑒 1D_E∆G(HIJ) 𝑒1DE∆K(HIJ) L MND 𝑤_@𝐴𝐼𝐶 = 𝑒 1D_E∆G(PIJ) 𝑒1DE∆K(PIJ) L MND These weighting factors, termed Schwarz and Akaike weights, give each model a probability of being the best fitting model for each subject over all models (Akaike, 1974; Schwarz, 1978). For this we used the 8 theoretical driven models discussed above. The results of this averaging are in line with our theoretical stance, that bias type influences decision thresholds. Using a one-way ANOVA, we found that the b parameters derived from the model averaging methods show significant differences across bias type, the same was not found for the drift rates. As expected, the averaged parameters from the Akaike weights indicate a larger effect of bias on decision thresholds (p < 0.001) than its more conservative Bayesian

counterpart (p < 0.05). The results of the Bayesian model averaging (BMA) estimate the bvalid parameter to

equal 4.12, and the binvalid parameter to be 4.18. The values of the other Bayesian averaged parameters are as

follows: bneutral = 4.08, A = 2.84, v1 = 3.76, v2 = 2.66, t0 = 0.086 and sd2 = 1.28. The Akaike model averaging

(5)

(17)

calculations estimate the bvalid parameter to be 4.03, and the binvalid parameter to equal 4.14. The values of the

other Akaike averaged parameters are as follows: bneutral = 3.96, A = 2.83, v1 = 3.73, v2 = 2.63, t0 = 0.10 and

sd2 = 1.26. Therefore, inferences drawn from the parameter estimates and model specification of model 2 would be similar if we used the model averaged parameters from either averaging method.

In addition to this quantitative method of model selection, qualitatively visualizing the data and fits can be just as informative. Figure 9 displays plots of the model estimated accuracies and reaction times, from the BMA parameters, against the actual data recorded from the experiment. The first column shows the five quantile estimates of the model against the actual reaction time data during correct responses. The second also compares the quantile estimates of the model against the reaction time of the participants, but for incorrect responses. The greater the correlation between the fit and the data, the better the model describes the data. On the third column, the graph displays the accuracies associated with each participant against the accuracy predicted by the model. As there were six conditions, there are six columns visualizing the precision of the model within each manipulation.

(18)

Figure 9. Plots of RT quantiles and accuracy for data (y-axis) and fits (x-axis). Each row displays one of six

conditions in the task, all panels contain actual data and predicted data for all subjects. ●● ●● ● ●● ● ● ● ●● ●● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): gain invalid

Reaction time (ms) − Fit

Reaction time (ms) − Data ●● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ● ●● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): gain invalid

Reaction time (ms) − Data ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: gain invalid

Accuracy − Fit Accur acy − Data ● ●● ● ● ●●● ●● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ●●● ●● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): loss invalid

Reaction time (ms) − Data ●● ●● ● ● ●●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): loss invalid

Reaction time (ms) − Data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: loss invalid

Accuracy − Fit Accur acy − Data ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●●● ●● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): gain valid

Reaction time (ms) − Data ●● ● ● ● ●●● ●● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): gain valid

Reaction time (ms) − Data ● ● ● ● ● ● ● ● ● ●_● ● ● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: gain valid

Accuracy − Fit Accur acy − Data ●● ●● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●● ●● ● ● ● ●● ● ● ● ● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): gain neutral

Reaction time (ms) − Data ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ●● ● ●● ● ● ● ●● ● ● ● ●● ●● ● ●● ● ● ● ● ●● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): gain neutral

Reaction time (ms) − Data ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: gain neutral

Accuracy − Fit Accur acy − Data ●● ● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): loss neutral

Reaction time (ms) − Data ●● ● ● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ● ●● ● ● ● ● ● ● ● ●● ●● ● ●● ● ● ● ● ● ●● ● ●● ● ● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): loss neutral

Reaction time (ms) − Data ● _● ● ● ● ● ● ● ● ● ● ● ● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: loss neutral

Accuracy − Fit Accur acy − Data ●● ●● ● ●●● ●● ● ●● ● ● ●● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ● ●● ● ● ● ●● ● ● ●● ● ● ● ●● ●● ● ●● ●● ● ● ● ● ● ● ● ●● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Correct Responses (RTs): loss valid

Reaction time (ms) − Data ●● ● ● ● ●● ●● ● ●● ● ● ● ● ● ● ● ● ●● ●● ● ● ● ● ● ● ● ● ● ● ● ● ●● ● ● ●● ● ● ● ● ● ● ● ● ●● ● ● ● ● ● ● ● ● ● ● ●● ●● ●● ● ● ●● ● ● ● ●●●● ● 0 500 1000 1500 2000 0 500 1000 1500 2000

Incorrect Responses (RTs): loss valid

Reaction time (ms) − Data ● ● ● ● ● ●● ● ● ● ● ●● ● ● ● 0.4 0.5 0.6 0.7 0.8 0.9 1.0 0.4 0.5 0.6 0.7 0.8 0.9 1.0

Accuracy: loss valid

Accuracy − Fit

Accur

acy

−

(19)

Parameter Recovery

In order to verify the model, we performed a parameter recovery. As model averaged parameters cannot be used for parameter recovery, we are displaying results from model 2 recovery. The process consists of firstly generating reaction time and accuracy datasets using randomly sampled parameter values. The same numbers of trials as used in the experimental task were generated (720). These datasets are then modeled using the same parameter makeup as the data-generating model. Figure 10 displays the results of this recovery, indicating that the parameters can indeed be reliably re-retrieved. Each point within the graphs signifies a data point of an individual dataset. If the coloured line entirely follows the diagonal dotted line, the estimated parameters have perfectly recovered the true parameters. The actual correlation between parameters of the same or different type are displayed in the legend box within each graph. Below these are the root mean squared deviations (RMSEs) of the two parameter sets, indicating the magnitude of error between the true and estimated values. The results indicate that v1, v2, sd2, t0 and b can all be recovered very well. The A parameter, on the other hand, was not recovered to the same extent. Though, recovery of the starting point parameters in both the DDM (sv) and LBA (A) has always yielded inconsistent results. As seen in figure 10, the relationships between non-analogous parameters are relatively uncorrelated, indicating the influence of each parameter is reasonably independent.