• No results found

Examining the impact of outcome-precision on reinforcement learning

N/A
N/A
Protected

Academic year: 2021

Share "Examining the impact of outcome-precision on reinforcement learning"

Copied!
31
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

Examining the impact of outcome-precision on

reinforcement-learning

26EC

03/02/2017 – 16/06/2017

Scott Isherwood

11407093

Supervisor: Maël Lebreton

Co-assessor/UvA Representative: Jan Engelmann

Center for Research in Experimental Economics and Political Decision Making

(CREED)

MSc in Brain and Cognitive Sciences, University of Amsterdam (UvA)

Track: Behavioural Neuroscience

(2)

Abstract

Naturalistically, decision-making is shrouded in uncertainty. Many aspects affect these choices, whether you’re selecting a trade on the financial market or trying to find the most efficient route to your favourite restaurant. Both irreducible (aleatory) and reducible (epistemic) uncertainty surround everyday life, but the latter has had little attention. The purpose of this study was to investigate if and how outcome-precision influences the dynamics of reinforcement learning in a simple binary economic choice task, and how this may translate to real world instances. The term outcome-precision is simply the inverse of the variance associated with an outcome. Through trial-and-error, participants (n = 18; m/f = 4/14) were asked to learn the values of abstract symbols, defined by a set of independent probability distributions, and to maximize their instrumental performance. By observing their performance in conditions differing in outcome-precision (high or low) and valence (reward or punishment), it was possible to elucidate the effect of this precision on learning. The features of the environment that are integrated into each decision-process are of significant importance in the study of behavioural economics. Moreover, the weight at which individuals give these features can differ, especially under uncertain conditions. The current paper demonstrates that outcome-precision does indeed impact reinforcement learning and that this impact appears to be sensitive to modulation by the reward and punishment learning systems.

(3)

Introduction

Decision-making under uncertain conditions is something that humans face on a day-to-day basis. From estimating the quickest route to work to selecting the right investment in the financial market; previous outcomes, predictions and uncertainty all influence the decision apparatus. In order to know which actions provide beneficial outcomes, persons must receive feedback from the environment. By nature, our environment is very uncertain: stimulus-action-outcomes associations are never deterministic but rather probabilistic – or stochastic. Through reinforcement learning, we are able to adapt to these noisy environments. Rangel (2008) describes a simplified algorithm of how to learn these stimulus-action-outcome associations and how representations of the outside world are updated (see figure 1). For example, arriving to work late after choosing what turned out to be a slower route acts as a negative reinforcement and presumably drives the updating of the value assigned to this less advantageous action. At an elementary level, reinforcement learning involves forming representations of the environment through assigning values to actions

based on the outcomes these actions yield; this is the internal model. Though, this process is by no means simple: the world is inherently uncertain. Due to this, uncertainty is an important factor that needs to be integrated into the decision-process. Yu and Dayan (2005) describe this uncertainty as a deviation from the standard underlying patterns defining the environment.

Investigating the way in which we make decisions under uncertain conditions is therefore an important course of academic research. With regards to this

Figure 1. Computational processes underlying value-based decision-making. The process of choice can be split in five stages. Firstly, a representation of the decision problem must be formed, inclusive of the internal and external states along with the possible executable actions. Secondly, valuation of the possible actions is incorporated into the decision-process. Thirdly, an action is selected on the basis of its valuation. Fourth, the outcome of the selected action is evaluated to infer how advantageous it is. The last stage integrates this evaluation into the previous processes, updating them, so to increase the quality of future decisions. Adapted from Rangel et al., 2008.

(4)

uncertainty, two fundamentally different types arise: aleatory and epistemic. The former, aleatory, describes uncertainty that is irreducible. This is due to the unpredictability and stochasticity of the system it describes. Being irreducible consequentially means that the action you take in a certain state will not affect your knowledge at subsequent states. For example, rolling a dice is an aleatory form, although all of the possible outcomes of the dice are known, this does not aid you in predicting what the actual next outcome would be. Each state is therefore independent. Epistemic uncertainty, on the other hand, which will be the core of this work, derives due to a lack of complete knowledge, where this uncertainty is reducible through the gaining of new information. For example, to find the quickest path from point A to point B, an agent would have to attain details about aspects of the route (distance, obstructions, traffic). To start with, all routes between A and B are uncertain, but this uncertainty can be resolved through gathering knowledge on their features. Over time, past experience and new information can be integrated to indicate which is the quickest route. The information you receive at one state therefore aids the decision-process at the next state. In the case of this example, the outcome of the agents’ choice would be the length of time taken to traverse from A to B. One route may largely vary, and thus the time taken would vary, while another may be slower on average but its timing may be more precise. If and how persons take into account the outcome-precision of their choices is the focus of this work. Precision here is defined as the inverse of the variance (Dodge and Marriott, 2003).

Much previous literature has attempted to model the effects of uncertainty on reinforcement learning (Palminteri et al., 2015; Behrens et al., 2007; Courville et al., 2006 O’Doherty et al., 2003; Pessiglione et al., 2006). These studies have typically comprised of paradigms based on Bernoulli processes, where the binary probability of obtaining a reward was manipulated (Palminteri et al., 2015; Pessiglione et al, 2006). Crucially then, most of these studies have investigated the learning of stimulus-action-outcome associations using discrete outcome-distributions. As these generated outcomes are stochastic, each state is independent and participants cannot rely on learning to resolve the uncertainty. Lottery tasks also share this feature of independency between trials (Wright et al., 2013; Symmonds et al., 2011).

As such, instances of aleatory uncertainty have been examined profusely but another important source of uncertainty, epistemic, has had little attention. Processes like the examples given do not cover the majority of naturalistic decision problems. Uncertainty surrounding choices in real world settings are rarely discrete. Additional research has attempted to investigate the influence of uncertainty resulting from environmental volatility using complex reversal-learning

(5)

tasks (Behrens et al., 2007; Iglesias et al., 2013). Little-to-no research focused on uncertainty in simple reinforcement learning tasks where the outcomes are continuous.

We aimed to capture decision-making in a more naturalistic setting, so to emphasize the importance of epistemic uncertainty and to better understand how we approach more ecological choice situations. The issue with binary choices is that is does not fit the sequential decision structure of the real world. Interdependent and successive decisions, based on the prediction of future states are fundamentally different to how we approach discrete, independent choices such as lotteries. Most decisions in daily life, such as successful driving, do not occur as a result of one action, but as a result of multiple successive actions and outcomes. As such, it is essential to plan ahead and construct representations that can be built upon over time in order to make better decisions.

The effect of using continuous outcome distributions, with varying precision, has not been examined to such a large extent (Hertz et al., 2017). With respect to the previous examples given, rolling a normal dice has a discrete distribution associated with it. There are only six possible outcomes, a pre-defined and limited number of values that can be learnt, with an outcome uncertainty that is non-reducible. Naturalistic decisions, on the other hand, are generally more complex. For example, say you had to pick a restaurant in the city center on a busy Saturday night. Your favourite restaurant would normally be the obvious choice, but previously at this time it has varied from anywhere between almost empty to a queue leading out onto the street. You also know of another restaurant on the other side of the city that always has space, but the food is not as good. The way in which we would approach this decision exemplifies the feature of outcome-precision in our choices. We could reduce the uncertainty through further iterations of this decision, but the variance in the outcome of our choices will always remain a factor. The optimization of this tradeoff between mean and variance has often been modeled in finance (Markowitz, 1952). In contrary to psychological and mathematical outlooks of risk, which see risk as the probability or magnitude of a loss, theories in finance commonly relate risk to the variance of returns. The relationship between risk and variance will be described in further detail below.

Risk can be fragmented on the basis of the mean, variance and skewness of the probability distribution. These so-called ‘moments’ are used for descriptive purposes and provide the intrinsic features of the distribution. Tendencies towards these components can be very individualistic and have been shown to vary (Symmonds et al., 2011; Paulsen et al., 2012; Alderfer, 1970). The relationship between variance and risk is simple; in general, greater variance increases

(6)

the likelihood of more extreme values (if the mean is kept constant). These extreme values can be more advantageous or more disadvantageous, but by our definition of risk, the variance has increased the probability of a more negative outcome occurring. Due to this, variance-averse tendencies can often parallel risk-averse tendencies (Symmonds et al., 2011). The two concepts are so closely related that some have even used variance as their definition of risk (Bossaerts, 2010; Wright et al., 2013; Markowitz, 1972). Mean-variance theory has a high application efficacy in portfolio selection but its simplified classification of risk means its generalization to circumstances outside of finance are limited (Markowitz, 1972). Formally, they should be kept distinct from one another and as such, variance has been used as an approximate of risk (e.g. coefficient of variance) rather than a definition (Tobler et al., 2007; Weber et al., 2004; Christopoulos et al., 2009). Neuroimaging data concurs with the assumptions that risk-perception is not comprised solely of variance encoding (Symmonds et al., 2011). Precision here therefore refers to the inverse of this statistical variance, and not an analogy to risk, even though instances of greater variance are often accompanied by greater risk.

Knight (1921) was the first to discriminate between the notions of risk and uncertainty. In decisions under risk, the volatility in the environment is known, in this case an agent is fully informed on the probability distributions of the possible outcomes. During uncertain decision-making, this probability is not entirely known. Through sampling and exploratory behaviour, uncertainty can be reduced to risk, a transition that participants made through trial-and-error in the learning task of this experiment. Importantly, due to the feedback presented (factual and counter-factual), sampling activity remained consistent. A more risky decision implies a higher likelihood of a negative or undesirable outcome occurring. Risky decisions have also been shown to be favoured over uncertain ones, a phenomenon known as ambiguity aversion (Ellsberg, 1961).

In a probabilistic experimental paradigm, where visual tasks include abstract symbols paired with rewards or punishments, individuals can use trial and error to enhance their instrumental performance (ability to seek rewards and avoid punishments). Computationally, this can be done through learning the expected values (EV) of the available prospects (Daw and O’Doherty, 2013; Pessiglione et al., 2006; Sutton and Barto, 1998). Trial-by-trial, the values assigned to the chosen prospects can be updated with prediction errors (δ): using the discrepancy between the expected value and the actual value received. Uncertainty in the environment should increase the elasticity of the EVs but these values should remain stable in the presence of noise

(7)

(Yu and Dayan, 2002; Symmonds et al., 2011). If the decision-maker can predict when rewards are going to occur they are more likely to choose the more rewarding actions. Elasticity here is analogous to the learning rate (α), or speed of learning, which is an important component of the delta-learning rule (Rescorla and Wagner, 1972). Prior studies have attempted to model reinforcement learning under uncertainty, exploiting modifications of the delta-learning rule (Palminteri et al., 2015; Hertz et al., 2017; Rescorla and Wagner, 1972). Trial and error (experiential learning) works through this general error driven learning mechanism. Presumably, this could be used to solve both types of uncertainties, but the algorithms used in the face of epistemic or aleatory uncertainty may exhibit slight fundamental differences.

In unstable environments, estimating the expected values of the cues presented is also accompanied by an assessment of its uncertainty estimate (Rushworth and Behrens, 2008). Uncertainty therefore has an effect on the subjective preferences of choice options (Mengal, Tsakas & Vostroknutov, 2016). The goal of reinforcement learning strategies is simple; to use current data and prior knowledge to calculate the best action to next take. In order to know the appropriateness of the action, agents must receive feedback, in the form of positive or negative reinforcements. Here, these are in the form of positive points for the reward conditions and negative points for the punishment condition. In the case of the paradigm described here, the task contingencies, or possible outcomes, are unknown but must be learnt throughout the experiment through the use of rewards and punishments. Thus, a model-free reinforcement learning method is used, as the subjects rely on the rewards received and do not need to learn the state representations necessary for a model-based mechanism.

In order to investigate the impact of epistemic uncertainty on reinforcement learning, we adapted a previous simple reinforcement-learning task (Palminteri et al., 2015). The goal of the participant is to learn the outcomes of the stimuli (symbols) presented so to maximize their rewards and minimize their punishments. The difference being that, instead of generating stimuli-outcome associations with a fixed probability, we manipulated their outcome-precision. We suspect that the manipulation of outcome-precision will cause individuals to conform to a mean-variance tradeoff, this tradeoff being subjective. Their experience-based choices will work to optimize these mean-variance preferences, which likely vary between individuals. For example, some subjects may be very variance averse and thus sacrifice the potential for a higher reward so to keep the associated precision as high as possible.

(8)

Symbols in the learning phase were presented in fixed pairs, each symbol with a set of pre-defined and independent numerical outcomes. One symbol within each pair represented a distribution with a higher expected value (EV). The better option can have a high EV and high precision (congruent) or a high EV and low precision (incongruent). Likewise, the worse option can have a low EV and high precision (incongruent) or a low EV and low precision (congruent). Model-free learning, through trial-and-error, will work to extract the EVs associated with each symbol so to know the most advantageous symbol to select. The outcomes of each stimulus will be drawn from predefined distributions, such that the learning of these distributions will imply the better symbol (higher EV).

Previous investigations indicate individuals perform equally well in reward and punishment contexts during binary choice tasks (Palminteri et al., 2015; Pessiglione et al., 2006). Given that there appears to be distinct neural signatures for reward and punishment-learning (Palminteri et al., 2012), and dissimilarities in behavioural signatures such as their value and weighting functions (Kahneman and Tversky, 1979), it is not unlikely to assume that the manipulation of outcome-precision will impact the two domains in different ways. A discrepancy between the performances in the reward and punishment conditions would be in contrast to normative theories such as Expected Utility Theory (EUT), which would exclude an effect from framing (Bernoulli, 1738; Von Neumann and Morgenstern, 1944). Whether people make normative choices under the uncertainty introduced here is our key research question.

Aims and Objectives

The aim of this research was to elucidate the influence of outcome-precision on reinforcement learning in a simple behavioural task. We therefore manipulated the outcome-precision of distributions paired with abstract symbols over the course of an instrumental task. By implementing two contexts (gain and loss) the effects of outcome precision on learning can be analyzed separately in a reward-learning and punishment-learning context. Behavioural results will be used to indicate whether this type of uncertainty influences reinforcement learning in general, and further analysis can reveal its interaction with valence. Eliciting a preference task will indicate as to what degree of value learning occurred in the learning task.

The null hypothesis (H0) of performance in the learning task would not only assume a lack of influence of valence and congruency independently, but also during their interaction.

(9)

Computationally, as a normative solution, we should assume neutrality towards precision and skewness. On the basis of standard reinforcement learning algorithms we would predict performance to be the same, simply due to overlap of statistical moments in congruent and incongruent conditions. That is, the difference in EV, skewness and variance of the prospects within each pair are equivalent, regardless of the congruency. The computation of the difference in EVs therefore arrives at the same answer. This would entail that the learning of the most advantageous symbol would occur at a similar level in both conditions. Differences in performance between the congruencies are therefore the result of individual biases. This is likely not true for the valences. The processes regarding reward learning and punishment learning have recently been proposed to be encoded through separate neural networks (Palminteri and Pessiglione, 2017). Therefore, across valences, differences in performances may be expected but within a given valence, outcome-precision, normatively, should not induce differences.

Of course, individual biases are innate and likely vastly impact decision-making and learning. With respect to congruency, we therefore propose two contrasting theories on its influence. Our first hypothesis (H1) rests on the notion of variance aversion; we predict that the congruent conditions will display a greater degree of learning in both valences. As the better option in the congruent condition has less variance, and thus more outcome-precision, we expect more ease in learning the correct policy. An alternative hypothesis (H2) is that persons will exhibit valence-dependent risk-preferences; valence and congruency may interact in a way that will produce contrasting results between the reward and punishment domains. The differences in risk-aversive and risk-seeking behaviour between valences may shift preferences for the intrinsic moments defining the outcome distributions. A graphical illustration of the results we would expect with regards to these hypotheses is presented below (figure 2). We hypothesise that the ability to discrimnate between better and worse prospects in the preference task will reflect performance in the learning task. That is, greater discrimination will occur in conditions where greater learning took place.

(10)

In addition to a measure of performance in the learning task, reaction times (RT) were also recorded. Previous work has indicated differences in RT between reward and punishment conditions (Guitart-Masip et al., 2011; Wright et al., 2012). Due to losses causing a larger psychological arousal than gains, it is thought that we naturally attend more to punishments, so to minimize the likelihood of receiving one. Although RT can rarely be used as a predictor of performance, it can indicate differences in computational processes. Due to the contrast in EV and outcome-precision in the incongruent conditions, we predict that this will present with longer RT in these conditions. Participants may have to allocate more cognitive resources to learning the policy associated with the incongruent conditions, both in the gain and loss domains. Trial Co rr ec t ( % ) 0 30 H0 Trial Co rr ec t ( % ) 0 30 H1 Trial Co rr ec t ( % ) 0 30 H2 Con+ Inc+ Con- Inc- Figure 2. Illustration describing the learning curves expected as a result of each of three hypotheses. The images display the proposed average percentage correct across all trials of each condition. H0 – There will be no difference in learning between each condition. H1 – Congruency will largely influence learning, with greatest performance in the congruent conditions. H2 – There will be an interaction between valence and congruency producing a marked difference between conditions.

(11)

Materials and Methods

This study was approved by the Center for Research in Experimental Economics and Political Decision Making (CREED) department at the University of Amsterdam (UvA). Healthy subjects (n = 18; male/female = 4/14) performed a probabilistic reinforcement-learning task comprising of both reward and punishment domains. The 2x2 factorial design presented was adapted from previous instrumental learning paradigms (Palminteri et al., 2012, 2015; Pessiglione et al., 2006). The subjects received an instruction sheet (see supplementary data) and a practice task comprising of 24 trials so to demonstrate the paradigm in task 1. The study was divided into three experimental tasks: learning, preferences and monetary lotteries, as visualized in figure 3. Tasks 1 and 2 were divided into 3 sessions, each composed of 120 or 56 trials, respectively. The data from the monetary preference task (task 3) will not be discussed in this paper as irregularities in the behavioural results meant it does not have relevance for this work. Figure 3. Schematic overview of the experimental setup. Participants first received an instruction sheet with explanations and a description of the tasks. Task 1 was preceded by a short practice task to aid in the understanding of the learning task. The learning and preference tasks were run three times. Lastly, subjects performed a monetary lottery task. Practice task Instruction

sheet Learning Task 1: Preferences Task 2:

Task 3: Monetary preferences

3 sessions

(12)

Learning task

Per trial, subjects were required to choose between two abstract symbols, one representing a distribution with a higher expected value (EV). Objectively then, the symbol with a higher EV is more advantageous to the participant. These pairs were presented as stable choice contexts, with each pair being displayed 30 times over the course of each session. In each pair, one symbol was objectively better, as its distribution of outcomes described a set of numbers with a higher EV. Participants received feedback on their choices, with both factual and counterfactual information being displayed. Each session of the learning task presented 4 symbol pairs, representing the 4 conditions of the experiment. Within each pair, the underlying outcome distributions vary with respect to EV, skewness and precision. Where, outcome-precision is the main component of interest.

An important notion in this work is that of congruency. Congruency refers to the correspondence between the expected value of a symbol and its outcome-precision. That is, a prospect with a higher expected value that also has greater outcome-precision than its associate distribution, can be said to be congruent. As such, a prospect with a lower expected value but a higher outcome-precision than its associate distribution would be described as incongruent. Congruency does therefore not refer to a single symbol, nor indicate the more advantageous symbol within a pair but simply describes the commonality of two distributions. Both a congruent and an incongruent condition arise in each of the valences. Table 1 illustrates this setup and the numerical distributions of each symbol are displayed in figure 4. The four conditions described in table 1 signify: congruent pair - gain domain (Con+), incongruent pair - gain domain (Inc+), congruent pair - loss domain (Con-) and incongruent pair – loss domain (Inc-).

The ranges of the distributions represented by the symbols are consistent between the values of 3 to 8 for the reward conditions, and -3 to -8 for the punishment conditions. Ranges

Low outcome-precision High outcome-precision

Low expected value Congruent Incongruent

High expected value Incongruent Congruent

Table 1. Make-up of experimental conditions. Each condition, congruent or incongruent, has

one symbol with a low EV and one symbol with a high EV. Congruent and incongruent conditions arise in both the reward and punishment contexts.

(13)

were set a consistent, so to remove the confound of extreme values. Each session was presented in two blocks of 60 trials, each block consisting of either congruent pairs or incongruent pairs, one from each valence. Two pairs therefore gave the opportunity for gaining points within the task, while the other two pairs gave the opportunity for losing points. The pairs presented in each valence are strictly equivalent, allowing for accurate comparison between performances. Figure 4. Graphical depictions of the outcome distributions associated with the symbol pairs. Blue represents the flat associate distribution of each pair. Green signifies reward conditions, with red signifying the punishment conditions. Each outcome of the flat distributions occur with a probability of 16.67%. Each outcome of the skewed distributions within the congruent conditions (a and c) occur with a probability of 6.7%, 10.0%, 16.7%, 26.7%, 33.3% and 6.7%, respectively. Each outcome of the skewed distributions within the incongruent conditions (b and d) occur with a probability of 6.7%, 33.3%, 26.7%, 16.7%, 10.0% and 6.7%, respectively. 0 10 20 30 40 1 2 3 4 5 6 7 8 9 10 Pr ob ab ili ty (%) Outcome (points) 0 10 20 30 40 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 Pr ob ab ili ty (%) Outcome (points) 0 10 20 30 40 1 2 3 4 5 6 7 8 9 10 Pr ob ab ili ty (%) Outcome (points) 0 10 20 30 40 -1 -2 -3 -4 -5 -6 -7 -8 -9 -10 Pr ob ab ili ty (%) Outcome (points)

a

b

c

d

(14)

Symbol Type EV Skewness Precision

Con+ Flat 5.5 0.00 0.3314 Con+ Skewed 5.9 -0.59 0.5503 Inc+ Flat 5.5 0.00 0.3314 Inc+ Skewed 5.1 0.59 0.5503 Con- Flat -5.5 0.00 0.3314 Con- Skewed -5.1 -0.59 0.5503 Inc- Flat -5.5 0.00 0.3314 Inc- Skewed -5.9 0.59 0.5503 Distribution pools for each condition

Symbol Type 3 4 Points 5 6 7 8

Con+ Flat 5 5 5 5 5 5 Con+ Skewed 2 3 5 8 10 2 Inc+ Flat 5 5 5 5 5 5 Inc+ Skewed 2 10 8 5 3 2

Symbol Type -3 -4 Points -5 -6 -7 -8

Con- Flat 5 5 5 5 5 5

Con- Skewed 2 10 8 5 3 2

Inc- Flat 5 5 5 5 5 5

Inc- Skewed 2 3 5 8 10 2

Table 2. Description of the statistical moments defining the probability distributions of each

symbol. Each condition is made up of one flat distribution and one skewed distribution with a difference in expected value of 0.4 points. Table 3. Frequency of values in the outcome pools of each symbol. The numerical values in bold indicate the range of the symbol distributions. Each row defines the frequency that each value occurred in the outcome pool of the symbol. These pools are made up of 30 overall outcomes, each specific to the symbols presented.

(15)

The procedure of the learning task is displayed in figure 5, showing examples from both the reward and punishment conditions. Regret theory suggests counterfactual information aids in memory formation due to adding a strong emotional component to the circumstance, in the form of regret (Loomes & Sugden, 1982). Upon each choice, 3 outcomes of both factual and counterfactual feedback are presented. Each outcome presented is taken at random from the appropriate outcome pool describing the displayed symbols (see table 2). Statistical descriptions of the distributions can be found in table 3. By increasing sampling we predict that this will augment learning, as more information can be integrated into the participants knowledge of the distributions defining the symbols. The extent of these continuous distributions is more realistic than the use of binary outcomes (Courville et al., 2006; Behrens et al., 2007; Palminteri et al., 2015). We believe the noisy distributions will therefore confer greater ecological validity, as raw sensory stimuli in the real world are also unstructured and scattered with noise.

+

^

+

^

8 3 5 You have gained 3 points 4 4 6

+

+

+

-4 -6 -6 You have lost 6 points -4 -3 -7

^

^

Fixation – 500ms Choice - 1500ms Feedback - 3000ms Condition - Self-paced

Figure 5. Experimental design of the learning task. The fixation point (500ms) is followed by the

presentation of the symbol pairs. This phase is self-paced and participants were required to select the left or right option. Subsequently to this, their choice will be displayed in a box for 1500ms (right symbols selected in this example), and then the factual and counterfactual information will also be displayed in boxes (3000ms), with the coloured box indicating the points that will be added or deducted from their score. This process was repeated 120 times per session. As shown, symbol pairs can represent either positive or negative values.

(16)

Preference task

The preference task consisted of three sessions, each comprising of 56 trials. In this task, participants were once again required to select one of two symbols displayed on the screen; the one they believed to be most advantageous. Each session presented the 8 symbols the subjects had learnt in the previous learning task, but this time given as mixed pairs of all 28 possible combinations. As this is a preference task, no feedback was given after each choice (neither factual nor counterfactual). All possible pairs were displayed twice per session and their positions (left or right) were psuedorandomized. Participants had to indicate the symbol that they believed to harbor the highest expected value, with correct answers indicating a confirmation of learning. That is, if participants were able to discriminate between two previously unpaired symbols, and select the one with a higher EV, it would suggest they carried a representation of the EV of individual symbols. In this case, we would say that symbol values were learnt context-independently, as their EV could be retrieved in the absence of the original symbol pairs. On the other hand, if the participants scored well in the learning task, but were unable to discriminate accurately between previously unpaired symbols in the preference task, we would assume a context-dependent learning mechanism. This is because it would suggest the original pairs (contexts) were needed for successful value retrieval of the symbols. Our predictions of performance within this task are displayed below in figure 6.

Figure 6. Graphical hypotheses for the preference task. Green depicts reward contexts, with red

depicting punishment contexts. The first symbol of each condition presented represents the one with a higher EV. HO – The null hypothesis (left) suggests a context-dependent learning mechanism, where the only retrievable information without stable choice contexts is that of valence. As positive values are always more advantageous than negative values, they should be selected more by default, even if absolute values are not learnt. H1 – We hypothesize (right) that symbol values will be encoded in a context-independent fashion, allowing value retrieval in the preference task and discrimination of more and less advantageous symbols. % Ch os en Symbols Con+ Inc+ Con- Inc- % Ch os en Symbols Con+ Inc+ Con- Inc-

(17)

Results

Participants of this study completed three components within the experiment: a learning task, a preference task and a monetary lottery task. As described in the methodology section, the learning task required subjects to use trial-and-error to learn the values of pairs of abstract symbols so to maximize their scores. The preference tasks closely followed the learning tasks and required subjects to use the values of the symbols they had learned to select the most advantageous symbol in the absence of feedback. Learning task The implemented design comprised of four conditions, formed by the crossing of our 2x2 factors: 2 levels of valence (gain/loss) and 2 levels of congruency (congruent/incongruent). The first aspect we examined was whether the subjects exhibited a significant level of learning. By this, I mean whether the overall performance was above chance level (0.5) in all conditions. Performance here was measured as the percentage of correct choices, defined by the fraction of responses where the participants selected the symbol representing the distribution with a higher EV in each condition. The average correct choice rate indicated that performance in the conditions were indeed above chance level (P < 0.05) (see figure 7). In addition, consistent with previous findings, it would appear that subjects learnt equally well in the gain and loss domains (Palminteri et al., 2015; Pessiglione et al., 2006).

We then tested one of our central research questions; does valence impact preferences towards outcome-precision? In order to answer this question we inserted the trial-by-trial performance of individuals into a three-way repeated measure analysis of variance (ANOVA). For this, the categorical factors of valence and congruency were used as well as the continuous, explanatory variable: trial number. The ANOVA showed that there was no significant influence of valence (F = 1.26, P > 0.2) or congruency (F = 0.21, P > 0.6) but, as expected, the trial position appeared to be a significant predictor (F = 13.69, P < 0.01). The interaction between valence and congruency did not appear to be significant (F = 0.46, P > 0.5), but the combined interaction of valence, congruency and trial number does show significance (F = 4.56, P < 0.05). This suggests that in isolation, neither valence nor congruency contribute massively to the dynamics of learning. However, it does indicate that through learning, across the trials, a developing interaction arises between valence and congruency. This finding is something not predicted by hypotheses H0 or H1,

(18)

and thus favours our latter hypothesis, H2. Although performances in the four conditions do not display significant differences, the influence of congruency does appear to be asymmetric across valence, which is in line with our alternative hypothesis proposed earlier (H2). Hypothesis H1 predicted greater performance in the congruent conditions of both valences; this observation was not made upon analysis of the results. These results therefore suggest that the dynamics of reinforcement learning in this task were impacted by an interaction between outcome-precision and valence. The fact that an interaction arises between valence and congruency indicates that precision is integrated into the decision process, and that this integration is in some way influenced by the perception of reward or punishment.

Figure 7. Graphical plot of performance in the learning task, separated by conditions. The reward

context can be seen on the left and the punishment context on the right. Performance at chance level is indicated by the dotted black line at 50%. Dots indicate mean % correct at each trial interval (averaged per 3 trials). Error bars indicate standard error of the mean (sem) for each dot point. 0 2 4 6 8 10 trial bin 40 45 50 55 60 65 70 75 80 %correct GAIN Congruent Incongruent 0 2 4 6 8 10 trial bin 40 45 50 55 60 65 70 75 80 %correct LOSS Congruent Incongruent #Trial #Trial

(19)

Condition % correct in learning task % increase in performance over all trials Con+ 0.59 ± 0.03* 31.5 Inc+ 0.55 ± 0.03 9.3 Con- 0.60 ± 0.03** 7.4 Inc- 0.63 ± 0.03** 13.0 In order to test what aspects of the task required greater attention, we turned to the analysis of reaction times (RT). To do this, we used a three-way repeated measure ANOVA, with valence and congruency as categorical factors and the trial number as an explanatory variable. The ANOVA showed a significant influence of valence (F = 7.07, P < 0.05) and congruency (F= 17.78, P < 0.001) on RT. RT in the loss domain was significantly larger than that of the gain, which is in conjunction with previous work (Guitart-Masip et al., 2011; Wright et al., 2012). RT in the incongruent conditions also appeared to be larger than the congruent conditions, consistent with the predictions made earlier (see figure 8). Due to the differences in performance between the reward and punishment conditions it is problematic to extrapolate the differences in RT to account for instances of greater attention or computational dissimilarities. Table 4. Table of results for learning task, presented per condition. The second column indicates overall performance across condition. The third column displays the increase in performance per condition over all trials. % correct data reported as mean ± sem. * P < 0.05, ** P < 0.01, t-test, comparing performance in individual conditions against chance level (50% performance).

(20)

Reward/Punishment 0 200 400 600 800 1000 1200 1400 1600 1800 Reaction times (ms) Congruent Reward/Punishment 0 200 400 600 800 1000 1200 1400 1600 1800 Reaction times (ms) Incongruent

**

**

*

*

Figure 8. Reaction times distributed according to condition. Average reaction times for reward and punishment for both the congruent and incongruent contexts. *P < 0.05, **P < 0.01 one sample t-test. ns: not significant.

(21)

Preference task

Results in the preference task were used to indicate the extent of learning in task 1. Greater performance, and therefore greater value retrieval in this task, would imply greater learning had taken place. To test this, we examined the difference in selection rate between the symbols in each pair. As all symbols were displayed the same amount of times, we could look at their overall selection rate as an indicator of which symbols participants believed to be more advantageous. The results of this analysis are shown in figure 9. In conditions Con+ and Inc-, the symbol with a higher EV was selected significantly more than the less advantageous symbol in the pair, thus indicating value retrieval. We then used a two-way repeated measure ANOVA with valence and congruency as categorical variables, to test if either factor influenced performance in the preference task. Results indicated there was no significant effect of valence (F = 2.71, P > 0.1) or congruency (F = 1.13, P > 0.3) within this task. This uniformity across valence contexts in the preference task is in line with previous findings investigating similar decision-making paradigms (Palminteri et al., 2015).

(22)

Figure 9. Bar graph displaying the amount each symbol was selected, averaged by 18 participants

and by the 3 sessions. Symbols 1 & 2 constitute the Con+ condition, symbols 3 & 4 the Inc+ condition, symbols 5 & 6 the Con- condition and symbols 7 & 8 the Inc- condition. Within each condition, the odd numbered symbol depicts the better option within the pair. That is, with regards to the expected values of the symbols: 1>2, 3>4, 5>6 and 7>8. * p < 0.05 one sample t-test; NS, not significant. 0 1 2 3 4 5 6 7 8 9 #symbol 0 10 20 30 40 50 60 70 80 % chosen Con+ Inc+ Con-

Inc-*

*

NS NS

(23)

Discussion

The purpose of this work was to elucidate the influence of epistemic uncertainty on reinforcement learning. In order to do so, subjects took part in a binary choice task in which they had to learn the values of abstract symbols over the course of the experiment. The outcomes of these symbols during the task were sampled from continuous distributions rather than being defined by discrete probabilities. We orthogonally manipulated the expected value and the precision of the probability distributions associated with each abstract symbol.

The null hypothesis, as a normative assumption, predicted uniform performance regardless of the associated outcome-precision and expected value in each condition. This is in contrast to our obtained results, which indicate that the relationship between expected value and precision do indeed influence performance. In the reward condition, when the better symbol (higher EV) had a more precise outcome-distribution, performance was greater than when the better symbol had less outcome-precision. The opposite was true for the punishment condition, where performance appeared greater when the better symbol had a less precise outcome-distribution. The fact that, within valence (gain vs loss), performance was markedly different, indicates that precision does influence the dynamics of reinforcement learning. To reiterate an earlier point, from a normative and unbiased view, performance in the congruent and incongruent conditions is expected to be the same. The calculations required to discriminate the symbols use the same algorithm and arrive at the same differences in EV. The differences in performance signify that precision is incorporated into the learning of both reward and punishment. On top of this, we also provide evidence for the opposing nature of this influence between the reward and punishment conditions. When gaining points, precision appears to be favoured, whereas, when losing points, greater outcome-precision is not weighted as such.

The results observed here do not correlate with assumptions put forward by Expected Utility theory (EUT). The use of EUT would expect invariance between performances in the congruent conditions across valence, as it suggests framing does not have an effect on individual preferences. Here, we show an asymmetric effect of valence on congruency, indicating differences in the computational processes required for reward and punishment learning. Thus, the normative EU theory is unable to accurately explain the results obtained. Prospect theory on the other hand, a descriptive extension of EU theory, similarly found a difference in behaviour in the reward and punishment domains.

(24)

In the reward condition, when the better symbol had a more precise outcome-distribution, variance-averse preferences may increase the probability of attending to the more precise prospect, which in this case is the more advantageous option. Conversely, in the punishment condition, when the better symbol had a less precise outcome-distribution, this risk-preference may shift to variance-seeking behaviour, it could increase the probability of attending to the less precise prospect, augmenting performance. Though, these notions are speculative and elucidation of the mechanisms involved will require a greater sample size and imaging techniques. In a similar study, Hertz et al., (2017) investigated the influence of outcome-variance in a two-armed bandit task. The use of continuous outcome distributions in studies such as these are more closely related real-world instances such as the examples given than typical fixed-probability tasks. Of course, in reality, to what extent the uncertainty can be resolved is limited by the complexity of the system involved. Their 2x2 design incorporated the manipulation of expected value and outcome-variance, but not valence (only reward learning was tested). Within this task, participants had to learn the values of the normal distributions depicted by two choice options on the screen, through trial-and-error. Performance when the better option was more precise appeared greater than when the worse option was more precise, paralleling the findings of this study. This serves as further evidence that precision, as well as the expected value, is incorporated in the decision-process. As valence was not manipulated in their task, we cannot compare our observation of the asymmetric effect of precision on reward and punishment learning. In addition, as counterfactual feedback was not presented during their task, a limitation arises. Differences in the amount of information gained on each prospect may impair their results, due to the discrepancies in how well they could learn the distributions of each option. Displaying both factual and counterfactual feedback serves as a sampling control, so that there is the opportunity for both distributions to be learnt equally well. Therefore, in an attempt to augment learning, three values of factual and three values of counterfactual information were presented in each trial in this study. Essentially, this meant that sampling was not dependent on the subjects’ choices. These contextual influences have been previously noted to aid in instrumental performance (Vlaev et al., 2011). In fact, counterfactual prediction errors have been reported as well as the encoding of counterfactual reward-expectations (Boorman, Behrens & Rushworth, 2011). An interesting aim of future research will be the impact of the learning rate on these counterfactual expectations. Previous work has indicated that these systems do not have distinct learning rates and both are modulated in the same way under equal information (Boorman, Behrens & Rushworth, 2011).

(25)

Instances of value-retrieval in the preference task correlated with the performance of individuals in the preceding learning task, as hypothesized. The preference task score was therefore used as a confirmation of learning of the symbol distributions. In addition to this, the ability to differentiate between previously unpaired symbols in the preference task strongly indicates that the values of the individual symbols were learnt. Hence, we can rule out the possibility of participants simply learning the policy of presented symbol pairs in the learning task.

In real-world settings, where decisions are more intricate and have many more variables it is likely that individuals use a heuristic rather than exhaustive strategy to solve problems. It would be cognitively demanding to use an optimal approach as a solution for continuous distributions; therefore a more tractable method may be used in its place. As such, investigations into reinforcement learning using tasks involving stimuli-outcome associations are often modeled with variants of the Rescorla-Wagner model (Palminteri et al., 2015; Hertz et al., 2017; O’Doherty et al., 2003). This model is used as an algorithmic rule of how to update previous values or associations on the basis of prediction errors (Rescorla and Wagner, 1972; Sutton and Barto, 1998). In bandit tasks commonly used to assess decision-making, prediction errors (δ) can be formulated by the difference between the observed and expected value:

δ

t

= R

t

- V

t

Where Rt represents the observed value at trial t, and Vt indicates the expected value at trial t. Q

learning, a model-free reinforcement-learning algorithm, uses this rule to infer the values of the rewards obtained through specific actions. Where, Q simply describes the predicted value of a specific action. In its simplest form, Q learning relies on the learning rate (α) and the prediction error (δ) to establish the weight at which new information is integrated into the Q-value (Dayan, Kakade and Montague, 2000). This can be algorithmically expressed as (Rescorla and Wagner, 1972):

Q

t+1

= Q

t

+ αδ

t

In this view, iterations of the equation update the value estimate of a given action, facilitating learning. Factors that influence learning, such as uncertainty, are typically labeled as modulators of

(26)

the learning rate, α (Behrens et al., 2007). As such, previous investigations into the impact of uncertainty on reinforcement learning have indicated this effect (Courville et al., 2006; Behrens et al., 2007). Moreover, this learning rate may vary due to the risk-preferences between individuals. If a risk-aversive agent perceives a higher level of risk in a situation, their learning rate will be tailored to this outlook, thus decreasing learning. It would then follow that a decrease in some aspect of risk would increase the learning rate. As such, Hertz et al., 2017 found that by decreasing the variance of both choice options, the likelihood of choosing the better option increased, in line with this view.

To what extent naturalistic instances of outcome-precision modulate learning will be a focus of future research. In addition, it will be interesting to see whether variance-averse or variance-seeking behaviour correlates in some manner with performance in tasks manipulating outcome-precision. We know that human behaviour strays from normative solutions such as EUT and mean-variance theory but this deviation is not random, but typically systematic and predictable (Camerer, 2000; Ellsberg, 1961; Allais, 1953). The risk-attitudes that individuals’ form are somewhat generalizable but can be influenced by a number of factors including personality traits and environmental conditions, variance preferences are also likely not static (Lopes, 1987; Weber and Kirsner, 1997; Hanoch et al., 2006). The neural encoding of EV and variance appears to be modulated by these risk-preferences in individuals (Tobler et al., 2009). EV-encoding activity was reduced in risk-averse subjects in the presence of high variance, where the opposite was true for risk-seeking behaviour. Knowledge on how individuals form preferences to outcome-precision and how it is integrated in the brain may aid in educational purposes for describing how deviations from normative solutions affect decision-making.

(27)

Concluding remarks

In light of the results found in this work, we suggest that individuals’ are both able to learn the outcome-precision of distributions associated with abstract symbols and form preferences towards this precision. In addition, it appears that these precision preferences are valence dependent. Though, these effects remain subtle in the choice task implemented in this study, and appear limited by the difficulty of the task: the subjects sometimes have trouble in learning the precision associated with the symbols in specific conditions. The asymmetry in performance relating to congruency between the valences suggests either a difference in the computational mechanisms employed by the reward and punishment learning systems or a dissimilarity in the preferences built across the domains. The task implemented here captures greater ecological validity of naturalistic decision-processes than typical fixed-probability tasks. Real-world instances of choice under uncertainty derive outcomes that are often continuous and complex. Take trading on the financial market or simply selecting a restaurant, where continuous and variable outcomes are dominant. In summary, we believe this work serves as a proof of concept to the influence of outcome-precision on reinforcement learning.

(28)

Acknowledgements

The completion of this work could not have been possible without the guidance of the Center for Research in Experimental Economics and Political Decision Making (CREED). I’d like to gratefully acknowledge Maël Lebreton, my acting supervisor, for his input and support throughout the project and also my co-accessor Jan Engelmann.

References

1. Alderfer, C.P. (1970). Choices with risk: beyond the mean and variance. The Journal of business. 43: 341.

2. Allais, P.M. (1953). Le comportement de l’homme rationnel devant le risque: critique des postulats et axiomes de l’e ́cole americaine. Econometrica. 21: 503-546. 3. Behrens, T.E.J., Woolrich, M.W., Walton, M.E., and Rushworth, M.F.S. (2007). Learning the value of information in an uncertain world. Nat Neurosci. 10: 1214–1221. 4. Bernoulli, D. (1738). "Exposition of a New Theory on the Measurement of Risk". Econometrica. The Econometric Society. 22(1): 22–36. 5. Bernoulli, D. (1738/1954). Exposition of a new theory on the measurement of risk. Translation published in Econometrica. 22: 23–36. 6. Boorman, E.D., Behrens, T.E., & Rushworth, M.F. (2011). Counterfactual Choice and Learning in a Neural Network Centered on Human Lateral Frontopolar Cortex. PLoS Biology. 9(6).

7. Camerer, C. (2000). Prospect theory in the wild. In: Kahneman, D., Tversky, A. (Eds.), Choice, Values, and Frames. Cambridge University Press. 288-300.

8. Christopoulos, G.I., Tobler, P.N., Bossaerts, P., Dolan, R.J., & Schultz, W. (2009). Neural correlates of value, risk, and risk aversion contributing to decision making under risk. J Neurosci. 29(40): 12574-83.

9. Courville, A.C., Daw, N.D., & Touretzky, D.S. (2006). Bayesian theories of conditioning in a changing world. Trends Cogn Sci. 10: 294–300.

10. Daw, N.D., & O’Doherty, J.P. (2013). Multiple systems for value learning. Neuroeconomics Decision Making. Brain.

11. Dayan, P., Kakade, S., & Montague, P.R. (2000). Learning and selective attention. Nature Neuroscience. 3: 1218-23.

(29)

12. Dodge, Y., & Marriott, F. H. C. (2003). The Oxford dictionary of statistical terms. Oxford, Oxford University Press.

13. Ellsberg, D. (1961). Risk, Ambiguity and the Savage Axioms. The Quarterly Journal of Economics. 75(4): 643-669.

14. Guitart-Masip, M., Fuentemilla, L., Bach, D.R., Huys, Q.J.M., Dayan, P., Dolan, R.J., & Duzel, E. (2011). Action dominates valence in anticipatory representations in the human striatum and dopaminergic midbrain. The Journal of Neuroscience: The Official Journal of the Society for Neuroscience. 31(21): 7867–7875.

15. Hanoch, Y., Johnson, J.G., & Wilke, A. (2006). Domain specificity in experimental measures and participant recruitment: an applica- tion to risk-taking behavior. Psychol Sci. 17:300-304.

16. Hertz, U., Bahrami, B., & Keramati, M. (2017). Stochastic satisficing account of choice and confidence in uncertain value-based decisions. bioRxiv: 107532.

17. Iglesias, S., Mathys, C., Brodersen, K.H., Kasper, L., Piccirelli, M., denOuden, H.E.M., and Stephan, K.E. (2013). Hierarchical Prediction Errors in Midbrain and Basal Forebrain during Sensory Learning. Neuron. 80: 519–530.

18. Kahneman, D., & Tversky, A. (1979). Prospect Theory: An analysis of decision under risk. Econometrica. 47(2): 263-291.

19. Knight, F.H. (1921). Risk, Uncertainty and Profit. Sentry Press, New York.

20. Loomes, G., & Sugden, R. (1982). Regret Theory: An Alternative Theory of Rational Choice Under Uncertainty. The Economic Journal. 92(368): 805-824

21. Lopes, L.L. (1987). Between Hope and Fear: The Psychology of Risk. Advances in Experimental Social Psychology. 20: 255-295.

22. Markowitz, H. (1952). Portfolio Selection. The Journal of Finance. 7(1): 77-91.

23. Mengel, F., Tsakas, E., & Vostroknutov, A. (2016). Past experience of uncertainty affects risk aversion. Experimental Economics. 19(1): 151-176. 24. ODoherty, J.P., Dayan, P., Schultz, J., Deichmann, R., Friston, K., & Dolan, R.J. (2004). Dissociable roles of ventral and dorsal striatum in instrumental conditioning. Science. 304:452 25. Palminteri, S., Justo, D., Jauffret, C., Pavlicek, B., Dauta, A., Delmaire, C., Czernecki, V., Karachi, C., Capelle, L., Durr, A., et al. (2012). Critical Roles for Anterior Insula and Dorsal Striatum in Punishment-Based Avoidance Learning. Neuron. 76: 998–1009.

(30)

26. Palminteri, S., & Pessiglione, M. (2017). Opponent Brain Systems for Reward and Punishment Learning: Causal Evidence From Drug and Lesion Studies in Humans. Decision Neuroscience: An integrative approach, Chapter 23. 291-303.

27. Palminteri, S., Clair, A.H., Mallet, L., & Pessiglione, M. (2012). Similar improvement of reward and punishment learning by serotonin reuptake inhibitors in obsessive-compulsive disorder. Biol Psychiatry. 72(3): 244-50.

28. Palminteri, S., Khamassi, M., Joffily, M., & Coricelli, G. (2015). Contextual modulation of value signals in reward and punishment learning. Nat Commun. 6: 8096.

29. Paulsen, D.J., Platt, M.L., Huettel, S.A., & Brannon, E.M. (2012). From Risk-Seeking to Risk-Averse: The Development of Economic Risk Preference from Childhood to Adulthood. Frontiers in Psycholog. 3: 313.

30. Pessiglione, M., Seymour, B., Flandin, G., Dolan, R.J., and Frith, C.D. (2006). Dopamine-dependent prediction errors underpin reward-seeking behaviour in humans. Nature. 442: 1042–1045. 31. Rangel, A., Camerer, C., & Montague, P.R. (2008). A framework for studying the neurobiology of

value-based decision making. Nature Reviews Neuroscience. 9: 545–556.

32. Rescorla, R.A. and Wagner, A.R. (1972). A theory of Pavlovian conditioning: variations in the effectiveness of reinforcement and nonreinforcement. In: A.H. Black and W.F. Prokasy (eds), Classical Conditioning II: Current Research and Theory. 64–99.

33. Rushworth, M.F.S., and Behrens, T.E.J. (2008). Choice, uncertainty and value in prefrontal and cingulate cortex. Nat Neurosci. 11: 389–397.

34. Sutton, R.S., and Barto, A.G. (1998). Reinforcement learning: An introduction (MIT press Cambridge).

35. Symmonds, M., Wright, N.D., Bach, D.R., & Dolan, R.J. (2011). Deconstructing risk: Separable encoding of variance and skewness in the brain. Neuroimage. 58: 1139-1149.

36. Tobler, P.N., Christopoulos, G.I., O’Doherty, J.P., Dolan, R.J., & Schultz, W. (2009). Risk-dependent reward value signal in human prefron- tal cortex. Proc. Natl. Acad. Sci. 106: 7185-7190.

37. Tobler, P.N., ODoherty, J.P., Dolan, R.J. & Schultz, W. (2007). Reward value coding distinct from risk attitude-related uncertainty coding in human reward systems. J. Neurophysiol. 97: 1621– 1632.

38. Tversky, A., and Kahneman, D. (1981). The framing of decisions and the psychology of choice. Science. 211(4481): 453–458.

(31)

39. Vlaev, I., Chater, N., Stewart, N., & Brown, G.D. (2011). Does the brain calculate value? Trends in Cognitive Science. 15(11): 546-54.

40. von Neumann, J., & Morgenstern, O. (1944). The Theory of Games and Economic Behavior. Princeton. NJ: Princeton University Press.

41. Weber, B.J., & Huettel, S.A. (2008). The neural substrates of probabilistic and intertemporal decision making. Brain Research. 1234: 104–115.

42. Weber, E. & Kirsner, B. (1997). Reasons for Rank-Dependent Utility Evaluation. Journal of Risk and Uncertainty. 14: 41.

43. Wright, N. D., Bahrami, B., Johnson, E., Di Malta, G., Rees, G., Frith, C. D., & Dolan, R. J. (2012). Testosterone disrupts human collaboration by increasing egocentric choices. Proceedings of the Royal Society B: Biological Sciences. 279(1736): 2275–2280.

44. Wright, N.D., Morris, L.S., Guitart-Masip, M., & Dolan, R.J. (2013). Manipulating the contribution of approach-avoidance to the perturbation of economic choice by valence. Frontiers in Neuroscience. 7: 228.

Referenties

GERELATEERDE DOCUMENTEN

&#34;To what extent do static visual images and the number of choice sets being answered have an impact on the accuracy and precision of the partworth estimates?&#34;... Empirical

Secondly, a Choice-Based Conjoint analysis was conducted with a fractional factorial design to study if more realistic static visual images increase the accuracy and precision of

First of all, this analysis identifies the deterministic (stationary and non-sta- tionary) and stochastic effects present in measured signals, and is used to design an adaptive

Publisher’s PDF, also known as Version of Record (includes final page, issue and volume numbers) Please check the document version of this publication:.. • A submitted manuscript is

We emphasize that our analysis of the (periodic) cardinal .C-spline interpolation problem makes essential use of specific relations between derivatives and finite

Behandelingen als, Sarepta mosterd met bladrammenas, Sarepta mosterd + Pseudomonas A +1/10 Ridomil Gold en Sarepta Mosterd + 1/10 Rido- mil Gold waren niet significant verschillend

Tabel 2. Gasvormige N-verliezen uit stallen en mestopslagen bij varkens in kg N/j per dierplaats en in %, en een vergelijking met de gasvormige N-verliezen voor 2003 volgens Oenema

Ontsmetten voor de heetstook werkt erg goed tegen roet maar geeft als Erwinia in de partij aanwezig is ook extra kans op een aantasting door agressief snot. Een uitslag die geen