• No results found

THE EFFECT OF ACUTE STRESS ON GOAL-DIRECTED AND HABITUAL ACTIONS IN MICE

N/A
N/A
Protected

Academic year: 2021

Share "THE EFFECT OF ACUTE STRESS ON GOAL-DIRECTED AND HABITUAL ACTIONS IN MICE"

Copied!
13
0
0

Bezig met laden.... (Bekijk nu de volledige tekst)

Hele tekst

(1)

THE EFFECT OF ACUTE STRESS ON GOAL-DIRECTED

AND HABITUAL ACTIONS IN MICE

Esther Visser

1,2,

, Isabell Ehmer

2,3

& Ingo Willuhn

2,3

ABSTRACT

Different learning systems are thought to manage instrumental actions, with one con-trolling goal-directed actions and another concon-trolling habitual actions. The flexible nature of goal-directed actions (outcome-dependent) allows adaptation to the ever-changing en-vironment, while habitual actions (outcome-independent) reduce cognitive load. Cardinal features of habitual behaviour are its persistence after reward devaluation and insensitivity to reversal of action-outcome contingency. Interestingly, previous studies have shown that stress can promote a shift from goal-directed to habitual actions. Therefore, this study val-idated an animal model for habitual behaviour, the within-subject habit formation task, and subsequently investigated the effects of an acute stressor on goal-directed and habitual behaviour in this paradigm.

This study successfully replicated the within-subject habit formation task in mice, by showing that animals display goal-directed actions after short training on a random ratio (RR) schedule. Besides, we confirmed that animals use habitual action strategies after training on a random interval (RI) schedule, or extended training on an RR schedule. In contrast to previous studies, we found that breaking a habit, by reversal of action-out-come contingency, was faster in animals on an RI than on an RR reinforcement schedule. Furthermore, this study investigated the effect of a single acute forced swim test stressor on sensitivity to outcome devaluation. In contrast to our expectations, we found no effect of stress on performance in the outcome devaluation session, where instrumental action remained goal-directed. All in all, our study paves the way for investigation of neuronal substrates underlying goal-directed and habitual behaviour, in healthy and pathological neuronal circuits (e.g. obsessive compulsive disorder or addiction).

1Research Master Brain & Cognitive Sciences, track Behavioural Neuroscience, Institute for Interdisciplinary Studies, University of Amsterdam, The Netherlands 2Neuromodulation and Behaviour group, Netherlands Institute for Neuroscience, Royal Netherlands Academy of Arts and Sciences, Amsterdam, The Netherlands 3Department of Psychiatry, Academic Medical Center, Amsterdam, The Netherlands

Correspondence to: esther-visser@hotmail.com

KEYWORDS

Habits Goal-directed behaviour Reinforcement schedules Devaluation Omission Stress

Forced Swim Test

INTRODUCTION

Throughout the day, we constantly make decisions by select-ing certain actions to obtain desired outcomes. An action can be selected based on its consequences, for example re-fuelling your car with petrol to continue driving. This kind of goal-directed action is necessary for adaptive behaviour, but does require considerable cognitive capacity for moni-toring of the response. After repeated rewarded repetition of the action, i.e. fuelling your car with petrol and being able to complete your journey, this action is automated and becomes habitual. Habitual actions are more efficient as these require less cognitive capacity, since they depend more on the history of reinforcement of an action and less on the expected outcome of the action at the moment of action selection1. However, when you borrow a diesel

engine car, your petrol-fuelling habit should be inhibited and you should employ a goal-directed strategy to fill it with

diesel. The ability to flexibly shift between goal-directed and habitual actions is crucial for proper decision making, but evidently difficult, as is illustrated by a 150.000 yearly misfuelling cases in the United Kingdom2. Also, aberrant

shifting between goal-directed and habitual actions is hy-pothesised to underlie several psychopathologies, such as addiction3, obsessive compulsive disorder (OCD)4 and

To-urette syndrome5.

Goal-directed and habitual actions can be modelled in rodents by using different reinforcement schedules6–9.

Random Ratio (RR) schedules bias animals to employ goal-directed strategies, because of a strong relationship between action and outcome. In contrast, Random Inter-val (RI) schedules bias towards using habitual strategies, because of uncertainty in action-outcome relationship. A hallmark of habitual behaviour is its persistence after

1

ARTICLE INFO

Received: 21st of July 2016 36 EC 10047247

(2)

2

been found that chronic stress can induce a shift to habitual actions, as it renders rats insensitive to changes in outcome value and action-outcome contingency21. This corresponds

with changes found on a morphological neuronal level, as Dias-Ferreira et al. showed global hypertrophy of the DLS and shrinkage of the DMS21. These findings correspond to

previous studies showing the involvement of the DLS in ha-bitual and DMS in goal-directed actions, respectively. This study shows that chronic stress causes structural reorgan-isation of corticostriatal circuits, accompanied by a shift towards habitual actions and inability to execute actions based on their outcomes. More recently, a study investigated the effects of different acute stressors on goal-directed be-haviour in rats22. They showed that a pharmacological

(sys-temic administration of corticosterone and yohimbine) or a single restraint stressor did not influence goal-directedness,

i.e. rats were sensitive to outcome devaluation. Contrarily,

multiple restraint stressors can lead to habitual actions, i.e. rats were insensitive to outcome devaluation.

The Forced Swim Test (FST) induces severe and acute physiological and psychological stress in rodents and can be used to assess stress coping strategies23,24. During the

FST, animals experience an uncontrollable and unpredict-able stressor25, for which an active (swimming) or passive

(immobile floating) coping strategy can be used24. In

con-trast to restraint stress, the FST is a compound stressor26,

as it encompasses a physiological exertion component and an aversive cold (wet) water component. An advantage of the FST over pharmacological stressors is that it elicits a natural activation of the glucocorticoid/noradrenergic system, other practical advantages are the length of the test, easy setup and reliable induction of acute stress27. Therefore,

this study investigated the effect of an acute FST stressor on goal-directed and habitual actions.

In this report we first describe our attempt to repli-cate the behavioural findings of Gremel et al.11,12,15 by using

the within-subject design habit formation task (Experiment 1). During training mice were subjected to both RR and RI reinforcement schedules in different contexts. Subsequent-ly, goal-directed and habitual behaviour was assessed by early and late outcome devaluation tests and an omission test. We hypothesised that mice on an RR schedule, but not RI schedule, would be sensitive to outcome devaluation and reversal of action-outcome contingency after short train-ing. Furthermore, after extended training we expected that mice would be insensitive to outcome devaluation on both the RR and RI schedule. Secondly, we have assessed the effect of a severe acute stressor induced by the reliable and well-established FST on sensitivity to outcome devaluation (Experiment 2). We hypothesised that stress would induce a shift towards habitual responding during early outcome devaluation.

devaluation of a reward. Studies have repeatedly shown that upon devaluation of a reward, animals on an RI schedule will continue responding (habitual actions), while animals on an RR schedule will decrease responding (goal-directed actions) 10–12. Moreover, when an action has repeatedly been

rewarded, animals will perform the action regardless of the outcome. Consequently, after extended training, animals on an RR schedule become insensitive to reward devaluation as well1,6,8. Another cardinal feature of habitual behaviour

is its insensitivity to reversal of the action-outcome contin-gency6,13. This can be assessed by using an omission test, in

which the previously rewarded action should be withheld in order to obtain a reward. Prior studies have shown that animals on an RR schedule can omit their responses more easily than animals on an RI schedule6,10,14, further

sup-porting the notion that RR schedules bias animals towards goal-directed actions and RI towards habitual actions.

Insights into the neurobiological mechanisms and circuits underlying goal-directed and habitual actions, point towards a pivotal role for basal ganglia and frontostriatal cir-cuits6,12,14–18. More specifically, studies have shown that the

dorsomedial striatum (DMS) is necessary for goal-directed action12,16, while the dorsolateral striatum (DLS) is necessary

for habitual actions17,18. In vivo recordings of neurons in the

DMS and orbitofrontal cortex (OFC) showed an increase in modulation during goal-directed actions12. In contrast,

DLS neurons became more inhibited during goal-directed actions on an RR schedule, but more active during habitual actions on an RI schedule. The OFC is thought to have a role in balancing goal-directed and habitual actions, by convey-ing information regardconvey-ing action value. Moreover, the shift from goal-directed to habitual actions is governed by a de-crease in excitatory transmission of lateral OFC projection neurons12. It was shown that optogenetic activation of these

OFC neurons increased goal-directed actions, while che-mogenetic inactivation led to habitual actions. Recently, a study showed that habitual behaviour correlated with stron-ger DLS output to both the direct and indirect basal ganglia pathways14. Interestingly, they also found that during

habit-ual behaviour the action-promoting direct-pathway striatal projection neurons (SPNs) fired before indirect-pathway SPNs. Habit suppression on the other hand, correlated with a decreased direct pathway output. Taken together, these studies suggest that the shift from goal-directed to habit-ual behaviour corresponds with a change in OFC output, leading to a decrease in DMS and an increase in DLS in-volvement on a neuronal level.

Another factor that can influence the shift from goal-directed to habitual actions in both humans and animal models is stress19,20. These effects are thought to be mediated

by glucocorticoid, catecholamine and noradrenergic activ-ity, which bias instrumental actions towards habits19. It has

(3)

3

Experiment 1: Validation of within-subject habit for-mation task

Making a Habit - Training

Mice (n=8 (Exp 1A) and n=12 (Exp 1B)) were consecutive-ly trained twice daiconsecutive-ly in different contexts to nose poke to obtain a sucrose pellet reward (14 mg, Dustless Precision Pellets, Bio Serv). The contexts differed in visual, olfactory, tactile and circadian cues, while the reward, position of the poke hole and training order (RR/RI schedule training first) remained the same for each animal (Figure 1a). The posi-tion of the poke hole, context and training order was coun-terbalanced across animals. Training sessions were at least four hours apart, to promote distinction between the rein-forcement schedules. A within-subject design was chosen to minimise effects of inter-subject variation. Performance in the operant boxes was recorded by Med-PC (Med-Asso-ciates Inc.), which registered the number of pokes, number of rewards, number of magazine entries and the duration of the session. Furthermore, an overhead infrared camera recorded behaviour in the operant chamber.

The operant training started with a single magazine training (Mag training), in which the animal was habituated to the operant boxes and rewards (Figure 1c). During the

MATERIAL & METHODS

Animals

Male C57Bl6/J mice (Envigo, United Kingdom) arrived at 6 weeks of age and were kept on a 12 h reversed light-dark cycle. Upon arrival, animals were single-housed in plastic conventional cages (365 x 207 x 140 mm) with corncob bedding material and enriched with nesting material and a plastic handling tube. This housing procedure allowed visual, olfactory and auditory contact between animals. All animals were allowed to habituate to the facility for at least two weeks, with ad libitum access to food and water. In the subsequent two weeks, animals were handled in their plastic tube and weighed daily. After one week of han-dling, animals were food restricted to ~85% of their base-line free-feeding weight (minimal 85% weight: 19 g). The operant experimental procedures started one week later. All animals used in this study were maintained in accordance with the guidelines of the European Union Welfare Strategy and the study was approved by the Animal Experimentation Committee (DEC) of Royal Netherlands Academy of Arts and Sciences (Amsterdam). After experimental procedures were finished, animals were transcardially perfused with 4% paraformaldehyde and brains removed and stored for future studies.

Figure 1 - Experimental setup. (a) Animals were exposed to one training session in the striped box (e.g. RR) and one training session in the Plexiglas

sleek box (e.g. RI) daily. The position of the poke hole (grey), cue light (yellow) and magazine (blue) remained the same for each individual animal throughout the experiment. (b) Animals on an RR20 schedule receive a reward after an average of 20 pokes. Animals on an RI60 schedule receive a reward on average after the first poke after an interval of 60 s, if they poke once in every 6 s interval. (c) Timeline experimental setup. Group A (n=8) underwent an omission test before the second and third devaluation, Group B (n=12) underwent an omission test after the second devaluation, because we reasoned experiencing an omission test could influence the results of the second devaluation test.

(4)

4

magazine training, the nose poke hole was absent and the animal could obtain 15 rewards in 15 min on a random time schedule. Each training session commenced by turning on the house light and finished by turning it off. The training continued with three days on a continuous reinforcement (CRF) schedule, in which the animal learned to nose poke to obtain rewards. Each poke resulted in a reward, with the opportunity to earn 5, 15 and 30 rewards in each context over days. Then, mice underwent two days of RR10 and RI30 training, followed by four days of RR20 and RI60 train-ing (Figure 1b). On the RRX schedule, each poke has a 1/X probability of being rewarded. On the RIY schedule, each first poke after a set time interval of Y/10 has a 0.1 chance of being rewarded. This means that on the RR10 sched-ule, a reward follows the 10th poke on average and on the RI30 schedule reward follows the first poke after on average 30 s, if the animal pokes once every 3 s. With a spaced stable pokerate of 20 pokes/min, this would lead to similar reward rates in both schedules (RR20 1/min, RI60 1/min). A train-ing session ended after the maximum number of rewards (15) was obtained or 60 min had passed. Half an hour after the afternoon training session, animals were fed with regular grain pellets in their home cage to maintain body weight.

Outcome devaluation

The action strategy of an animal cannot be assessed during training, therefore we investigated whether behaviour was goal-directed or habitual in an outcome devaluation test. Animals were subjected to a pre-feeding session, in which they were placed in a new cage and received one hour of ad

libitum access to sucrose pellets (task reward is devalued)

or regular grain pellets (task reward remains valued) before entering the devaluation test. To avoid differences in satia-tion caused by variasatia-tions in water intake, water bottles were absent during pre-feeding. The devaluation test consisted of two consecutive days, so animals experience both the valued and devalued state. During the devaluation probe test of 5 min in each context, all cues were similar to training, except that no reward was delivered. Animals were placed in the

first training context (e.g. RR) for 5 min, and when finished immediately placed in the second training context (e.g. RI) for again 5 min. Order of context exposure was the same as during training, and the order of revaluation (valued or devalued state first) was counterbalanced between animals. The amount of food intake during pre-feeding is describe in Supplementary Figure 1 (S1).

Breaking a habit – Omission

To assess the effects of reversal of action-outcome contin-gency, animals were subjected to an omission test. The omis-sion test took place twice a day on two consecutive days, similar to training. In this test animals had to refrain from the previously rewarded action for a certain time interval, in order to obtain a reward. In Experiment 1A the timer interval was set at 20 s. However, this resulted in unreliable data as the interval was too short for animals to display be-haviour other than eating and grooming (data not shown). Therefore, in Experiment 1B an interval of 40 s was chosen. When an animal would poke, the timer was reset to 40 s. The omission sessions lasted until an animal had received the maximum number of rewards (15/session) or 60 min had passed.

Video analysis Bonsai

To gain further insight into the action strategy of the animal during omission, we performed a region of interest video analysis during the omission test. The quality of the video re-cordings was insufficient to perform reliable tracking of the animal; still, a region of interest analysis was possible with these videos. Four out of six operant boxes were equipped with a low-resolution infrared camera (320x256 pixels, 25 fps). The video files were trimmed and then divided in five regions of interest (Figure 2), encompassing each quad-rant and a magazine area. For the offline analysis of video material the open-source visual programming framework Bonsai was used28. This allowed investigation of the time

the animal spent in the poking region, by determining the number of moving pixels in each region of interest.

Figure 2 - Example of video analysis with Bonsai.(a) Orange boxes indicate the regions of interest. In the left frame (a) the animal is in the nose poke

region, while in the right frame (b) the animal is in the magazine region.

(5)

5

each test, the cylinder was cleaned and refilled. Animals were picked up gently by the tail and quickly placed in the water for 10 min and were closely monitored by two exper-imenters. When an animal went under the surface of the water for more than 2 s, it was rescued and allowed to dry, which finished the FST session. After the FST animals were picked up by the tail, dried with a paper towel and allowed to dry for 20 min in a new cage with paper towels under a heat lamp (250 W). The drying period was incorporated to avoid interference with poking behaviour due to exten-sive grooming in the operant boxes during the devaluation probe tests.

Statistical analyses

Data are denoted as mean ± standard error of the mean (S.E.M.). Repeated measures ANOVAs were used to examine the effects of factors Training Day and

Reinforce-ment Schedule on pokerate (pokes/min) during

Train-ing, and factors Value and Reinforcement Schedule during Outcome Devaluation (Exp 1). To examine the effect of stress (Exp 2), Condition was added as a between-subject factor. When the assumption for sphericity was violated, a Greenhouse-Geisser correction was performed. In case data were not normally distributed, a log-transformation was applied, or a non-parametric Wilcoxon Signed Rank test was used to detect differences31. Follow-up paired

compari-sons were Bonferroni corrected. An α-level of 0.05 was used for all analyses. All data were analysed using SPSS, Version 22.0 (International Business Machines Corporation. Re-leased 2013. Armonk, NY: IBM Corp.).

Experiment 2: Influence of Forced Swim Test stressor on goal-directed behaviour

Making a Habit - Training

Animals (n=18) were trained similarly to Experiment 1, with one day of magazine training, three days of CRF train-ing, followed by two days of RR10/RI30 and four days of RR20/RI60 training (Figure 3). The position of the poke hole, context and training order was counterbalanced across the stress and control group. Experimenters were blind to the conditions.

Outcome Devaluation

After four days of RR20/RI60 training animals were subject-ed to the early devaluation test. Since we hypothesissubject-ed stress would induce a shift towards habitual behaviour already after short training, only the early devaluation session was assessed. Similar to Experiment 1, animals were pre-fed with sucrose pellets (devalued) or grain pellets (valued), but in contrast to previous experiments, animals were now sub-jected to a stress protocol (n=10) or waited 30 min in their home cage (n=8). After this, animals were subjected to the 5 min devaluation tests consecutively in both contexts (RR and RI), like in Experiment 1.

Forced Swim Test

Animals in the stress group were subjected to an acute stress-or, the Forced Swim Test (FST)25,27,29,30. For this a large glass

cylinder (30 x 15 cm) was filled three-quarters with water of ~30˚C, so animals were not able to touch the bottom of the cylinder with their tails nor escape from the top. Before

Figure 3- Design Experiment 2.(a) Animals underwent a similar training as animals in Experiment 1. During the devaluation test after pre-feeding, 10

out of 18 animals were placed in the FST setup (b) for 10 min and were allowed to dry for 20 min under a heat lamp. Then, all animals were subjected to the devaluation tests (5 min) in the same sequence as during training.

(6)

6

RESULTS

Experiment 1: Validation of within-subject habit for-mation task

In this experiment we aimed to replicate previous studies11,12,15 by implementing the within-subject habit

for-mation task in our lab. We hypothesised that after short training, animals on an RR schedule would be sensitive to outcome devaluation and action-outcome contingency re-versal, while they would not on an RI schedule. After pro-longed training, animals on an RR schedule were expected to be insensitive to outcome devaluation as well.

Making a Habit - Training

To ensure any differences between performance on different reinforcement schedules during outcome devaluation and omission testing are not due to differences in acquisition of the task, the training data were inspected. Since animals in Exp 1A and 1B underwent the same training regime up to the first outcome devaluation test, this data was pooled (n=20).

During training animals consistently obtained the maximum number of rewards on both reinforcement sched-ules. Notably, RR and RI during CRF training indicate the context that will later in training become RR or RI. During CRF training, no baseline differences were found between the RR and RI context F(1,19)=0.387, p=0.541, ensuring no innate preference or aversion for either context. The pokerate in each session (except for the first CRF5 train-ing) was normally distributed according to the Kolmogor-ov-Smirnov test (p’s>0.07). Analysis revealed an increase in pokerate over the training days, as there was a significant effect of Training Day, F(1.509, 28.668)=69.83, p<0.001 η2=0.963 (Greenhouse-Geisser corrected). Importantly, there was no difference in pokerate between the Reinforce-ment Schedules, F(1,19)=55.658, p=0.52 η2=0. 022 (Figure

4a). With regard to the entries in the magazine, we found a

significant effect of Training Day F(3.465,41.584)=11.031,

p<0.001 η2=0.479 (Greenhouse-Geisser corrected), with a decreasing number of head entries into the magazine over days. This indicates the animal learns it has to perform an action to obtain an outcome. There was no difference in magazine entry rate between the RR and RI Reinforcement Schedule F(1,12)=0.578, p=0.462 η2=0.046 (Figure 4b).

Outcome Devaluation

Early Devaluation – 1A & 1B

To assess the action strategies of our mice, we performed an outcome devaluation test after six training sessions on an RR and RI schedule. Animals had ad libitum access to either sucrose pellets (devalued) or regular grain pellets (valued) for 1 h before the test. Animals did not have a sig-nificant preference for sucrose (1.08±0.07) or grain pellets (1.01±0.04), t(19)=0.89, p=0.38 (Figure S1g). Visual inspec-tion of the pokerate data revealed a right skew, therefore the data were log-transformed. This transformation led to nor-mally distributed data (Kolmogorov-Smirnov all p’s>0.13). During early outcome devaluation 1, we found a significant effect of Reinforcement Schedule F(1,19)=8.681,

p=0.008 η2=0.314, Value F(1,19)=17.505, p=0.001 η2=0.480 and an interaction effect of Reinforcement Schedule x Value F(1,19)=5.889, p=0.025 η2=0.237 (Figure 5a). Further investigation by pairwise comparisons (Bonferroni cor-rected) revealed a significant difference between the RR Valued condition and the RR Devalued condition (p<0.01). This indicates animals were sensitive to outcome deval-uation, i.e. used a goal-directed action strategy, in the RR schedule context. Importantly, the same animals in the RI schedule context were not sensitive to outcome devaluation (p=0.202), i.e. used a habitual action strategy. This data also shows mice were able to shift between performing goal-di-rected and habitual actions in the RR and RI context, re-spectively.

Figure 4 - Pokerate and Magazine entry rate during training on RR / RI schedule. (a) The pokerate increases over training days and there is no

dif-ference in pokerate between the reinforcement schedules (RR vs. RI) during training. (b) The magazine entry rate, in contrast, decreases over time and there is no difference in magazine entry rate between the reinforcement schedules. Error bars display ± S.E.M..

(7)

7

p=0.249 η2=0.184 or Value F(1,7)=4.070, p=0.083 η2=0.368. There was a significant interaction effect of Reinforcement Schedule x Value F(1,7)=6.088, p=0.043 η2=0.465. Fol-low-up paired comparisons did not reveal significant dif-ferences between the Valued and Devalued state on either reinforcement schedule (all p’s>0.13) Nevertheless, these results should be interpreted with caution, because of the floor effect on pokerate.

Second Devaluation – Experimental group 1B

To investigate the effect of extensive training, without possible influences of an omission test, mice in group 1B (n=12) underwent six RR20 and RI60 retraining sessions after the early devaluation. Animals had a preference for sucrose pellets during prefeeding, t(11)=3.62, p<0.01. Analysis of the data revealed a significant effect of Rein-forcement Schedule F(1,9)=8.258, p=0.018 η2=0.479 and Value F(1,9)=20.867, p=0.001 η2=0.699 but no interaction F(1,9)=2.749, p=0.132 η2=0.234 (Figure 5d). More specifi-cally, we found a higher pokerate in the RR than RI schedule context, and valued versus devalued state. Further investi-gation by pairwise comparisons revealed significant differ-ences between the RR Valued and RI Valued (p=0.005) and RI Devalued (p<0.001). In contrast, no significant differenc-es were prdifferenc-esent between the RR Valued and RR Devalued (p=0.09) nor the RI Valued and RI Devalued (p=1.00). This suggests that after extended training, animals became in-sensitive to reward devaluation on the RR and RI schedule, pointing towards the use of a habitual action strategy.

Second and Third Devaluation – Experimental group 1A

For subsequent analyses, experimental group 1A and 1B were split, as they experienced a different training proto-col before the second devaluation. Group 1A underwent an omission session before the second devaluation, which may have led to decreased pokerates in the second devaluation.

For the second devaluation session after extended training on an RR schedule, we expected that animals were insensitive to devaluation of the outcome. The pokerate data showed a right skew, therefore a log-transformation was applied, which resulted in normally distributed data (Kolmogorov-Smirnov all p’s>0.09). After extended train-ing, we found no significant effect of Reinforcement Sched-ule F(1,7)=0.503, p=0.501 η2=0.067, no significant effect of Value F(1,7)=1.769, p=0.225 η2=0.202, and no interaction effect F(1,7)=3.452, p=0.106 η2=0.330 (Figure 5b). These data demonstrate that after extended training, animals became insensitive to reward devaluation (i.e. used a habit-ual strategy) on both reinforcement schedules.

Before the third, and last, devaluation session animals were trained for another six sessions on the RR20/ RI60 schedule. Animals showed a significant preference for the sucrose pellets during pre-feeding, t(7)=2.39, p=0.048 (Figure S1c). During this devaluation session, the pokerate declined and several animals refrained from poking (Figure

5c). Albeit, the pokerate data were normally distributed

(Kolmogorov-Smirnov, all p’s>0.06) and there were no sig-nificant effects of Reinforcement Schedule F(1,7)=1.578,

Figure 5 - Devaluation tests. (a) Shows

the pokerate of the animals at the first (early) devaluation test (n=20). Animals poke significantly more in the RR Valued than RR Devalued and are therefore sen-sitive to outcome devaluation in the RR schedule context (goal-directed action), but not in the RI Valued than RI Deval-ued, indicating insensitivity to outcome devaluation in the RI schedule context (habitual action). (b) Depicts the second (late) devaluation of group 1A (8 training sessions + omission after Devaluation 1) and analysis revealed no significant differ-ences between conditions. This indicates that after extended training, animals be-come insensitive to outbe-come devaluation.

(c) Illustrates the third (last) devaluation

of experimental group 1A (6 training ses-sions after the Devaluation 2) where no sensitivity to outcome devaluation was detected on the RR nor the RI schedule, indicating habitual action. However, po-kerates were very low. (d) Shows (late) Devaluation 2 of experimental group 1B (6 training sessions after the Devaluation 1) with no significant differences in po-kerate between the valued and devalued state on the RR or RI schedule. Error bars ± S.E.M., *p<0.05.

(8)

8

Breaking a habit – Omission

To investigate the effect of reversal of action-outcome con-tingency, animals were subjected to an omission test. In this test animals had to refrain from nose poking for 40 s to obtain a reward. We expected that after extended train-ing (two retraintrain-ing sessions after the second devaluation) behaviour on both schedules would be habitual, and there-fore we would not find differences in omission of poking behaviour. Data are presented as percentage of nose pokes omitted, relative to the last pre-omission training day.

The data indicated one animal that did not decrease its pokerate and was therefore excluded from further anal-ysis (n=11). This animal may be considered compulsive, because of its persistence in repetitive behaviour, despite the negative consequence of not receiving a reward. Due to a ceiling effect on the second day (percentage pokes omitted 91.8% ±2.5), data were not normally distributed. Therefore, the non-parametric Wilcoxon signed-rank test was used to investigate differences between the two reinforcement schedules and two days of omission testing (Bonferroni corrected for multiple comparisons, α=0.0125). Animals omitted more pokes on the second than the first day in both the RR (p=0.003) and RI (p=0.003) schedule context, indi-cating they learned to withhold actions previously associat-ed with reward (Figure 6a). This analysis showassociat-ed no signif-icant difference between RR and RI on Day 1 (p=0.05 n.s.), but unexpectedly revealed that on Day 2, animals in the RI schedule context omitted more pokes than when in the RR schedule context (p=0.004).

To gain further insight into the action strategy of the animals, region of interest video analysis was performed in Bonsai. We determined the percentage of time spent in the poking region (n=6) (Figure 6b). After log-transformation, the data were normally distributed (Kolmogorov-Smirnov all p’s>0.06). In accordance with the pokerate data, we found a significant decrease in time spent in the poke hole region on Day 2 compared to Day 1 of omission, F(1,4)=28.659,

p=0.006 η2=0.878. The video analysis did not detect a dif-ference between RR and RI F(1,4)=0.411, p=0.556 η2=0.093, nor an interaction effect F(1,4)=0.000, p=0.997, η2<0.001. This video analysis underscores that animals learn to inhibit the action previously associated with reward and adds that animals choose to actively avoid the poke hole region.

Experiment 2: Influence of Forced Swim Test stressor on goal-directed behaviour

In this experiment we investigated whether an acute stress-or induced by the Fstress-orced Swim Test could promote a shift from goal-directed to habitual actions. To this end, ten out of eighteen animals underwent a 10 min Forced Swim Test in water of 30˚C before the outcome devaluation probe test.

Making a Habit - Training

To assess any baseline group differences, the pokera-te and magazine entry rapokera-te during training were inves-tigated. Visual inspection of the data showed they were right skewed; therefore, a log-transformation was applied. Similar to the previous groups, we found a significant in-crease in pokerate over days F(2.762, 41.434)=4.00, p=0.016 η2=0.211 (Greenhouse-Geisser corrected) (Figure 7a). In contrast to the previous groups, this group showed a sig-nificant effect of Reinforcement Schedule F(1,15)=6.148,

p=0.026 η2=0.291. Since animals showed a higher

pokera-te in the RR than in the RI condition during training, the pokerates in the outcome devaluation test are normalised to the pokerate on the last training day. Furthermore, there was a significant Day x Reinforcement Schedule interac-tion effect (1.878,28.172)=4.747, p=0.018 η2=0.240 (Green-house-Geisser corrected). Importantly, during training we found no significant differences between the stress and control group F(1,15)=1.152, p=0.300 η2=0.071, nor any significant interactions of Condition with other factors. This shows that there were no baseline differences between animals assigned to the stress and control group.

With regard to the magazine entries during train-ing, we found a similar pattern as in Experiment 1. We observed a significant effect of Day F(2464,39.423)=4.363,

p=0.014 η2=0.214 (Greenhouse-Geisser corrected), so

animals decreased the number of magazine entries over days. We did not find a significant effect of Reinforcement F(1,16)=1.227, p=0.284 η2=0.071, no interactions and no difference between the stress and control group (Figure 7b).

Figure 6 - Omission test.(a) Shows the percentage of pokes omitted

on the first and second day of omission testing, for both reinforcement schedules. There is a clear learning effect, as animals learn to withhold their responses. Unexpectedly, we found a significant increase in percent-age of pokes omitted in the RI versus the RR schedule associated context.

(b) Depicts the results of the Bonsai video analysis, and confirms findings

of (a). It shows a significant decrease in percentage of time spent in the poke hole region on the second day compared to the first day. Error bars display ± S.E.M., * p<0.05.

(9)

9

effect of Reinforcement Schedule F(1,15)=4.888, p=0.043, η2=0.246 and an interaction between Value and Rein-forcement F(1,15)=5.932, p=0.028 η2=0.283. This indicates animals have a higher relative pokerate in the RR than the RI schedule associated context. Further investigation revealed that on a group level (stress and control group combined) animals showed a higher poking rate in the RR Valued than all others (all p’s<0.033), indicating goal directed behaviour. Remarkably, we detected no significant effects of stress F(1,15)=0.905, p=0.357, η2=0.057, nor any interac-tion effects of stress with other factors. This leads to the conclusion that the acute stress imposed by the FST did not induce a shift from goal-directed to habitual behaviour. Moreover, when analysing the groups separately, we found a significant effect of Reinforcement F(1,9)=6.104, p=0.036 and Value F(1,9)=6.592, p=0.03 only in the Stress group. This implies that the overall effect is carried by the effects in the Stress (and not control) group. Animals in the stress group had a higher pokerate in the Valued versus Deval-ued state and in the RR versus RI context. The lack of effect in the control group is noteworthy, and may be caused by insufficient group size, or the waiting period between the pre-feeding session and test (30 min). This may have caused an increased motivation to poke in the devalued condition, obfuscating possible differences.

This shows that during training, animals learned to perform an action to obtain an outcome and that baseline magazine entry rate was similar in the stress group and the control group.

Outcome Devaluation

After the regular training procedure, the first (early) outcome devaluation test took place. Since we aimed to investigate an effect of acute stress on goal-directed behaviour, we subject-ed animals to the FST before the devaluation test. To avoid stress effects on appetite during pre-feeding, animals under-went the FST immediately after an hour of pre-feeding (see

Figure 3a). One animal from the control group was

exclud-ed, as this animal had access to water during the pre-feeding session, while the other animals had not.

Because of the difference in pokerate during train-ing between the reinforcement schedules, pokerates in the outcome devaluation were analysed as percentage of pokerate on the last training day. As the data were right skewed, a log-transformation was applied to comply with the assumption of normally distributed data (Kolmogor-ov-Smirnov all p’s>0.07). We found a significant effect of Value F(1,15)=5.970, p=0.027 η2=0.285, showing animals had a higher relative pokerate in the valued than in the devalued condition. Furthermore, we found a significant

Figure 7 - Pokerate and mag-azine entry rate during train-ing. (a) The pokerate increases over training days and there is a significant difference in poker-ate between the reinforcement schedules (RR>RI) during train-ing. Most importantly, there is no difference between the stress and control condition. (b) The maga-zine entry rate decreases over time and there is no difference in mag-azine entry rate between the rein-forcement schedules, nor between the stress and control group. Error bars display ± S.E.M..

Figure 8 - Outcome Devaluation Experiment 2. (a) Demonstrates the pokerate of the whole group (irrespective of stress or control), which reveals

sensitivity to outcome devaluation in the RR, but not in the RI, schedule associated context. This means the animals were using a goal-directed strategy, similar to Experiment 1. (b) Shows the pokerate for the groups separately, with a significant effect of Value and Reinforcement schedule in the Stress (but not control) group.

(10)

10

DISCUSSION

Validation of within-subject habit formation task

In this study we have successfully replicated the within-sub-ject habit formation task (Experiment 1), since we show that animals on an RR reinforcement schedule are sensitive to outcome devaluation, i.e. display goal-directed actions, after short training. In accordance with previous studies, we demonstrate that after extensive training on an RR, or training on an RI reinforcement schedule, animals become insensitive to outcome devaluation, i.e. display habitual actions.

Another hallmark of habitual behaviour is relative insensitivity to action-outcome contingency reversal6,13,

which we measured with an omission test. In contrast to previously published studies, we found that animals on an RI reinforcement schedule omitted their responses more easily than animals on an RR reinforcement schedule. This finding might in part be explained by the fact the animals experienced the omission test after seventeen RR/RI train-ing sessions, and previous studies have found that after ex-tensive training animals on an RR reinforcement schedule are found to be habitual, i.e. insensitive to action-outcome contingency reversal, as well14. While this explanation

de-scribes the habitual action strategies of animals on an RR schedule, it does not explain why animals on an RI sched-ule ‘break a habit’ relatively easily (99.2% ±0.41 of responses omitted on Day 2). This result might be explicated by the difference between RR and RI schedules with regard to the relationship between response and reward rates. While this relation is relatively strong in an RR schedule, an RI sched-ule imposes an uncertain relation between response and reward rate. One may speculate this uncertainty in the re-lation between response and reward rate on an RI schedule leads to easier omission of responses, compared with an RR schedule with a strong relationship between response and reward rate. This corresponds to previous studies in which response rates were found to be more resistant to extinction in the presence of cues associated with high reinforcement than low reinforcement32. In our study, the RR-associated

context can be seen as a relative high reinforcement context because of the high number of rewards earned per time unit compared to the RI-associated context. This theory may explain why animals are more resistant to extinction of learned behaviour in the RR than RI context.

To our knowledge, this is the first report of an omis-sion test in a within-subject design. This means that in our study, animals underwent four omission sessions (Day 1 RR and RI, Day 2 RR and RI) compared to only two omission sessions in studies using a between subject-design (Day 1 RR or RI, Day 2 RR or RI). The decrease in responding over ses-sions revealed a strong carry-over effect of omission action strategies. This, combined with the strong ceiling effect on

the second day, indicates this within-subject paradigm may be less suitable for omission testing in this manner.

The divergent results of sensitivity to outcome de-valuation and sensitivity to action-outcome contingency reversal may hint towards different habit strengths imposed by the RR and RI training schedule. We have found support for the notion that a habit induced by extensive training on an RR schedule leads to more compulsive-like responding. After extensive training on an RR schedule, animals become insensitive to outcome devaluation and continue respond-ing in the omission test, despite negative consequences of not receiving a reward. On the other hand, we have found that habitual responding induced by uncertainty between action and outcome on an RI schedule, leads to insensi-tivity to outcome devaluation, but not insensiinsensi-tivity to ac-tion-outcome contingency. Future studies may investigate this hypothesis by using a stronger negative consequence than absence of reward (like delivery of a foot-shock), to gain further understanding of whether this behaviour after extensive RR schedule training can indeed be perceived as compulsive.

Influence of Forced Swim Test stressor on goal-direct-ed behaviour

Surprisingly, this study also revealed that an acute forced swim stressor did not induce a shift from goal-directed to habitual actions in this paradigm (Experiment 2). We did not detect a significant effect of stress on the pokerate during the outcome devaluation test. This absence of effect may be caused by the fact that animals were not intensely stressed by the FST. This may be explained by the relative high water temperature (30˚C), which is less stressful than cold water29,30. Observation of the animals during and after

the FST, showed no (stress-induced) obvious abnormal be-haviour. However, as we did not measure blood corticos-terone or adrenocorticotrophic hormone levels, we cannot provide a conclusive answer as to whether or not the FST was sufficiently stressful. Another feature that may explain the absence of an effect of the FST, is the strain of mice used in this study (C57Bl6/J). As this is a strain with relative low stress-sensitivity33,34, one acute stressor may have been

in-sufficient to induce a shift from goal-directed to habitual behaviour. A previous study in rats also found that a single acute pharmacological or single restraint stressor did not alter goal-directed behaviour22, the compound stressor FST

appears to have a similar effect. Therefore, it is recommend-ed for future studies to investigate multiple acute stressors, for example a series of unpredicted foot shocks, to unravel how stress influences goal-directed behaviour in a with-in-subject habit formation task. As this is a quick acute stressor protocol, it would also decrease the waiting time for the control group, ensuring proper devaluation of the reward.

(11)

11

while other studies reported action rates of 711, 10.512 and

159 in control groups. This also has implications for the

ac-tion-reward rate animals experience on the two reinforce-ment schedules. Animals on an RR20 schedule receive a reward after an average number 20 nose pokes, as each poke, irrespective of timing, has a probability of 0.1 of being re-warded. Therefore, animals on an RR schedule experience a strong action-reward rate relationship; the more nose pokes an animal makes, the faster it will receive a reward. With an average pokerate of over 40 pokes per minute, animals complete the RR schedule training after approximately 7.5 min. This is in stark contrast to animals on an RI60 sched-ule, where only the first poke after an interval of 6 s has a chance of 0.1 of being rewarded. On this schedule, animals experience a low action-reward rate relation and animals are unable to speed up reward delivery by adapting their behaviour. As such, it will take an animal on average 60 s to obtain a reward and 15 min to complete the RI schedule training. This means that animals on an RR schedule spend significantly less time to complete a training session and have more control over reward delivery compared to the RI schedule. The number of pokes an animal has to make in order to receive a reward is based on a lever press rate of 20 actions per minute, which would lead to one reward in the RR schedule and one reward in the RI schedule per minute. However, since our animals have a nose poke rate of 40 per minute, this diverges the reward rate of animals on the RR (on average 2 rewards per min) and RI (on average 1 reward per minute) reinforcement schedule. It is therefore recommended for future studies to equate the RR and RI schedules on reward rate, so effects found in the devalua-tion session are determined by the reinforcement schedules and not different reward rates.

This discrepancy between the action rates in ours and the aforementioned studies9,11,12, may be caused by a

difference in manipulandum, since the cited studies used a lever to measure instrumental action and we used a nose poke hole. Nose poking is seen as a more natural, spe-cies-specific response for rodents, while lever pressing is a learned skill36. In support of this, a recent study found a

differential activation pattern of dopamine in the Nucleus Accumbens core and shell between responding for sucrose rewards on a lever versus a nose poke hole36. The latter was

associated with a selective increase of dopamine in the shell, while lever pressing was associated with increase of dopamine in the shell and core. This differential striatal dopamine patterns caused by the manipulandum used for instrumental responding, should be taken into account for the design of future neuronal manipulation studies.

Furthermore, there were no upper or lower limits for number of pokes programmed, which may result in a high number of pokes required before obtaining the reward. This Additionally, a notable finding in Experiment 2

was the divergence of pokerate between animals on the RR and RI reinforcement schedule during training, which was absent in Experiment 1A and 1B. This may have been caused by a deviation from the procedure during CRF training, in which animals exhibited an increase in weight and were trained on different time points than groups in Experiment 1. The weight increase may have caused a decreased motiva-tion to learn the task and the time schedule deviamotiva-tion may have hampered the distinction of action strategies between the two contexts. While there was no apparent difference in responding on the CRF30 schedule the day after, these ab-normalities may have weakened initial learning and thereby the foundation of the task.

General

This habit formation task aims to provide a tool to distin-guish goal-directed from habitual actions on the basis of sensitivity to outcome devaluation and reversal of action outcome contingency. When a statistical significant differ-ence arises between the pokerate in the valued and the valued state, this is interpreted as sensitivity to outcome de-valuation or goal-directed behaviour. However, in case of a non-significant difference between the valued and devalued state, this is construed as insensitivity to outcome devalua-tion or habitual behaviour. But, a non-significant result does not mean the null hypothesis is true and should never be in-terpreted as there being no difference between the means31.

This means the task can be used to reliably measure and statistically prove goal-directed behaviour, but should be adapted in order to statistically prove habitual behaviour.

In our study, we consistently find the highest po-kerates in the RR Valued condition during outcome deval-uation sessions, which we interpret as motivated, goal-di-rected behaviour. A possible contributing factor to the increase in response rates in this condition, is the occur-rence of extinction bouts. This is a phenomenon where animals increase their responding once the reward is not delivered when expected35, as is the case in the outcome

devaluation test. Based on their training history, animals expect to receive rewards faster on the RR than RI sched-ule. Since the outcome devaluation sessions only last 5 min, one would expect these extinction bouts predominantly in the RR schedule. While this explanation may partly explain the difference between the reinforcement schedules, it does not explain the difference between the valued and devalued state.

One remarkable difference between our study and reports by other groups is the number of responses an animal makes. For example, on the last day of training before the first outcome devaluation test, our animals have an average action rate (responses/min) of 41.46 (±2.06),

(12)

12

goal-directed actions. Front. Comput. Neurosci. 7, 1–8 (2013).

12. Gremel, C. M. & Costa, R. M. Orbitofrontal and striatal circuits

dynamically encode the shift between goal-directed and habitual actions. Nat.

Commun. 4, 1–12 (2013).

13. Balleine, B. W. & Dickinson, A. Goal-directed instrumental action:

Contingency and incentive learning and their cortical substrates.

Neurophar-macology 37, 407–419 (1998).

14. Hare, J. K. O. et al. Pathway-Specific Striatal Substrates for Habitual

Behavior. Neuron 89, 472–479 (2016).

15. Gremel, C. et al. Endocannabinoid modulation of orbitostriatal

circuits gates habit formation. Neuron 90, 1312–1324 (2016).

16. Yin, H. H., Ostlund, S. B., Knowlton, B. J. & Balleine, B. W. The role

of the dorsomedial striatum in instrumental conditioning. Eur. J. Neurosci. 22, 513–523 (2005).

17. Yin, H. H., Knowlton, B. J. & Balleine, B. W. Inactivation of

dorso-lateral striatum enhances sensitivity to changes in the action-outcome contin-gency in instrumental conditioning. Behav. Brain Res. 166, 189–196 (2006).

18. Yin, H. H., Knowlton, B. J. & Balleine, B. W. Lesions of dorsolateral

striatum preserve outcome expectancy but disrupt habit formation in instru-mental learning. Eur. J. Neurosci. 19, 181–189 (2004).

19. Schwabe, L. & Wolf, O. T. Stress-induced modulation of

instru-mental behavior: From goal-directed to habitual control of action. Behav.

Brain Res. 219, 321–328 (2011).

20. Schwabe, L., Wolf, O. T. & Oitzl, M. S. Memory formation under

stress: Quantity and quality. Neurosci. Biobehav. Rev. 34, 584–591 (2010).

21. Dias-Ferreira, E. et al. Chronic Stress Causes Frontostriatal

Reor-ganization and Affects Decision-Making. Science. 325, 621–625 (2009)

22. Braun, S. & Hauber, W. Acute stressor effects on goal-directed

ac-tion in rats. Learn. Mem. 20, 700–9 (2013).

23. Het Europees Parlement en de Raad van de Europese Unie.

2010/63/EU Richtlijn betreffende de bescherming van dieren die voor weten-schappelijke doeleinden worden gebruikt. Publ. van Eur. Unie 2010, 33–79 (2010).

24. De Kloet, E. R. & Molendijk, M. L. Coping with the Forced Swim

Stressor: Towards Understanding an Adaptive Mechanism. Neural Plast. 2016, (2016).

25. Koolhaas, J. M. et al. Stress revisited: A critical evaluation of the

stress concept. Neurosci. Biobehav. Rev. 35, 1291–1301 (2011).

26. Ottenweller, J. E. in Encyclopedia of Stress (ed. Fink, G.) 202

(Aca-demic Press, 2000).

27. Petit-Demouliere, B., Chenu, F. & Bourin, M. Forced swimming

test in mice: A review of antidepressant activity. Psychopharmacology. 177, 245–255 (2005).

28. Lopes, G. et al. Bonsai: an event-based framework for processing

and controlling data streams. Front. Neuroinform. 9, 7 (2015).

29. Chen, L., Faas, G. C., Ferando, I. & Mody, I. Novel insights into the

behavioral analysis of mice subjected to the forced-swim test. Transl.

Psychi-atry 5, e551– (2015).

30. Arai, I., Tsuyuki, Y., Shiomoto, H., Satoh, M. & Otomo, S.

De-creased body temperature dependent appearance of behavioral despair in the forced swimming test in mice. Pharmacol. Res. 42, 171–6 (2000).

31. Field, A. Discovering Statistics using SPSS. (SAGE Publications

Ltd, 2009).

32. Shull, R. L., Gaynor, S. T. & Grimes, J. a. Response rate viewed as

engagement bouts: resistance to extinction. J. Exp. Anal. Behav. 77, 211–31 (2002).

33. Brinks, V., Mark, M. Van Der, Kloet, R. De & Oitzl, M. Emotion

and cognition in high and low stress sensitive mouse strains: a combined neu-roendocrine and behavioral study in BALB / c and C57BL / 6J mice. Front

Behav Neurosci 1, 1–12 (2007).

34. Tannenbaum, B. & Anisman, H. Impact of chronic intermittent

challenges in stressor-susceptible and resilient strains of mice. Biol. Psychiatry 53, 292–303 (2003).

35. Thompson, T. & Bloom, W. Aggressive behavior and

extinction-in-duced response-rate increase. Psychon. Sci. 5, 335–336 (1966).

36. Bassareo, V., Cucca, F., Frau, R. & Di Chiara, G. Differential

acti-vation of accumbens shell and core dopamine by sucrose reinforcement with nose poking and with lever pressing. Behav. Brain Res. 294, 215–223 (2015).

may be especially problematic for the RI schedule, as they experience fewer pairings of action and outcome per time period, compared to the RR schedule. Particularly in the beginning of RI30 and RI60 training, experiencing a low re-lationship between responding and obtaining a reward may lead to incomprehension, and thereby reduced responding, of the task. It is therefore recommended to set upper and lower boundaries at the number of pokes required before obtaining a reward, to induce regulated randomness and to homogenise training sessions inter- and intra-individually.

CONCLUSION

To summarise, this study has successfully replicated the within-subject habit formation task in mice. Thereby, it has paved the way for investigation of neuronal substrates underlying habitual behaviour, in healthy and pathological neuronal circuits, for example obsessive compulsive disor-der, Tourette syndrome or addiction. In contrast to our ex-pectations, we found no effect of Forced Swim Test induced stress on performance in the outcome devaluation session, where instrumental action remained goal-directed. It is therefore recommended to induce acute stress by multiple unexpected stressors which may result in the intended shift from goal-directed to habitual behaviour.

ACKNOWLEDGEMENTS

The author thanks Isabell Ehmer, Ingo Willuhn and Harm Krugers for being my supervisors on this project. For technical help and productive discussions Bastijn van den Boom, Nicole Yee and Marleen van der Meer receive my gratitude. Thanks to Kees Visser, Allard Haarman and Anna Lien Bouhuis for their helpful comments on the manuscript.

REFERENCES

1. Dickinson, A. Actions and Habits: The Development of

Be-havioural Autonomy. Philos. Trans. R. Soc. Lond. B. Biol. Sci. 308, 67–78 (1985).

2. The Guardian. Don’t be driven mad by misfuelling clauses. (2007).

at <https://www.theguardian.com/money/2007/nov/04/cash1>

3. Everitt, B. J. & Robbins, T. W. Neural systems of reinforcement

for drug addiction: from actions to habits to compulsion. Nat. Neurosci. 8, 1481–9 (2005).

4. Graybiel, A. M. & Rauch, S. L. Toward a Neurobiology of

Obses-sive CompulObses-sive Disorder. Neuron 28, 343–347 (2000).

5. Marsh, R. et al. Habit Learning in Tourette Syndrome. Arch Gen

Psychiatry 61, 1259–1268 (2004).

6. Yin, H. H. & Knowlton, B. The role of the basal ganglia in habit

formation. Nat. Rev. Neurosci. 7, 464–76 (2006).

7. Rossi, M. A. & Yin, H. H. Methods for studying habitual behavior

in mice. 1–13 (2013). doi:10.1002/0471142301.ns0829s60.Methods

8. Hilario, M. R. F. & Costa, R. M. High on habits. Front. Neurosci. 2,

208–217 (2008).

9. Hilario, M. R. F., Clouse, E., Yin, H. H. & Costa, R. M.

Endocanna-binoid signaling is critical for habit formation. Front. Integr. Neurosci. 1, 1–12 (2007).

10. Derusso, A. L. et al. Instrumental uncertainty as a determinant of

behavior under interval schedules of reinforcement. Front. Integr. Neurosci. 4, 1–8 (2010).

(13)

13

SUPPLEMENTARY MATERIALS

In Supplementary Figure 1 (S1) the amount of food intake during the pre-feeding session is displayed for each of the groups. It shows some animals have a preference for grain pellets, while others prefer sucrose pellets. In the pre-feed-ing session before Devaluation 3 in group 1A (c), before De-valuation 1 in group 1B (d) and DeDe-valuation 2 in group 1B (e), there was a significant higher sucrose intake than grain intake.

Figure S1 - Amount of food intake in grams during pre-feeding session of 1h. In the pre-feeding session before Devaluation 1 and 2 in group 1A (a) and (b) there was no significant higher food intake of the sucrose or grain pellets. Devaluation 3 in group 1A (c), before Devaluation 1 in group 1B (d) and Devaluation 2 in group 1B (e), there was a significant higher sucrose intake than grain intake. In Experiment 2 before Devaluation 1 (f) and in Experiment 1A and B (g), no differences in sucrose or grain intake were observed. Error bars display S.E.M., * p<0.05

These findings indicate a preference for the sucrose pellets in some groups, but did not influ-ence nose poking behaviour during the outcome deval-uation probe test. Subsequent correlational studies did not reveal a significant correlation between the pokera-te and amount of food intake in groups with a significant preference for the sucrose pellets: Devaluation 3 - 1A (c):

Referenties

GERELATEERDE DOCUMENTEN

License: Licence agreement concerning inclusion of doctoral thesis in the Institutional Repository of the University of Leiden Downloaded.

In that regard, the enhancement of memory processes during the early stages of responding to a stressor can be viewed as logical and salutary.” However, one of the

High cortisol levels at the time of testing were associated with slow WM performance at high loads, and with impaired recall of moderately emotional, but not of

Oei and colleagues found that stress impaired accuracy in the Sternberg paradigm specifically at high loads during present-target trials, whereas Schoofs and

However, consistent with our expectations, at high load, propranolol enhanced WM, with faster performance, indicating that propranolol reduced the distinction between

Then, to examine whether stress modulated the specific pattern of more activity in ventral areas, and less activity in dorsal areas during emotional distraction, and

The results show that the cultural variables, power distance, assertiveness, in-group collectivism and uncertainty avoidance do not have a significant effect on the richness of the

First, we included women who used and women who did not use hormonal contraceptives and did not control for menstrual phase in female participants, which is known to affect